Error Correlation Guide
This guide covers the error correlation analysis features, including release correlation, user correlation, and time-based correlation.
⚙️ Optional Feature - Error correlation is disabled by default. Enable it in your initializer:
RailsErrorDashboard.configure do |config|
config.enable_error_correlation = true
end
Table of Contents
- Overview
- Release Correlation
- User Correlation
- Time-Based Correlation
- Period Comparison
- Use Cases
- Configuration
- Best Practices
Overview
Error correlation helps you answer critical questions:
- Release Impact: Did this version introduce more errors?
- User Impact: Are the same users experiencing multiple issues?
- Temporal Patterns: Do certain errors occur together?
- Trend Analysis: Are errors increasing or decreasing?
Why Correlation Matters
Understanding correlations enables:
- Release Quality: Identify problematic releases quickly
- User Experience: Find users most impacted by errors
- Root Cause Analysis: Discover hidden relationships between errors
- Proactive Fixes: Address issues before they escalate
Release Correlation
What is Release Correlation?
Release correlation links errors to specific app versions or git commits, helping you identify which releases introduced errors.
Errors by App Version
Track errors per version to spot problematic releases:
correlation = RailsErrorDashboard::Queries::ErrorCorrelation.new(days: 30)
versions = correlation.errors_by_version
versions.each do |version, data|
puts "Version: #{version}"
puts " Total Errors: #{data[:count]}"
puts " Error Types: #{data[:error_types]}"
puts " Critical Errors: #{data[:critical_count]}"
puts " Platforms: #{data[:platforms].join(', ')}"
puts " First Seen: #{data[:first_seen]}"
puts " Last Seen: #{data[:last_seen]}"
end
Output:
Version: 2.1.0
Total Errors: 450
Error Types: 23
Critical Errors: 12
Platforms: iOS, Android
First Seen: 2025-12-20 09:00:00
Last Seen: 2025-12-25 14:30:00
Version: 2.0.5
Total Errors: 120
Error Types: 8
Critical Errors: 2
Platforms: iOS, Android
First Seen: 2025-12-15 10:00:00
Last Seen: 2025-12-24 16:00:00
Metrics Explained:
- Total Errors: Sum of all error occurrences for this version
- Error Types: Number of distinct error types (e.g., NoMethodError, ArgumentError)
- Critical Errors: Count of high-severity errors
- Platforms: Which platforms are running this version
- First/Last Seen: Time range for this version’s errors
Errors by Git SHA
For more granular tracking, correlate errors with specific commits:
shas = correlation.errors_by_git_sha
shas.each do |sha, data|
puts "Commit: #{sha[0..7]}"
puts " Total Errors: #{data[:count]}"
puts " Error Types: #{data[:error_types]}"
puts " App Versions: #{data[:app_versions].join(', ')}"
end
Use Case: Identify the exact commit that introduced a bug.
Example:
Commit: a3b4c5d6
Total Errors: 280
Error Types: 15
App Versions: 2.1.0, 2.1.1
Commit: e7f8g9h0
Total Errors: 45
Error Types: 5
App Versions: 2.1.0
Interpretation: Commit a3b4c5d6 is associated with significantly more errors → likely contains a bug.
Problematic Releases
Automatically identify releases with abnormally high error rates:
problematic = correlation.problematic_releases
problematic.each do |release|
puts "⚠️ Version #{release[:version]}"
puts " Errors: #{release[:error_count]}"
puts " Deviation: +#{release[:deviation_from_avg]}% from average"
puts " Critical: #{release[:critical_count]}"
puts " Error Types: #{release[:error_types]}"
end
Algorithm:
- Calculate average errors per version
- Flag versions with >2x the average
- Sort by error count (worst first)
Output:
⚠️ Version 2.1.0
Errors: 450
Deviation: +275% from average
Critical: 12
Error Types: 23
Interpretation: Version 2.1.0 has 275% more errors than average → rollback or hotfix urgently.
Configuration Requirements
To enable release correlation, errors must include version metadata:
# When logging errors
RailsErrorDashboard::Commands::LogError.call(
error_type: "NoMethodError",
message: "undefined method 'name'",
backtrace: [...],
app_version: "2.1.0", # Required for version correlation
git_sha: "a3b4c5d6e7f8", # Optional, for commit correlation
# ...
)
Database Columns:
app_version(string) - Semantic version (e.g., “2.1.0”)git_sha(string) - Full git commit hash
If these columns are missing, correlation queries return empty results.
User Correlation
What is User Correlation?
User correlation identifies users experiencing multiple error types, helping you understand user impact and prioritize fixes.
Multi-Error Users
Find users affected by multiple different error types:
correlation = RailsErrorDashboard::Queries::ErrorCorrelation.new(days: 30)
multi_error_users = correlation.multi_error_users(min_error_types: 2)
multi_error_users.each do |user_data|
puts "User: #{user_data[:user_email]}"
puts " Error Types: #{user_data[:error_type_count]}"
puts " Total Errors: #{user_data[:total_errors]}"
puts " Types: #{user_data[:error_types].join(', ')}"
end
Output:
User: user@example.com
Error Types: 5
Total Errors: 23
Types: NoMethodError, ArgumentError, NetworkError, TimeoutError, ValidationError
User: another@example.com
Error Types: 3
Total Errors: 12
Types: PaymentError, DatabaseError, CacheError
Interpretation:
user@example.comexperienced 5 different error types → severely impacted- They hit 23 total errors → frustrating experience
- Reach out to this user, investigate their usage patterns
Parameters:
min_error_types(default: 2) - Minimum distinct error types to include
Error Type User Overlap
Calculate how many users experience two specific error types:
overlap = correlation.error_type_user_overlap(
"NoMethodError",
"ArgumentError"
)
puts "Users with NoMethodError: #{overlap[:users_a_count]}"
puts "Users with ArgumentError: #{overlap[:users_b_count]}"
puts "Users with both: #{overlap[:overlap_count]}"
puts "Overlap percentage: #{overlap[:overlap_percentage]}%"
puts "Sample users: #{overlap[:overlapping_user_ids].join(', ')}"
Output:
Users with NoMethodError: 150
Users with ArgumentError: 120
Users with both: 80
Overlap percentage: 66.7%
Sample users: 45, 67, 89, 102, 134, 156, 178, 190, 203, 245
Interpretation:
- 66.7% of users with ArgumentError also get NoMethodError
- High overlap suggests related errors or common user path
- Fixing NoMethodError may also reduce ArgumentError
Use Cases:
- Identify cascading errors (one error causes another)
- Find common user workflows triggering multiple errors
- Prioritize fixes with highest user impact
Configuration Requirements
User correlation requires:
user_idcolumn in error_logs table- User model with
emailattribute (configurable)
# config/initializers/rails_error_dashboard.rb
RailsErrorDashboard.configure do |config|
config.user_model = "User" # Your user model class name
end
# When logging errors
RailsErrorDashboard::Commands::LogError.call(
error_type: "NoMethodError",
user_id: current_user.id, # Required for user correlation
# ...
)
Time-Based Correlation
What is Time-Based Correlation?
Time-based correlation finds error types that occur at similar times of day, suggesting shared root causes or triggers.
Analyzing Time Correlation
correlation = RailsErrorDashboard::Queries::ErrorCorrelation.new(days: 30)
time_correlated = correlation.time_correlated_errors
time_correlated.each do |pair_name, data|
puts "#{data[:error_type_a]} ↔ #{data[:error_type_b]}"
puts " Correlation: #{(data[:correlation] * 100).round}%"
puts " Strength: #{data[:strength]}"
end
Output:
NoMethodError ↔ DatabaseError
Correlation: 85%
Strength: strong
TimeoutError ↔ NetworkError
Correlation: 92%
Strength: strong
PaymentError ↔ ValidationError
Correlation: 62%
Strength: moderate
Interpretation:
- NoMethodError and DatabaseError occur at similar hours (85% correlation)
- Likely shared root cause: database issues trigger NoMethodError
- TimeoutError and NetworkError also correlate (92%)
- Both probably caused by network infrastructure issues
Correlation Algorithm
Uses Pearson correlation coefficient to measure similarity:
def calculate_time_correlation(series_a, series_b)
# series_a and series_b are arrays of 24 elements (hourly counts)
# Calculate means
mean_a = series_a.sum.to_f / 24
mean_b = series_b.sum.to_f / 24
# Calculate covariance and standard deviations
covariance = 0.0
std_a = 0.0
std_b = 0.0
24.times do |i|
diff_a = series_a[i] - mean_a
diff_b = series_b[i] - mean_b
covariance += diff_a * diff_b
std_a += diff_a**2
std_b += diff_b**2
end
# Pearson correlation
correlation = covariance / Math.sqrt(std_a * std_b)
correlation.round(3)
end
Correlation Values:
- 1.0 = Perfect positive correlation (always occur together)
- 0.5-1.0 = Strong correlation (often occur together)
- 0.0-0.5 = Weak correlation (sometimes occur together)
- -1.0-0.0 = Negative correlation (occur at opposite times)
Strength Classification:
- Strong: correlation >= 0.8
- Moderate: correlation >= 0.5
- Weak: correlation < 0.5 (filtered out by default)
Use Cases for Time Correlation
1. Identify Cascading Failures:
DatabaseError ↔ CacheError (90% correlation)
→ Database issues cause cache to fail
→ Fix database to resolve both
2. Find Shared Dependencies:
PaymentError ↔ ExternalAPIError (85% correlation)
→ Both depend on third-party payment API
→ Add retry logic for API timeouts
3. Detect Infrastructure Issues:
All errors spike at 2 AM (high correlation)
→ Nightly backup job causing resource contention
→ Reschedule backup or scale infrastructure
Period Comparison
What is Period Comparison?
Period comparison shows how error rates changed between two time periods, helping identify trends.
Comparing Periods
correlation = RailsErrorDashboard::Queries::ErrorCorrelation.new(days: 30)
comparison = correlation.period_comparison
puts "Current Period (last #{days/2} days):"
puts " Errors: #{comparison[:current_period][:count]}"
puts " #{comparison[:current_period][:start]} to #{comparison[:current_period][:end]}"
puts "Previous Period (#{days/2} days before):"
puts " Errors: #{comparison[:previous_period][:count]}"
puts " #{comparison[:previous_period][:start]} to #{comparison[:previous_period][:end]}"
puts "Change: #{comparison[:change]} (#{comparison[:change_percentage]}%)"
puts "Trend: #{comparison[:trend]}"
Output:
Current Period (last 15 days):
Errors: 1850
2025-12-10 to 2025-12-25
Previous Period (15 days before):
Errors: 1200
2025-11-25 to 2025-12-09
Change: +650 (+54.2%)
Trend: increasing_significantly
Trend Classification
Trends are classified based on percentage change:
| Trend | Change | Meaning |
|---|---|---|
| decreasing_significantly | < -20% | Major improvement |
| decreasing | -20% to -5% | Moderate improvement |
| stable | -5% to +5% | No significant change |
| increasing | +5% to +20% | Moderate concern |
| increasing_significantly | > +20% | Major concern |
Use Cases:
- Post-release monitoring: Check if errors increased after deployment
- Improvement tracking: Verify that fixes are working
- Capacity planning: Identify long-term growth trends
Use Cases
Scenario 1: Post-Release Validation
Question: “Did v2.1.0 introduce new errors?”
Analysis:
correlation = ErrorCorrelation.new(days: 7)
versions = correlation.errors_by_version
v210 = versions["2.1.0"]
v205 = versions["2.0.5"]
puts "v2.1.0: #{v210[:count]} errors, #{v210[:error_types]} types"
puts "v2.0.5: #{v205[:count]} errors, #{v205[:error_types]} types"
puts "Increase: +#{v210[:count] - v205[:count]} errors"
Interpretation:
v2.1.0: 450 errors, 23 types
v2.0.5: 120 errors, 8 types
Increase: +330 errors
Conclusion: v2.1.0 introduced 330 more errors and 15 new error types → rollback or hotfix immediately.
Scenario 2: Identifying Power Users vs Affected Users
Question: “Who should we reach out to?”
Analysis:
multi_error_users = correlation.multi_error_users(min_error_types: 3)
multi_error_users.first(5).each do |user|
puts "#{user[:user_email]}: #{user[:error_type_count]} types, #{user[:total_errors]} errors"
end
Output:
power_user@example.com: 7 types, 45 errors
affected_user@example.com: 5 types, 23 errors
casual_user@example.com: 3 types, 8 errors
Action:
- Reach out to
power_user@example.com(most impacted) - Offer compensation or support
- Gather feedback on their experience
- Investigate their usage patterns
Scenario 3: Root Cause via Time Correlation
Question: “Why do these errors always occur together?”
Analysis:
time_correlated = correlation.time_correlated_errors
# Find errors correlated with DatabaseError
database_correlated = time_correlated.select do |_, data|
data[:error_type_a] == "DatabaseError" || data[:error_type_b] == "DatabaseError"
end
database_correlated.each do |_, data|
other_error = data[:error_type_a] == "DatabaseError" ? data[:error_type_b] : data[:error_type_a]
puts "#{other_error}: #{(data[:correlation] * 100).round}% correlation"
end
Output:
CacheError: 90% correlation
NoMethodError: 85% correlation
TimeoutError: 78% correlation
Conclusion:
- DatabaseError triggers CacheError (cache depends on database)
- DatabaseError causes NoMethodError (missing data = nil.method)
- DatabaseError leads to TimeoutError (retry logic)
Action: Fix DatabaseError to resolve all correlated errors.
Scenario 4: Trend Monitoring
Question: “Are we improving or getting worse?”
Analysis:
# Weekly checks
week1 = ErrorCorrelation.new(days: 7).period_comparison
week2 = ErrorCorrelation.new(days: 14).period_comparison
month = ErrorCorrelation.new(days: 30).period_comparison
puts "This week: #{week1[:trend]} (#{week1[:change_percentage]}%)"
puts "Last 2 weeks: #{week2[:trend]} (#{week2[:change_percentage]}%)"
puts "Last month: #{month[:trend]} (#{month[:change_percentage]}%)"
Output:
This week: decreasing (-12%)
Last 2 weeks: decreasing (-8%)
Last month: stable (+2%)
Interpretation: Recent fixes are working (decreasing trend), but long-term trend is stable.
Scenario 5: A/B Test Impact
Question: “Did the new feature increase errors?”
Analysis:
# Assume feature deployed in v2.1.0
correlation = ErrorCorrelation.new(days: 14)
# Users on new version
v210_errors = correlation.errors_by_version["2.1.0"]
# Users still on old version
v205_errors = correlation.errors_by_version["2.0.5"]
puts "New version (v2.1.0): #{v210_errors[:count]} errors"
puts "Old version (v2.0.5): #{v205_errors[:count]} errors"
Interpretation:
- If v2.1.0 has significantly more errors → feature has bugs
- If similar → feature is stable
- Compare error types to see if new errors introduced
Configuration
Required Columns
For full correlation analysis, ensure these columns exist:
# db/migrate/XXXXXX_add_correlation_columns.rb
class AddCorrelationColumns < ActiveRecord::Migration[7.0]
def change
add_column :rails_error_dashboard_error_logs, :app_version, :string
add_column :rails_error_dashboard_error_logs, :git_sha, :string
add_column :rails_error_dashboard_error_logs, :user_id, :integer
add_index :rails_error_dashboard_error_logs, :app_version
add_index :rails_error_dashboard_error_logs, :git_sha
add_index :rails_error_dashboard_error_logs, :user_id
end
end
User Model Configuration
# config/initializers/rails_error_dashboard.rb
RailsErrorDashboard.configure do |config|
config.user_model = "User" # Your user model class
end
Logging Errors with Correlation Data
# In your error handler
RailsErrorDashboard::Commands::LogError.call(
error_type: error.class.name,
message: error.message,
backtrace: error.backtrace,
# Release correlation
app_version: ENV['APP_VERSION'] || "unknown",
git_sha: ENV['GIT_SHA'] || `git rev-parse HEAD`.strip,
# User correlation
user_id: current_user&.id,
# Other metadata
platform: request.user_agent =~ /iPhone/ ? "iOS" : "Android",
# ...
)
Best Practices
1. Track App Version Consistently
Use semantic versioning:
✓ Good: "2.1.0", "2.1.1", "2.2.0"
✗ Bad: "v2.1", "2.1-beta", "latest"
Set in environment:
# config/application.rb
config.app_version = ENV['APP_VERSION'] || '1.0.0'
# Or read from package.json, build number, etc.
2. Monitor Problematic Releases Daily
Set up daily alerts:
# In a scheduled job
correlation = ErrorCorrelation.new(days: 1)
problematic = correlation.problematic_releases
if problematic.any?
AlertService.notify(
title: "Problematic releases detected",
releases: problematic.map { |r| "#{r[:version]} (+#{r[:deviation_from_avg]}%)" }
)
end
3. Investigate Multi-Error Users
Weekly review:
# In a scheduled report
multi_error_users = correlation.multi_error_users(min_error_types: 3)
top_10 = multi_error_users.first(10)
CsTeam.send_report(
title: "Top 10 Most Impacted Users",
users: top_10
)
Action:
- Reach out to these users
- Offer support or compensation
- Investigate common patterns
4. Use Time Correlation for Root Cause
When stuck on root cause:
- Find all errors correlated with the mystery error
- Identify the common dependency
- That’s likely your root cause
Example:
correlated = correlation.time_correlated_errors
mystery_error_corr = correlated.select do |_, data|
data[:error_type_a] == "MysteryError" || data[:error_type_b] == "MysteryError"
end
# If all correlated errors involve ExternalAPI
# → MysteryError is caused by ExternalAPI issues
5. Compare Periods After Fixes
After deploying a fix:
# Wait 24 hours, then check
before_fix = ErrorCorrelation.new(days: 14).period_comparison
after_fix = ErrorCorrelation.new(days: 7).period_comparison
if after_fix[:trend] == :decreasing
puts "Fix is working! Errors down #{after_fix[:change_percentage]}%"
else
puts "Fix didn't help. Keep investigating."
end
6. Document Version History
Maintain a version log:
v2.1.0 (2025-12-20):
- Added new payment flow
- Errors: 450 total, 12 critical
- Status: ⚠️ Problematic, hotfix planned
v2.0.5 (2025-12-15):
- Fixed checkout bug
- Errors: 120 total, 2 critical
- Status: ✓ Stable
v2.0.0 (2025-12-01):
- Major redesign
- Errors: 800 total, 45 critical
- Status: ⚠️ Problematic, rolled back
Troubleshooting
“No version data available”
Cause: app_version column missing or always null
Fix:
# Add column
rails g migration add_app_version_to_error_logs app_version:string
rails db:migrate
# Ensure errors include version
RailsErrorDashboard::Commands::LogError.call(
app_version: ENV['APP_VERSION'], # Must be set!
# ...
)
“Multi-error users showing strange results”
Cause: User model not found or email missing
Fix:
# Verify configuration
RailsErrorDashboard.configuration.user_model
# => "User"
# Check if User model has email
User.column_names.include?('email')
# => true
# If using different attribute:
# Update find_user_email method in error_correlation.rb
“Time correlation shows no results”
Cause: Not enough error types (need at least 2)
Fix:
# Check distinct error types
ErrorLog.distinct.pluck(:error_type).count
# If < 2, wait for more data
# Lower correlation threshold to see weaker correlations
# (requires code change to min correlation threshold)
“Period comparison shows unexpected trend”
Cause: Time period split doesn’t align with deployment
Fix: Use custom date ranges instead of automatic split:
# Instead of using days parameter (which splits in half)
# Query specific periods:
current = ErrorLog.where("occurred_at >= ?", deployment_date).count
previous = ErrorLog.where("occurred_at < ?", deployment_date).where("occurred_at >= ?", deployment_date - 7.days).count
API Reference
ErrorCorrelation Query Object
correlation = RailsErrorDashboard::Queries::ErrorCorrelation.new(days: 30)
# Release correlation
correlation.errors_by_version
# => { "2.1.0" => { count: 450, error_types: 23, ... }, ... }
correlation.errors_by_git_sha
# => { "a3b4c5d6" => { count: 280, error_types: 15, ... }, ... }
correlation.problematic_releases
# => [{ version: "2.1.0", error_count: 450, deviation_from_avg: 275.0, ... }]
# User correlation
correlation.multi_error_users(min_error_types: 2)
# => [{ user_id: 45, user_email: "...", error_types: [...], total_errors: 23 }]
correlation.error_type_user_overlap("NoMethodError", "ArgumentError")
# => { users_a_count: 150, users_b_count: 120, overlap_count: 80, overlap_percentage: 66.7 }
# Time correlation
correlation.time_correlated_errors
# => { "NoMethodError <-> DatabaseError" => { correlation: 0.85, strength: :strong } }
# Period comparison
correlation.period_comparison
# => { current_period: {...}, previous_period: {...}, change: 650, change_percentage: 54.2, trend: :increasing_significantly }
# Platform-specific errors
correlation.platform_specific_errors
# => { "iOS" => [{ error_type: "...", count: 120, platform_specific: true }] }
Further Reading
- Advanced Error Grouping Guide - Finding related errors
- Baseline Monitoring Guide - Statistical anomaly detection
- Platform Comparison Guide - Platform health analysis
- Occurrence Patterns Guide - Temporal pattern detection