Advanced Error Grouping
This guide covers the advanced error grouping features, including fuzzy error matching, co-occurring error detection, and error cascade identification.
⚙️ Optional Features - All grouping features are disabled by default. Enable them in your initializer:
RailsErrorDashboard.configure do |config|
config.enable_similar_errors = true # Fuzzy error matching
config.enable_co_occurring_errors = true # Co-occurring patterns
config.enable_error_cascades = true # Cascade detection
end
Table of Contents
Overview
Intelligent error grouping that goes beyond simple error type matching:
- Fuzzy Matching: Find similar errors even when error_hash differs
- Co-occurring Errors: Detect errors that happen together in time
- Cascade Detection: Identify error chains (A causes B causes C)
These features help you:
- Reduce noise by grouping related errors
- Identify root causes faster
- Understand error dependencies
- Prioritize fixes based on impact
Fuzzy Error Matching
What It Does
Fuzzy matching finds errors that are similar but not identical. It uses:
- Backtrace similarity: Compares stack frames (70% weight)
- Message similarity: Uses Levenshtein distance (30% weight)
How It Works
# In your error show page, similar errors are automatically displayed
# Or access programmatically:
error = RailsErrorDashboard::ErrorLog.find(123)
similar = error.similar_errors(threshold: 0.6, limit: 10)
similar.each do |result|
puts "#{result[:error].error_type} - #{result[:similarity]}% similar"
end
Similarity Calculation
- Backtrace Frames (70%):
- Extracts file paths and method names from stack traces
- Ignores line numbers (they change with code edits)
- Uses Jaccard similarity on frame sets
- Example:
app/controllers/users_controller.rb:create
- Message Similarity (30%):
- Normalizes messages (replaces numbers, quoted strings)
- Calculates Levenshtein distance
- Example: “User 123 not found” → “User N not found”
- Combined Score:
similarity = (backtrace_score * 0.7) + (message_score * 0.3)
Configuration
Default threshold is 0.6 (60% similar). This provides a good balance between recall and precision.
# Adjust threshold for your needs
similar = error.similar_errors(threshold: 0.7) # More strict (fewer results)
similar = error.similar_errors(threshold: 0.5) # More lenient (more results)
Platform-Based Matching
By default, fuzzy matching only compares errors from the same platform (iOS to iOS, Android to Android). This is because:
- Platform-specific code differs significantly
- More relevant matches for debugging
- Faster computation
Use Cases
- After Refactoring: Find errors that moved to different line numbers
- Similar User Reports: Group “User X not found” errors
- Code Duplication: Identify similar error patterns across files
- Root Cause Analysis: See if fixing one error resolves similar ones
UI Display
On the error show page, you’ll see a “Similar Errors” card showing:
- Similarity percentage (color-coded progress bar)
- Error type and message preview
- Platform and occurrence count
- Link to view the similar error
Co-occurring Errors
What It Does
Detects errors that happen together within a time window (default: 5 minutes). This helps identify:
- Cascading failures
- Related bugs
- Concurrent issues
How It Works
error = RailsErrorDashboard::ErrorLog.find(123)
co_occurring = error.co_occurring_errors(
window_minutes: 5,
min_frequency: 2,
limit: 10
)
co_occurring.each do |result|
error = result[:error]
puts "#{error.error_type} occurs together #{result[:frequency]} times"
puts "Average delay: #{result[:avg_delay_seconds]}s"
end
Parameters
- window_minutes (default: 5): Time window for co-occurrence
- min_frequency (default: 2): Minimum times errors must co-occur
- limit (default: 10): Maximum results to return
Algorithm
- Get all occurrence timestamps for the target error
- For each timestamp, find other errors within ±window_minutes
- Count co-occurrences and calculate average delay
- Return errors sorted by frequency
Use Cases
- API Cascades: When one API call fails, related calls fail too
- Dependency Issues: Database error → Cache error → Timeout
- User Workflows: Errors in multi-step processes
- Load Spikes: Multiple services fail under high load
UI Display
The “Co-occurring Errors” card shows:
- Error type and message
- Frequency of co-occurrence
- Average time delay between errors
- Link to the co-occurring error
Error Cascades
What It Does
Identifies parent-child relationships between errors:
- Parents: Errors that cause this error
- Children: Errors this error causes
How It Works
The system automatically detects cascades using background jobs:
# Manual cascade detection (runs hourly by default)
RailsErrorDashboard::Services::CascadeDetector.call
# Access cascade patterns for an error
error = RailsErrorDashboard::ErrorLog.find(123)
cascades = error.error_cascades(min_probability: 0.5)
puts "Parent errors (what causes this):"
cascades[:parents].each do |pattern|
puts " #{pattern.parent_error.error_type} → #{pattern.cascade_probability * 100}%"
end
puts "Child errors (what this causes):"
cascades[:children].each do |pattern|
puts " #{pattern.child_error.error_type} → #{pattern.cascade_probability * 100}%"
end
Detection Algorithm
- Time Window: Looks 0-60 seconds before/after each error
- Pattern Tracking: If A then B happens >70% of time = cascade
- Probability Calculation:
probability = times_B_after_A / total_A_occurrences - Minimum Frequency: Requires 3+ occurrences to confirm pattern
Cascade Probability Levels
- High (70-100%): Strong cascade relationship
- Medium (50-70%): Moderate cascade relationship
- Low (<50%): Filtered out by default
Background Job
Cascade detection runs automatically via CascadeDetectionJob:
- Frequency: Hourly
- Lookback: Last 24 hours
- Updates: Existing patterns are updated with new data
Use Cases
- Root Cause: Fix the parent error to resolve children
- Impact Analysis: See downstream effects of an error
- Priority Setting: Parent errors have higher impact
- Architecture Review: Identify tight coupling issues
UI Display
The “Error Cascades” section shows:
- Visual diagram: Parent → This Error → Children
- Cascade probability percentage
- Average delay between errors
- Links to parent and child errors
Configuration
Fuzzy Matching Configuration
# In config/initializers/rails_error_dashboard.rb
RailsErrorDashboard.configure do |config|
# Fuzzy matching threshold (0.0 - 1.0)
# Default: 0.6 (60% similar)
# Higher = more strict, fewer matches
# Lower = more lenient, more matches
# No config needed - adjust via method parameters
end
Cascade Detection Configuration
# Cascade detection window (seconds)
# Default: 60 seconds
# Adjust via CascadeDetector parameters:
RailsErrorDashboard::Services::CascadeDetector.call(
window_seconds: 120, # 2-minute window
min_frequency: 5, # Require 5+ occurrences
min_probability: 0.7 # 70% probability threshold
)
Platform-Based Matching
Fuzzy matching compares errors from the same platform only. This is currently a sensible default and cannot be configured.
Use Cases
Scenario 1: After a Major Refactor
Problem: You refactored your authentication system. Now errors appear at different line numbers.
Solution: Use fuzzy matching to find related authentication errors:
- View any auth error
- Check “Similar Errors” card
- See all related auth errors (even with different line numbers)
- Fix the root cause once
Scenario 2: Mysterious Cascading Failures
Problem: Users report multiple errors during checkout, but you’re not sure which is the root cause.
Solution: Use cascade detection:
- View the final error users see
- Check “Error Cascades” → “Parent Errors”
- Follow the chain back to the root cause
- Fix the parent error to resolve all children
Scenario 3: Load Spike Investigation
Problem: During peak hours, you see multiple different errors occurring.
Solution: Use co-occurring errors:
- Pick any error from the spike
- Check “Co-occurring Errors” card
- See all errors that happen together
- Identify if it’s a database bottleneck, API timeout, etc.
Scenario 4: Code Duplication Detection
Problem: You suspect similar error handling exists in multiple controllers.
Solution: Use fuzzy matching:
- View an error from one controller
- Check “Similar Errors” with threshold 0.5
- See errors from other controllers with similar backtraces
- Refactor common error handling into shared code
Best Practices
1. Start with Default Thresholds
The defaults are well-tested:
- Fuzzy matching: 0.6 (60%)
- Cascade window: 60 seconds
- Co-occurrence window: 5 minutes
Only adjust if you have specific needs.
2. Monitor Cascade Detection
Check cascade patterns weekly:
- Are there unexpected cascades?
- Can you fix parent errors to prevent children?
- Are cascades indicating architectural issues?
3. Use Co-occurring Errors for Root Cause
When investigating production issues:
- Find the most recent error
- Check co-occurring errors
- Look for the earliest error in the sequence
- That’s likely your root cause
4. Leverage Similar Errors for Cleanup
Periodically:
- Sort errors by occurrence count
- View top errors
- Check “Similar Errors”
- Group and fix related issues together
Performance Considerations
Fuzzy Matching
- Computation: O(n) where n = number of candidate errors
- Caching: Similarity scores are calculated on-demand
- Limits: Results limited to 10 by default
Co-occurring Errors
- Database: Uses indexed occurred_at queries
- Memory: Loads error occurrences into memory
- Performance: Fast for normal error volumes (<10k/day)
Cascade Detection
- Background Job: Runs hourly, doesn’t block requests
- Database: Creates cascade_patterns table
- Cleanup: Old patterns (>90 days) should be cleaned periodically
Troubleshooting
“No similar errors found”
Possible causes:
- Error is unique (truly different from others)
- Threshold is too high (try 0.5)
- Different platforms (iOS vs Android)
Solution: Lower the threshold or check if the error is genuinely unique.
“Too many co-occurring errors”
Possible causes:
- Time window too large (>5 minutes)
- High error volume during peak hours
- Many unrelated errors happening simultaneously
Solution:
- Reduce window_minutes to 2-3
- Increase min_frequency to filter noise
- Focus on specific error types
“Cascade detection not working”
Possible causes:
- Background job not running
- ErrorOccurrence table doesn’t exist (run migrations)
- Window too short (errors happen >60s apart)
Solution:
- Check
rails_error_dashboard_cascade_patternstable exists - Ensure background jobs are running (
rails jobs:work) - Increase window_seconds if needed
Database Schema
Similarity Tracking
No additional columns needed for fuzzy matching (computed on-demand).
Error Occurrences
create_table :rails_error_dashboard_error_occurrences do |t|
t.references :error_log
t.datetime :occurred_at, null: false
t.integer :user_id
t.string :request_id
t.timestamps
end
add_index :rails_error_dashboard_error_occurrences, [:occurred_at, :error_log_id]
Cascade Patterns
create_table :rails_error_dashboard_cascade_patterns do |t|
t.integer :parent_error_id, null: false
t.integer :child_error_id, null: false
t.integer :frequency, default: 0
t.float :avg_delay_seconds
t.float :cascade_probability
t.datetime :last_detected_at
t.timestamps
end
add_index :rails_error_dashboard_cascade_patterns, [:parent_error_id, :child_error_id]
API Reference
ErrorLog#similar_errors
error.similar_errors(threshold: 0.6, limit: 10)
# Returns: [{ error: ErrorLog, similarity: 0.85 }, ...]
ErrorLog#co_occurring_errors
error.co_occurring_errors(window_minutes: 5, min_frequency: 2, limit: 10)
# Returns: [{ error: ErrorLog, frequency: 5, avg_delay_seconds: 12.5 }, ...]
ErrorLog#error_cascades
error.error_cascades(min_probability: 0.5)
# Returns: { parents: [CascadePattern, ...], children: [CascadePattern, ...] }
Services::CascadeDetector.call
RailsErrorDashboard::Services::CascadeDetector.call(
lookback_hours: 24,
window_seconds: 60,
min_frequency: 3,
min_probability: 0.7
)
# Returns: { detected: 5, updated: 3 }
Further Reading
- Baseline Monitoring Guide - Statistical anomaly detection
- Platform Comparison Guide - iOS vs Android analysis
- Occurrence Patterns Guide - Temporal pattern detection
- Error Correlation Guide - Release and user correlation