Multi-App Performance Monitoring

This guide covers performance monitoring and optimization for Rails Error Dashboard’s multi-app support feature.

Architecture Overview

Multi-app support is designed for high-concurrency scenarios with multiple Rails applications writing errors simultaneously to a shared database.

Key Design Decisions

Row-Level Locking: Uses pessimistic locking scoped to (application_id, error_hash) - apps never block each other
Cached Application Lookups: Application names cached for 1 hour to reduce database hits
Composite Indexes: Optimized indexes on [application_id, occurred_at] and [application_id, resolved]
Per-App Deduplication: Error hashes include application_id to track same errors independently across apps
Consistent Lock Ordering: Prevents deadlocks by always locking in (application_id ASC, error_hash ASC) order

Database Performance

Index Usage Monitoring (PostgreSQL)

Check if multi-app indexes are being used effectively:

-- Index usage statistics
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan as scans,
  idx_tup_read as tuples_read,
  idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
WHERE tablename = 'rails_error_dashboard_error_logs'
  AND indexname LIKE '%application%'
ORDER BY idx_scan DESC;

Expected Results:

index_rails_error_dashboard_error_logs_on_application_id should have high scan count
index_error_logs_on_app_occurred should be used for time-based queries
index_error_logs_on_app_resolved should be used for filtering unresolved errors

If scan counts are low (<100), the indexes may not be used. Check:

Are you filtering by application_id?
Is the query planner choosing a different strategy? (Run EXPLAIN ANALYZE)

Slow Query Detection

Find slow queries involving applications:

-- Requires pg_stat_statements extension
SELECT
  query,
  calls,
  total_exec_time / calls as avg_time_ms,
  min_exec_time as min_ms,
  max_exec_time as max_ms,
  stddev_exec_time as stddev_ms
FROM pg_stat_statements
WHERE query LIKE '%rails_error_dashboard_error_logs%'
  AND query LIKE '%application_id%'
ORDER BY avg_time_ms DESC
LIMIT 20;

Optimization Targets:

Average query time < 10ms for error writes
Average query time < 50ms for dashboard queries
Max query time < 500ms for analytics

Cache Hit Rate

Monitor application lookup cache performance:

-- PostgreSQL cache hit ratio
SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit) as heap_hit,
  sum(heap_blks_hit) / NULLIF((sum(heap_blks_hit) + sum(heap_blks_read)), 0) as cache_ratio
FROM pg_statio_user_tables
WHERE relname = 'rails_error_dashboard_applications';

Target: Cache ratio > 0.99 (99%+)

If lower:

Applications table should be tiny (usually <100 rows)
All reads should hit cache
Check if find_or_create_by_name is being cached in Rails (should be 1-hour TTL)

Lock Monitoring

Check for lock contention between applications:

-- Active locks on error_logs table
SELECT
  l.pid,
  l.mode,
  l.granted,
  a.application_name,
  a.query_start,
  a.state,
  substring(a.query, 1, 100) as query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE l.relation = 'rails_error_dashboard_error_logs'::regclass
ORDER BY l.granted, a.query_start;

Expected Behavior:

Most locks should be granted = true
RowShareLock and RowExclusiveLock are normal
AccessExclusiveLock (table-level) should NEVER appear during error logging

If you see ungranted locks:

Check for long-running transactions
Verify row-level locking is working (check find_or_increment_by_hash uses .lock)
Look for deadlocks in PostgreSQL logs

Deadlock Detection

-- Check PostgreSQL logs for deadlocks
-- In postgresql.conf: log_lock_waits = on, deadlock_timeout = 1s

-- Or query recent deadlocks (requires logging_collector = on)
SELECT * FROM pg_stat_database_conflicts
WHERE datname = current_database();

Target: Zero deadlocks

Our design prevents deadlocks by:

Applications table is READ-ONLY after setup
Error writes lock single row only
Consistent lock ordering by (application_id, error_hash)
Retry logic for RecordNotUnique exceptions

Rails Application Monitoring

Cache Performance

Monitor Rails cache for application lookups:

# In Rails console
stats = Rails.cache.stats

# Check hit rate
cache_key_pattern = "error_dashboard/application/*"

# Clear cache and test
Rails.cache.clear
app1 = RailsErrorDashboard::Application.find_or_create_by_name("TestApp")
app2 = RailsErrorDashboard::Application.find_or_create_by_name("TestApp") # Should hit cache

# Verify cache
cached = Rails.cache.read("error_dashboard/application/TestApp")
puts "Cached: #{cached.inspect}"

Expected:

First call: Database hit
Second call: Cache hit (within 1 hour)
Cache size: ~500 bytes per application name

Query Object Performance

Measure analytics query performance by application:

# Benchmark dashboard stats
require 'benchmark'

apps = RailsErrorDashboard::Application.pluck(:id).sample(5)

Benchmark.bm(20) do |x|
  x.report("All apps:") do
    RailsErrorDashboard::Queries::DashboardStats.call
  end

  apps.each do |app_id|
    x.report("App #{app_id}:") do
      RailsErrorDashboard::Queries::DashboardStats.call(application_id: app_id)
    end
  end
end

Targets:

All apps query: < 100ms
Single app query: < 50ms

Error Write Performance

Benchmark error logging with multiple apps:

require 'benchmark'

apps = 5.times.map { |i| RailsErrorDashboard::Application.create!(name: "BenchApp#{i}") }

Benchmark.bm(20) do |x|
  x.report("Sequential writes:") do
    100.times do
      app = apps.sample
      RailsErrorDashboard::ErrorLog.find_or_increment_by_hash(
        "test_#{rand(10)}",
        application_id: app.id,
        error_type: "TestError",
        message: "Benchmark test",
        occurred_at: Time.current
      )
    end
  end

  x.report("Concurrent writes:") do
    threads = 10.times.map do
      Thread.new do
        10.times do
          app = apps.sample
          RailsErrorDashboard::ErrorLog.find_or_increment_by_hash(
            "concurrent_#{rand(10)}",
            application_id: app.id,
            error_type: "ConcurrentError",
            message: "Thread test",
            occurred_at: Time.current
          )
        end
      end
    end
    threads.each(&:join)
  end
end

Targets:

Sequential: 5-10ms per write
Concurrent: No deadlocks, similar per-write time
Should see occurrence_count increments (not duplicates)

Production Monitoring

Metrics to Track

Error Write Latency
- P50, P95, P99 for LogError.call
- Target: P95 < 50ms, P99 < 200ms
Application Cache Hit Rate
- Percentage of find_or_create_by_name that hit cache
- Target: > 95%
Query Performance
- Dashboard stats query time
- Analytics query time
- Filtering query time
- Target: All < 100ms P95
Database Metrics
- Lock wait time
- Deadlock count (should be 0)
- Index scan ratio (vs sequential scans)

New Relic / APM Integration

# In config/initializers/rails_error_dashboard.rb

# Track error logging performance
ActiveSupport::Notifications.subscribe("log_error.rails_error_dashboard") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)

  NewRelic::Agent.record_metric(
    "Custom/ErrorDashboard/LogError",
    event.duration
  )

  NewRelic::Agent.record_metric(
    "Custom/ErrorDashboard/Application/#{event.payload[:application_name]}",
    event.duration
  )
end

# Instrument find_or_create_by_name
RailsErrorDashboard::Application.class_eval do
  def self.find_or_create_by_name_with_instrumentation(name)
    ActiveSupport::Notifications.instrument("application.find_or_create", application: name) do
      find_or_create_by_name_without_instrumentation(name)
    end
  end

  class << self
    alias_method :find_or_create_by_name_without_instrumentation, :find_or_create_by_name
    alias_method :find_or_create_by_name, :find_or_create_by_name_with_instrumentation
  end
end

Alerting Thresholds

Set up alerts for:

High Error Write Latency
- Alert if P95 > 200ms for 5 minutes
- Action: Check database load, slow queries
Cache Miss Rate
- Alert if cache hit rate < 90% for 10 minutes
- Action: Check Rails cache backend, memory
Dashboard Query Slowness
- Alert if dashboard stats > 500ms
- Action: Check error log count, consider archiving old errors
Lock Contention
- Alert if lock wait events > 10/minute
- Action: Check for long transactions, review locking strategy

Performance Checklist

Use this checklist to verify multi-app performance:

Database Layer

Applications table index on name (unique)
Error logs index on application_id
Composite index on [application_id, occurred_at]
Composite index on [application_id, resolved]
Foreign key constraint exists
pg_stat_statements enabled (PostgreSQL)
Index scan ratio > 95%
Cache hit ratio > 99%
Zero deadlocks in production

Rails Application Layer

Application lookups use find_or_create_by_name (cached)
Cache backend configured (Redis, Memcached, or memory)
Cache expiry set to 1 hour
Query objects use base_scope helper
All dashboard queries scoped by application_id
Row-level locking in find_or_increment_by_hash
Error hash includes application_id

Monitoring

Troubleshooting

Problem: Slow Error Writes (>200ms)

Diagnosis:

-- Check if indexes are used
EXPLAIN ANALYZE
SELECT * FROM rails_error_dashboard_error_logs
WHERE application_id = 1 AND error_hash = 'abc123';

Solutions:

Verify composite index exists: CREATE INDEX index_error_logs_on_app_hash ON rails_error_dashboard_error_logs(application_id, error_hash);
Check database load and connection pool
Enable async logging in config

Problem: High Cache Miss Rate

Diagnosis:

# Check if cache backend is working
Rails.cache.write("test", "value")
Rails.cache.read("test") # Should return "value"

Solutions:

Check Rails cache backend (config/environments/production.rb)
Verify cache size limits aren’t being hit
Check if cache is being cleared too frequently

Problem: Deadlocks

Diagnosis:

# Check PostgreSQL logs
grep "deadlock detected" /var/log/postgresql/postgresql-*.log

Solutions:

Verify find_or_increment_by_hash uses .lock
Check for custom queries that bypass row-level locking
Review transaction isolation level (should be READ COMMITTED)

Problem: Ungranted Locks

Diagnosis:

SELECT * FROM pg_locks WHERE NOT granted;

Solutions:

Check for long-running transactions blocking writes
Verify no table-level locks (DDL operations during writes)
Consider increasing max_locks_per_transaction if hitting limit

Benchmarks

These are reference benchmarks from testing multi-app support:

Write Performance

Scenario	Throughput	P95 Latency	P99 Latency
1 app, sequential	200 writes/sec	5ms	10ms
5 apps, sequential	195 writes/sec	6ms	12ms
5 apps, 10 threads	950 writes/sec	25ms	45ms
5 apps, 50 threads	2800 writes/sec	85ms	150ms

Test environment: PostgreSQL 14, 4 CPU, 8GB RAM, local network

Cache Performance

Operation	Without Cache	With Cache	Speedup
Application lookup	2.5ms	0.05ms	50x
1000 lookups	2500ms	50ms	50x

Query Performance

Query	All Apps	Single App	Reduction
Dashboard stats	45ms	12ms	73%
Analytics (7 days)	180ms	55ms	69%
Error list (paginated)	25ms	8ms	68%

Based on database with 100,000 errors across 5 applications

Optimization Tips

Use Async Logging: Enable config.async_logging = true to move error writes out of request cycle
Archive Old Errors: Use error_dashboard:cleanup_resolved rake task to remove old resolved errors
Separate Database: Consider dedicated database for error dashboard to isolate performance impact
Connection Pooling: Increase connection pool size if seeing “could not obtain connection” errors
Read Replicas: Route dashboard queries to read replicas to reduce load on primary
Sampling: Enable error sampling for high-frequency errors to reduce write volume

Contact

For performance issues or optimization questions:

GitHub Issues: https://github.com/YourUsername/rails_error_dashboard/issues
Performance label: performance + multi-app

Include:

Database type and version
Number of applications
Error write rate (per second)
Relevant query plans (EXPLAIN ANALYZE output)
APM screenshots if available