Documentation

how-to/troubleshoot-dynaplex.md


title: Troubleshooting Guide

Overview

This guide provides diagnostic and troubleshooting tips for the Dynaplex system.

  • Service Health - Check if services are running correctly
  • Log Analysis - Parse and analyze service logs
  • Connection Issues - Debug database, MQTT, and HTTP connections
  • Performance - Identify slow operations and bottlenecks
  • Error Tracking - Find and analyze errors in production

Common Issues Quick Reference

Issue 1: Service Won't Start

Symptoms:

  • Service fails to start in Aspire
  • Container app shows "Provisioning" state
  • Health checks failing

Common causes:

  1. Missing configuration: Check appsettings.json and environment variables
  2. Database not ready: Ensure db-manager completed migrations
  3. Port conflict: Check if port is already in use
  4. Dependency not started: Check service dependencies in Aspire

Issue 2: Database Connection Errors

Symptoms:

  • "Unable to connect to database" errors
  • Timeout exceptions
  • NullReferenceException in data access

Common causes:

  1. Wrong connection string format (SQL Server vs PostgreSQL)
  2. Database server not running
  3. Firewall blocking connection
  4. SSL/TLS certificate issues
  5. Service lifetime error (scoped service in singleton)

Issue 3: Service Lifetime Error

Error: Cannot consume scoped service 'DbContext' from singleton 'IHostedService'

Root Cause: Hosted service (singleton) trying to inject scoped service directly

Fix:

// Change from:
public MyProcessor(ILogger logger, DbContext db) { }

// To:
public MyProcessor(ILogger logger, IServiceProvider serviceProvider) {
    _serviceProvider = serviceProvider;
}

// Access scoped service:
await using var scope = _serviceProvider.CreateAsyncScope();
var db = scope.ServiceProvider.GetRequiredService<DbContext>();

Issue 4: PostgreSQL DateTime Error

Error: Cannot write DateTimeOffset with Offset=-04:00:00 to PostgreSQL

Root Cause: Using DateTimeOffset with non-UTC offset for PostgreSQL timestamp

Fix:

// Change property type:
public DateTime Timestamp { get; set; }  // Not DateTimeOffset

// Convert when setting:
entity.Timestamp = source.UtcDateTime;
entity.CreatedAt = DateTime.UtcNow;

Issue 5: MQTT Session Takeover

Symptoms: Repeated connect/disconnect cycle

Root Cause: Multiple instances with same client ID

Fix:

{
  "Mqtt": {
    "AppendUniqueIdToClientId": true
  }
}

Issue 6: Duplicate Message Processing

Symptoms: Same message processed by multiple instances

Root Cause: Not using shared subscriptions

Fix:

{
  "Mqtt": {
    "SharedSubscriptionGroup": "my-processors"
  }
}

Issue 7: Messages Lost During Disconnect

Symptoms: Messages not delivered after reconnection

Root Cause: Using CleanSession=true

Fix:

{
  "Mqtt": {
    "CleanSession": false
  }
}

Issue 8: Port Binding Conflicts

Error: listen tcp [::1]:5000: bind: address already in use

Root Cause: Hardcoded port in launchSettings.json or AppHost

Fix:

  1. Remove applicationUrl from launchSettings.json
  2. Remove targetPort from AppHost registration
  3. Let Aspire assign ports dynamically

Common Port Conflicts:

  • Port 5000: macOS AirPlay Receiver
  • Port 8080: Common development servers
  • Port 3000: Node.js development servers

Issue 9: Slow Performance

Symptoms:

  • API requests taking too long
  • High CPU/memory usage
  • Database query timeouts

Common causes:

  1. Missing database indexes
  2. N+1 query problems
  3. Large result sets loaded into memory
  4. Synchronous I/O blocking threads
  5. Inefficient LINQ queries

Issue 10: Build Errors After Refactoring

Common Causes:

  • Missing using statements
  • Namespace changes not updated
  • Project references broken
  • Missing NuGet packages

Systematic Fix:

  1. Check using statements
  2. Verify namespaces match folders
  3. Validate project references
  4. Restore NuGet packages

Troubleshooting Workflow

Step 1: Identify the Problem

  • What is the symptom?
  • When did it start?
  • Is it consistent or intermittent?
  • Which service(s) are affected?

Step 2: Gather Information

  • Check service health status
  • Review recent logs
  • Check distributed traces
  • Look for related errors

Step 3: Form Hypothesis

  • What could cause this symptom?
  • Have there been recent changes?
  • Are there any patterns?

Step 4: Test Hypothesis

  • Check configuration
  • Test connections
  • Review code changes
  • Run diagnostic scripts

Step 5: Apply Fix

  • Make minimal changes
  • Test in development first
  • Deploy to staging
  • Monitor after deployment

Step 6: Verify Resolution

  • Check service health
  • Monitor logs for errors
  • Run smoke tests
  • Document the fix

Quick Diagnostic Commands

# Build errors
dotnet build --no-restore 2>&1 | grep error

# Find project references
grep -r "ProjectReference" --include="*.csproj"

# Check namespace consistency
grep -r "^namespace" --include="*.cs" | sort

# Check MQTT config
cat appsettings.json | grep -A 20 "Mqtt"

Tips for Effective Diagnostics

  1. Start with service health: Check if all services are running
  2. Use distributed tracing: Follow request path through system
  3. Correlate logs: Use operation IDs to connect related logs
  4. Check recent changes: What changed before the issue started?
  5. Reproduce locally: Try to reproduce in development
  6. Narrow the scope: Isolate which service has the problem
  7. Read error messages: They often contain the root cause
  8. Check configuration: Many issues are config-related

References

External documentation: