Documentation
adrs/026-structured-logging-observability.md
ADR-026: Structured Logging and Observability Strategy
Status
Accepted (Partially Implemented)
Implementation Progress
Completed
- ✅ OpenTelemetry integration for logs, metrics, and traces (
Extensions.cs) - ✅ OTLP exporter configuration via environment variables
- ✅ Azure Monitor integration support
- ✅ Serilog file logging in Development mode with daily rolling
- ✅ Health check filtering from traces
- ✅ Component name enrichment for log files
- ✅ Polly/Resilience logging noise reduction
- ✅ Aspire Dashboard for local development observability
In Progress
- ⚠️ Standardized logging patterns across all components
- ⚠️ High-performance LoggerMessage source generators
Remaining
- ⏳ Consistent use of log scopes for operations
- ⏳ Sensitive data redaction with Microsoft.Extensions.Compliance
- ⏳ Custom metrics instrumentation per component
- ⏳ Production log aggregation configuration
- ⏳ Alerting and dashboard setup
Key Implementation Files
strata/service-defaults/src/Acsis.Dynaplex.Strata.ServiceDefaults/Extensions.cs- Component-specific logging patterns (to be standardized)
Context
Observability is critical for understanding, debugging, and operating distributed microservices. The Dynaplex architecture requires comprehensive telemetry that correlates across service boundaries while maintaining high performance and protecting sensitive data.
Requirements
- Performance: Logging must not impact service latency or throughput
- Correlation: Trace requests across service boundaries automatically
- Structure: Logs must be machine-parseable for querying and alerting
- Security: Sensitive data must never appear in logs
- Consistency: All components must follow the same patterns
- Flexibility: Support multiple backends (OTLP, Azure Monitor, local files)
Existing Foundation
- .NET Aspire provides OpenTelemetry configuration via ServiceDefaults
- OpenTelemetry is the industry-standard, vendor-neutral observability framework
- .NET 10 has first-class support for LoggerMessage source generators
Decision
We adopt a three-pillar observability strategy built on OpenTelemetry:
- Logs: Structured logging via ILogger with OpenTelemetry export
- Traces: Distributed tracing via System.Diagnostics.Activity
- Metrics: Custom and automatic metrics via System.Diagnostics.Metrics
Core Principles
- OpenTelemetry-first: Use native .NET logging/tracing APIs that feed into OpenTelemetry
- High-performance by default: LoggerMessage source generators for all logging
- Automatic correlation: Activity.TraceId propagates through all calls
- Redact, don't omit: Use data classification to redact sensitive info
- Semantic conventions: Follow OpenTelemetry semantic conventions
Consequences
Positive
- Performance: Source-generated logging eliminates boxing and allocation
- Debugging: Trace any request across all services via TraceId
- Searchability: Query structured logs by any property
- Flexibility: Switch observability backends without code changes
- Compliance: Audit trails with full context, sensitive data redacted
Negative
- Storage costs: Structured logs are larger than plain text
- Learning curve: Team must learn structured logging patterns
- Migration effort: Existing logging must be updated
Neutral
- Tool selection: Azure Monitor for production, Aspire Dashboard for dev
Implementation
1. High-Performance Logging with Source Generators
Every component should define its log messages using the LoggerMessageAttribute for maximum performance. This eliminates boxing, reduces allocations, and provides compile-time validation.
// CatalogLogMessages.cs - Define all log messages for the component
namespace Acsis.Dynaplex.Engines.Catalog;
/// <summary>
/// High-performance log messages for the Catalog component.
/// Uses source generators to eliminate boxing and reduce allocations.
/// </summary>
public static partial class CatalogLogMessages
{
// ========================================
// Item Operations (EventId 1000-1099)
// ========================================
[LoggerMessage(
EventId = 1000,
Level = LogLevel.Information,
Message = "Creating item {ItemName} of type {ItemTypeId} at location {LocationId}")]
public static partial void CreatingItem(
this ILogger logger,
string itemName,
Guid itemTypeId,
Guid locationId);
[LoggerMessage(
EventId = 1001,
Level = LogLevel.Information,
Message = "Created item {ItemId} with passport {PassportId}")]
public static partial void CreatedItem(
this ILogger logger,
Guid itemId,
Guid passportId);
[LoggerMessage(
EventId = 1002,
Level = LogLevel.Warning,
Message = "Item {ItemId} not found")]
public static partial void ItemNotFound(
this ILogger logger,
Guid itemId);
[LoggerMessage(
EventId = 1003,
Level = LogLevel.Error,
Message = "Failed to create item {ItemName}")]
public static partial void FailedToCreateItem(
this ILogger logger,
string itemName,
Exception exception);
// ========================================
// Move Operations (EventId 1100-1199)
// ========================================
[LoggerMessage(
EventId = 1100,
Level = LogLevel.Information,
Message = "Moving {ItemCount} items from {SourceLocationId} to {DestinationLocationId}")]
public static partial void MovingItems(
this ILogger logger,
int itemCount,
Guid sourceLocationId,
Guid destinationLocationId);
[LoggerMessage(
EventId = 1101,
Level = LogLevel.Information,
Message = "Moved item {ItemId} to location {LocationId}")]
public static partial void MovedItem(
this ILogger logger,
Guid itemId,
Guid locationId);
// ========================================
// Batch Operations (EventId 1200-1299)
// ========================================
[LoggerMessage(
EventId = 1200,
Level = LogLevel.Information,
Message = "Starting batch import of {ItemCount} items")]
public static partial void StartingBatchImport(
this ILogger logger,
int itemCount);
[LoggerMessage(
EventId = 1201,
Level = LogLevel.Information,
Message = "Completed batch import: {SuccessCount} succeeded, {FailureCount} failed")]
public static partial void CompletedBatchImport(
this ILogger logger,
int successCount,
int failureCount);
[LoggerMessage(
EventId = 1202,
Level = LogLevel.Warning,
Message = "Batch import item {Index} failed: {Reason}")]
public static partial void BatchImportItemFailed(
this ILogger logger,
int index,
string reason);
}
2. Using Log Messages in Services
public class ItemsService(
CatalogDb db,
ILogger<ItemsService> logger,
ICurrentTenantService tenantService)
{
public async Task<Item> CreateItemAsync(CreateItemRequest request, CancellationToken ct)
{
// Log entry with structured properties
logger.CreatingItem(request.Name, request.ItemTypeId, request.LocationId);
try
{
var item = new Item
{
Id = Guid.NewGuid(),
Name = request.Name,
ItemTypeId = request.ItemTypeId,
LocationId = request.LocationId,
TenantId = tenantService.TenantId
};
db.Items.Add(item);
await db.SaveChangesAsync(ct);
// Log success with result data
logger.CreatedItem(item.Id, item.PassportId);
return item;
}
catch (Exception ex)
{
// Log failure with exception - exception is captured automatically
logger.FailedToCreateItem(request.Name, ex);
throw;
}
}
}
3. Log Scopes for Complex Operations
Use log scopes to add context that applies to multiple log entries:
public class BatchProcessor(ILogger<BatchProcessor> logger)
{
public async Task ProcessBatchAsync(Batch batch, CancellationToken ct)
{
// All logs within this scope will include BatchId and BatchSize
using (logger.BeginScope(new Dictionary<string, object>
{
["BatchId"] = batch.Id,
["BatchSize"] = batch.Items.Count,
["BatchType"] = batch.Type.ToString()
}))
{
logger.StartingBatchImport(batch.Items.Count);
var successCount = 0;
var failureCount = 0;
for (var i = 0; i < batch.Items.Count; i++)
{
var item = batch.Items[i];
// Inner scope for per-item context
using (logger.BeginScope("Processing item {ItemIndex} of {TotalItems}", i + 1, batch.Items.Count))
{
try
{
await ProcessItemAsync(item, ct);
successCount++;
}
catch (Exception ex)
{
failureCount++;
logger.BatchImportItemFailed(i, ex.Message);
}
}
}
logger.CompletedBatchImport(successCount, failureCount);
}
}
}
4. Automatic Correlation with OpenTelemetry
OpenTelemetry automatically propagates TraceId and SpanId across HTTP calls. The ServiceDefaults configuration handles this:
// In Extensions.cs - already implemented
builder.Logging.AddOpenTelemetry(logging =>
{
logging.IncludeFormattedMessage = true;
logging.IncludeScopes = true; // Essential for scope propagation
});
When viewing logs, you can filter by TraceId to see all logs from a single request across all services:
TraceId: 4bf92f3577b34da6a3ce929d0e0e4736
5. Tenant and User Context Enrichment
Add tenant and user context to all logs automatically via middleware:
// TelemetryEnrichmentMiddleware.cs
public class TelemetryEnrichmentMiddleware(RequestDelegate next)
{
public async Task InvokeAsync(HttpContext context)
{
var activity = Activity.Current;
// Add tenant context
if (context.User.FindFirst("tenant_id")?.Value is { } tenantId)
{
activity?.SetTag("tenant.id", tenantId);
activity?.SetBaggage("tenant.id", tenantId);
}
// Add user context (never log sensitive user data)
if (context.User.FindFirst("sub")?.Value is { } userId)
{
activity?.SetTag("user.id", userId);
}
// Add request context
activity?.SetTag("http.request.id", context.TraceIdentifier);
await next(context);
}
}
6. Sensitive Data Redaction
Use Microsoft.Extensions.Compliance for automatic redaction of sensitive data:
// Define data classifications
public static class DataClassifications
{
public static DataClassification Pii => new("PII");
public static DataClassification Sensitive => new("Sensitive");
}
// Apply to log parameters
[LoggerMessage(
EventId = 2000,
Level = LogLevel.Information,
Message = "User {UserId} logged in from {IpAddress}")]
public static partial void UserLoggedIn(
this ILogger logger,
string userId,
[Sensitive] string ipAddress); // IP address will be redacted
// Configure redaction in DI
services.AddRedaction(builder =>
{
builder.SetRedactor<ErasingRedactor>(DataClassifications.Pii);
builder.SetRedactor<StarRedactor>(DataClassifications.Sensitive);
});
// Enable redaction in logging
builder.Logging.EnableRedaction();
7. Component-Specific Metrics
Each component should define custom metrics for business operations:
// CatalogMetrics.cs
public class CatalogMetrics
{
private readonly Counter<long> _itemsCreated;
private readonly Counter<long> _itemsMoved;
private readonly Histogram<double> _batchProcessingDuration;
public CatalogMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("Acsis.Dynaplex.Engines.Catalog");
_itemsCreated = meter.CreateCounter<long>(
"catalog.items.created",
unit: "{item}",
description: "Number of items created");
_itemsMoved = meter.CreateCounter<long>(
"catalog.items.moved",
unit: "{item}",
description: "Number of items moved");
_batchProcessingDuration = meter.CreateHistogram<double>(
"catalog.batch.duration",
unit: "s",
description: "Duration of batch processing operations");
}
public void RecordItemCreated(string itemType) =>
_itemsCreated.Add(1, new KeyValuePair<string, object?>("item.type", itemType));
public void RecordItemMoved(Guid fromLocation, Guid toLocation) =>
_itemsMoved.Add(1,
new KeyValuePair<string, object?>("from.location", fromLocation.ToString()),
new KeyValuePair<string, object?>("to.location", toLocation.ToString()));
public void RecordBatchDuration(double seconds, string batchType) =>
_batchProcessingDuration.Record(seconds,
new KeyValuePair<string, object?>("batch.type", batchType));
}
8. Configuration
Development (appsettings.Development.json)
{
"Logging": {
"LogLevel": {
"Default": "Debug",
"Microsoft": "Information",
"Microsoft.AspNetCore": "Warning",
"Microsoft.EntityFrameworkCore": "Information",
"Microsoft.EntityFrameworkCore.Database.Command": "Information"
},
"Console": {
"IncludeScopes": true,
"TimestampFormat": "HH:mm:ss.fff "
}
}
}
Production
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.AspNetCore": "Warning",
"Microsoft.EntityFrameworkCore": "Warning"
}
},
"OTEL_EXPORTER_OTLP_ENDPOINT": "https://your-otel-collector:4317",
"APPLICATIONINSIGHTS_CONNECTION_STRING": "InstrumentationKey=..."
}
9. Event ID Allocation Strategy
Each component should use a distinct range of Event IDs:
| Component | EventId Range | Example |
|---|---|---|
| Catalog | 1000-1999 | 1000 = ItemCreated |
| Spatial | 2000-2999 | 2000 = LocationCreated |
| Transport | 3000-3999 | 3000 = ShipmentCreated |
| Workflow | 4000-4999 | 4000 = WorkflowStarted |
| Identity | 5000-5999 | 5000 = UserAuthenticated |
| IoT | 6000-6999 | 6000 = TagRead |
| Events | 7000-7999 | 7000 = EventPublished |
| Prism | 8000-8999 | 8000 = PassportCreated |
| Core-Data | 9000-9999 | 9000 = ReferenceDataLoaded |
10. Log Level Guidelines
| Level | When to Use | Examples |
|---|---|---|
| Trace | Detailed diagnostic info, development only | Method entry/exit, variable values |
| Debug | Development and troubleshooting | Cache hits/misses, query details |
| Information | Normal operations, business events | Item created, user logged in |
| Warning | Unusual but handled situations | Retry occurred, deprecated API used |
| Error | Operation failures, exceptions | Database error, external API failed |
| Critical | System failures requiring immediate attention | Service unavailable, data corruption |
Migration Plan
- Phase 1: Create LogMessages classes for each component ⏳
- Phase 2: Migrate existing logging to use source generators ⏳
- Phase 3: Add log scopes to complex operations ⏳
- Phase 4: Implement sensitive data redaction ⏳
- Phase 5: Add component-specific metrics ⏳
- Phase 6: Configure production log aggregation ⏳
- Phase 7: Create dashboards and alerts ⏳
Related ADRs
- ADR-019: OpenTelemetry Integration (traces and metrics)
- ADR-020: Container-Aware Configuration (environment-based settings)
- ADR-027: Event-Driven Architecture (event logging)