Documentation

fsds/transport-duplicate-delivery-id-fix.md

Duplicate Delivery ID Error - Diagnosis and Fix Plan

Problem Summary

PostgreSQL error 23505: duplicate key value violates unique constraint "IX_deliveries_shipment_id_delivery_number_tenant_id" occurs during BBU component's Oracle/OBLPN processing.

Root Cause Analysis

Finding 1: Dead Code Identified

There are two methods in OracleProcessor.cs that upsert shipments:

Method Lines Status Duplicate Handling
UpsertShipmentWithContextAsync 1809-1962 ACTIVE (called at line 1202) Partial - has try-catch but retry is unprotected
UpsertShipmentAsync 2787-2960 DEAD CODE (never called) None

Finding 2: Unprotected Retry in Active Code Path

In UpsertShipmentWithContextAsync, the duplicate handling has a vulnerability:

// Line 1909-1927
try {
    deliveriesByOrder = await EnsureShipmentDeliveriesAsync(...);
    await transportDb.SaveChangesAsync(cancellationToken);  // ← First attempt
} catch(DbUpdateException ex) when (...) {
    // Clear tracker, re-fetch shipment
    deliveriesByOrder = await EnsureShipmentDeliveriesAsync(...);
    await transportDb.SaveChangesAsync(cancellationToken);  // ← RETRY HAS NO CATCH!
}

The Bug: Line 1926 (retry SaveChangesAsync) has no try-catch. If another racing process creates a new delivery between the re-fetch and retry save, the exception propagates unhandled.

Finding 3: Race Condition Scenario

  1. Process A: First attempt fails on delivery D1 (23505)
  2. Process A: Catches error, clears tracker, re-fetches shipment (now includes D1)
  3. Process C: Meanwhile creates delivery D2 (committed to DB)
  4. Process A: EnsureShipmentDeliveriesAsync doesn't see D2 (not yet visible during re-fetch)
  5. Process A: Retry creates D2, calls SaveChanges
  6. Boom: 23505 on D2 with no catch - exception propagates!

Finding 4: Database Evidence

Same delivery numbers across multiple shipments created within milliseconds:

delivery_number  | shipment_id                          | status_updated_at_utc
120920253151003  | 019afae1-8213-7537-959a-ed54b57f3f03 | 2025-12-07 22:14:34.228112
120920253151003  | 019afae1-8210-71dd-bf5c-eff512ca48fe | 2025-12-07 22:14:34.220680

Fix Plan

Step 1: Remove Dead Code

File: engines/bbu/src/Acsis.Dynaplex.Engines.Bbu/Services/OracleProcessor.cs

Delete the entire UpsertShipmentAsync method (lines ~2782-2960). It's never called and just adds confusion and maintenance burden.

Step 2: Add Retry Loop to Handle Cascading Race Conditions

File: engines/bbu/src/Acsis.Dynaplex.Engines.Bbu/Services/OracleProcessor.cs

Replace the single try-catch with a retry loop that handles multiple consecutive race conditions:

Current code (lines 1905-1927):

Dictionary<string, Delivery> deliveriesByOrder;

// Handle race condition: multiple parallel processors may try to create the same delivery
// If we hit a duplicate key error, clear the tracker and retry once
try {
    deliveriesByOrder = await EnsureShipmentDeliveriesAsync(transportDb, shipment!, deliveries, prismDb, cancellationToken);
    await transportDb.SaveChangesAsync(cancellationToken);
} catch(DbUpdateException ex) when (ex.InnerException is PostgresException { SqlState: "23505" } pgEx
                                    && pgEx.ConstraintName?.Contains("deliveries") == true) {
    _logger.LogWarning("Duplicate delivery detected for shipment {Oblpn}, retrying with fresh data", oblpn);

    // Clear the change tracker to remove conflicting entities
    transportDb.ChangeTracker.Clear();

    // Re-fetch the shipment with deliveries from database
    shipment = await transportDb.Shipments
        .Include(s => s.Deliveries)
        .FirstAsync(s => s.Id == shipment!.Id, cancellationToken);

    // Retry - this time EnsureShipmentDeliveriesAsync will find the existing deliveries
    deliveriesByOrder = await EnsureShipmentDeliveriesAsync(transportDb, shipment, deliveries, prismDb, cancellationToken);
    await transportDb.SaveChangesAsync(cancellationToken);
}

Updated code with retry loop:

Dictionary<string, Delivery> deliveriesByOrder;

// Handle race conditions: multiple parallel processors may try to create the same delivery
// Use a retry loop to handle cascading race conditions under high parallelism
const int maxRetries = 3;
for(var attempt = 1; attempt <= maxRetries; attempt++) {
    try {
        deliveriesByOrder = await EnsureShipmentDeliveriesAsync(transportDb, shipment!, deliveries, prismDb, cancellationToken);
        await transportDb.SaveChangesAsync(cancellationToken);
        break; // Success - exit loop
    } catch(DbUpdateException ex) when (
        attempt < maxRetries &&
        ex.InnerException is PostgresException { SqlState: "23505" } pgEx &&
        pgEx.ConstraintName?.Contains("deliveries") == true) {

        _logger.LogWarning(
            "Duplicate delivery detected for shipment {Oblpn} (attempt {Attempt}/{MaxRetries}), retrying with fresh data",
            oblpn, attempt, maxRetries);

        // Clear the change tracker to remove conflicting entities
        transportDb.ChangeTracker.Clear();

        // Re-fetch the shipment with deliveries from database
        shipment = await transportDb.Shipments
            .Include(s => s.Deliveries)
            .FirstAsync(s => s.Id == shipment!.Id, cancellationToken);
    }
}

Files to Modify

  1. engines/bbu/src/Acsis.Dynaplex.Engines.Bbu/Services/OracleProcessor.cs
    • Delete UpsertShipmentAsync method (lines ~2782-2960) - dead code cleanup
    • Replace try-catch with retry loop (lines ~1905-1927)

Testing

  1. Run the BBU component with parallel OBLPN message processing
  2. Verify no more 23505 errors on deliveries
  3. Confirm the log shows retry attempts under high load
  4. Verify deliveries are created correctly without duplicates

Summary

  • Root cause: The single retry in the catch block had no protection against cascading race conditions
  • Dead code: UpsertShipmentAsync is never called and should be removed
  • Fix: Replace single try-catch with a retry loop (max 3 attempts) to handle multiple racing processes