Documentation
fsds/prism-data-cleanup-and-journal-backfill-plan.md
Data Cleanup and Journal Backfill Plan
Overview
Fix duplicate EPC/basket and OBLPN/stack records, then backfill journal tables for pre-existing data.
Part 1: Duplicate Basket Cleanup (2,566 duplicate EPCs)
Findings
- 2,566 EPCs are each assigned to 2 different Bakery Basket items
- The "newer" baskets (created Dec 2) have MORE activity because BBU hardening deployed Dec 2
- Movement counts per basket are reasonable (max ~19, not 822 as aggregate suggested)
- Strategy: Keep older (original) basket, migrate all relations, delete newer
Tables to Update (in order)
spatial.movements- Updateitem_idfrom newer basket to older basketbbu.tag_read_processing- Updatebasket_idfrom newer to olderbbu.camera_read_baskets- Updatebasket_item_idfrom newer to olderbbu.movement_deduplication- Updatebasket_idfrom newer to olderprism.identifiers- Delete identifier records for newer baskets (EPC_HEX)catalog.items- Delete the newer basket itemsprism.passports- Delete passports for newer baskets (cascade should handle most)
SQL Script Approach
-- Step 1: Create mapping table of older->newer basket pairs
CREATE TEMP TABLE basket_duplicate_map AS
WITH ranked AS (
SELECT
i.value as epc,
i.associated_object_id as basket_id,
ROW_NUMBER() OVER (PARTITION BY i.value ORDER BY
to_timestamp((('x' || substring(i.associated_object_id::text, 1, 8) ||
substring(i.associated_object_id::text, 10, 4))::bit(48)::bigint) / 1000.0)
) as rn
FROM prism.identifiers i
WHERE i.identifier_type_id = 273
AND i.value IN (SELECT value FROM prism.identifiers WHERE identifier_type_id = 273
GROUP BY value HAVING COUNT(DISTINCT associated_object_id) > 1)
)
SELECT
r1.epc,
r1.basket_id as keep_basket_id,
r2.basket_id as delete_basket_id
FROM ranked r1
JOIN ranked r2 ON r1.epc = r2.epc AND r1.rn = 1 AND r2.rn = 2;
-- Step 2-5: Migrate relations (UPDATE statements)
-- Step 6-7: Delete duplicates
Part 2: Duplicate Stack Cleanup (18,547 duplicate OBLPNs)
Findings
- Both older and newer stacks have allocations to the SAME shipments
- Allocations are truly duplicated (same OBLPN allocated twice to same shipment)
- Strategy: Keep older (original) stack, delete newer stack AND its allocations
Verification Confirmed
- Older and newer stacks point to same shipment_id for each OBLPN pair
- Safe to delete newer stacks without data loss
Tables to Update (in order)
bbu.camera_read_processing- Updatematched_stack_idfrom newer to older (11 records)transport.shipment_item_allocations- Delete allocations for newer stacksprism.identifiers- Delete all identifiers for newer stacks (OBLPN, PRIMARY_OBLPN, etc.)catalog.items- Delete the newer stack itemsprism.passports- Delete passports for newer stacks
Part 3: Code-Level Uniqueness Enforcement
Decision
- Do NOT add database constraints yet (tenant scoping needs consideration)
- Add application-level checks in BBU component
Already Protected (Confirmed)
- Basket creation:
BasketProcessingService.cschecks EPC viaPrismQueryExtensions.GetEntityIdsByIdentifierAsync()before creating (added Dec 2, 2025) - Stack creation:
OracleProcessor.csuses same pattern for OBLPN lookup before creating
No Additional Code Changes Needed
Both paths already have proper upsert logic with the Dec 2 hardening deployment.
Part 4: Journal Backfill
Current State
- Journal triggers added Dec 9, 2025
- ~99.3% of passports (2,178,938) and identifiers (4,095,442) lack journal entries
- Attribute tables have 0 journal entries
Backfill Strategy
Timestamp Sources (in priority order)
- IoT
zb_tag_reads.event_timestamp- For baskets (first EPC read time) background_service_runs.started_at- For Oracle-imported entities- UUIDv7 extraction - Fallback: extract timestamp from entity ID itself
- Default - Use earliest known system timestamp if no source available
Journal Table Schema (per JournalTable base class)
- seq_id: 1 (first version)
- valid_from: <calculated timestamp>
- valid_to: 'infinity'::timestamptz
- initiated_by_user_id: <matched processor user UUID>
- invalidated_by_user_id: NULL
System User Mapping for initiated_by_user_id
| Entity Type | User | UUID |
|---|---|---|
| Baskets (Bakery Basket items) | rfid-processor | 019a4865-3c89-7069-a21c-510374e7f460 |
| Stacks (from Oracle) | oblpn-processor | 019a4865-3b47-72b3-8a86-e2f3b89a2ee1 |
| Shipments/Deliveries | shipment-processor | 019a4865-3ca1-7624-8625-13cd1d35ca50 |
| Camera reads | camera-processor | 019a4865-3c98-702f-9fae-23b0be51de11 |
| Locations/Other | acsis | 019a4865-1616-7080-8fe3-8be053ae98d4 |
Backfill Script Approach
-- Must set session variable for trigger compatibility
SET app.user_id = '<system-user-uuid>';
-- Insert into journal tables directly (bypass triggers for initial backfill)
INSERT INTO prism.passports_journal (global_id, platform_type_id, seq_id, valid_from, valid_to, initiated_by_user_id)
SELECT
p.global_id,
p.platform_type_id,
1 as seq_id,
COALESCE(
-- Try to find creation time from source data
<source_timestamp_subquery>,
-- Fallback to UUIDv7 extraction
to_timestamp((('x' || substring(p.global_id::text, 1, 8) ||
substring(p.global_id::text, 10, 4))::bit(48)::bigint) / 1000.0)
) as valid_from,
'infinity'::timestamptz as valid_to,
'<system-user-uuid>'::uuid as initiated_by_user_id
FROM prism.passports p
WHERE NOT EXISTS (SELECT 1 FROM prism.passports_journal pj WHERE pj.global_id = p.global_id);
Tables to Backfill
passports_journal- 2,178,938 recordsidentifiers_journal- 4,095,442 recordsattribute_definitions_journal- 2 recordsattribute_assignments_journal- 2 recordsattribute_values_journal- 1,146 records
Execution Order
- Backup database (user already planning this)
- Part 1: Clean up duplicate baskets
- Part 2: Clean up duplicate stacks
- Part 3: Verify code protections (already done - no changes needed)
- Part 4: Run journal backfill scripts
- Verify: Run data integrity checks post-cleanup
Key Files Referenced
/engines/bbu/src/Acsis.Dynaplex.Engines.Bbu/Services/BasketProcessingService.cs- Basket upsert logic/engines/bbu/src/Acsis.Dynaplex.Engines.Bbu/Services/OracleProcessor.cs- Stack upsert logic/engines/prism/src/Acsis.Dynaplex.Engines.Prism.Database/Routines/journal_writer.sql- Journal trigger/strata/core/src/Acsis.Dynaplex/JournalTable.cs- Journal base class
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| FK violations during delete | Delete in correct order; use temp mapping tables |
| Missing data after cleanup | Verify all relations migrated before deleting |
| Journal backfill performance | Batch inserts, run during low-activity period |
| Wrong timestamps in journal | Use multiple fallback sources; UUIDv7 extraction as last resort |