Data Lineage
·article·2026-06-12
Data Lineage
Definition
The ability to trace any reported number back through every transformation to its original source: an API feed (with batch id), a document (with reference), or a manual entry (with author and attachment).
Worked Example
Dashboard: 'Service X June COGS per active user: $0.41'
$0.41
|- = June COGS $8,610 / 21,000 active users
|- COGS $8,610
| |- $6,000 AWS Cost Explorer API, batch 2026-07-02
| |- $1,450 Stripe fees feed, batch #5520
| |- $ 900 Twilio invoice INV-88412 (AP #3107)
| |- $ 260 manual entry by ops@..., receipt.pdf attached
|- 21,000 users
|- Google Analytics daily aggregation
Interpretation & Pitfalls
A number whose source cannot be identified is, for audit purposes, not a number — it's a rumor. Manual entries demand the strictest metadata because they are the weakest link.
In TupicFinance
Every data point carries an explicit source field (AWS, Stripe, Cloudflare, Google Analytics, manual), making the lineage walk a query, not an investigation.