A duplicate invoice payment is the smallest of accounting losses and the hardest to notice. Each overpayment is small, often a few thousand dollars buried inside a normal payment run. The aggregate across a year is meaningful: industry benchmarks place it at 0.1% to 0.5% of total AP spend. The reason duplicates persist even at companies running modern AP automation is that most platforms check only the invoice number, while vendors routinely re-issue the same invoice with a slightly modified number (a suffix added, a leading zero dropped, a hyphen moved).
Duplicate detection is the validation control that runs right after invoice OCR capture, and one of the leakage points behind the numbers in our AP automation benchmarks. This is what it actually requires, why generic tools miss it, and how to recover what has already leaked.
The leakage math, at scale
| Annual AP spend | Low (0.1%) | Mid (0.25%) | High (0.5%) |
|---|---|---|---|
| $10M | $10K | $25K | $50K |
| $50M | $50K | $125K | $250K |
| $100M | $100K | $250K | $500K |
| $500M | $500K | $1.25M | $2.5M |
| $1B | $1M | $2.5M | $5M |
The leakage is invisible at the individual invoice level; the annual aggregate is the number CFOs notice once it's quantified. Recovery cycles (vendor outreach, credit memos, refunds) typically recover only 40 to 60 percent of identified duplicates because vendors push back and time has passed. Prevention at intake captures the full amount.
Why generic detection misses 30%+ of duplicates
Of the six common duplicate patterns, single-field detection catches one. Multi-field detection catches all six. The single-field approach is what most generic AP automation ships with by default because it's easy to implement; the multi-field approach is what actually prevents leakage.
Five detection patterns that catch real duplicates
Hash of (vendor entity + amount + date + invoice number) against the paid-invoice history. Catches exact resubmissions and minor format variations. Example: invoice 12345 paid Jan 5, same vendor submits 12345 on Jan 8.
Same vendor + same amount + invoice number within a small edit distance. Catches suffix additions, dropped leading zeros, dash insertions. Example: 12345 versus 12345-R, INV-12345, 0012345.
Same vendor entity, same exact amount, within a configurable date window (typically 60 to 90 days). Catches re-invoicing of the same transaction with a new number. Example: $14,250 to Acme on Mar 1 and again Apr 15.
The vendor master maintains parent-subsidiary relationships, so two invoices from different legal entities of the same group for the same amount are flagged. Example: Acme USA and Acme UK each invoice $50K.
If a credit memo cleared a paid invoice, any later invoice from the same vendor for the same amount in the window is flagged, the most common real-world duplicate source. Example: paid Jan 5, credit memo Feb 10, new invoice Feb 15.
Retroactive recovery: scanning paid invoices for missed duplicates
The four-step retroactive duplicate-payment recovery scan
Recovery rates vary by vendor relationship, time elapsed, and documentation quality. Specialized duplicate-payment recovery firms charge 25 to 40 percent of recovered funds; running the scan in-house against a documented model preserves the full recovery and builds the prevention capability at the same time.
What good looks like
A clean duplicate detection workflow runs at every invoice intake against the full history of paid invoices and the queue of in-flight invoices. Detection uses multi-field matching: fuzzy logic on the invoice number, exact match on vendor entity and amount, configurable date windows, and vendor-hierarchy resolution. Flagged duplicates surface to the AP analyst with the matched pair displayed side by side and the reason highlighted. Confirmed duplicates block payment automatically; false positives close out with a documented reason that feeds back into the model. The leakage rate is tracked monthly as a KPI alongside touchless rate and days-to-pay.
At Cadel, duplicate detection runs as part of the validation layer. The multi-field model checks every new invoice against the full payment history on vendor entity, amount, date, and invoice-number combinations, with fuzzy matching and credit-memo pairing built in. Vendor entity hierarchy is maintained in the vendor master, and retroactive scanning runs on demand against historical data, surfacing recovery opportunities the CFO can act on directly.
See how Cadel runs AP automation with duplicate detection in the validation layer, or get in touch to run a multi-field scan against a year of your paid AP history and size the recovery.