CadelAll Articles
Accounts Payable

Invoice OCR Software: Why 85% Accuracy Breaks AP

Invoice OCR software hits 85-95% accuracy, but the last 15% breaks AP. Where OCR fails, the four-layer intake workflow, and how to pick the best tool.

Cadel Team7 min read
100100100100$

Every accounts payable workflow starts with intake. A vendor sends an invoice. Someone or something has to read the file, identify the relevant fields, and post the data into the AP module. Invoice OCR software made that step faster. The marketing claim is touchless processing. The practitioner reality is that the last 15% of accuracy is what determines whether the AP team is doing review work or correction work.

Invoice OCR software is the data-extraction step inside automated invoice processing: the OCR engine reads the file, the validation layer checks it, and the exception workflow handles what fails. Those three layers are what separate AP automation that scales from AP automation that creates a different kind of backlog. This is the intake layer behind the numbers in our AP automation benchmarks, and the first control gap covered in AP automation best practices for mid-market.

What invoice OCR actually extracts

The standard extraction set, why accuracy lands where it does, and how Cadel closes the gap

Extracted field
Why accuracy isn’t 100%
How Cadel solves it
Vendor name and address
95%+
Subsidiary names, DBAs, and aliases don't match the vendor master, so the exception sits at the matching layer.
Fuzzy matching with alias dictionaries; repeat mismatches surface as suggested aliases.
Invoice number
95%+
Inconsistent schemes trigger false-positive duplicates; credit memos get blocked as re-invoices.
Duplicate detection on a multi-field hash, with credit memos classified as their own type.
Invoice date and due date
90-95%
Format ambiguity (10/05 = Oct 5 or May 10) and multi-date invoices confuse field detection.
Locale-aware parsing with role classification (issue, service, due) against terms history.
Line items (description, qty, unit price)
75-85%
Multi-page tables, merged cells, and freight lines break layout analysis and lose column context.
Vendor-specific layout learning plus multi-page column-anchor logic.
Subtotal, tax, and total
90-95%
European formats (1.234,56), multi-currency, and conditional VAT drive wrong totals.
Locale-aware number parsing, currency detection with FX conversion, and tax classification.
Payment terms and PO reference
75-85%
PO numbers sit in non-standard positions and prose terms ('Net 30, 2/10') resist parsing.
PO matching at intake, prose-term parsing, and a separate workflow for PO-less invoices.
Hand-written annotations
30-50%
OCR is trained on printed text; handwriting varies too much to extract reliably.
Detect handwriting and flag for review with the region highlighted; original preserved.

The averages quoted in vendor pitches are typically the high-accuracy fields. The fields that drive the close, line items, tax, and PO references, are the ones where the workflow stalls.

Where invoice OCR software breaks

These four failure modes account for most of the exception volume in invoice data extraction, and they are where generic invoice scanning software and open-source OCR libraries fall down.

Four failure modes that account for most exception volume

01
Multi-line item tables
Tables with merged cells, sub-totals, freight lines, or continuation across pages confuse line-item extraction. The engine may merge lines, miss continuation, or extract the wrong total. Common on construction, manufacturing, and logistics invoices.
Fix: Vendor-specific layout learning plus multi-page column-anchor logic.
02
Foreign currency and number formats
A USD invoice from an EU vendor can confuse the engine. European decimal convention (1.234,56) gets misread as a US thousands separator. Multi-currency invoices need a currency-detection layer most generic OCR engines lack.
Fix: Locale-aware number parsing with currency detection and FX conversion.
03
Hand-written annotations
Vendor notes, manual discount overrides, and hand-stamped 'PAID' or 'DRAFT' markings get missed. The engine reads the printed text but loses the handwritten context, so the invoice posts at the wrong amount or status.
Fix: Detect handwriting and route to review with the region flagged.
04
Vendor-specific layouts
Each vendor has its own template. Procurement-grade vendors send clean digital files; smaller vendors send hand-typed invoices, photographed receipts, or scanned faxes. Generic OCR struggles unless trained on each format.
Fix: Per-vendor template learning that improves with every invoice.

The 85/15 problem at scale

Invoice volume / month
Touchless (85%)
Manual exceptions (15%)
100 invoices
85 clear
15 to correct
1,000 invoices
850 clear
150 to correct
10,000 invoices
8,500 clear
1,500 to correct
50,000 invoices
42,500 clear
7,500 to correct

At small volumes, a 15% exception rate is manageable on a spreadsheet. At enterprise scale, it is a dedicated exception-handling team that costs more than the OCR savings. The accuracy number that matters is not the extraction rate on the easy 85%, it is the cost per exception on the remaining 15%.

The complete AP intake workflow

OCR is one of four layers in automated invoice processing

1
Layer
Capture
Email, scan, EDI feed, vendor portal upload. Every channel funnels into the same intake queue.
2
Layer
OCR + classification
Extract structured fields. Classify document type (invoice, credit memo, statement).
3
Layer
Validation
Match vendor master, PO, contract, and prior-invoice patterns. Flag mismatches at the field level.
4
Layer
Exception routing
Failed validations go to the right owner with the right context, not to a generic queue.

OCR is layer 2 of four. Without the validation layer, an extracted invoice with the wrong total still posts cleanly. Without exception routing, a failed validation sits for days until someone reviews the queue manually. Vendors that pitch OCR as AP automation are pitching one layer of four.

How to choose the best OCR software for invoice processing

If you are choosing the best OCR software for invoice processing, look past the headline accuracy number. The selection criteria that actually matter for a working AP automation workflow are:

  • Vendor-specific (AI) format learning so the engine improves on a vendor's invoices over time rather than staying flat
  • Multi-currency support, including European number formats and FX conversion
  • Robust line-item OCR on multi-page invoices with continuation tables
  • A native ERP integration or OCR API so extracted data posts directly to the AP module
  • An exception routing workflow with named owners and SLA timers
  • An audit trail that preserves the original document alongside the extracted fields for the auditor

At Cadel, invoice OCR is the second of four layers in a complete AP intake workflow. Vendor-specific format learning runs continuously, multi-currency and line-item extraction are first-class features, and failed validations route to the right owner automatically with full context. The audit trail is preserved end to end, so the 85/15 accuracy split becomes a managed queue rather than a growing backlog.

See how Cadel runs AP automation end to end, with invoice OCR as one layer of four, or get in touch to run extraction across 100 of your most varied vendor invoices and see where the exceptions land.

#invoice-ocr-software#invoice-ocr#ocr-invoice-processing#invoice-data-extraction#automated-invoice-processing#ap-automation#accounts-payable

See it live

See Cadel automate your AP workflow

20 minutes. Bring your invoice-to-pay process. We'll show you where Cadel eliminates manual coding and approval bottlenecks.

Book a Demo
Invoice OCR Software: Why 85% Accuracy Breaks AP | Cadel Blog