Simple Invoice Processing
AP invoice OCR + GSTIN verification + tax-math checks — invoice processing automation in under 30 seconds per document.
| # | Description | Qty | Rate | Amount |
|---|---|---|---|---|
| 1 | Monthly Office Space Rent | 1 | 1,80,000 | 1,80,000 |
| 2 | Common Area Maintenance | 1 | 25,000 | 25,000 |
| 3 | Electricity Reimbursement | 1 | 40,000 | 40,000 |
The Problem
Manual invoice processing automation breaks at scale. A controller at a mid-market company processes invoices from three to five distinct vendor categories every cycle — and each one carries its own failure mode that pure-OCR or template-based tools miss.
Three to five vendor categories per cycle
Domestic GST-registered suppliers with multi-rate line items, inter-state vendors needing IGST vs CGST+SGST routing, and international vendors whose EUR or USD invoices carry no GSTIN and no Indian tax fields at all.
GSTIN state-code cross-check, every invoice
The AP clerk reads the vendor GSTIN, extracts the state code (first 2 digits), and compares it against the declared place of supply to determine the correct tax type — line by line, manually.
Tax math on every line
Verify line_amount × tax_rate ÷ 100 per line, then aggregate. A single Acme Corp bill with five line items at four rates (18%, 12%, 5%, 0%) means four multiplications plus a summation — every step a chance for a keying error that lands in GSTR-2B two months later.
International invoices break the ruleset
A EUR-denominated invoice from a US vendor to a German buyer carries no GSTIN, no place of supply, and no tax-rate fields. A GST-only validation either crashes on null fields or surfaces false-positive errors an AP clerk must clear one by one.
The scale-break point for most mid-market finance teams. Beyond ~100 invoices per month per AP resource, the error rate on manual invoice data extraction, math verification and GSTIN state-code checks rises faster than headcount can compensate — and the downstream cost (ITC mismatches, duplicate payments, vendor disputes) becomes material.
Why It Matters: Regulatory Context
Every tax invoice from a registered Indian supplier sits inside four overlapping CGST / IGST rules. Booking the wrong tax type or missing a required field creates a reconciliation mismatch in GSTR-2B that the AP team must fix before filing GSTR-3B — or face a demand notice.
Required invoice fields
Every tax invoice must carry supplier GSTIN, recipient GSTIN, place of supply, HSN/SAC code, and a separate tax amount per rate slab. Missing any field disqualifies the invoice from claiming Input Tax Credit.
Inter-state vs intra-state tax routing
If supplier state code matches place of supply state code → intra-state → CGST + SGST. If they differ → inter-state → IGST. Booking the wrong type triggers a GSTR-2B reconciliation mismatch the AP team must correct before filing GSTR-3B.
ITC capped at supplier’s filed return
Input Tax Credit can only be claimed up to the amount the supplier has reflected in their GSTR-1. An over-claimed or mis-booked ITC must be reversed — often with interest under Section 50.
GSTR-2B discrepancy resolution
Any auto-populated discrepancy between the buyer’s claimed credit and the supplier’s filed GSTR-1 must be addressed within the same return period — otherwise it surfaces as a demand notice under Section 73 or 74.
For controllers managing 150 vendor invoices a month across domestic and international suppliers — PDFs with embedded text, scanned images with embossed seals, German Rechnung layouts under the EU VAT Directive 2006/112/EC — reconciling formats manually means the data that enters the ERP is only as accurate as the clerk’s reading of the source document.
What This Workflow Automates
Eight deterministic steps that turn AP invoice OCR into structured, validated, ERP-ready data. Each step runs in under 30 seconds per invoice batch and produces a structured JSON output with a discrete validation-results array every AP controller can audit line by line — the same invoice processing software pattern used by mid-market finance teams to feed downstream three-way match.
Document ingestion & format detection
Accepts PDF invoices — digitally generated or scanned — and identifies the document type as Invoice, routing it to the right extraction schema whether the source is a GST-registered domestic supplier or a foreign vendor with no Indian tax fields.
Structured header extraction
For each invoice, extracts invoice_number, invoice_date, vendor_name, vendor_gstin, vendor_address, customer_name, customer_gstin, customer_address, place_of_supply, currency, subtotal, tax_amount and total_amount.
Line item extraction
For every line on the invoice, line item extraction captures item_code (HSN / SAC where present), description, quantity, unit_price, line_amount, tax_rate and tax_amount as a structured array — preserving mixed-rate line items as distinct records.
Line-level tax math check
For each line with a non-null tax_rate and line_amount, computes the expected tax as line_amount × tax_rate ÷ 100 and compares it to the extracted tax_amount. Any discrepancy raises a line-level validation exception.
Subtotal vs line-sum check
Sums all extracted line_amount values and compares the result to the header-level subtotal field. Flags the invoice if the difference is non-zero — catching the common OCR or vendor-keying error where the printed subtotal silently differs from the line math.
GSTIN verification & state-code routing
Where both vendor_gstin and customer_gstin are present, GSTIN verification extracts the supplier state code from the first two digits of the vendor GSTIN and compares it to the state code in the declared place_of_supply — surfacing any inter-state vs intra-state mismatch before tax-type booking.
International invoice handling
Where both GSTINs are null — as with EUR or USD foreign-vendor invoices — bypasses every GST-specific validation and extracts only the fields that exist (currency, line items, subtotal, total) without raising false-positive errors on absent tax fields.
Unreadable document flagging
Where OCR yields no extractable structured fields (e.g., a dense, seal-stamped scanned invoice), returns an empty extracted_fields object and records the file as requiring manual review — preventing a silent null-record from passing downstream to the ERP.
Edge Cases We Simulate
A battery of synthetic test scenarios that exercise every failure mode we have seen in real-world invoice data. Each scenario produces a deterministic outcome an auditor or controller can verify in seconds.
Inter-State Supply Mismatch
Null Tax Fields — International Invoice
Subtotal vs Line-Item Sum
line_amount values and compares to extracted subtotal — raising a validation exception when the difference is non-zero.Unreadable / Scanned Invoice
extracted_fields object and routes the file for manual review — preventing a silent null-record from passing to the ERP.Duplicate Invoice Detection
invoice_number as a potential duplicate when it matches a previously-processed record for the same vendor_gstin or vendor_name — requires explicit approver clearance.Sample Files & Results
Four seeded invoices — each one engineered to exercise a different failure mode. Three extract cleanly. One is deliberately unreadable, to prove the workflow surfaces it for manual review instead of posting bad data to the ERP.
Acme Corp → Acme Corp
Place-of-supply (state 27) vs customer state (29) correctly routed as inter-state IGST, not CGST+SGST. Per-line tax math (₹2,45,000 × 18% = ₹44,100) validated against header total.
Acme Corp → Schneider GmbH
Vendor address, customer address, subscription period and total extracted from a bilingual DE/EN document. The null-GSTIN bypass skips intra-state/inter-state and HSN validations instead of generating exceptions an AP clerk would have to clear.
Acme Corp → Acme Corp
Per-line tax math verified at 4 different rates. The zero-rated export documentation line (SAC 998399, ₹15,000 @ 0%) extracts cleanly with tax_rate=0 — not flagged as a missing-field error.
Dense seal-stamped scan
The workflow returns an empty extraction and routes the file for review — preventing the silent zero-value posting (or duplicate payment if the vendor resubmits the same invoice) that would happen if a partial extraction reached the ERP unchecked.
Why Automation Wins Here
For a mid-market AP team processing 100 domestic and international invoices per month, this AP automation software replaces an estimated 12–15 hours of manual field extraction, GSTIN verification and tax-math checking with a process that runs in under 30 seconds per document.
Math errors caught upstream
Computing line_amount × tax_rate ÷ 100 per line catches arithmetic discrepancies invisible to a clerk reading the printed total — reducing ITC booking errors that would require reversal under Rule 37A of the CGST Rules.
Quiet exception queue
Null-GSTIN detection eliminates the false-positive exceptions that an AP clerk would otherwise have to clear on every international invoice — keeping the queue limited to genuine anomalies, not format-driven noise.
Audit-ready, every invoice
Every processed invoice produces a structured JSON artifact (extracted fields + validation_results array) directly attachable to the AP voucher — a more reproducible evidence trail than a manually annotated printout under ICAI SA 500 standards. The output also feeds three way matching and downstream GSTIN validation flows without further transformation.
Frequently Asked Questions
The questions accountants and finance controllers ask most often before deploying invoice automation.
The workflow checks GSTIN format compliance per the alphanumeric structure mandated under Rule 10 of the CGST Rules, 2017, and verifies that the tax arithmetic on each line is consistent with the rates specified under Schedule I–IV of the CGST Act, 2017. It also extracts the place of supply field to help determine whether IGST (Section 5 of the IGST Act) or CGST+SGST (Section 9 of the CGST Act) should apply — a distinction that directly affects Section 16(2)(aa) input tax credit eligibility.
Yes. When neither vendor nor customer GSTIN is present, the workflow automatically bypasses GST validations and extracts the core commercial fields — currency, line items, subtotal, and total — without raising false errors. The demo data includes a EUR-denominated invoice between a US vendor and a German buyer, confirming that currency codes other than INR are captured as-is. Multi-currency conversion to a functional currency must be handled downstream in the ERP per the applicable standard (IAS 21 or ASC 830).
Cadel outputs a structured JSON object for each invoice containing all extracted and validated fields, which can be mapped to the chart-of-accounts and vendor master of any ERP through a standard API or CSV export. No custom ERP connector is required at the extraction stage; the validated payload is designed to slot into the AP entry screen of Tally Prime, NetSuite Bill, or SAP FB60 with field-level mapping configured once at onboarding.
Every extraction run stores the original document, the raw OCR output, the structured field extraction, and the full list of validation results — including any exceptions raised — as an immutable, timestamped record. This supports the documentation requirements under ICAI SA 230 (Audit Documentation) for external auditors and gives internal audit teams a line-by-line evidence chain from source PDF to ERP posting without relying on email threads or manual logs.
Fields that cannot be extracted are recorded as null rather than defaulting to zero, preventing silent arithmetic errors in downstream calculations. Validation rules that depend on a null field — for example, a unit-price-times-quantity check — are skipped with an explicit note in the validation results, so the reviewer knows exactly which fields require manual entry before the invoice is posted.
The workflow compares the extracted invoice_number and vendor_gstin (or vendor_name for international vendors) against previously processed records in the same batch and against the vendor ledger. A match raises a duplicate-payment risk flag that must be explicitly cleared by an approver — consistent with the duplicate-payment control objectives described in COSO Internal Control — Integrated Framework and testable under ICAI SA 240 fraud-risk procedures.
This workflow handles the invoice extraction and validation layer only — it is the upstream foundation for three-way match, not the match engine itself. The structured JSON it produces (with normalized line_amount, tax_rate, vendor_gstin and invoice_number fields) is exactly the shape Cadel's separate N-Way Reconciliation workflow consumes to compare invoice lines against PO and GRN lines, surface quantity, price and term variances, and flag any mismatches before payment is released. For teams whose three-way matching control is currently driven by spreadsheets, deploying invoice processing automation first removes the largest source of garbage-in errors that downstream matching has to clean up.
Generic OCR reads pixels and returns text. Template-based invoice processing software reads pixels, locates fields against a per-vendor template, and breaks the moment a vendor changes their layout. Cadel's AP invoice OCR is schema-driven, not template-driven: the workflow extracts a fixed set of structured fields (invoice number, GSTIN, line items, tax rates, totals) using LLM-grounded extraction with a domain-specific GST invoice schema, runs deterministic math validation on every line, and produces an auditable validation_results array. It works on first-seen vendors with no template maintenance, on bilingual layouts (e.g. German Rechnung, US commercial invoice), and on scanned PDFs — without breaking when a vendor moves their tax field to a different position.