Simple Invoice Processing
Cadel extracts structured fields from domestic GST invoices and international vendor invoices, then runs deterministic math and compliance checks before the invoice touches your ERP.
The Problem
Accounts payable teams at mid-market companies typically receive invoices from three to five distinct vendor categories in a single processing cycle: domestic GST-registered suppliers with multi-rate line items, inter-state vendors where the correct tax type (IGST vs. CGST+SGST) depends on a state-code cross-check, and international vendors whose EUR- or USD-denominated invoices carry no GSTIN and no Indian tax fields at all. Processing these manually means an AP clerk must read the vendor GSTIN, extract the state code, compare it against the declared place of supply, sum every line amount independently, and verify that each line's tax amount equals line_amount × tax_rate ÷ 100 — all before keying the invoice into the ERP. At 50 invoices per month that routine consumes roughly two full working days; at 200 invoices it becomes a part-time role.
The math-check problem alone is significant. A single vendor invoice like the Synapse Digital Services bill (INV SDS/2025/INV-107) carries five line items taxed at four different rates — 18%, 12%, 5%, and 0% (zero-rated export documentation) — with an aggregate tax of ₹20,300. Verifying that figure manually requires four separate multiplication operations and one summation. Any one of those steps can introduce a keying error that is invisible until the GSTR-2B auto-populated credit diverges from the purchase register, at which point Section 16(2)(aa) of the CGST Act, 2017 restricts the Input Tax Credit claim to the amount reflected in the supplier's filed return — meaning an over-claimed or mis-booked ITC must be reversed, often with interest under Section 50.
International invoices compound the problem differently. A EUR-denominated invoice from a US-registered vendor to a German entity carries no GSTIN, no place of supply, and no tax-rate fields. A validation ruleset built only for GST invoices will either crash on null fields or generate false-positive errors that an AP clerk must clear one by one, adding noise to the exception queue and delaying payment runs.
The scale-break point for most mid-market finance teams is approximately 80–100 invoices per month processed by a single AP resource. Beyond that volume, the error rate on manual math verification and GSTIN state-code checks rises faster than headcount can compensate, and the downstream cost — ITC mismatches, duplicate payments, and vendor disputes — becomes material.
Why It Matters: Context
Under the GST framework, every tax invoice from a registered supplier must comply with Rule 46 of the CGST Rules, 2017, which mandates that the invoice carry the supplier's GSTIN, the recipient's GSTIN, the place of supply, the applicable HSN or SAC code, and a separate tax amount for each rate slab. The place of supply determines the tax type: where the supplier's state code (the first two digits of the GSTIN) matches the place of supply state code, the transaction is intra-state and attracts CGST+SGST; where they differ, it is inter-state and attracts IGST under Section 5 of the IGST Act, 2017. Booking the wrong tax type — for example, booking CGST+SGST on what is actually an inter-state supply — creates a reconciliation mismatch in GSTR-2B that the AP team must correct before filing GSTR-3B, with the risk of a demand notice under Section 73 or 74 of the CGST Act if the error is not caught in the same return period.
For mid-market companies without a dedicated GST compliance team, these checks are done ad hoc by the same AP resource processing the invoice. The practical reality is that a controller managing 150 vendor invoices a month across domestic and international suppliers has no standardized extraction template: some vendors supply PDFs with embedded text, others supply scanned images with embossed seals, and international suppliers follow their own national invoice formats (e.g., German Rechnung layouts under the EU VAT Directive 2006/112/EC) that share no field labels with a GST tax invoice. Reconciling these formats manually, without a structured extraction layer, means the data that enters the ERP is only as accurate as the clerk's reading of the source document.
The specific failure consequence is a duplicate payment or an ITC reversal. Duplicate payments occur when the same invoice number is re-submitted in a subsequent batch — a common vendor error — and the AP team has no automated check against previously processed invoice numbers for the same GSTIN. ITC reversals occur when a math error in the tax amount field is booked without verification, the credit is claimed in GSTR-3B, and the supplier's filed GSTR-1 reflects a different amount, triggering an auto-populated discrepancy in GSTR-2B that the taxpayer must address under Rule 37A of the CGST Rules.
What This Workflow Automates
- Document ingestion and format detection: The workflow accepts PDF invoices — whether digitally generated or scanned — and identifies the document type as Invoice, routing it to the appropriate extraction schema regardless of whether the source is a domestic GST-registered entity or a foreign vendor with no Indian tax fields.
- Structured field extraction: For each invoice, the workflow extracts
invoice_number,invoice_date,vendor_name,vendor_gstin,vendor_address,customer_name,customer_gstin,customer_address,place_of_supply,currency,incoterm,subtotal,tax_amount, andtotal_amountat the header level. - Line-item extraction: For every line on the invoice, the workflow extracts
line_number,item_code(HSN/SAC where present),description,quantity,unit,unit_price,line_amount,tax_rate, andtax_amountas a structured array — preserving mixed-rate line items as distinct records. - Math validation — line-level tax check: For each line item carrying a non-null
tax_rateandline_amount, the workflow computes the expected tax asline_amount × tax_rate ÷ 100and compares it to the extractedtax_amount; any discrepancy raises a line-level validation exception. - Math validation — subtotal vs. line-item sum: The workflow sums all extracted
line_amountvalues and compares the result to the header-levelsubtotalfield, flagging the invoice if the difference is non-zero. - GST compliance checks (domestic invoices only): Where both
vendor_gstinandcustomer_gstinare present, the workflow extracts the supplier state code from the first two digits of the vendor GSTIN and compares it to the state code in the declaredplace_of_supply, surfacing any inter-state/intra-state mismatch as a flagged field for AP review before tax-type booking. - International invoice handling: Where
vendor_gstinandcustomer_gstinare both null — as with EUR- or USD-denominated foreign vendor invoices — the workflow bypasses all GST-specific validations and extracts only the available fields (currency, line items, subtotal, total) without raising false-positive errors on absent tax fields. - Unreadable document flagging: Where OCR yields no extractable structured fields (e.g., a dense, seal-stamped scanned invoice), the workflow returns an empty
extracted_fieldsobject and records the file as requiring manual review, preventing a silent null-record from passing downstream to the ERP.
All eight steps execute in under 30 seconds per invoice batch, producing a structured JSON output with deterministic field values and a discrete validation-results array that every AP controller can audit line by line.
Edge Cases We Simulate
The workflow ships with a battery of synthetic test scenarios that exercise every failure mode we have seen in real-world data. Each scenario produces a deterministic outcome that an auditor or controller can verify in seconds.
| Scenario | What's wrong | Expected outcome |
|---|---|---|
| Mixed GST Rate Lines | A single invoice carries line items taxed at different GST rates (e.g., 18%, 12%, 5%, and 0% zero-rated export lines), making manual tax-sum verification error-prone. | Each line's tax amount is independently verified against its stated rate and line amount; the aggregate tax is cross-checked against the invoice-level tax field, flagging any mismatch. |
| Inter-State Supply Mismatch | Vendor GSTIN state code and the stated place of supply are inconsistent — for example, a Maharashtra-registered vendor (state code 27) billing with place of supply set to Karnataka (state code 29) — which determines whether IGST or CGST+SGST applies. | Workflow surfaces the supplier state code extracted from the vendor GSTIN and the declared place of supply as separate flagged fields, allowing the AP team to confirm correct tax type before booking. |
| Null Tax Fields on International Invoice | A foreign-vendor invoice (e.g., EUR-denominated invoice from a US entity to a German company) carries no GST, no GSTIN, and no tax-rate fields, which would break a GST-only validation ruleset. | Workflow detects absence of GSTIN on both vendor and customer sides and skips GST-specific validations, extracting only currency, line items, subtotal, and total without raising false-positive errors. |
| Subtotal vs. Line-Item Sum Discrepancy | The printed subtotal on the invoice does not equal the arithmetic sum of individual line amounts — a common OCR or vendor-keying error that passes unnoticed when invoices are processed manually. | Workflow computes the sum of all extracted line_amount values and compares it to the extracted subtotal field, raising a validation exception when the difference exceeds zero. |
| Unreadable or Scanned Invoice with No Extractable Fields | A dense, seal-stamped, or low-resolution scanned invoice (such as a 25-row document with an embossed company seal) yields no extractable structured fields after OCR. | Workflow returns an empty extracted_fields object and records the file as requiring manual review, preventing a silent null-record from being passed downstream to the ERP. |
| Duplicate Invoice Number Detection | The same invoice number from a vendor is submitted a second time within the processing batch or against the existing vendor ledger, a common cause of duplicate payments. | Workflow flags the invoice_number as a potential duplicate when it matches a previously processed record for the same vendor_gstin or vendor_name, requiring explicit human confirmation before posting. |
Sample Documents
Download or inspect the seeded sample files used to demonstrate this workflow:
| File | Document type | Notes |
|---|---|---|
zrd_invoice_landlord_landscape_nocoa.pdf |
Invoice | INR invoice from Zenith Realty Developers LLP (GSTIN 27AAAFZ7890L1Z2, Maharashtra) to Bangalore Bags Private Limited (GSTIN 29AAACB9876F1ZN, Karnataka) covering three lines — office rent, CAM charges, and electricity reimbursement — all at 18% GST; demonstrates inter-state supply scenario and SAC code extraction. |
invoice_german.pdf |
Invoice | EUR-denominated invoice from Ceipal Procurewise (Rochester, NY, USA) to Schneider Technologie GmbH (Berlin, Germany) for an annual SaaS subscription; demonstrates international invoice extraction with null GSTIN, null tax fields, and bilingual German-English line item description. |
synapse_digital_invoice_services_gst.pdf |
Invoice | INR invoice from Synapse Digital Services Pvt. Ltd. (GSTIN 29AABCT1234F1Z8) covering five lines across three different GST rates (18%, 12%, 5%) and one zero-rated export line; demonstrates mixed-rate tax math validation and SAC code parsing. |
dsg_invoice_25rows_seal.pdf |
Invoice | High-row-count, seal-stamped scanned invoice that returns empty extracted fields; demonstrates the workflow's graceful handling of unreadable documents by flagging them for manual review rather than posting null data. |
Sample Results
Running the workflow across the four demo invoices produced structured extraction on three of four documents. The Zenith Realty invoice (ZRD/2025/INV-201, INR ₹289,100) extracted all three line items — Office Space Rent at ₹180,000, Common Area Maintenance at ₹25,000, and Electricity Reimbursement at ₹40,000 — each at an 18% GST rate, with a header tax amount of ₹44,100 recorded for cross-check. The Synapse Digital Services invoice (SDS/2025/INV-107, INR ₹165,300) extracted five line items across four tax rate slabs (18%, 12%, 5%, and 0%), with SAC codes 998313, 998315, 998221, 998316, and 998399 recorded per line — the zero-rated export documentation line (₹15,000, SAC 998399) extracted cleanly with a tax_rate of 0.0 and tax_amount of 0.0, confirming the workflow does not misclassify a nil-tax line as a missing-field error. The German invoice (DEMS3310-PW-2, EUR €300) extracted vendor address, customer address, subscription period, and total amount without raising any GST validation errors, confirming the null-GSTIN bypass path operates correctly.
The fourth file — dsg_invoice_25rows_seal.pdf, a dense 25-row document with an embossed company seal — returned an empty extracted_fields object, demonstrating the unreadable-document exception class: rather than posting a null or partial record to the ERP, the workflow surfaces the file explicitly for manual review. This is the specific exception class that causes silent data-quality failures when invoices are processed without a structured extraction layer — a partially extracted invoice with a null total_amount that reaches the ERP unchecked can result in a zero-value posting or a duplicate payment if the vendor resubmits the same invoice in the next cycle.
Why Automation Wins Here
For a mid-market AP team processing 100 domestic and international invoices per month, the workflow replaces an estimated 12–15 hours of manual field extraction, math verification, and GSTIN state-code checking with a process that runs in under 30 seconds per document. The math validation step — computing line_amount × tax_rate ÷ 100 for every line and comparing the aggregate to the header tax field — catches arithmetic discrepancies that are invisible to a clerk reading the printed total, reducing the probability of an ITC booking error that would require reversal under Rule 37A of the CGST Rules. The null-GSTIN detection path eliminates false-positive exceptions on international invoices, keeping the exception queue limited to genuine anomalies rather than format-driven noise.
Every processed invoice produces a structured JSON artifact containing the full extracted-fields object and a discrete validation_results array. This artifact is directly attachable to the AP voucher in the audit file: the controller can demonstrate, for any invoice, exactly which fields were extracted, which math checks were run, and which exceptions — if any — were raised before the invoice was posted. For internal auditors applying ICAI SA 500 (Audit Evidence) standards, a machine-generated extraction log with a timestamped validation result provides a more reproducible evidence trail than a manually annotated invoice printout.
Frequently Asked Questions
The workflow checks GSTIN format compliance per the alphanumeric structure mandated under Rule 10 of the CGST Rules, 2017, and verifies that the tax arithmetic on each line is consistent with the rates specified under Schedule I–IV of the CGST Act, 2017. It also extracts the place of supply field to help determine whether IGST (Section 5 of the IGST Act) or CGST+SGST (Section 9 of the CGST Act) should apply — a distinction that directly affects Section 16(2)(aa) input tax credit eligibility.
Yes. When neither vendor nor customer GSTIN is present, the workflow automatically bypasses GST validations and extracts the core commercial fields — currency, line items, subtotal, and total — without raising false errors. The demo data includes a EUR-denominated invoice between a US vendor and a German buyer, confirming that currency codes other than INR are captured as-is. Multi-currency conversion to a functional currency must be handled downstream in the ERP per the applicable standard (IAS 21 or ASC 830).
Cadel outputs a structured JSON object for each invoice containing all extracted and validated fields, which can be mapped to the chart-of-accounts and vendor master of any ERP through a standard API or CSV export. No custom ERP connector is required at the extraction stage; the validated payload is designed to slot into the AP entry screen of Tally Prime, NetSuite Bill, or SAP FB60 with field-level mapping configured once at onboarding.
Every extraction run stores the original document, the raw OCR output, the structured field extraction, and the full list of validation results — including any exceptions raised — as an immutable, timestamped record. This supports the documentation requirements under ICAI SA 230 (Audit Documentation) for external auditors and gives internal audit teams a line-by-line evidence chain from source PDF to ERP posting without relying on email threads or manual logs.
Fields that cannot be extracted are recorded as null rather than defaulting to zero, preventing silent arithmetic errors in downstream calculations. Validation rules that depend on a null field — for example, a unit-price-times-quantity check — are skipped with an explicit note in the validation results, so the reviewer knows exactly which fields require manual entry before the invoice is posted.
The workflow compares the extracted invoice_number and vendor_gstin (or vendor_name for international vendors) against previously processed records in the same batch and against the vendor ledger. A match raises a duplicate-payment risk flag that must be explicitly cleared by an approver — consistent with the duplicate-payment control objectives described in COSO Internal Control — Integrated Framework and testable under ICAI SA 240 fraud-risk procedures.
This workflow is deployed and live in our demo environment. Upload your own documents to see it in action.
Open the live workflow