All Workflows

GST Certificate Extractor with Checksum Validation

Form GST REG-06 OCR + GSTIN structure check + base-36 Luhn checksum — GST certificate validation in under 30 seconds per document.

Live demo Drop your own Form GST REG-06 certificates, see Cadel extract every field and validate the GSTIN checksum — in seconds.

The Problem

Vendor onboarding and AP teams in mid-market companies collect Form GST REG-06 certificates from hundreds of suppliers every year — and manual processing of those PDFs creates compounding risk at every step.

15–40 person-hours per onboarding cycle

Manually keying GSTIN, legal name, trade name, constitution type, principal address, date of liability and approving authority from each Form GST REG-06 PDF takes 5–10 minutes per certificate. Across a vendor base of 200–500 active suppliers that is 15–40 person-hours per cycle with no built-in verification step.

Tampered GSTINs invisible to the human eye

The GSTN encodes a base-36 Luhn check character at position 15 of every GSTIN. A certificate where someone altered the GSTIN — swapping a digit or transposing two characters — will still appear 15 characters long with a valid state code and Z at position 14. Only the checksum computation reveals the alteration, which a manual review cannot perform reliably at scale.

ITC at risk from unvalidated supplier GSTINs

Under Section 16(2)(aa) of the CGST Act, 2017 (inserted by the Finance Act, 2021), Input Tax Credit is admissible only when the supplier’s outward supply is reflected in GSTR-2B. An invalid or fabricated GSTIN on a supplier certificate is an early indicator that ITC claimed against that supplier may be denied during scrutiny. GST Council Circular No. 183/15/2022-GST confirms that buyers bear due-diligence responsibility.

Spot-checking breaks at 50+ new vendors per month

At scale — 50 or more new vendor certificates per month — teams resort to spot-checking, and the exception queue grows invisibly until a GST audit or annual vendor verification exercise forces a full review. Constitution mismatches (e.g., extracting “Private Limited Company” when the certificate reads “Limited Liability Partnership”) affect TDS applicability under Sections 194C vs. 194J of the Income Tax Act and go undetected.

₹65L+

Estimated ITC exposure per annum in a mid-market firm with 300 active vendors if even 2–3% of GSTINs on file carry an undetected checksum error or are subsequently cancelled. Section 50 of the CGST Act mandates interest on reversed ITC — compounding the cost of a vendor master that was never validated at onboarding.

Why It Matters: Regulatory Framework

Form GST REG-06 sits at the intersection of four overlapping rules — each one creating a specific obligation the vendor onboarding controller must satisfy before a supplier enters the master register.

Rule 10(1) · CGST Rules 2017

The prescribed certificate format

Form GST REG-06 is the Certificate of Registration issued under Rule 10(1) of the CGST Rules, 2017. It is the only government-prescribed format that carries a GSTIN, constitution of business, principal place of business address, date of liability and period of validity in a single structured document — making it the canonical source for vendor master data.

Section 16(2)(aa) · CGST Act 2017

ITC conditioned on supplier GSTR-1 filing

Input Tax Credit is admissible only up to the credit available in the buyer’s auto-populated GSTR-2B. A supplier with a forged or corrupted GSTIN will not file valid GSTR-1 returns under that GSTIN, meaning every ITC claim associated with that vendor is at risk of reversal — plus interest under Section 50 — during any future scrutiny or audit.

Circular No. 183/15/2022-GST · GST Council

Buyer bears due-diligence responsibility

The GST Council’s Circular clarifies that a buyer cannot claim ignorance of a supplier’s registration status or the authenticity of its GSTIN. Controllers who accept certificates without programmatic verification are exposed to demand notices under Sections 73 and 74 of the CGST Act if ITC is later found inadmissible.

SA 505 · ICAI Standards on Auditing

External confirmations as audit evidence

Under ICAI’s Standard on Auditing SA 505 (External Confirmations), auditors may independently circularise GSTIN validity as part of an indirect tax audit. A vendor master holding unvalidated or checksum-failed GSTINs will trigger qualified findings. Cadel’s per-certificate audit trail — with computed vs. extracted check characters — constitutes the documentary evidence SA 505 requires the controller to retain.

What This Workflow Automates

Seven deterministic passes from raw Form GST REG-06 PDF to a validated, vendor-master-ready record — in under 30 seconds per certificate batch, with a binary pass/fail result for each of four checks that every controller can audit without re-running the computation.

01

Document ingestion & type identification

Accepts PDF uploads through the Cadel inbox. The classifier identifies each file as a Cert document type by matching the Form GST REG-06 header, sub-header (“Certificate of Registration under the CGST Act, 2017”), and the presence of a GSTIN field — routing only valid certificate PDFs to the extraction pipeline.

02

Structured field extraction

The OCR and extraction layer isolates nine named fields from each certificate: gstin, legal_name, trade_name, constitution, principal_address, date_of_liability, valid_from, registration_type and approving_authority. Fields not present in the PDF are recorded as null, not silently omitted.

03

GSTIN structural validation

Confirms that the extracted GSTIN is exactly 15 characters long, that positions 1–2 are a recognised two-digit numeric state code (01 through 38), that positions 3–12 follow the PAN-derived alphanumeric pattern, and that position 14 is the letter Z. Any structural violation fires a FAIL immediately — before the checksum step is even attempted.

04

Base-36 Luhn checksum verification

Applies the GSTN-specified base-36 variant of the Luhn algorithm over positions 1–14 of the GSTIN and compares the computed check character to the extracted character at position 15. A mismatch — such as extracted O vs. computed 4 — fires a FAIL with the expected value shown inline, giving the controller the exact evidence needed to reject the certificate.

05

Legal name presence check

Confirms that the legal_name field is non-null and non-empty after extraction. The legal name must match the PAN-linked entity name in the supplier master; a missing or blank field indicates OCR failure or a corrupted PDF, and the certificate is routed to the exception queue for manual review regardless of the GSTIN result.

06

Principal address presence check

Confirms that principal_address is non-null and non-empty. The principal place of business address is required for state-specific compliance obligations and for matching the supplier in the e-invoicing IRP system. A blank address — from a partially printed or truncated PDF — fires a FAIL independently of the GSTIN checks.

07

Inbox badge + Excel export

Each certificate receives a Valid badge when all four checks pass or an Invalid badge when any check fails; failed records are routed to the exception queue. All extracted fields and validation outcomes across the full batch are written to a structured Excel export — one row per certificate, one validation-result column per check — ready to import into Tally Prime, Zoho Books, SAP or any ERP vendor master.

Edge Cases We Simulate

Five synthetic test scenarios that exercise every failure mode observed in real-world certificate batches. Each scenario produces a deterministic outcome an auditor or controller can verify in seconds.

Karnataka — Valid GSTIN

ScenarioStandard Form GST REG-06 issued in Karnataka (state code 29) for a Private Limited Company. GSTIN 29ACMEA1234B1Z5: all 15 positions structurally correct, base-36 Luhn check character at position 15 resolves to the expected value.
Expected outcomeGSTIN Structure: PASS. GSTIN Checksum: PASS. Legal Name and Principal Address fields populated. Inbox badge: Valid. All nine structured fields written to the Excel export.

Maharashtra — LLP Constitution

What’s trickyCertificate issued in Maharashtra (state code 27) for a Limited Liability Partnership with GSTIN 27ACMEA1234B1Z5. The constitution field reads “Limited Liability Partnership” — a value that differs from the common “Private Limited Company” and must be extracted verbatim without normalisation errors that would misclassify TDS obligations.
Expected outcomeGSTIN Checksum: PASS (check character verified). Constitution extracted as “Limited Liability Partnership”. All four validations pass; inbox badge: Valid.

Haryana — Tampered Checksum

What’s wrongCertificate for Haryana (state code 06) passes all structural rules — 15 characters, numeric state code, Z at position 14 — but the final check character O does not match the base-36 Luhn computed value of 4, indicating the GSTIN was manually altered after issuance.
Expected outcomeGSTIN Structure: PASS. GSTIN Checksum: FAIL (extracted O, computed 4). Inbox badge: Invalid. Record routed to exception queue with computed vs. extracted values shown inline.

Missing Legal Name

What’s wrongA certificate PDF where the legal name field is blank or unparseable due to OCR failure on a scanned, low-resolution document. The legal name is required for vendor master matching under Sections 194C/194J TDS obligations.
Expected outcomeLegal Name Required validation fires as FAIL. Record flagged in exception queue regardless of GSTIN checksum outcome. Excel export includes a blank legal name cell with a FAIL marker in the validation column.

Missing Principal Address

What’s wrongA certificate where the principal business address block is absent — from a partially printed or truncated PDF — causing address extraction to return an empty string. The address is required for state-specific GST compliance obligations.
Expected outcomePrincipal Address Required validation fires as FAIL. Inbox badge shows Invalid. Excel export includes a blank address cell with a FAIL marker in the corresponding validation column.

OCR Misread: ‘0’ vs ‘O’ in GSTIN

What’s wrongA scanned certificate where OCR misreads the digit 0 as the letter O in the GSTIN body. The 15-character count is preserved and the state code is valid, so a format-only check reports PASS. Only the checksum detects the substitution.
Expected outcomeGSTIN Structure: PASS. GSTIN Checksum: FAIL (computed check character differs from extracted). Record flagged for manual correction before the GSTIN is admitted to the vendor master.

Sample Files & Results

Three seeded Form GST REG-06 certificates — each engineered to exercise a specific validation scenario. Two pass cleanly across all four checks. One triggers the checksum FAIL that a format-only review would have missed.

Cert · Karnataka · Private Ltd
Valid

cert_001_karnataka_valid.pdf

Acme Corp Pvt Ltd · GSTIN 29ACMEA1234B1Z5 · Karnataka (state 29)
Checks4 / 4all PASS
ConstitutionPvt Ltdextracted verbatim
Fields9 / 9all populated

Demonstrates a clean end-to-end pass on all four validations including the base-36 Luhn checksum. Approving authority details (Superintendent of Central Tax, Bengaluru West Commissionerate) extracted into a structured field alongside the principal address.

Cert · Maharashtra · LLP
Valid

cert_002_maharashtra_llp.pdf

Acme Corp LLP · GSTIN 27ACMEA1234B1Z5 · Maharashtra (state 27)
Checks4 / 4all PASS
ConstitutionLLPverbatim
Checksum charFverified

Demonstrates multi-state support and verbatim extraction of the “Limited Liability Partnership” constitution value. The LLP constitution affects TDS rate applicability under Section 194C vs. 194J of the Income Tax Act; a normalisation error here would misclassify the deduction category for all future payments to this vendor.

Cert · Haryana · Tampered
Invalid

cert_003_tampered_invalid.pdf

Acme Corp Sales · Haryana (state 06) · Checksum FAIL
StructurePASS15 chars, Z at 14
ChecksumFAILO ≠ 4
RoutingExcep.queue

The GSTIN passes every visual and structural test — 15 characters, numeric state code 06, Z at position 14 — which is exactly what a manual review would have confirmed and cleared. The base-36 Luhn computation over positions 1–14 produces the expected check character 4, not the extracted character O. Without automated checksum validation, this certificate would have entered the vendor master undetected.

Sample Results in Detail

Running the workflow against all three certificates produced the following outcomes. Certificates cert_001_karnataka_valid.pdf (GSTIN 29ACMEA1234B1Z5, Karnataka, Private Limited Company) and cert_002_maharashtra_llp.pdf (GSTIN 27ACMEA1234B1Z5, Maharashtra, LLP) each passed all four validations: GSTIN structure, base-36 Luhn checksum, legal name presence, and principal address presence. Across the two clean certificates, 8 of 8 validation checks resolved to PASS, all nine structured fields were populated in each record, and the constitution values — “Private Limited Company” and “Limited Liability Partnership” respectively — were extracted verbatim without normalisation.

Certificate cert_003_tampered_invalid.pdf (Haryana, state 06) demonstrated the checksum validation’s practical value. The GSTIN passed the structural check — 15 characters, state code 06, Z at position 14 — precisely what a visual review would have confirmed and cleared. However, the base-36 Luhn computation over positions 1–14 produced the expected check character 4, not the extracted character O. The workflow fired a FAIL on the checksum check, set the inbox badge to Invalid, and routed the record to the exception queue. The Excel export logged the extracted character, the computed expected character, and the timestamp of the validation — the exact evidence chain required under ICAI SA 230 (Audit Documentation) for a controller’s working paper file.

Why Automation Wins Here

For a vendor onboarding team processing 50–300 Form GST REG-06 certificates per month, automated GSTIN validation replaces an estimated 25–50 person-hours of manual field transcription per onboarding cycle with a deterministic four-check pipeline that runs in under 30 seconds per certificate — and catches the one class of error that no manual process can reliably detect at scale.

5–10 min → <30 s
Per-certificate processing time from PDF to validated vendor master record
25–50 hrs
Person-hours saved per cycle on a 300-vendor onboarding workload
100%
GSTINs verified against the base-36 Luhn algorithm, not just format-checked
0
False-pass tampered GSTINs admitted to the vendor master undetected

Checksum errors caught before vendor master entry

Applying the GSTN base-36 Luhn algorithm deterministically over every certificate — not just format-checking the 15-character length — catches forged, OCR-corrupted and copy-paste-transposed GSTINs that pass all structural rules. These are the GSTINs that would later surface as ITC reversal demands under Section 50 of the CGST Act if admitted undetected.

Audit-ready evidence, every certificate

Each certificate run produces a structured Excel artifact — extracted fields, per-check validation results, computed vs. extracted check characters, timestamp — that meets the documentation standard under ICAI SA 230 (Audit Documentation) and satisfies the due-diligence evidence requirement established by Circular No. 183/15/2022-GST. Attachable directly to the vendor onboarding working paper without further preparation.

ERP-ready output, zero manual re-keying

The structured Excel export maps directly to vendor master fields in Tally Prime, Zoho Books, SAP Business One, and Oracle NetSuite — GSTIN, legal name, constitution, registration type and principal address all populated in the correct columns. Constitution extracted verbatim (not normalised) preserves the TDS rate distinction between “Private Limited Company” and “Limited Liability Partnership” without downstream correction.

Frequently Asked Questions

The questions compliance controllers, vendor onboarding managers and internal auditors ask before deploying GST certificate validation automation.

Which specific regulation governs the GSTIN format and checksum that this workflow validates?

The GSTIN format is specified by the Goods and Services Tax Network (GSTN) under the Central Goods and Services Tax Act, 2017 (CGST Act). The 15-character structure — two-digit state code, ten-character PAN, entity number, Z at position 14, and a base-36 Luhn check character at position 15 — is defined in the GSTN technical specification for taxpayer registration. The checksum algorithm is a base-36 variant of the Luhn algorithm and Cadel implements it deterministically: there is no heuristic or probabilistic element. Form GST REG-06 is the prescribed certificate format under Rule 10(1) of the CGST Rules, 2017.

Can a GSTIN pass the structural check but fail the checksum check, and why does that matter?

Yes. A GSTIN can have exactly 15 characters, a valid numeric state code, and Z at position 14 — satisfying all structural rules — while carrying an incorrect check character at position 15. This is precisely the scenario demonstrated by cert_003_tampered_invalid.pdf, where the extracted character is O but the computed value is 4. Manual alteration of a GSTIN on a PDF, OCR misreads of ambiguous characters (e.g., 0 vs O), and copy-paste errors in vendor master data all produce structurally valid but arithmetically invalid GSTINs. A format-only check cannot catch these errors; only the checksum computation can.

How does this workflow integrate with Tally, Zoho Books, or other accounting systems used by mid-market companies?

Cadel produces a structured Excel export containing all extracted fields — GSTIN, legal name, trade name, constitution, principal address, registration type, date of liability, validity dates and approving authority — along with a per-certificate validation status column. This file can be imported directly into Tally Prime’s ledger creation screen, Zoho Books’ contact master, or any ERP vendor master via the standard CSV/Excel import. For companies running SAP Business One or Oracle NetSuite, the same export maps to vendor master fields without transformation.

What audit trail does Cadel create when processing GST registration certificates?

Each uploaded PDF is assigned a unique document ID, and the workflow records the extracted field values, the raw OCR output, the four validation outcomes (structure, checksum, legal name, principal address), and a timestamp for each step. Validation failures are logged with the computed vs. extracted check character so that an internal auditor or a GST practitioner can confirm the finding without re-running the calculation. The exception queue preserves all failed records in their original state alongside the failure reason, satisfying the documentation requirements under ICAI SA 230 (Audit Documentation) for evidence of third-party credential verification.

Does the workflow handle certificates for all GST constitution types, including proprietorships and LLPs?

Yes. The constitution field is extracted as a free-text string directly from the certificate, so values such as “Proprietorship”, “Partnership Firm”, “Limited Liability Partnership”, “Private Limited Company”, “Public Limited Company”, and “HUF” are all captured verbatim without normalisation. This matters because the constitution type determines TDS rate applicability: under Section 194C of the Income Tax Act, payments to a company attract 2% TDS while payments to an LLP or individual attract 1% — a misclassification caused by silent normalisation would persist in every payment run for that vendor.

Is this workflow applicable to GST certificates issued in all Indian states and union territories?

Yes. The state code occupies characters 1–2 of the GSTIN and ranges from 01 (Jammu & Kashmir) to 38 (Ladakh), covering all 28 states and 8 union territories recognised under the CGST Act. The workflow validates the numeric format of the state code as part of the structural check but does not restrict processing to any specific state. The three demo certificates cover Karnataka (29), Maharashtra (27) and Haryana (06), illustrating multi-state operation.

Does Cadel perform live GSTN portal verification in addition to local checksum validation?

The checksum workflow runs fully offline using the deterministic GSTN base-36 Luhn algorithm — no portal lookup is required or performed during extraction. This makes the process instantaneous and auditable: the computed expected check character is logged alongside the extracted character for every certificate. For live GSTN portal verification (to confirm the GSTIN is currently active and not cancelled), the validated GSTIN list produced by this workflow can be fed into a separate online verification step via the GSTN taxpayer search API, which is a distinct process outside the scope of this certificate extraction workflow.

Can the workflow detect a GSTIN that was valid at issuance but has since been cancelled or suspended?

The checksum validation confirms that the GSTIN was structurally correct and arithmetically valid at the time of its issuance — it cannot detect subsequent cancellation or suspension, which requires a live GSTN portal query under the CGST Act’s registration management provisions. The workflow is designed as a fast, offline first-pass filter: it eliminates forged or corrupted certificates immediately (which a portal query cannot prioritise), and the validated GSTIN list can then be submitted for portal status checks in bulk. This two-stage approach is consistent with the due-diligence standard described in Circular No. 183/15/2022-GST.