Clean Baseline
ACMEA1234B (4th char 'A' = trust, valid), GSTIN 27ACMEA1234B1Z5, state code 27 (Maharashtra) matches address, IFSC ACME0001234 follows 4+0+6 pattern.PAN structure, GSTIN Luhn checksum, and state-code cross-check — customer onboarding KYC validated in under 30 seconds per form.
Mid-market AR teams onboard 20–200 new customers per month. Every form carries at least 13 discrete fields that require customer KYC validation — and the checks that matter most are exactly the ones that manual review skips under time pressure.
The 15th character of every GSTIN is a base-36 Luhn check digit defined under Rule 10 of the CGST Rules 2017. Computing it requires encoding each of the first 14 characters into base-36 and running the Luhn algorithm — a step that takes minutes per record and is universally skipped in manual onboarding. A fabricated or mis-transcribed GSTIN passes visual inspection but fails the algorithm.
Under Rule 10A of the CGST Rules 2017, a GSTIN is derived directly from the entity's PAN: characters at positions 3–12 of the GSTIN must be identical to the standalone PAN. An AR analyst comparing two strings of similar-looking alphanumerics across two fields on a physical form misses transpositions. A customer submitting a valid-looking but mismatched GSTIN can later claim the GSTIN was a data-entry error — after ITC has already been claimed against it.
The first two digits of a GSTIN encode the registration state under Section 22 of the CGST Act 2017. A customer with a Karnataka registered address (state code 29) submitting a GSTIN beginning with 07 (Delhi) has either a stale GSTIN from a previous registration or is using the wrong identifier. This mismatch is visible in the form data but requires a side-by-side comparison that manual review does not systematically perform.
A missing billing email discovered at month-end delays invoice dispatch and distorts DSO. A missing bank account blocks payment terms. Required-field validation — trivial to automate — is the check most likely to be waived when a sales team is pressuring finance to clear the customer quickly. The cost is paid weeks later when the AR aging report shows a customer who cannot receive e-invoices because no email address was ever captured.
The manual cycle time to process a batch of 50 new customer KYC forms — visual PAN inspection, GSTIN structure review, base-36 Luhn computation, PAN-embed comparison, state-code cross-reference, and required-field audit — across an AR team that doesn't have a dedicated KYC operations desk. During that window, e-invoices are blocked, ITC is at risk, and new customer accounts sit idle. Under Section 16(2)(aa) of the CGST Act 2017, any ITC claimed against a GSTIN not reflected correctly in GSTR-1 is reversible with interest under Section 50.
Customer KYC in India sits at the intersection of three regulatory frameworks plus RBI KYC Master Directions. Every validation rule in this workflow maps directly to a specific statutory or regulatory provision — not a best practice, but a compliance obligation.
CBDT Rule 114 prescribes the exact 10-character PAN format: characters 1–5 and 10 are letters; characters 6–9 are digits; the 4th character encodes the entity type from the set {P, C, H, F, A, T, B, L, J, G}. A PAN with any character outside this specification cannot have been issued by the Income Tax Department in its standard format and is therefore either fabricated or transcribed incorrectly. Onboarding a customer against an invalid PAN exposes every subsequent invoice and payment to scrutiny.
Rule 10A of the CGST Rules 2017 specifies that a GSTIN is constructed by prefixing the state code and a registration sequence to the entity's PAN, and appending a Luhn check digit. Characters at positions 3–12 of the GSTIN must be identical to the standalone PAN. Any divergence between these two fields — whether from a data-entry error or a deliberate substitution — means the GSTIN cannot have been legitimately issued for that entity. The Luhn checksum at position 15 is computationally verifiable without any external API call.
Section 22 of the CGST Act 2017 requires GST registration in the state where the entity's principal place of business is located. The two-digit state code embedded in GSTIN positions 1–2 must therefore correspond to the entity's declared registered address. A GSTIN with a Delhi prefix (07) on a form declaring a Karnataka address (29) indicates either stale data from a prior registration or the submission of a GSTIN belonging to a different legal entity. Raising invoices against such a GSTIN exposes the buyer's ITC under Section 16(2)(aa).
For companies onboarding financial counterparties — distributors, channel partners, or customers with credit terms — RBI KYC Master Directions require the collection and verification of identity documents as part of the customer due diligence (CDD) process. The Cadel workflow's validation log, which records the extracted field values and the pass/fail result of each check, satisfies the documentary evidence requirements under the CDD framework and supports the audit trail expected under ICAI SA 230 (Audit Documentation) for the onboarding workpaper package.
Seven deterministic validation passes from PDF upload to a PASS/FAIL customer master record — in under 30 seconds per form. Every check is computable from the document itself, with no external API dependency.
Cadel ingests the uploaded Customer KYC & Onboarding Form (PDF) and extracts all 13 structured fields — legal name, trade name, PAN, GSTIN, constitution, registered address, state, state code, contact person, billing email, phone, bank account, and IFSC — using document-type recognition trained on the kyc_form schema.
Confirms the PAN is exactly 10 characters, that characters 1–5 and 10 are letters, that characters 6–9 are digits, and that the 4th character belongs to the CBDT-permitted entity-type set {P, C, H, F, A, T, B, L, J, G} as specified under Section 139A of the Income-tax Act 1961 read with Rule 114. Any deviation raises a FAIL on the pan_structure rule.
Confirms the GSTIN is exactly 15 characters, that position 14 is the literal character Z, and that the two-digit state-code prefix corresponds to a valid Indian state or union territory code under the CGST Rules. Structural failure here blocks all downstream GSTIN checks.
Encodes characters 1–14 of the GSTIN into base-36 (digits 0–9 at face value; letters A–Z as values 10–35), applies the Luhn algorithm, and compares the computed check character against position 15. A mismatch raises a FAIL on gstin_checksum per Rule 10 of the CGST Rules 2017. The input values and computed check character are written to the audit log.
Extracts characters at GSTIN positions 3–12 and compares them character-by-character against the standalone PAN field on the form. A divergence raises a FAIL on pan_in_gstin_match per Rule 10A of the CGST Rules 2017, and the exception record surfaces both strings side-by-side for manual comparison.
Compares the two-digit state code embedded in GSTIN positions 1–2 against the state_code field declared on the form. A mismatch raises a FAIL on state_code_match per Section 22 of the CGST Act 2017. The exception record displays both values — e.g., GSTIN-embedded 07 (Delhi) against declared 29 (Karnataka) — so the controller can request a corrected certificate.
Checks that legal name and billing email are present and non-null. Any blank or null value raises a FAIL on required_fields, holding the record in the exception queue until the field is supplied. Records that pass all six checks receive a green PASS badge and are written to the customer master. An Excel export of the full master — with per-rule validation status — is available immediately.
Six synthetic test scenarios ship with the workflow, each engineered to trigger exactly one deterministic failure mode. Every scenario produces a PASS or FAIL that an auditor or controller can verify in seconds against the validation rule definition.
ACMEA1234B (4th char 'A' = trust, valid), GSTIN 27ACMEA1234B1Z5, state code 27 (Maharashtra) matches address, IFSC ACME0001234 follows 4+0+6 pattern.ACMEA1234B but GSTIN positions 3–12 embed a different 10-character string. Under Rule 10A of the CGST Rules 2017, the two must be identical; any divergence indicates a data-entry error or a forged GSTIN.pan_in_gstin_match raises a FAIL; exception record surfaces both the submitted PAN and the GSTIN-embedded string side-by-side for manual review.07 (Delhi) while the registered address is Bengaluru 560035, Karnataka (state code 29). Section 22 of the CGST Act 2017 requires registration in the state of the principal place of business.state_code_match raises a FAIL; exception record displays the GSTIN-embedded state code (07) and the declared state code (29) side-by-side so the controller can request a corrected certificate.X as the 4th character. CBDT Rule 114 restricts the 4th character to the set {P, C, H, F, A, T, B, L, J, G} encoding the taxpayer entity type. X is not in this set, so the PAN cannot have been legitimately issued.pan_structure raises a FAIL; onboarding is blocked and the exception record lists the specific position (4), the submitted character, and the allowed-value set.billing_email field is blank on the submitted form. Without it, system-generated invoices and e-way bill dispatch notifications cannot be delivered, and the customer cannot be linked to the AR aging module.required_fields raises a FAIL; the record is held in the exception queue until a valid email address is supplied. No incomplete master entry is created.gstin_checksum raises a FAIL; the workflow displays the expected check character alongside the submitted character so the operations team can request a corrected GSTIN certificate from the customer.Five seeded KYC forms ship with the workflow — one clean baseline that passes every check and four exception scenarios each triggering exactly one deterministic FAIL. All 13 fields were extracted correctly from every PDF.
ACMEA1234B · GSTIN 27ACMEA1234B1Z5 · State 27 (MH) · IFSC ACME0001234
Baseline form confirming the happy-path master write. PAN 4th character 'A' (valid entity-type code), GSTIN Luhn checksum verified, PAN embed at positions 3–12 matches standalone PAN, state code 27 consistent with Maharashtra address.
Structurally valid GSTIN (checksum passes) but PAN embed mismatch caught. Exception record surfaces the submitted PAN and the GSTIN-embedded string side-by-side. This is the hardest failure to catch manually — the GSTIN looks valid on its own.
07 (Delhi)
The GSTIN passes Luhn checksum and PAN embed checks — it is structurally valid. Only the cross-field state-code consistency check (GSTIN prefix vs. declared state) catches the discrepancy. This class of error is the most consequential under Section 16(2)(aa) and the hardest to detect manually.
X is outside CBDT-permitted entity-type set
Exception record lists the specific position (4), the submitted character (X), and the complete allowed-value set {P, C, H, F, A, T, B, L, J, G} per CBDT Rule 114. Onboarding is blocked until the customer supplies a corrected PAN.
billing_email field is null on submitted form
All PAN and GSTIN checks pass; record is blocked solely on the missing email. The exception queue entry specifies the missing field, preventing creation of an incomplete customer master entry that would surface as an invoice delivery failure at month-end.
Running the workflow against the five demo forms produced the following aggregate outcomes: 1 form received all-PASS status across all six rules and was written to the master; 4 forms each triggered exactly one deterministic FAIL. The state-code mismatch exception is worth highlighting: the GSTIN in that form passed both the Luhn checksum and the PAN embed check — it was structurally well-formed — and was only caught by the cross-field consistency rule. This is the error class most consequential under Section 16(2)(aa) of the CGST Act 2017 and the one most consistently missed in manual review.
For a team processing 50 new customer KYC forms per onboarding cycle, the manual path — visual PAN inspection, GSTIN format review, base-36 Luhn computation, PAN-embed comparison, state-code cross-reference, and required-field audit — takes two to three working days and reliably misses cross-field checks. Cadel reduces that cycle to under 30 seconds per form with zero ambiguity on every rule result.
PAN-embed mismatches and state-code inconsistencies are caught before the customer enters the master — preventing e-invoices from being raised against a GSTIN that doesn't match the counterparty's actual registration. A GSTIN error discovered post-invoice can expose the buyer's ITC to reversal under Section 16(2)(aa) of the CGST Act 2017, plus interest under Section 50. Catching it at onboarding costs nothing; catching it in a GST audit is expensive.
Every KYC form processed generates an immutable validation log recording the document hash (SHA-256), extraction timestamp, all 13 extracted field values, and the per-rule PASS/FAIL result with the specific failure reason. This log satisfies ICAI SA 230 (Audit Documentation) requirements and provides the customer due-diligence evidence trail expected under RBI KYC Master Directions 2016. No manual reformatting required before filing in the onboarding workpaper package.
The validated customer master record is exported as an Excel file importable into Tally Prime (via Tally Data Exchange XML), SAP Business One (via Data Transfer Workbench), or any ERP that accepts CSV/XLSX customer master uploads. For companies running SAP S/4HANA or Oracle NetSuite, a JSON webhook can push the validated record directly into the customer master endpoint. The exception queue carries only genuine anomalies — not noise from missing fields that were actually present but not extracted.
Questions finance controllers and AR teams ask before deploying customer KYC automation.
PAN structure validation derives from Section 139A of the Income-tax Act 1961 read with CBDT Rule 114, which prescribes the 10-character alphanumeric format and restricts the 4th character to entity-type codes {P, C, H, F, A, T, B, L, J, G}. GSTIN validation — including the 15-character structure, state-code prefix, and base-36 Luhn check digit — is governed by Rule 10 and Rule 10A of the CGST Rules 2017, framed under Section 22 of the CGST Act 2017. The requirement that GSTIN positions 3–12 embed the entity's PAN is explicitly stated in the GSTIN format specification issued by the GSTN.
The workflow applies the base-36 Luhn algorithm over the first 14 characters of the GSTIN — treating digits 0–9 at face value and letters A–Z as values 10–35 — and compares the computed check character against position 15. The algorithm steps, input string, computed value, and submitted value are all written to the validation audit log, which is exportable as a timestamped XLSX or JSON file. An internal auditor can reproduce the check independently using the same inputs from the log without requiring access to the Cadel platform.
The workflow performs deterministic document parsing and structural validation only — it does not call UIDAI's Aadhaar e-KYC API or any external government registry. All six checks (PAN structure, GSTIN format, GSTIN Luhn checksum, PAN-in-GSTIN embed, state-code match, and required-field gate) are fully computable from the document itself, with no network dependency and no per-query cost. For workflows that require live GSTN status verification (e.g., checking whether a GSTIN is active or cancelled), the validated GSTIN output can be passed to a downstream API-based lookup step.
Cadel ingests the KYC form as a PDF and outputs a validated customer master record as an Excel file that can be imported into Tally Prime (using the Tally Data Exchange XML format after a one-step mapping), SAP Business One (via the Data Transfer Workbench), or any ERP that accepts CSV/XLSX customer master uploads. For companies already running SAP S/4HANA or Oracle NetSuite, a JSON webhook can push the validated record directly into the customer master endpoint without manual file transfer.
Failed records are routed to a dedicated exception queue rather than rejected outright. Each queued record displays the specific validation rule that failed, the submitted value, and — where applicable — the expected value or allowed set (for example, the list of valid 4th-character entity-type codes for PAN, or both state codes in a state-code mismatch). A controller or operations staff member can correct the form, resubmit it through the same workflow, and the record will pass through all six validation checks again before being written to the master. No failed record is written to the customer master until all FAIL-severity checks are cleared.
A single customer legal entity can have up to 37 GSTINs — one per state or union territory in which it holds a principal or additional place of business. The workflow validates each GSTIN submitted on the KYC form independently: structure, checksum, PAN embed, and state-code consistency are each checked per GSTIN row. If the same PAN is associated with multiple GSTINs across different state codes, the workflow creates one customer master record per GSTIN and links them by the common PAN, preserving the multi-state registration structure required for place-of-supply determinations under Section 12 and Section 13 of the IGST Act 2017.
Every KYC form processed generates an immutable validation log that records the document hash (SHA-256), extraction timestamp, all 13 extracted field values, and the pass/fail result of each of the six validation rules — with the rule name, severity, input value, and failure reason stored as structured data. This log satisfies the documentation requirements under ICAI SA 230 (Audit Documentation) and supports the customer due-diligence evidence trail expected under RBI KYC Master Directions 2016 (updated 2023) for financial counterparty onboarding. The log is exportable in XLSX and JSON formats and is retained for the duration configured by the organization's data-retention policy.
When both PAN and GSTIN fields are null on a KYC form — as with a foreign entity that is not registered under the Indian GST regime — the workflow skips all five PAN/GSTIN validation rules and checks only the required-field gate for legal name and billing email. The customer master record is created with a clear flag indicating that GST validation was bypassed due to absent identifiers, and the record is exportable in the same XLSX format as domestic customer records for import into Tally Prime or any other ERP.
The workflow's validation rules are deterministic and document-based — they verify structural correctness (format, Luhn checksum, PAN embed, state code) but do not call the GSTN public API to check registration status or cancellation. A structurally valid GSTIN with a correct checksum and matching PAN embed will pass all six checks even if the registration has been cancelled by the tax authority. For customers where live status verification is required, the validated GSTIN output from this workflow can be passed to a downstream GSTN API lookup step as a separate control.