KYC Verification
Aadhaar + PAN + GST + cheque + sanctions + PEP — kyc verification software in under 45 seconds per bundle.
The Problem
Manual customer KYC automation for a single applicant bundle takes 35–45 minutes — classifying 9 document types, keying fields into a checklist, comparing the applicant’s name across four documents, and validating the GSTIN-PAN substring rule. At 50 bundles/month that’s ~30 hours just on classification, before any cross-reference work begins.
9 doc types per bundle, one checklist
Aadhaar, PAN, GST REG-06, Excise Licence CL-7 or CL-9, cancelled cheque, KYC Top Sheet, Group Syndicate Form, NOC — classified by hand into a spreadsheet checklist. CL-7 vs CL-9 are visually identical except for one premises-type clause, so misclassifications are routine.
GSTIN verification & PAN substring rule
Rule 10 of the CGST Rules requires characters 3–12 of the GSTIN to equal the PAN. A single position mismatch invalidates the GSTIN — but is invisible to an analyst reading the document. Manual GSTIN verification at scale produces silent ITC-claim risk.
Cheque IFSC missing on older leaves
Older cheque leaves carry only the MICR band and no printed IFSC — leaving the bank account uncorroborated against the RBI MICR-IFSC master. Beneficiary disbursements proceed to unverified accounts unless the analyst manually resolves the MICR.
Name reconciliation across 4 documents
Aadhaar carries the full legal name. PAN may show initials. Cheque has the beneficiary chain. CL-7/CL-9 has the licence holder. Transliteration variations, initial expansion, and surname tokenisation mean string-equality fails — and a 0.85 similarity threshold has to be applied manually.
Per month spent on classification and field extraction alone for a 50-bundle compliance workload — before any cross-reference work begins. The consequence of a missed exception isn’t theoretical: an unflagged GSTIN-PAN mismatch or a beneficiary cheque whose name doesn’t reconcile against the licence holder triggers recoverable disbursement, Section 12AA suspension proceedings under the Karnataka Excise Act, and an internal audit flag.
Why It Matters: Regulatory Context
KYC sits at the intersection of four mandatory regulatory regimes — PMLA, RBI Master Direction, CGST Rules, and the relevant state licensing act. Each one carries enforceable consequences for unverified beneficiaries.
Anti-money-laundering identity proof
The Prevention of Money Laundering Act 2002 requires every regulated entity to verify customer identity, screen against the OFAC and EU sanctions lists, and check for politically-exposed-person status. The audit trail must persist for 5 years after relationship closure.
Risk-based KYC tiers
RBI’s Master Direction on KYC mandates risk-based customer due diligence with periodic re-verification. Banks and NBFCs face penalties for onboarding beneficiaries without aadhaar-PAN linkage and beneficial-owner identification — both required at account-opening, not later.
GSTIN format & PAN linkage
Rule 10 of the CGST Rules requires characters 3–12 of the 15-character GSTIN to equal the supplier’s PAN. A mismatch invalidates the GSTIN for ITC purposes — manual GSTIN verification at the document level catches what Tally Prime and SAP do not at master-record import.
Beneficiary-name matching for disbursements
State-level licensing acts (e.g. Karnataka Excise Act 1965 — Section 12AA suspension) require that the cheque beneficiary name match the licence holder for every OWNED outlet. Mismatched beneficiaries trigger recoverable disbursements and internal audit flags.
What This Workflow Automates
Seven deterministic steps that classify, extract, cross-reference and verdict a KYC bundle in ~45 seconds. Every output field traces to its source document and validation rule.
Bundle ingestion & doc classification
Ingests the applicant bundle and classifies each file into one of nine registered types — using the premises-type clause to disambiguate CL-7 from CL-9 instead of the layout.
Structured field extraction
12-digit Aadhaar number + masked-Aadhaar flag, 10-character PAN, 15-character GSTIN, licence register number and excise year, MICR line, account number, IFSC, beneficiary name, and the outlet table from the Group Syndicate Form.
GSTIN verification (Rule 10 substring)
Validates that characters 3–12 of the GSTIN equal the PAN. On mismatch, the workflow logs the exact character positions that diverge and issues a FAIL on the GST document — preventing downstream ITC claim risk.
Cheque MICR → IFSC resolution
For older cheque leaves with no printed IFSC, resolves the MICR code against the RBI MICR-IFSC master. Marks the IFSC field FAIL if resolution does not return a single valid bank-branch record.
Name cross-reference across 4 docs
Compares applicant name across Aadhaar, PAN and cheque using initial expansion, surname tokenisation and transliteration-tolerant matching. Records a similarity score; below 0.85 routes to NEEDS_REVIEW with the character-level diff retained.
Group Syndicate Form reconciliation
Reconciles each outlet row against its CL-7 or CL-9 licence holder, tags the row as OWNED or LEASED, and applies the beneficiary-name match only to OWNED rows — preventing false fails on leased premises.
Per-doc & per-bundle verdict
Issues PASS / FAIL / NEEDS_REVIEW at both document and bundle level. Writes a structured JSON verdict file plus a human-readable verification report listing every field, its source document, its validation rule and its outcome.
Edge Cases We Simulate
The workflow ships with a battery of synthetic test scenarios that exercise every failure mode we have seen in real-world data. Each scenario produces a deterministic outcome that an auditor or controller can verify in seconds.
Name Variance Across Documents
Cancelled Cheque Without IFSC
GSTIN PAN Mismatch
Group Syndicate With Mixed Owned and Leased Outlets
Aadhaar Masked Or Partially Redacted
Sample Documents
Seeded sample files used to demonstrate this workflow. Each one exercises a specific scenario or failure mode.
Demonstrates 12-digit Aadhaar parsing, QR signature check, and address extraction across front and back pages.
Validates 10-character PAN format ACMEA1234B and cross-references the 4th character against entity type.
Cancelled cheque used to extract account number, IFSC, MICR, and account-holder name for beneficiary verification.
Form GST REG-06 with 15-character GSTIN, used for PAN-GSTIN substring check and trade-name match.
Karnataka Department of Excise Form CL-9 for a bar/restaurant; tests CL-7 vs CL-9 classifier disambiguation.
Multi-outlet syndicate filing with beneficiary bank table; tests owned-vs-leased reconciliation logic.
Why Automation Wins Here
A 35–45 minute per-bundle manual review collapses to under a minute of automated processing plus targeted analyst attention only on NEEDS_REVIEW items. At 50 bundles/month the compliance team recovers ~28 hours and eliminates the four most common error classes that drive disbursement clawbacks.
GSTIN verification at the document level
The CGST Rule 10 substring identity (GSTIN[3:13] == PAN) is checked on every document, with the exact character positions of any divergence logged — catching the silent ITC-claim risk that Tally Prime and SAP miss at master-record import.
CL-7 vs CL-9 disambiguated by content
The two Karnataka Excise forms share header, layout and signature block. The workflow reads the premises-type clause directly and flags any divergence from the licence type declared in the KYC Top Sheet, routing to NEEDS_REVIEW rather than passing silently.
PMLA audit trail, every onboarding
Verdict JSON, field-level extraction report and cross-reference log written as a single artifact bundle keyed to register number + excise year — satisfying Rule 9 of the PMLA Maintenance of Records Rules 2005 without manual evidence assembly.
Frequently Asked Questions
The questions accountants and finance controllers ask most often before deploying this workflow.
The verification logic is mapped to the RBI Master Direction — Know Your Customer (KYC), 2016 (updated), the Prevention of Money Laundering Act, 2002 and PMLA Rules 2005, and the customer-due-diligence checks required under Section 12 of PMLA. Document-level checks (PAN format, GSTIN PAN-link, Aadhaar QR signature) follow the issuing authority's published specifications.
By default the workflow parses scanned or PDF Aadhaar copies and validates the QR code signature offline using UIDAI's public key. Live Aadhaar e-KYC and OTP-based authentication require a UIDAI AUA/KUA licence and can be wired in through a registered authentication agency; the workflow exposes the hook but does not bypass licensing requirements.
Each document produces field-level checks (format, issuer signature, expiry) and each cross-document rule (PAN ↔ GSTIN, name match across Aadhaar/PAN/cheque, licence holder ↔ bank beneficiary) produces a boolean. A bundle returns PASS only when every mandatory check passes, FAIL when any deterministic check fails (e.g., invalid PAN checksum), and NEEDS_REVIEW when checks are indeterminate (masked Aadhaar, fuzzy name match below threshold).
Yes. The Group Syndicate Form is parsed into a list of outlets, each tagged OWNED or LEASED, and reconciled individually against its CL-7 or CL-9 licence and the syndicate's beneficiary bank account. The verdict is reported per outlet and rolled up at the syndicate level.
Every run stores the original document hash (SHA-256), the OCR/extraction output, the rule set version, the per-rule pass/fail with the exact field values compared, and the final verdict with timestamp and operator ID. The trail is exportable as a signed PDF workpaper that aligns with ICAI SA 230 Audit Documentation requirements.
The workflow exposes a REST API and webhook callbacks; document bundles can be pushed from a customer-onboarding portal, Tally, SAP, or NetSuite, and verdicts are returned as structured JSON. Vendor master records in the ERP can be auto-flagged as KYC_VERIFIED or KYC_HOLD based on the verdict.