Keywords

1 Introduction

Resident identification systems are pervasive in the world today, with many using biometrics [15]. These systems hold and mediate vast amounts of private data, which in many cases is also used to facilitate welfare schemes and other public programs. Aadhaar is a 12-digit unique ID issued by the Indian government to each Indian resident (not citizen), using their demographic and biometric information. To date, over 1.3 billion residents have been enrolled [34]: it is the largest biometric identity system ever built and is linked to bank a counts, income tax numbers, social security schemes, etc. And while Aadhaar is technically not required for many things (such as getting a new cellular connection), its ubiquity has rendered it the default form of identification in India.

Though public trust in Aadhaar is crucial, the system has been relatively opaque, leading to much confusion and speculation. Civil activists [4] and media outlets [41] have alleged that Aadhaar is vulnerable to numerous types of breaches; corroborating these claims is difficult as there exists no comprehensive resource detailing Aadhaar’s system and security architecture. Public documentation about Aadhaar is outdated or ambiguous, and no unified description of the infrastructure exists. As a result, one has to collate information from multiple (often unreliable) sources. We present the first comprehensive description of Aadhaar, analyze all reported privacy or security breaches, and assess defenses against future attacks. We also report the first knownFootnote 1 cryptographic issue (fortunately not exploitable at scale under current conditions) in the system.

Contributions. Comprehensive snapshot: We outline the journey of an individual’s data through the Aadhaar system and the entities involved (for data collection, processing, storage, and usage), covering the entire body of publicly available information on Aadhaar. Previous work has looked at authentication or verification, etc. [4, 30], but none have covered the whole infrastructure.

Security Flaws: We analyze all documentation made public by UIDAI—trawling through thousands of pages over time—as well as all alleged attacks to compile and analyze possible security issues. We find that the way Aadhaar generates IVs for AES (it uses AES-GCM) opens up the possibility to mount an identity forgery attack and steal data. We note that the attack is not currently deployable: we have made sure that this is not exploitable before publishing. However, any batching of queries or capture of multiple messages within the same second may still render the system insecure. Specifically, one could forge the identity of any individual whose Aadhaar number is availableFootnote 2.

1.1 Paper Overview

Section 2 provides a brief background and discusses related work. A list of all abbreviations, in order of appearance, is provided in Appendix B. Section 3 describes Aadhaar’s infrastructure in detail (along with data privacy and security policies)Footnote 3 This snapshot is divided into the following main sections: the Enrollment Ecosystem (Sect. 3.1), the Authentication Ecosystem (Sect. 3.2), the Central Identities Data Repository or CIDR (Sect. 3.3). Section 4 details the security of different endpoints at which an individual’s data is vulnerable to attacks. Section 5 discusses information security in Aadhaar, using standard benchmarks. We define the threat model and discuss a cryptographic flaw we identified and its mitigation strategies (Sect. 5.2). We use the threat model along with the snapshot, in Sect. 6, to filter legitimate attacks from our database of media allegations (Sect. 6). We discuss possible attacks, categorize the feasibility of these breaches based on the threat actor involved, cost (time and resources) and the level of security provided by Aadhaar (Sect. 6). Section 6 discusses technical and structural mitigation strategies for each type of breach. A study of alleged attacks is provided in supplementary analysis Appendix C.

2 Background

The Unique Identification Authority of India (UIDAI) was established in January 2009. Its mission was to issue a unique identification (UID) number, an “Aadhaar Number,” to every resident of the country. The UID’s purpose was to be a one-stop identification that is eventually linked to every social service to make the disbursement of welfare services effective and efficient (by reducing leakages). The bill that provides legal backing to Aadhaar is called the “Aadhaar (Targeted Delivery of Financial and other Subsidies, benefits and services) Act.” Apart from providing Indian residents with a unique identity (an Aadhaar number), the UIDAI is also responsible for providing a platform for residents to authenticate their physical presence [63] at a point of service. Aadhaar’s policies regarding its vision, ethical implications, data security, and privacy have been under intense scrutiny [20].

This becomes all the more important with Aadhaar’s ubiquity. It is different from login.gov [5, 11], for example. It is not merely a single point of contact system for welfare. Aadhaar is what you can use to get on a plane, to open a bank account, to get a phone connection. Getting tested or vaccinated for COVID-19? Aadhaar. It is MOSIP [40] on steroids: closed-source, universal, and practically (although not officially) mandatory.

2.1 Related Work

National identification projects of many countries have attracted considerable academic research—Jamaica’s attempt [33], Nepal’s National Identity Project (NIDP) [3], UAE’s ID system [6], Europe’s e-ID systems [9], United States’ Social Security Number [18], etc. Being the world’s largest biometric ID system, India’s Aadhaar has been an active research topic in the areas of ICTD, HCI, security, and privacy. Singh and Jackson [35] perform an ethnographic study of Aadhaar. They find exclusion of people in various phases: during enrollment, while authenticating, and while linking (“seeding”) their Aadhaar numbers with existing public welfare databases (like the Public Distribution System database). Srinivasan and Johri [36] draw similarities between the legitimization and support tactics of Aadhaar and previously successful infrastructure projects like railroads in British India and dams in post-Independence India.

Prior security and privacy works have recommended using a Trust and Role-Based Access Control Model for internal Aadhaar processes and using cryptography to prevent illegal tracking and profiling [30]. Rajput and Gopinath [31] have analyzed the privacy of authentication workflows offered by Aadhaar and recommended new ones. The work of Agrawal, Banerjee and Sharma [4], though relatively informal, is the closest to ours. It provides a broad analysis of Aadhaar’s vulnerabilities like faking biometrics, identification without consent, and illegal tracking by collation of data across service providers. Our work differs from these: we present a detailed overview of the system and do not assume the correctness of media allegations and activism (which are essential in their own right). Instead, we analyze Aadhaar’s security and allegations against it based on an extensive study of available documentation.

Fig. 1.
figure 1

Flowchart of Aadhaar’s architecture. Yellow cells depict entry points into the enrollment (left) and authentication (right) ecosystems. Enrollment starts with the resident visiting the Enrollment Agency which uses an enrollment software provided by the Enrollment Service. The data is then sent to the Registrars for verification. If de-duplication succeeds, the data is stored in the CIDR and the user is enrolled. The authentication procedure starts with the Aadhaar holder’s information reaching the CIDR via AUA and ASA. The biometric data is captured by the authorization devices, sent to the CIDR through AUA and ASA. The response is sent back by the CIDR via the same route. (Color figure online)

3 Snapshot: Aadhaar System Design

Aadhaar has three primary components: (1) the Enrollment ecosystem, (2) the Authentication ecosystem, and (3) the CIDR (Central Identities Data Repository). Enrollment handles onboarding and assigning of unique identity numbers. Authentication provides verification services when residents want to prove their identity. CIDR is a database that stores the collected biometric and demographic data. We provide an overview of a typical resident’s interaction with the Aadhaar system and then discuss its usability and the three components. An overview of the entire architecture is available in Fig. 1.

Usability of Aadhaar. The entire process assumes significant privilege: that a resident can read and speak fluently, has a phone (for many services, a smartphone), access to the internet, etc. Also, during the COVID pandemic, many centers are either fully or partially shut down: simple tasks such as linking a mobile number to one’s Aadhaar for the first time have turned herculean. If one’s Aadhaar number is lost (e.g., loss of card), there is no way to recover it for someone without a mobile phone (or an unlinked phone). This can result in loss of welfare [7], and restoring the UID is incredibly difficult. On the other hand, there is no way to remove one’s data from the CIDR if the citizen wants/needs this (e.g., changing residency to another country). There are also on-ground issues like the prevalent use of the Aadhaar “card” or a photocopy as a visual proof of identity without biometric validation (e.g., at airports).

3.1 Enrollment Ecosystem

The Enrollment ecosystem (Fig. 2) handles onboarding of residents into Aadhaar with the objective of providing each resident with a unique ID (UID). It also handles updating of demographic and biometric details of existing UID holders. Residents enroll only once but may request updates. The ecosystem is designed to work offline to allow enrollment of residents from areas that lack connectivity. There are two major actors: Registrars and Enrollment Agencies (EAs). UIDAI appoints Registrars, and each Registrar appoints EAs under it.

Registrar: UIDAI partners with various ministries, banks, public sector organizations, and other agencies that interact with Indian residents [61, 66] to facilitate issuing Aadhaar numbers by enrolling residents and validating resident data during enrollment and updation. Registrars must take special measures to enroll women, children, persons with disabilities, unskilled workers, nomadic tribes, and people belonging to marginalized groups who cannot produce a valid Proof of Identity (PoI) and/or Proof of Address (PoA) [61]. “Introducers” are individuals (such as Registrar employees, members of local administrative and elected bodies, etc.) recognized by Registrars to confirm resident data without PoI or PoA. Registrars must follow protocols and standards prescribed by the UIDAI. They usually outsource these tasks to EAs. While they are responsible for the correct functioning of these EAs, there is no mention of Registrars having to inform UIDAI about the EAs. A Registrar uses a UIDAI developed Enrollment Client to enroll residents, and must follow the Demographic Data Standards and Verification Procedure (DDSVP) [43].

Security (Policy and Logs). The MoUs between Registrars and UIDAI specify that UIDAI periodically audits the Registrars and EAs (frequency not specified). Although the standard penalties are nowhere specified, if a Registrar fails to follow the security mandates, UIDAI will only make “reasonable attempts” [66] to discuss and resolve difficulties with the Registrar. Organizations have been penalized in the past: UIDAI terminated a Registrar’s contract citing “enormous number of complaints of corruption and enrollment process violations against Aadhaar Enrollment/Update Centres under CSC e-Gov” [37].

Enrollment Agency. Registrars employ third-party vendors called Enrollment Agencies (EA) to carry out enrollment services using tools and procedures [59] prescribed by the UIDAI. Sometimes, Registrars double up EAs instead of employing external EAs. For example, a bank may use its branches as EAs. In such cases, “Enrollment Agency” and “Enrollment Centre” become synonymous. As this is pervasive, we use these terms interchangeably in this paper. EAs are the on-ground functional arm of the Enrollment ecosystem and are responsible for providing operators and supervisors for each Enrollment Centre [60]. These Enrollment Operators (EOs) collect demographic and biometric data for enrollment or updation using UIDAI-approved equipment [53]. Before enrollment, EAs must verify the resident’s PoA and PoI documents and ensure that the details entered in the Aadhaar Enrollment Client match. This verification is done by duly appointed officers at the EA called Verifiers [62].

Fig. 2.
figure 2

Flowchart of the Aadhaar Enrollment Ecosystem. The resident’s data is captured by the Enrollment Client and sent via the SFTP client for de-duplication. After multiple validity checks, an Aadhaar identity is generated and a physical card is printed.

Security (Technical). Enrollment Equipment – UIDAI mandates Registrars to follow guidelines to set up the enrollment environment. Only certified equipment is allowed [49]. The Enrollment Client is equipped to work under “Indian conditions”, which we assume means low lighting, lack of internet connectivity, dusty environments, etc. [26]. Data Validation – The resident’s PoI and PoA documents are verified by the Verifier, and details are entered into the Enrollment Client by the EO, followed by biometric data capture and validation by the resident. Most onboarding happens offline—data is periodically synced with CIDR [53]. Operator Activity Tracking – Every EO using the Enrollment Client must sign each enrollment and update with their own biometrics. EO login involves a username, password, and the EO’s biometrics [53].

Security (Policy and Logs). When a Registrar hires an EA, the EOs working there need training and certification. The UIDAI provides a questionnaire [44] and a presentation to ensure basic training. The “Training, Testing and Certification” team designs lessons to ensure that EOs can recognize the necessary documents for the first check [65]. Periodically, “Mega Training and Certification Programs” [50] are organized to facilitate mass onboarding of operators when there is high demand. Refresher courses are also organized.

3.2 Authentication Ecosystem

The Authentication ecosystem (Fig. 3) provides paperless identity verification: Authentication – Uses an Aadhaar number and a one-time password (or biometrics) as a second factor to authenticate an individual. The CIDR returns a signed Yes/No [57]. e-KYC – identity verification via a signed and encrypted demographic record (name, age, address, etc.) from the CIDR.

Fig. 3.
figure 3

Flowchart of Aadhaar’s Authentication Ecosystem. We start at bottom right with a resident requesting a service. Aadhaar details are sent to the CIDR either through an AUA Server directly to the Production Server or via an ASA server. The CIDR then authenticates this information and returns the results via the same route.

AUAs and KUAs: A requesting entity is an agency that uses Aadhaar authentication and e-KYC facilities to provide services such as opening bank accounts, LPG connections, purchasing mobile SIMs, etc. [57]. There are two types of requesting entities [51, 52]: an Authentication User Agency (AUA) uses only the authentication service, while a Know-Your-Customer User Agency (KUA) also uses the e-KYC service. When serving an individual, an AUA submits their Aadhaar number and demographic/biometric information to the CIDR for authentication [27]. An AUA connects to the CIDR through an Authentication Service Agency (ASA), which owns a secure connection to the CIDR. In response, the AUA receives a digitally signed response from the CIDR. A sub-AUA uses Aadhaar authentication to enable its services by contracting the services of an AUA. A KUA, in addition to being an AUA, uses e-KYC authentication facility to retrieve a resident’s personal information from the CIDR. When an Aadhaar holder wants to submit their KYC details to a KUA, they download a copy of their e-KYC in XML or QR Code format from the Aadhaar website. This is encrypted with a “Share Code” set by the user. To verify the submitted file, a request is sent to CIDR through a KSA. The KUA receives a “digitally signed [machine readable XML] e-KYC authentication response with encrypted e-KYC data [58].” The KUA uses this copy of the holder’s KYC data retrieved from UIDAI to verify the offline copy the resident submitted. The encrypted XML file contains the resident name, download reference number, address, photo, gender, DoB/YoB, hash of mobile number, hash of email.

Security (Technical). Aadhaar numbers collected by an AUA/KUA are encrypted and stored locally in an “Aadhaar Data Vault” [13]. The encryption keys must be stored in a Hardware Security Module (HSM). The UIDAI does not mandate audits nor specifies repercussions if the vault stores plaintext. The implementation of the Data Vault is usually outsourced, and many third-party vendors [22] offer their own variants. An AUA/KUA can transmit biometric information over a network only after creating an encrypted Personal Identity Data (PID) block in accordance with UIDAI specifications [47]. The encrypted PID block cannot be stored except for buffered authentication (for up to 24 h, after which it must be deleted from local storage) [25]. AUA/KUAs send authentication and e-KYC requests to ASAs/KSAs (who relay them to the CIDR) via secure private lines or a secure channel (SSL, VPN) [42].

Security (Policy and Logs). Access to the application, audit logs, source code etc. is only given to authorized personnel [25]. The basis on which a person becomes authorized and the extent of access are unknown. AUAs/KUAs are required to maintain online logs of each authentication transaction for two years, for grievance and dispute redressal. After this, logs are archived offline for five more years and then deleted (unless required in a pending dispute). The logs record the Aadhaar number, auth request, CIDR’s response, information disclosed upon authentication, and the person’s consent for authentication [25, p. 12]. Logs do not store PID information. No encryption/safety standards are specified; we discuss the resultant privacy issues in Sect. 5.3. Aadhaar holders can self-generate Virtual IDs (VID) for privacy. VIDs are temporary, revocable 16-digit random numbers that are one-way mapped from the Aadhaar number [64]. This mapping should be secret and the Aadhaar number should not be recoverable from it. The algorithm used for generating VIDs is not specified.

AUAs/KUAs are required to ensure that their operations are audited, including information security controls and technical testing like vulnerability assessment, penetration tests, etc., especially for new technologies introduced [25]. This audit must be done by a recognised body (presumably government empanelled auditors [12]) annually and on a need basis [25, p. 46] or by UIDAI itself to ensure compliance. Although UIDAI states that only authorized personnel can access the audit trails, selection criteria and security policies are unspecified.

ASAs and KSAs: Authentication/KYC Service Agencies (ASAs/KSAs) are public and private agencies that have an “established secure leased line connectivity with the CIDR” [57] in accordance with UIDAI’s standards and specifications [25]. Only they can interact directly with the CIDR in the Authentication ecosystem. ASAs provide secure CIDR access to AUAs for authentication; KSAs are ASAs with additional e-KYC permissions and therefore serve KUAs. Hence, ASAs/KSAs act as enabling intermediaries between an AUA/KUA and the CIDR as shown in Figs. 1 and 3. There are 27 live ASAs/KSAs [56].

Security (Technical). Servers used by ASAs to connect to the CIDR must be located within India. ASA/KSA server host must be within a segregated network segment. It should be isolated from the rest of the network of the ASA/KSA. The ASA/KSA server host is solely dedicated to Aadhaar authentication. The PID block includes the keys generated by the ASAs/KSAs (sensitive and must never be stored). ASAs perform key generation, distribution, and storage.

Security (Policy and Logs). Access control, communication policies, log maintenance and expiration, and audit protocols are the same as those of AUAs/KUAs (Refer to Sect. 3.2). The logs can be accessed by UIDAI or the requesting entity solely for grievance and dispute redressal and contain the following information: identity of the requesting entity, parameters of authentication request submitted, and parameters received as authentication response.

3.3 CIDR (Central Identities Data Repository)

The Central Identities Data Repository (CIDR) is a centralized database that stores all Aadhaar numbers and corresponding demographic and biometric data. Maintained by UIDAI and distributed across multiple servers throughout India, CIDR is the core of Aadhaar and interacts with both the Enrollment and Authentication ecosystems. CIDR is also (indirectly) responsible for deduplication as deduplication servers access biometric data residing in the CIDR to check for matches before enrolling a new resident. Post-enrollment access to the CIDR comprises mainly authentication and e-KYC requests (see Sect. 3.2).

Security (Technical). Enrollment Client: The connection between the CIDR and the Enrollment Client is protected using SSL. The enrollment data (XML) is POSTed to the CIDR [26, 45]. To ensure only certified operators and Enrollment Clients connect to the CIDR, each time an operator logs into the client, an XML document containing the machine identifier, enrollment agency code, and station number is sent to the CIDR for validation. The CIDR then sends back a security token, which is used to send subsequent enrollment data. The XML document containing the enrollment data is sent in the form of packets to the CIDR, each of which is encrypted using a public key published by UIDAI, and signed by the sender (to avoid wasting resources on extracting packets without a valid signature [26]). This packet encryption phase is handled by the Client Security module of the Enrollment Client, which also stores certificates and manages keys. The key management uses public-key style encryption where two sets of public keys are maintained – one for data exchange between the Enrollment Client and the CIDR, and another for data exchange between the Registrar and the CIDR. The CIDR is classified as a Protected System under the IT Act, and the link between the CIDR and the Enrollment Client is encrypted using 2048 bit PKI. Deduplication: Deduplication at the billion scale has never been previously attempted [26]. For risk mitigation, UIDAI has three independent ABIS (Automatic Biometric Identification System) providers performing biometric deduplication. At enrollment, Aadhaar first does a demographic and reduced biometric check for matches. The Aadhaar enrollment server integrates the ABIS solutions using an ABIS API and dynamically allocates deduplication requests to the 3 ABIS servers. Then, ABIS deduplication servers are sent packages of size 3–5 MB. The enrollment packet (containing all demographic, biometric, and metadata) is encrypted at the client side and then sent to CIDR; the CIDR interacts with the ABIS servers and sends them these packages. Only the Enrollment Server (maintained by CIDR) can decrypt the enrollment packet. It does this in memory; the decrypted packet is never sent to storage. Original biometric data is archived and sent to offline storage and is not available on an online network. 2048-bit PKI is used throughout. See supplementary analysis Appendix C for more details. When a registered device is called, it captures, processes, and encodes the digitally signed biometric record. The biometric data received by the CIDR is essentially a Base-64 of the DSA signature of a hash (SHA-256) of the biometric data and a timestamp, device code, and device private key.

4 Security Landscape

We consider the security of different endpoints at which an individual’s data could be vulnerable and the steps Aadhaar takes to prevent any attacks.

4.1 Hardware Security and Certification

Biometric data is first collected during registration, and subsequently used to verify that individual’s identity. These biometric devices, therefore, are a critical component of Aadhaar. The official documentation [49] specifies two types of devices. Public Devices are biometric capture devices that can be attached to the Aadhaar application provided to AUA/Sub-AUA to capture Aadhaar compliant biometric data. The application then encrypts the data before authentication. Registered Devices (RD) have three key additional features over public devices. Each RD has a unique device identifier, biometric data is signed with the device key to ensure liveness and encrypted on-device rather than on the host application, and lastly, the RD service is certified regardless of the device provider. “RD service” refers to the process of capturing biometrics, signing them, and forming a personal identity data (PID) block before returning to the application.

Device Compliance Levels. The RD service is certified over two levels. Level 0 Compliance ensures that the implementation of signing and encryption of biometrics is within the software zone at host’s OS level. This includes ensuring that the associated private keys are not compromised through access via any external applications within the OS, and the biometric data can not be injected maliciously. Level 1 Compliance enhances security by ensuring that the signing and encryption take place within a Trusted Execution Environment (TEE). The private keys and the biometrics are stored in, and accessed via, the TEE.

Pre-certified Hardware: Any provider of an L1 compliant device needs to supply “Pre-certified” Hardware (PCH) and accompanying system software. This must protect against Hardware Cloning, Hardware Tampering (Physical, voltage, frequency, temperature attacks on crypto blocks), Differential Power analysis, Probing, Memory segregation of cryptographic operations, Cryptography implementation vulnerability, Attacks against Secure Boot and Secure Upgrade and TEE, and Secure processor OS attacks.

Certification: The agencies responsible for the certification are UIDAI and Standardization Testing and Quality Certification (STQC) Directorate (which is an attached office of the Ministry of Electronics and Information Technology). The certification process is exhaustive and combines testing over multiple, widely regarded industry and government standards like NIST’s FIPS [38] for the security of cryptographic modules, PCI PTS [29] and PED for physical and software tampering, GlobalPlatform certification for the TEE, and other dedicated hardware for L1, like secure boot, secure upgrade, etc. More details are available in [49]. UIDAI and STQC also check for tamper responsiveness: these devices can detect box-open tampering, chemical tampering, etc. and destroy sensitive data upon detection. However, a small part of hardware and system software is vendor self-certified. We were unable to find any reasoning for this; it is unclear how a vendor can verifiably self-certify a lack of backdoors!

4.2 Key Management and Device Registration

Each device provider must register and obtain a device provider ID via UIDAI. UIDAI then signs a public-key certificate procured by the device provider from a certificate authority(CA) licensed by the Govt. of India’s Controller of Certifying Authorities (CCA). These certificates are X.509 v3 compliant. Furthermore, the UIDAI policy specifies time periods after which device keys have to be rotated.

L1 compliant devices store their signing and encryption keys in PCH. There exists a hardware key-store in these devices. The certificate issued for the device, called the Chip Identity Certificate, is stored therein and must be non-clonable. The signing and encrypting key-pair generation and the cryptographic operations happen within this hardware key-store. However, L0 compliant devices have a software-based key-store provided by the OS. Common software security practices are specified and required for this key-store in [49]. All accesses to this key-store are logged. The private key is not extractable in any format, and the key-store is cleared and zeroed if the RD service is deleted. The key-store password is auto-generated using some random data, user credentials, and device identities of hardware like hard disk serial number, processor ID, and other device IDs. This key derivation is not public and obfuscated to prevent attacks. We note that this can be dangerous. Historically, security by obscurity has been a terrible idea [39], and has meant that bad security went uncriticized.

4.3 Biometric Deduplication and Locking

Since Aadhaar has the face, fingerprint, and iris biometrics for enrolled residents, it can combine these for de-duplication upon enrollment. With ten fingerprints and a facial image, a 95% de-duplication rate could be achieved over a population of 50 million. To increase the de-duplication rate to 99%, usage of iris biometrics was proposed. However, there is no documentation about the matching algorithms running at the ABIS and how well they perform. The accuracy listed above implies that authentication for valid Aadhaar numbers and corresponding residents might fail for a small fraction of requests. While UIDAI has not released any documentation about the de-duplication process, we discovered the following information from our interviews of Aadhaar personnel: The de-duplication problem is viewed and solved as a multi-class classification problem where there are as many classes as there are individuals in the Aadhaar database. Using deep learning techniques, the set consisting of Aadhaar IDs, ten fingerprints, iris and face biometric data is pre-processed before classification. Since this is a huge dataset, this process is optimized by reducing some features. If candidate duplicates are discovered, they are checked using some more features along with a combination of manual assistance. The biometric algorithms used were described as standard ones from the works of Jain et al. [21, 67]

5 Security, Privacy and Attacks

Defining “security” and “privacy” in the context of Aadhaar is nontrivial. It’s easy to provide stringent requirements, but those would almost certainly result in the exclusion of large sections of marginalized people in India, who may not have much documentation—precisely those we want to help. Many Indians also routinely use different spellings for their names (and other data) and may need to update the same without requiring a complicated court process (names in various Indian languages can be anglicized in multiple ways). Therefore, any realistic treatment of security (and attacks) cannot be too broad; we detail our Aadhaar-specific interpretations of the CIA (Confidentiality, Integrity, and Availability) information security triad in this section. We also explicitly list a variety of threat actors and their abilities (see supplementary analysis Appendix C).

Classifying Attacks. We use the CIA standard for information security. Any attack must violate one or more of: Confidentiality – Access to a resident’s data (demographic or biometric) collected at the time of enrollment or updation is granted only to authorized individuals within UIDAI and its partner organizations. Integrity – A resident’s information within the CIDR or during transmission is not modified or lost in an unauthorized manner. Availability – A resident’s data is available to authorized entities within UIDAI and its partner organizations when required.

5.1 Threat Actors

We conduct a threat actor analysis to identify possible threats as an individual’s data travels through the system. In the attached report in the Appendix C, we classify threat actors based on their capability, motivation, and damage caused and give low/medium/high ratings for each. The threat actors we identified are described below.

Rogue Enrollment Operator: The first barrier an individual’s information has to the central repository is the enrollment operator, which has the responsibility of asking the individual their information and verifying its authenticity. A rogue agent can possibly enroll the individual with faulty data or, worse, make a copy of their data and enroll a fake resident instead.

Rogue Agency Seeking AUA/ASA Services: AUA/ASA provide services to agencies seeking to become requesting entities for authentication. Aadhaar specifies the criteria for such agencies [46]. However, in some cases, the authentication devices are operator-assisted: a service might be provided without authentication or based on identity forgery. E.g., an operator at a cellular agency could authenticate twice by using Anita’s Aadhaar details (when she applies for a new SIM) and keep one connection for themselves.

Rogue Enrollment Agent: A rogue enrollment agent can help generate fake Aadhaar cards; in practice, there is little oversight in place.

Rogue UIDAI Official: The access privileges of a high-ranking UIDAI official, if misused, can result in identity theft, fake voter IDs, and more.

External Parties: Governments, IT companies, and curious residents could try to access confidential Aadhaar information for varying motives. The resources possessed by all these external parties can vary quite a bit.

5.2 Forbidden Attack: A Cryptographic Challenge

We describe a possible cryptographic attack on Aadhaar; note that carrying out such an attack would be illegal, as Aadhaar is classified as a “protected system” under Section 70 of the Indian IT Act, 2000 [1]. We reported this attack to UIDAI, which validated its correctness and ensured its mitigation.

Aadhaar’s API security document [54, p. 29] details that packaged biometrics are sent for authentication as a \(\textsf{Pid}\) (Personal Identity Data) element, which is a base-64 encoded block. Before base-64 encoding, the \(\textsf{Pid}\) blocks are encrypted with a dynamic session key using AES-256 symmetric algorithm, using the Galois Counter Mode (GCM). Refer Appendix A for details about GCM. One major issue discussed by Antoine Joux in his comments to NIST on GCM [8] is A forbidden attack with repeated IV. If an adversary sees two different messages encrypted with the same IV, it can inject malicious content into the communication channel. One such attack is demonstrated in detail by Böck et al. [10].

The document [54] describes exactly how Aadhaar instantiates AES GCM: “The last 12 bytes of the \(\textsf{ts}\) (string formatted date) is used as the IV or nonce.” The \(\textsf{ts}\) attribute (timestamp) is described as follows [54, p. 15]: “Timestamp at the time of capture of authentication input. This is in the format YYYY-MM-DDThh:mm:ss (derived from ISO 8601).”

The implementation available on the Github repo [24] and the old Aadhaar developer portal [48], and our interviews with Aadhaar officials confirm this timestamp format. So, suppose the timestamp is 2020-06-22T19:47:30. Then last 12 bytes are -22T19:47:30 and the string used as IV for AES GCM comprises just the day-of-month and the time. Trivially, the IV is reused if multiple messages are sent within the same second, or if messages are buffered or batched. Further, the IV -22T19:47:30 repeats at time 19:47:30 on the 22 date of each month, leading to monthly IV reuse. We describe this forbidden attack formally in Appendix A. Briefly: an adversary can exchange their invalid biometrics with valid data and still authenticate. (They cannot recover keys, but we want to protect the data, not just the keys.) Authentication requests can be altered over the channel due to IV reuse. As a consequence, a malicious party can open a bank account, fly domestically, get a SIM card, etc. —all in someone else’s name.

Benchmarking. Using data published by the Govt. of India [55], we estimate how many times AES-GCM is used for encrypting requests. One source of such requests is the Authentication API; the other is e-KYC, which also uses AES-GCM in the exact same way [45]. Between October 2016 to September 2019, 7.9 billion requests were made for e-KYC; on average, the IV was reused \(\sim 83\) times per second. Consequently, the malleability of the encrypted plaintext becomes a major security issue, and hence, all chosen ciphertext attacks become feasible.

Mitigation. The IV for AES-GCM is 96-bits (12 bytes) and we need to prevent IV reuse. Currently, the IV is of the form -22T19:47:30 (day-of-month and time). In this format, the IV takes \(< \sim 2^{22}\) different values (since the dates vary in range 1–31, range of hours is 0–23 and minutes and seconds are in range 0–59 each). Instead, if a simple counter is utilized, it would take values in the whole \(2^{96}\) range space (as IV length is 96 bits). However, the communication complexity of a synchronized task across 30 million devices is infeasible: maintaining it proved impractical and so the Aadhaar team decidedFootnote 4 to use timestamps as IVs due to the availability of this information across all devices. To mitigate this attack, all AES-GCM communications now occur over secure channels with unique session keys. This prevents the attack from being exploitable. Note that the UIDAI encrypts all communications and storage across Aadhaar. UIDAI policy is to use RSA [32] with 2048 bit keys for public key and AES with 256-bit keys for symmetric key encryption [47].

5.3 Privacy Issues

Aadhaar’s policy for logging requests and responses creates two issues. (1) the privacy of registered individuals in the event of a breach; and (2) the possibility of surveillance. The logs are rich spatio-temporal data on almost everyone in India. Obviously, a leak would be catastrophic if the data is not anonymized; but even “anonymized” spatio-temporal data can be used to uniquely identify a very large fraction of the individuals, as demonstrated by de Montjoye et al. [28]. Therefore, the use of virtual IDs (see Sect. 3.2) is essential. However, existing documentation is ambiguous as to whether virtual IDs are used by default for authentication requests. Further, while all communication of Aadhaar’s biometric templates is end-to-end encrypted, they remain vulnerable to social engineering attacks and the like at ECs; the privacy loss inherent in the storage of biometric templates for a national ID is beyond the scope of this work.

Non-KYC operations should not reveal anything beyond verification (yes/no). If an entity has knowledge of \(a_1, a_2, ... a_k\) columns of a person’s Aadhaar information, they should not be able to gain knowledge of the \(a_{k+1}^{th}\) column, including brute-forcing by checking against the same column multiple times (given someone’s name and phone number, an entity should not be able to query multiple times with different dates of birth). Services using aggregated data must be differentially private. The work of Wilson et al. [68] focuses on this approach and provides extensive theoretical and practical analysis. This gives a scalable method which is generic enough to apply to all national ID systems including Aadhaar. Extensive data logging for almost a decade means that such a system can very easily be used to track registered individuals. Differentially private (DP) anonymized logs can be used to protect against such tracking. While such logs and streams have been studied in some detail [16, 23], it remains to be seen if such proposals would be feasible at this scale (see Sect. 5.2). The closest (in scale) DP system is the recent work of the US Census Bureau [17] which shows that DP is not a one-size fits-all solution [19]. Aadhaar is meant to ensure the targeted delivery of benefits and services to Indian citizens. Verification of a resident’s existence to receive a service must not leak personally identifiable information. Another solution to mitigate privacy concerns is via brokered identification [11]. Here, a centralized hub mediates communication between an identity authority and a user with identity credentials. The US FCCX [2] and GOV.UK Verify proposed using this, but were unable to ensure all the properties required (see [11]). Using such a mechanism would mitigate the possibility of surveillance using Aadhaar authentication requests.

6 Media Allegations Analysis

Filtering Legitimate Breaches. Our primary database of media allegations consists of 36 reports from various news outlets. We filter breaches that are “legitimate” based on our knowledge of the Aadhaar infrastructure and our definitions of security and privacy. This yielded 17 legitimate security breaches and 10 privacy breaches, which were further analyzed. (Security and privacy breaches are not mutually exclusive.) Additionally, for each legitimate security breach, we ascertain whether or not there was a breach of Confidentiality, Integrity, or Availability of data in the Aadhaar infrastructure. (See Table C in supplementary material Appendix C.) According to our analysis, the prevalent breach is of confidentiality; this usually entails a subset of Aadhaar data being made public. Prevention goes back to ensuring that data is secured in encrypted “data vaults” and access is limited. Breach of integrity is also common. It compromises the quality of the central database. They typically occur at an individual level, involving a small set of rogue insider-agents or the hacking of individual accounts. This is easily detected if performed repeatedly, while for a specific use-case like introducing certain individuals into the database, the breach is virtually undetectable. OTP-based security, standardized punishments, and closing some known structural gaps could mitigate this. Breaches of availability are rare and occurs only in cases of insider attacks. The CIDR repository itself is reasonably secure, and removing/editing information is hard to do illegally. Internal attacks can be mitigated by using a decentralized system of checks and balances where no individual can commit edits [14]. For example, all operations by high-level employees could require approval by randomly chosen officers (anonymously).

Attack Analysis. We define three broad classes of attacks: (1) Server compromise: Hacking of the UIDAI or Partner software/database. (2) Infrastructural loopholes: Access via legitimate UIDAI channels. (3) Sub-par hardware: UIDAI hardware tricked into approving false biometrics as genuine due to flaws or backdoors. We analyze the feasibility of attacks based on the cost (time and resources used) and the effort required to protect against it. We then suggest mitigation strategies to ensure robust security. A detailed breakdown of our examination is provided in Appendix C. Aadhaar is predominantly vulnerable to “Infrastructure Loopholes.” These breaches exploit the general negligence to set or adhere to security protocols. As discussed, agents of Aadhaar, such as and especially EOs, can effectively be a threat to the security of the database if their credentials are not stored properly (multiple instances of this have occurred). This is a breach that is detected often, but measures taken to curb it are seemingly nonexistent. Complimentary and robust security standards like OTPs and Iris scans for these Aadhaar agents may be effective in ensuring accountability. The CIDR Database is secure and there exist no reports of it being hacked, but data in UIDAI’s partner organizations are regularly stored insecurely. We recommend that the UIDAI sets stricter standards and enforce them across the board. No one should store any Aadhaar data except the CIDR. Any queries to the database should go through the CIDR, and local copies should not be stored.

Privacy Breach Analysis. Listing the various allegations of Aadhaar privacy breaches, we find that limited access to the database and illegal or insecure storage of Aadhaar information are common. These are primarily due to improper or inefficient handling of data by UIDAI’s partner organizations. We summarize the number and type of privacy breaches in the attached supplementary material Appendix C. In either case, the pivotal issue is that an individual can be identified, resulting in the misuse of their data by malicious actors. This can include surveillance, profiling, or creating new services (without consent) by the state or other private actors. Most security breaches happen within the Enrollment Ecosystem; privacy breaches largely appear in the Authentication Ecosystem. For Aadhaar to be effective in the targeted delivery of subsidies, it needs to ensure that resident data is private beyond enrollment. If organizations require Aadhaar data to analyze aggregated trends, we strongly recommend differentially private systems be used.

7 Conclusion

We analyze Aadhaar, the world’s largest digital biometric identification system, and provide the first detailed, unified description of the infrastructure. We conclude that the framework does not have glaring security flaws of the kind suggested by media reports. Almost all the issues we found were due to a set of challenges unique to a system at Aadhaar’s scale. While we discussed mitigations for any flaws we found, we did not make any policy recommendations in this paper: if we had to make one, it would be for the system to be significantly more transparent and open-source. Throughout its lifetime, Aadhaar has been subject to multiple allegations that have made national headlines in India. We list, analyze, and classify these allegations to allow for a more balanced view of Aadhaar, identifying which ones are likely to be legitimate. (We note that most of the alleged attacks are now infeasible.)

We emphasize that our focus remained on the strengths and vulnerabilities of the technology, structure, and policy behind Aadhaar, and not issues with large-scale biometric ID schemes in general.