Columns Exploring Science

A CARE-driven approach to Indian genomics

Pragya Chaube

On 9 January 2025, Prime Minister Narendra Modi unveiled the Genome India Project, a groundbreaking initiative representing the sequencing of 10,000 genomes from 83 distinct population groups across India’s vast demographic landscape. This landmark achievement is a significant step toward a health genomics revolution in the country.

Pragya Column article title May
Genomics data. Photo Credit: NHGRI

Human health-focused genetics and genomics research relies heavily on harnessing genetic diversity to identify genetic variants that influence disease susceptibility or resilience. This knowledge enables accurate diagnosis, prognosis, the development of novel therapeutics, and the tailored application of precision medicine. Unfortunately, non-European ancestry groups are severely underrepresented in genomic databases globally. A 2022 study revealed that participants of European ancestry accounted for 86.3% of genome-wide association studies (GWAS) — a method for identifying genetic markers linked to diseases or traits — while South Asian populations contributed a mere 0.8%. Expanding representation in genomics databases is crucial to ensuring biomedical research and precision medicine are inclusive and beneficial for Indian populations.

The Genome India Project marks an important first step in this direction but remains far from comprehensive. Programs like the USA’s All of Us initiative aim to enroll at least 1 million participants from historically underrepresented populations, representing 0.3% of the national population. Similarly, the UK’s 100,000 Genomes Project represents 0.15% of the country’s diverse population. By comparison, Genome India currently covers only approximately 0.007% of the population, and 2% of India’s documented 4,600 population groups, leaving significant gaps in representation that must be addressed for the project to achieve its full potential.

The genomic data generated through the Genome India Project is now centrally housed at the Indian Biological Data Centre (IBDC) and is intended to be made accessible to researchers worldwide. However, India’s unique population dynamics and the absence of robust data protection laws necessitate a cautious approach. Moreover, genomics data differs from personal data in several critical ways. Furthermore, the global good data practices have laid down the CARE framework for ensuring that data is collected and used in a responsible, equitable, and transparent manner, particularly in the context of indigenous data sovereignty. This refers to the inherent right of the vulnerable communities, especially indigenous people, to control, govern, and even manage sharing of their own data. Therefore, governance frameworks and privacy guidelines for genomics data must be tailored to address these complexities, as well as encompass global standards for data sharing and governance. 

Genomics data – why should we be concerned?

Pragya Chaube 1st image
Havasupai tribe. Photo Credit: US Department of Interiors


Between 1990 and 1994, researchers from Arizona State University collected DNA samples from the Havasupai tribe to study type-II diabetes, a condition with high prevalence in the community. Approximately 400 members consented to participate. However, in 2003, a tribe member discovered the samples had been used in unrelated studies on migration, which contradicted the tribe’s oral history of their origins and posed potential legal risks to their land claims. Evidence also emerged that the samples were intended for mental health research without the tribe’s knowledge, raising concerns about perpetuating stigmatising stereotypes.

In 2004, the Havasupai tribe sued the Arizona State Regent, alleging misuse of their genetic data. Researchers argued that broad consent, allowing subsequent studies, protected them legally. However, this consent was obtained in English, a second language for the tribe, further questioning its validity. After a six-year legal battle, the court ruled in favour of the Havasupai in 2010, awarding monetary compensation and ordering the return of the DNA samples.

This landmark case underscored the complexities of genomic data governance, particularly for marginalised communities. Unlike personal data, genomic data carries both individual and collective implications. It reveals personal traits, disease risks, and ancestry while also containing information about relatives, ethnic groups, and populations, raising ethical questions about data ownership and consent.

Genomic data’s long-term relevance further complicates governance. Unlike other data, it remains biologically significant throughout a person’s life and across generations, necessitating robust safeguards for its storage and use. 

Moreover, genomic findings often intersect with cultural beliefs, ancestry, and identity, potentially conflicting with traditional narratives or impacting legal claims.

The misuse of genomic data can harm entire communities, leading to stigmatisation or exploitation, particularly for Indigenous populations. Informed consent poses additional challenges, as the potential future uses of data may be unknown at the time of collection. The collective nature of genomic data also means that decisions about one person’s data can affect others without their explicit consent.

Indian population structures – and what needs to be done?

The practices of consanguineous marriages and endogamy — marrying within one’s caste — has created a distinct genetic structure across India. This has resulted in limited gene flow between groups, effectively forming genetically endogamous” populations. Consequently, genomic data from a few individuals can reveal their community, making improper use of such data a potential source of harm or stigma for entire communities, particularly vulnerable indigenous groups. While including these populations in genomics research is essential to extend biomedical benefits, misuse of their data risks perpetuating stereotypes, undermining cultural identities, and causing long-term harm.

To mitigate these risks, a set of robust principles and governance models is required to ensure the ethical and equitable use of genomic data.

Privacy and long-term protection

Strong anonymisation protocols are crucial to safeguard individual, familial, and community privacy. Techniques like pseudonymisation and noise addition can obscure identities while preserving data utility. Privacy protections must be designed for long-term relevance, aligning with the enduring nature of genomic data. Strategies must account for decades, not just the initial years, of data security.

Informed consent and the right to withdraw

Informed consent is fundamental but must go beyond one-time, broad agreements. Consent should be a continuous, iterative process, and in local languages, particularly for new analyses or secondary use of existing data. 

Since genomic data implicates families and communities, consent mechanisms should extend beyond individuals. Additionally, individuals and communities must have the Right to Withdraw,” allowing for the complete erasure of their data at any point.

Community data governance

For indigenous and marginalised groups, community-driven governance is essential. Implementing the CARE Principles (Collective Benefit, Authority to Control, Responsibility, and Ethics) ensures respect for cultural values and data sovereignty. CARE prioritises community empowerment, cultural sensitivity, equitable benefit-sharing, and long-term trust.

Adopting CARE in Indian genomics programs

To integrate CARE into initiatives like Genome India:

  1. Engage communities at every research stage through Community-Based Participatory Research (CBPR).
  2. Ensure communities retain authority over their data.
  3. Develop mechanisms for equitable benefit-sharing, including healthcare and economic opportunities. These should particularly be given consideration when making this data available for research with private sectors or across borders.
  4. Establish ethical oversight bodies to enforce CARE adherence and suggest alignment of national biomedical research or genomic data policies with CARE principles.
  5. Implement federated data-sharing models, enabling regional centers and universities to locally manage data, with mentorship from institutions like IBDC or IGIB.

By embedding these principles, India can advance genomics research while safeguarding the rights and dignity of its diverse communities.

Conclusion

Indian genomics programs are at a nascent stage, offering policymakers a chance to embed ethical and equitable practices. By adopting participatory governance models and frameworks like the CARE Principles, these programs can empower communities, safeguard cultural values, and ensure sustainable outcomes. Stringent regulations are crucial to protect community interests and enable responsible data use. With the right strategies, India can advance healthcare, foster scientific innovation, and establish a globally recognised model for sustainable and inclusive genomics research.

Written By