Who builds AI matters: why women must be moved from the margins to the centre
The same women whose hands are shaping AI models are largely absent from the data those models learn from.
In February, India hosted its most ambitious AI summit at scale to demonstrate its ambition and seriousness. Sovereign AI has been declared a national priority. However, the presence of women remains largely invisible in the rooms where these decisions are made, despite their significant contributions to AI.
Women make up only 22% of the global AI workforce and just 12% of AI researchers worldwide (UNESCO, 2022). In senior leadership, the share falls below 14 % (World Economic Forum, 2023). These numbers are not a pipeline problem that will fix itself. They determine which questions get asked at the design stage, which populations are treated as default users, and which trade-offs are considered acceptable.
Here is something that rarely gets said clearly: women are building AI. Mira Murati was CTO of OpenAI. Daniela Amodei is President and co-founder of Anthropic. Across labs and research institutions, women are doing frontier technical work. But they are almost never the faces of these organisations. Their names do not become shorthand for the field. That invisibility sends a signal. When young women cannot see themselves in the AI story, they do not imagine themselves in it.
India has its own version of this story, and it is more layered than most people realise. In villages across Jharkhand, Telangana, and Tamil Nadu, women from tribal and first-generation educated backgrounds are contributing to AI by doing the painstaking work of labelling images, annotating text, and correcting data, one tag at a time, thereby helping train models.
India supplies roughly half the world’s data annotation workforce, and women make up the majority of these workers (NASSCOM; Scry AI, 2024). One annotator in Jharkhand tends crops by day and labels data by night. Another in Tamil Nadu, with a master’s in mathematics, labels road scenes for autonomous vehicles from a small-town office and told AFP: “Being here in my hometown and learning about AI makes me feel very proud.”
Here is the paradox that should bother us. The same women whose hands are shaping these models are largely absent from the data those models learn from. They speak in dialects that are not in any training corpus. They access services through interfaces not designed for them. They describe health symptoms, crop failures, and loan needs in languages and registers that global AI has never been trained to understand. They are inside the machine and yet invisible to it.
That invisibility is not accidental. The training sets powering most large AI models were built primarily from English-language internet content generated by a narrow demographic slice. Women’s speech patterns, healthcare needs, agricultural knowledge, and social contexts are systematically absent from what these systems are taught to recognise as normal.
For India, the gap is still wider. A woman in rural Rajasthan accessing a government scheme through a voice interface, a Dalit woman entrepreneur in Bihar applying for a loan, a tribal farmer’s wife in Odisha describing a crop disease in her dialect. None of these voices shaped the systems that will increasingly shape their lives.
At the same time, India is showing that a different path is possible and that it is not just the right thing to do. It is a strategic advantage. Sarvam AI, selected under the IndiaAI Mission, unveiled two foundational models at the summit, trained on 22 Indian languages, optimised for voice, and built to run in low-bandwidth environments. Gnani.ai launched Vachana, a voice cloning system supporting 12 Indian languages. The Tata AI Sakhi programme brought 1,553 women artisans and entrepreneurs from Jharkhand, Bihar, Odisha, Rajasthan, and Gujarat to the summit itself, where they were hands-on with AI tools.
What these initiatives have in common is that they treat the Indian context as a feature rather than a constraint. A model that works in Bhojpuri and Gondi, that understands how a rural woman describes a medical symptom or a crop disease, that functions on a basic smartphone with patchy connectivity, serves a billion people that no Western AI currently reaches. India does not need to out-spend the US or China on AI. It needs to out-contextualise them.
Women, who represent half of that underserved billion, are not a demographic to include as an afterthought. They are the market, the use case, and the design requirement.
What India now needs is to act on that proof. Four things, specifically.
First, Indian AI datasets must reflect Indian reality, disaggregated by gender, language, geography, and income. The women doing annotation work in smaller towns are a national asset. Make their knowledge visible in the training data, not just their labour invisible in the supply chain.
Second, any AI system used in healthcare, credit, welfare, or employment must pass a mandatory impact assessment before deployment and an independent audit after. If it is not inclusive and accessible across gender and language groups, it does not scale.
Third, the smartphone or digital access gap is an AI infrastructure. Only 35% of Indian women own smartphones, compared with 55% of men. That gap determines who is in the training data and whose needs future models learn from. Closing it is not a social programme. It is a prerequisite.
Fourth, the funding pipeline needs a direct fix. Women-led startups raised less than 9% of all startup capital in 2024, despite accounting for 7.5% of all active startups (Tracxn; DealStreetAsia, 2025). Nearly half of India’s science PhD students are women, yet fewer than 20% become working scientists (Indian Academy of Sciences). The India AI Mission needs an explicit mandate: fund women-led AI research and publish the gender breakdown of recipients.
Like Lakhpati didi or Drone didi, the government can support by having an AI didi.
India is early enough in building its AI stack to make deliberate choices. The women annotating data in Jharkhand, the voice engineers at Gnani, the artisans at the Tata AI Sakhi programme, they are already part of this story. The task now is to move them from the margins to the centre.
Who builds AI is not a diversity question. It is a design question, and the answer lies at the intersection of policy, infrastructure, datasets, talent, and capital. Right now, women are missing from all five.
(Sarika Bhattacharyya is VP – Institutional Advancement, Plaksha University and Prof Rajesh Sharma is Program Chair – CS & AI at Plaksha University)
(Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the views of YourStory.)

