Brands
Discover
Events
Newsletter
More

Follow Us

twitterfacebookinstagramyoutube
Yourstory

Brands

Resources

Stories

General

In-Depth

Announcement

Reports

News

Funding

Startup Sectors

Women in tech

Sportstech

Agritech

E-Commerce

Education

Lifestyle

Entertainment

Art & Culture

Travel & Leisure

Curtain Raiser

Wine and Food

YSTV

Meet the women from India's villages supporting the country's AI boom

India has forever been the data powerhouse feeding the tech giants of the world. With the recent upsurge in the demand for training AI models, thousands from rural areas are stepping in to stir up the Indian LLM buzz.

Meet the women from India's villages supporting the country's AI boom

Monday September 02, 2024 , 8 min Read

Sitting on the small steps of her home in a village of Odisha, 18-year old Chandrama smiles as she stares at her mobile phone, which is playing back her own voice spoken in Odia, her mother tongue. 

She lives in Raghurajpur, the home of Pattachitra painters. Despite the widespread fame of her village’s cultural heritage, many families, including hers, earn less than Rs 996 a month on average from the art form. 

However, Chandrama is an exception. She earns Rs 4,980 a week, just by reading text aloud in her native language. 

“My family is going through a lot of pain to pay for me to go to college. When I finish my studies, I want to return that favour. I want to support them. I hope others can also get assistance from such a project,” Chandrama tells YourStory. 

The project she is referring to is one among the several catalysing the AI boom in India.

While models like OpenAI’s ChatGPT and Google’s Gemini are undeniably well-known names by now, startups in India are going a step further to build more localised solutions to solve real-world problems. 

Bigger tech companies are relying on datasets consisting of voice and text, signalling a huge demand to build AI applications. But India in particular holds a very powerful asset: Indic languages. 

AI models built in Indic languages have never been more prominently in the spotlight than in recent years. Two years ago, Meta AI built a single AI model, NLLB-200, which translates across 200 different languages. 

But where is this data sourced from, and how can a nation like India, being home to over 1,600 languages and dialects, effectively leverage this asset? 

Empowering the rural workforce

Chandrama works for Bengaluru-based non-profit organisation Karya, which calls itself ‘the world’s first ethical data company’ and helps in creating datasets in native languages. By partnering with other NGOs, it identifies and trains workers from Tier II and Tier III cities. The startup claims to pay 20 times more than the minimum wage for data collection and annotation. 

Karya

Chandrama Swain from Raghurajpur, a heritage crafts village located in Puri district, Odisha.

This may not sound like much, but is a life changing amount for people like 19-year-old Moumita Saha Das, from a rural town near Kolkata. Each evening, as she leads her buffaloes to a nearby pond, Moumita pulls out her phone and continues her digitising work—a task that had become crucial to her daily life.

Everyday, she wakes up at 4 am to start her digitising tasks before heading to school. Evenings are spent managing both her rural duties and her digital tasks. 

"I currently don’t have a job. I was able to pay the bills for my home. This work takes less time and energy than manual work, and the pay is much better. I now feel that I am capable of doing any new work. I didn’t feel like that before,"Moumita tells YourStory

With the recent upsurge in hiring for roles focused on training AI models, thousands of people from rural India are stepping ahead to stir the next wave of AI revolution in the country. 

Both Chandrama and Moumita are part of this growing movement, making datasets in native language a crucial resource. 

“We [Karya] train workers in rural India to become subject matter experts. Our experts undertake a range of activities that include building multimodal Indic language speech datasets, performing human-in-the-loop tasks, mitigating bias and conducting culturally-sensitive LLM (Large Language Model) evaluations, and overall contributing to the development of AI models in range of regional languages across India,” Manu Chopra, CEO of Karya, tells YourStory

To join, the criteria is quite simple—workers must possess a smartphone and basic tech skills, with no requirement of advanced education. The initiative specifically targets those who are excluded from traditional job markets due to limited digital skills, socio-economic constraints, or geographic isolation. 

This year, Karya aims to create AI-based digital jobs for 100,000 people in India. But it isn’t the only player facilitating this boom in promoting rural employment

Homegrown AI startups

Another startup working towards bringing regional language compatibility to the Indian GenAI ecosystem is Sarvam AI, which raised $41 million just five months after inception. This Bengaluru-based startup has one goal: building large language models that support Indian languages. 

In a bid to support the needs of the Indian market, the firm recently launched several products including enterprise and open-source solutions, as part of its full-stack Generative AI (GenAI) platform. 

The products are voice-enabled and support ten Indian languages: Hindi, Tamil, Punjabi, Odia, Gujarati,Telugu, Malayalam, Marathi, Bengali, and Kannada. 

Census 2011

Elsewhere, AI startup Gan.AI has recently launched Myna-mini, a new text-to-speech (TTS) model which supports 22 official Indic languages and English. Its key features include diverse voices from different regions of India and support for code-mixed languages.

“We have a large in-house data collection and annotation team, and we also work with outsourced data collection agency partners who collect and record audio datasets for us across languages,”says Suvrat, Founder & CEO, Gan.AI. 

This kind of regional language support has use cases in several sectors, ranging from agriculture to healthcare. For example, E-Bhaasha Setu, a IIIT-Hyderabad incubated firm, provides support in translating patient consent forms and information sheets in Hindi, Malayalam, Telugu, among other languages. The platform, in turn, relies on several freelancers to train its models. 

AI and the Indian gig economy

Online labour platforms—sometimes referred to as online outsourcing, crowdwork, or gig platforms have enabled workers to engage with multiple clients on flexible schedules, rather than being fixated to a single full-time employer.

The Indian gig workforce is projected to grow to 23.5 million by FY30, up from 7.7 million in FY21, says a NASSCOMreport. 

In this, AI and automation skills (53.57%) will be the most critical for gig workers in the next five years, followed by advanced technical skills (21.43%) and sustainability practices (14.29%), says a report by Teamlease. 

This increasing demand for specialised AI roles is evident from the recent surge in freelance positions for “AI Tutors” or “GenAI Speech Trainers”, which have been listed on LinkedIn and other job portals. 

“Gig space, especially in the AI sector, is actually picking up. It is very important for platforms like us, to start adopting and bring those things to the forefront in terms of bringing employment to the workforce,” says Mahesh Kumar, Co-founder of online recruitment portal Gigin.

He believes that training AI models is not just one sort of opportunity, and there’s room for more. 

“AI related work can be categorised into four different roles. One is the AI-enabling services, which deal with data annotation. The second is validating the correctness of AI decisions, whether it’s vision images or summary of text. The third, is evaluating the outcome that AI gives,” he explains. 

The last one, is building the AI itself. These include mostly high-skilled workers, who build machine learning models.

Karya

Swarnalatha Nayak, from Raghurajpur, Odisha earned $60 in her first week at Karya.

In addition to boosting employment in rural areas, various Indian companies are also advancing the gig economy by offering opportunities in AI model training.

Jaideep Kewalramani, Head of employability and chief operating officer, TeamLease EdTech says that AI is driving growth, particularly for those unable to work traditional jobs or lacking formal education. TeamLease helps facilitate AI development in collaboration with other organisations.

“The space will end up giving employment, or self employment to nine crore people and collectively contribute about 1.25% to the overall GDP. It is a great opportunity for two distinct groups which typically the mainstream sector would have ignored,” he explains.

Literacy and technology access have long impeded individuals like Chandrama and Moumita from thriving in Indian society, with language barriers exacerbating the challenge.

With LLMs and voice translation, powered by a rural workforce, India's AI boom is on the verge of a breakthrough that could empower millions — by making use of an asset that is intrinsically theirs. 

However, data scarcity is still a significant hurdle, says Vinod Sankaranarayanan, Head of Digital Public Goods and Infrastructure, Thoughtworks, which has partnered with multiple organisations like Karya before.

“Building a robust AI model requires high-quality datasets, which were particularly scarce for many Indian languages. Additionally, the vast linguistic diversity of India, including numerous dialects and regional variations, posed challenges in ensuring accurate and contextually relevant translations,” says Sankaranarayanan. 

Karya

Source: Karya

While Karya’s core model is meant to only create a pathway for supplemental income through data work, Aakanksha Gulati, CEO of ACT, believes the impact on long-term skilling and upward financial mobility from such platforms, cannot be ignored.

"In a time when we’re seeing millions of people migrate to our already overcrowded cities for work opportunities, platforms like Karya can play a critical role in providing meaningful work-from-home livelihoods in rural India," Gulati proclaims. 

"As they financially contribute to the household, we’re finding that families associated with Karya tend to become more progressive. The women in turn are seen to gain more voice in the household, and most importantly, garner the confidence and agency to, over time, invest in their upskilling and graduate to more financially rewarding and long-term career opportunities," she added.

For now, these jobs are letting the rural workforce have their little wins. From supporting their families to helping further their children’s education, AI gig work is helping many people like Chandrama and Moumita fulfil personal milestones.

Ramola Didi, a former construction worker from Soda, Rajasthan shares her goals from this work. 

“With the money I earned [through Karya], I was able to pay my children’s school fees and even fulfil their wishes. If I get such an opportunity again in the future I will do the work very happily,” she says. 

(The article has been updated)


Edited by Jyoti Narayan