Meet Aman Chadha, the AI Expert Building Indic LLMs in Partnership with IITs
Explore the synergy of AI and healthcare through the development of Indic LLMs, opening doors to accessible and culturally inclusive medical AI
The development of Indic Large Language Models (LLMs) in collaboration with Indian Institutes of Technology (IITs) marks a significant advancement in AI technology tailored to Indian languages and contexts. Leading this groundbreaking initiative is Aman Chadha, a Stanford University alumnus and head of the generative AI research team at AWS. Chadha's primary focus is on building India's first medical LLM that supports Hindi and various other Indic languages. This project, in partnership with IIT Patna, aims to address the gap in medical AI models that cater specifically to Indic languages, a space currently dominated by models like Google's MedPaLM which primarily focus on English.
Chadha's approach involves using Open Hathi as the base LLM, which is then fine-tuned on medical data in Indic languages. This customisation process is challenging due to the complexity of medical jargon. The project also emphasises the importance of patient privacy and focuses on using anonymised data. A major hurdle in this development is the scarcity of computational resources in India, which is crucial for training these advanced models. Despite these challenges, Chadha remains optimistic, believing that constraints can foster innovative solutions.
The significance of this project extends beyond the medical field. It reflects the broader movement towards creating AI models that are culturally and linguistically inclusive. This inclusivity is critical for preserving diverse languages and making digital services more accessible in local languages. Last year, the Indian government launched the Bhasini project, aiming to develop technologies for translating content across Indian languages and crowd-sourcing voice datasets to enhance digital service accessibility.
Educational institutions like IISc and IIT Madras, as well as companies like Microsoft, are also involved in building datasets for Indic languages. However, challenges remain in terms of data scarcity and fragmentation, especially for languages other than Hindi. Tech Mahindra, for instance, is actively working on Project Indus, sourcing information from various platforms to build datasets for these languages. They are also addressing potential biases in AI models by using a combination of human annotation and automatic techniques.
The development of Indic LLMs is a step towards democratising AI technology, making it more relevant and accessible to a broader section of the Indian population. It also highlights the need for more government funding and support in building AI ecosystems that cater to diverse linguistic and cultural backgrounds.