Avalanche

Experts call for fast, low latency compute infrastructure to unlock India’s AI future

Indian companies are discovering that low latency, and not larger models, is now the frontier of AI. This roundtable, featuring Groq alongside leaders from fintech, commerce, mobility and FMCG companies, looked at what it will take to close that gap.

Friday December 19, 2025 , 5 min Read

Over the past year, AI has seeped into nearly every corner of Indian enterprise. But as adoption widens, the industry is confronting a harder truth where the next leap forward won’t come from smarter models, but from faster ones.

Whether in customer service, commerce, education, mobility, or fintech, companies now need AI that can respond instantly, operate across dozens of languages, and remain economically viable at national scale. Real time has become the new frontier.

This inflection point formed the backdrop for an exclusive closed-door roundtable at TechSparks 2025 titled ‘Faster than Now: Building India’s Real-Time AI Future’. Moderated by Chinmaya Saxena, Partner at BeeNext, the discussion convened technology leaders, including Ashish Dhagat (Hector Beverages), Ashish Jain (Lead School), Vaibhav Magon (Acko), Sarat Buddhiraju (BigBasket), Gaurav Jain (InCred Finance), Aniruddh Jain (Cars24), Nikhil Bhat (M2P), Ajit Kumar (BharatPe), and Narayan Babu (Zeta), along with Groq’s Scott Albin and Mehul Patel. Together, they explored what it would take to operationalize real-time AI across India’s most demanding sectors.

The demand for faster, smarter, multilingual AI

A recurring theme throughout the conversation was the rise of voice as India’s primary interface. With users across the country preferring calls, vernacular interaction, and conversational experiences, real-time responsiveness becomes essential. Participants noted that even a half-second pause can make a bot feel unnatural, while a one- or two-second delay typically leads to immediate call drops. Voice bots have grown sophisticated enough that most users cannot distinguish them from human agents, but only when latency stays imperceptibly low. That fragility makes speed the defining factor for voice-led AI in India.

India’s linguistic diversity adds another layer of complexity. While Hindi and a few northern languages perform reasonably well on existing models, accuracy dips sharply for southern and eastern languages, and even more so for dialects spoken in smaller towns and rural markets. This impacts everything from loan servicing to education support, where language precision directly shapes trust, outcomes, or completion rates. Solving for latency without solving for language, several participants noted, would leave real-time AI half-built.

Another emerging frontier is intent-based search. Ecommerce and quick-commerce businesses are seeing a rise in long-tail queries such as “healthy snacks for office”, “gifts for a diabetic father”, “ingredients for Pongal” that require contextual understanding rather than simple keyword mapping. Large language models (LLM) can decode these intents effectively, but the economics don’t work. With search volumes running into thousands per second, routing even a fraction of them to an LLM means hundreds of thousands of tokens per second. Coupled with multi-second inference times, the approach becomes unviable in production. The group agreed that only ultra-low-latency, low-cost models can make LLM-powered search mainstream in India

Balancing regulation, risk, and real-time expectations

For fintech players, the most pressing constraint is data residency and compliance. Personal and transactional data cannot leave India, not even momentarily for inference. As India strengthens data protection norms, banks and NBFCs must assume even tighter interpretation of cross-border data flows. This limits adoption of AI in areas like credit decisioning, collections, fraud detection, and merchant servicing, regardless of model performance. A shared view emerged that the future of real-time fintech AI depends on secure, India-based sovereign compute that meets regulatory requirements while delivering near-instant responses.

While speed is important, participants pointed out that not all AI must be real time. Intelligent document processing, underwriting summaries, and KYC validation can remain near-time or batch-driven without sacrificing value. But each company identified at least one use case in the form of voice negotiations, personalised recommendations, real-time nudges for field agents, instant quality checks, where sub-second AI would directly boost revenue, reduce risk, or improve customer satisfaction. This real-time versus near-time versus batch distinction is now shaping enterprise architecture decisions.

Infrastructure: The missing link

A live demonstration by Groq brought the infrastructure debate into sharper focus. Using its LPU-based architecture, the demo showcased rapid time-to-first-token, sub-200 millisecond responses, and significant cost advantages over GPU-based inference. Discussions of 5x speed gains and as much as 90% cost reductions at scale generated palpable interest across sectors where unit economics determine feasibility. But beyond raw performance, participants highlighted predictability as the true differentiator; consistent latency, stable output formats, and the ability to run custom or fine-tuned models without constantly reworking internal systems.

What India needs to unlock real-time AI

By the end of the session, the group converged on a shared blueprint for India’s real-time AI evolution:

• Latency must drop below perceptible thresholds, especially for voice-first experiences.

• Token economics must suit high-volume Indian use cases, making smaller, fine-tuned models increasingly critical.

• Language support must expand meaningfully, covering more Indian languages and dialects.

• Data residency must be solved through robust local computation and compliant deployments.

• Real-time AI must deliver business value, not just automation, whether through better conversions, fraud reduction, faster decisions, or superior user experience.

Despite the hurdles, participants agreed that India’s complexity makes it the ideal testing ground for real-time AI. If technology can deliver high-speed, multilingual, low-cost AI at India’s scale, it can perform anywhere. The conversation closed with a broad agreement that India’s next wave of AI progress will depend on how quickly real-time systems mature.

(Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the views of YourStory.)

Advertise with us