Inside India’s AI infrastructure leap as E2E Networks breaks global performance limits
At TechSparks 2025, Mohamed Imran K R explained why India needs sovereign, state-of-the-art infrastructure to scale generative AI for the world.
Most conversations about India's AI ambitions circle around the same talking points: foundation models, LLMs, dataset curation, and responsible AI. But there's a harder, less glamorous problem that kills most AI projects before they see the light of day: infrastructure. Not the kind you provision on a dashboard, but the kind that determines whether your proof-of-concept ever becomes a product anyone can actually use.
At TechSparks 2025, YourStory's flagship tech event, Mohamed Imran K R, Co-Founder and CTO of E2E Networks, took the stage for a keynote on state-of-the-art (SOTA) infrastructure for generative AI, and he was blunt about what's broken. "While there is a lot of interest around how generative AI models are deployed, how they are built, and how they are trained, unless you have very good infrastructure, it cannot be taken to the masses," he says.
Having spent 15 years in the cloud infrastructure trenches, Imran's diagnosis is sharp. The gap between a working demo and population-scale deployment? That's where most Indian AI startups die. Not because their models are weak, but because the infrastructure beneath them can't hold weight.
Why infrastructure is the true barrier to AI scale
Here's the problem most people miss. Everyone obsesses over model architecture, but models are useless if they can't train fast, can't deploy at scale, and can't serve millions of concurrent users without collapsing. Infrastructure is the moat. It's what separates a research paper from a business.
The journey from POC to product-market fit to global scale requires what Imran calls SOTA infrastructure—state-of-the-art systems that can handle the brutal demands of generative AI without burning through budgets or time.
This isn't just about renting GPUs. It's about network throughout, storage pipelines, utilization rates, security layers, orchestration, and a dozen other variables that, if misconfigured by even a small margin, can turn a 12-day training run into a 20-day money pit.
E2E Networks' breakthrough—beating global benchmarks
Here's where it gets interesting. E2E recently ran a Nickel test—a standard benchmark from Nvidia used to measure large-scale infrastructure performance, on a cluster of 1,024 H100 GPUs. The goal is to hit 400 gigabytes per second, which is notoriously difficult. Most assume 360 GB/sec is a strong result. E2E hit 380 GB/sec. That's not just good. That's better than most published benchmarks from infrastructure providers globally. But the more telling number is model FLOP utilization—a metric that measures how much of your GPU's compute capacity you're actually using versus how much is sitting idle.
Nvidia's best published utilization rate? 53.4%. E2E's? 54%.
This might sound like a marginal difference, but in the GPU world, idle compute is catastrophic. "If you are sitting at 20, 30% every month on a very large cluster, you are potentially losing thousands of dollars every hour," Imran points out.
At scale—think 1,000 or 2,000 GPU clusters—every percentage point of utilization lost translates to massive financial waste. The difference between 40% and 54% utilization isn't just performance. Its viability.
And unlike the CPU era, where 10-20% utilization was considered normal, GPUs cost an order of magnitude more per unit. You can't afford to let them sit idle. E2E's infrastructure ensures they don't.
The low-code, end-to-end AI platform
E2E has packaged all GPUs, InfiniBand networking, parallel file systems, orchestration, and security into a low-code platform that handles both training and inference.
Launching a 1,024-GPU cluster? Minutes. Traditionally, it takes days just to get the infrastructure configured correctly.
"All you have to do is just log in to the login node and start your training activity just like that," Imran explains. The platform includes distributed job schedulers, support for frameworks like PyTorch Lightning, and automated orchestration. Everything comes up ready.
Inference is handled with equal sophistication. The platform manages token throughput, time-to-first-token (TTFT), concurrent requests, and auto-scaling, all the variables that determine whether an LLM can serve real users or collapse under load. It supports modern inference engines and can scale alternatives to ChatGPT or Claude using open models like Meta's Llama, deployed at enterprise scale.
Critically, the platform bridges a gap Imran identifies between two emerging AI roles: model builders and model deployers. "These are two equal skill sets that you need to hire," he notes. You can have the world's best transformer architecture, but if you can't deploy it at scale, you have nothing. E2E's platform turns model builders into model deployers by abstracting away the infrastructure complexity.
Sovereignty, security, and India's AI mission
E2E is also a partner in the India AI Mission, providing infrastructure to some of the country's most ambitious foundation model developers selected by the government.
This isn't just a commercial partnership; it's strategic. "If your startup is doing something sensitive for the government or something, and if your data is not within the sovereign location, then you might be at risk of not being able to be in full control of your data," Imran says.
For use cases in government, defense, education, and other sensitive sectors, data sovereignty isn't optional. E2E offers infrastructure that's governed by Indian laws, handled by an Indian company, and physically located in India. No external dependencies. No foreign control.
The platform also includes enterprise-grade security features—access control, token authentication, rollback capabilities—because in AI, security failures are uniquely expensive.
A DDoS attack on a website? Annoying. An unauthenticated endpoint hammered with AI inference requests? "You can raise the data center temperature by a few degrees," notes Imran, because GPU servers are power-intensive at scale. The cost of a security lapse in AI infrastructure isn't just financial; it's operational.
Software optimization, hardware efficiency
E2E's performance gains aren't just about throwing hardware at problems. They've built strong partnerships with Indian hardware and software players, leveraging SDKs, operators, and optimized inference frameworks from Nvidia (like NIM) and others.
The result? Significant performance improvements from the same hardware through software-level optimizations. "You can get significant amounts of gain through the same hardware," Imran says. This approach—software-led efficiency—allows E2E to meet international performance standards without requiring cutting-edge hardware refreshes every quarter.
It's a sovereign-class platform, fully developed in India, meeting global benchmarks. "Our performance is globally benchmarked and essentially all the international standards in terms of performance for training, everything is met right here in India," he concludes.
India's AI builders can now build without limits
The significance here isn't just technical. It's structural. For years, Indian AI companies have had to choose between building locally with subpar infrastructure or relying on foreign cloud providers with all the sovereignty, latency, and cost issues that entail.
Imran ended the keynote with a simple idea. Software-driven efficiency is the real unlock for India’s AI future. With the right architecture, optimized pipelines, secure access controls, and a fully sovereign stack, India can train, deploy, and scale world-class AI models without depending on global infrastructure giants.
The message at TechSparks 2025 was clear. India’s AI builders do not need to wait for global infrastructure to catch up. The future of high-performance, accessible, sovereign AI infrastructure is already being built here.


