NeuralGarage bets on generative AI to make dubbed shows appear more natural

Bengaluru-based deep tech startup NeuralGarage, which is backed by Exfinity Ventures and angel investor Amit Patni, seeks to reduce audio-visual dissonance in dubbed content with its generative AI product VisualDub. The startup’s clients include Amazon India, Microsoft, Hippo Video, and Pixis.

Monday July 31, 2023 , 6 min Read

What if Spanish actors could speak Tamil with natural ease? Or at least appear to speak Tamil naturally?

This is possible with the intervention of generative AI, say the founders of Bengaluru-based deep tech startup NeuralGarage.

Today, content owners and distributors dub content in several languages to reach a wider audience. However, the dubbed content doesn’t offer a cohesive viewing experience as the lip and jaw movements of the actors do not match the words coming out of their mouths.

Take the example of a Spanish show dubbed in Tamil. The audio is in Tamil, but the actors on the screen still look like they are speaking Spanish. More often than not, viewers find this mismatch irksome and may eventually lose interest in the show.

Get connected to NeuralGarage ys-connect

This is the problem that NeuralGarage aims to address with its flagship product VisualDub.

The genesis of VisualDub started with a personal experience.

Anjan Banerjee, one of the founders of NeuralGarage, is an avid fan of Korean shows and movies. While watching the Korean movie Train to Busan, dubbed in English, he experience disconnect as the dubbed audio did not synchronise with the facial movements of the actors.

This bothered him, as it prevented him from fully immersing himself in the stories and appreciating their visual aspects.

“I have a great fondness for Korean films, and this was clearly at its peak during the lockdown. But I had problems with the visual dissonance due to the lack of audio-video synchronisation. This sparked the idea of whether this could be solved at a time when I was fully immersed in my work on generative networks,” says Banerjee, Chief Product Officer at NeuralGarage.

Get connected to NeuralGarage ys-connect

He wondered if advancements in artificial intelligence could help address the issue and decided to embark upon research to explore the possibilities of this technology.

Driven by his curiosity and desire to bridge the gap between audio and video, Banerjee began studying the potential of AI along with his batchmates from IIT Kanpur, Subhabrata Debnath and Subhashish Saha.

As the trio began building the VisualDub technology to address audio-visual dissonance, they also reached out to Mandar Natekar, a media and entertainment veteran, for advice and mentorship, which paved the way for the birth of NeuralGarage.

Birth of Neural Garage

In 2015, Debnath, Banerjee, and Saha founded Visage Map, a facial recognition startup, which was later acquired by FaceFirst, a US-based facial tech company. They quit FaceFirst in 2021 and started working on the VisualDub technology.

The same year, they founded the deep tech startup NeuralGarage with Natekar, who has a rich experience of 20 years working with companies such as Viacom18, Times Television Network, Turner International, Reliance Entertainment, and Sony.

Eliminating audio-visual discord

VisualDub runs on proprietary algorithms that map phonemes, the lowest bit of human sound, with visemes, the corresponding lip shapes. These are unique mappings that are universally true for every language in the world.

Visual dissonance happens when the audio cues and visual cues are not in sync. VisualDub’s proprietary generative AI tech removes the discord that’s apparent in dubbed content by syncing the jaw and lip movements of actors with the words being spoken.

Generative AI transforms facial parts using audio activations, blending them with the rest of the scene. The lip movements are tweaked to match syllables, and the jaw and chin movements and smile lines are harmonised with this to make the dubbed content visually realistic and natural. This technology is person- and language-agnostic.

“Removing visual dissonance makes the dubbed content look authentic and local, which helps viewers and consumers connect more with the content,” says Natekar.

The synchronisation solution doesn’t interfere with the actual dubbing process. A technology layer is added on top of the dubbed content, he adds.

NeuralGarage offers this technology through API integration, SaaS, and desktop software. The beta version of the software was released two months ago.

The startup uses Amazon Web Services for client delivery and to ensure security and privacy. Additionally, it leverages complex AI and computer vision algorithms to improve content consumption, delivery, and creation.

The technology has been tested in more than 30 languages across the world, including many Indian languages and international languages such as Italian, German, Spanish, Japanese, Korean, and Mandarin.

Business and growth

Recently, Amazon’s ad campaign with actor Manoj Bajpayee was shot in Hindi and dubbed in Tamil, Telugu, Kannada, Malayalam, Bengali, Gujarati, and Marathi.

VisualDub was then used to lip-sync the creative in the dubbed languages to give the feeling that the creative was actually shot in these many languages, thus creating an authentic connection with the consumer, explains Natekar.

NeuralGarage generates business from verticals such as advertising, influencer marketing, content creation, OTT, and films. Film and edtech content account for more than 90% of its revenues. All these projects are under process.

The startup has 10 clients including Amazon India, Microsoft, Hippo Video, and Pixis.