How a synthetic video of Barack Obama inspired this AI scientist to build a digital human-based talking video platform

Jyoti Joshi is the CEO and Co-founder of Kroop AI, a deeptech startup that has developed a sophisticated audio-visual deep learning-based platform “The Artiste”. It generates high-quality videos with digital avatars with just audio or text as input.

How a synthetic video of Barack Obama inspired this AI scientist to build a digital human-based talking video platform

Wednesday June 01, 2022,

6 min Read

Kroop AI was one of the five Indian startups to pitch at the recently held Cannes Next – an executive conference and innovation-driven business development platform, exploring the future of the entertainment sector.

Founded by Jyoti Joshi, Milan Chaudhuri, and Sarthak Gupta in February 2021, Kroop AI is a deeptech startup in the audio-visual content generation space.


Jyoti Joshi - CEO and Co-founder, Kroop AI

Of the Cannes experience, Jyoti says, “The energy at the India pavilion and the passion in fellow founders to showcase their ideas was amazing. This event provided me with a huge platform to network with content creators, film-makers, directors and metaverse experts. They were highly impressed with the product and tech we have to offer as it can significantly reduce cost and help in the production of content in various languages with proper lip-sync much faster.”

A first-generation entrepreneur, Jyoti is an AI scientist with a PhD from the University of Canberra, Australia. She worked for AI for mental health analysis, and put in stints at the University of Waterloo, Canada and Monash University, Australia. She returned to India last year and is currently based in Jalandhar, Punjab.

How a Barack Obama video led to a deeptech startup

In 2017, researchers at the University of Washington created a lip-synced video of former US President, Barack Obama, blending existing audio and footage. The program used AI to match the audio of a person speaking with realistic mouth shapes, and then graft it onto an existing video.

This video proved to be the pain point for starting Kroop AI.

Jyoti explains, “When we came across Barack Obama's synthetic video, we realised a large market opportunity. We started with a detection algorithm for finding manipulations to face and audio. The 'aha' movement was a realisation - to improve the quality of the detector we had to generate high-quality synthetic data. The synthetic data generated by our algorithms were of high quality! We all went back to the whiteboard and drew plans on introducing an ethical platform for synthetic generation for B2B.”

To tap into the opportunity of synthetic data generation, the team created high-quality voice cloning and voice text to speech (TTS), which nicely augmented the facial animation. A cloud-based UI was developed and was released as The Artiste AI Studio.

“Kroop AI’s sophisticated audio-visual deep learning-based platform The Artiste generates high-quality videos with digital avatars with just audio or text as input. The platform is geared towards creating data for the Metaverse. In keeping with the current trend of personalised marketing, Kroop AI helps in generating high-quality marketing and customer relation videos with its Studio,” Jyoti says.

Elaborating on the tool, she says, “Kroop AI provides accurate lip-sync while dubbing the content from one language to the other language. It has a huge impact on the entertainment and advertising industry. Many movies, when dubbed from one language to another, use voice over artists for the dubbing, however, does not provide lip-sync. This lack of lip-syncing not only affects the engagement and immersive experience of the user but also affects viewers with hearing disabilities. Our technology makes fixing lip-sync issues easy and fast. The same is also highly relevant in the advertising industry. Celebrities shoot the ad in one language, but if the content needs to be converted to a regional language for impact, we can do that with perfect lip-sync.”

Media and publication houses can also have audio-visual support in different languages. Kroop.AI also helps games become more interactive. With close to real-time lip-sync for the audio as input while playing, the gaming experience becomes far more interesting.

Reusing older content has become simple

Kroop AI’s tech is now used by both startups and large organisations to generate audio-visual content fast, using easy to access cloud-based studio and API. Kroop AI works with organisations in the Metaverse space by providing API for animating digital avatar characters’ faces with just text or audio as input.

Creators and studios on tight budgets can create high quality audio-visual with just text or audio as input using Kroop AI’s platform. It also helps organisations in re-using recorded old videos by generating new lip movements with both Kroop AI-generated voice or a recorded voice-over.

Jyoti points out that, with this, edtech companies do not need to record again for edits and updates of content. With Kroop AI, reusing older content with new information has become simple.

“With tech capabilities, there also comes the responsibility. Our platform, through its VizMantiz API, enables organisations to detect unethically generated synthetic media to prevent its misuse,” she adds.

Jyoti’s co-founders are Sarthak Gupta and Milan Chaudhari. Both graduated with a BTech in Computer Science and Engineering from IIT Ropar in 2019. She was introduced to them by her husband, who is a professor at IIT Ropar.

“We came up with the name Kroop AI as ‘K’ means the infinite numeric number & ‘Roop’ is the Hindi word used for various digital avatars, i.e. with Kroop AI, one can create infinite number of digital avatars,” she says.

Kroop AI offers clients both API and UI based access. The target audience is both national and international in entertainment and edtech. It uses a SaaS based revenue model and charges based on the number of minutes.

“Our aim is to revolutionise the content creation industry. With our studio, the content can be created 100X faster and 10X cheaper. Now the long process of dubbing a movie can be done easily with just a click of the mouse. Animating 2D and 3D characters is faster with The Artiste AI studio,” Jyoti adds.

Its biggest competition in this space is Synthesia AI, an AI video platform.

Kroop AI started as a bootstrapped company but has now raised $230,000 from a pre-seed round from 100X.VC, Lets Venture and angel investors.

“In 2022, we have seen unparallel interest and growth. This is attributed to both better understanding of artificial intelligence-based audio-visual content generation and growing interest in the Metaverse. We aim to augment the generation capabilities of The Artiste Studio and expand at both national and international levels. We plan to raise another round this year,” she says.

(Disclaimer: The article has been updated to correct the amount of funds raised in a pre-seed round)

Edited by Anju Narayanan