[Tech30] This startup creates customised audio content through voice cloning

Founded in 2018, startup Deepsync Technologies uses AI to learn how you speak and saves hours of recordings, which then helps it to create audio content.

The time taken to produce an audiobook is a minimum of a month. In some cases, this can stretch up to two months. With the digital publishing and ebooks segment holding huge potential looking to the future, consider this – a technology that can reduce the time taken for an audiobook by 90 percent?

Bengaluru-based technology startup – Deepsync Technologies has embarked on this journey to leverage AI to create audio content through voice cloning. Not a robotic staccato. But a real human voice.

The voice cloning technology market estimated to touch $1.74 billion by 2023 but the audio based content market is a multi-billion dollar industry. Ishan Sharma and Rishikesh Kumar founded the startup in December 2018 to address this large market, and to capitalise on this unicorn segment.

On the importance of the technology, Ishan stresses, “We need AI to produce top quality audio content to match the need for billions of people. They all want high quality and a personalised experience.”


The two founders of Deepsync e-met on Github. Both were working on a similar project on voice cloning, and after a few interactions, they realised their mutual interest - to build a similar artificial intelligence (AI) startup that created content.

Deepsync Technologies: Ishan Sharma (left) and Rishikesh Kumar

Ishan believes that there is a need for greater adoption of audio based content with personalised experiences and that is where Deepsync bridges the gap. On a practical level, currently, the coordination of multiple sets of people which include voice artists, production houses, editing, post-production, was not feasible monetarily, and Deepsync voice cloning helps save time, money and effort.

The technology takes different languages, genres and other nuances and infuses it into the voice.

This startup has recently made its technology available public and allows for short voice cloning. Deepsync’s technology platform leverages AI to clone a voice. This speeds up the process of production as voices are replicated, unlike the current manually-done process.

Ishan likens the technology to the revolution that the printing press brought to society when it was first introduced, making books available to everyone.

Starting up and cloning voices

Since the launch of the startup in December 2018, Deepsync has been busy building its technology platform. A month ago, they made it available for the public. Though the proof of the pudding lies in how it would actually work in the real world.

Deepsync was engaged by an edutech company to create audio content and that was where it developed the proof of concept with 97 percent accuracy. “This was a Indian edutech company but the proof of concept was in Indonesian language, but it worked. It is a testament that our platform can work for all languages,” says Ishan.

The founder of Deepsync says the output of their voice quality is like that of a studio. More importantly, it saves a considerable amount of time and money.

This startup which has received certain pre-seed stage investment from a Hong Kong-based incubator, and was also part of the Lightspeed Extreme Entrepreneurs programme, plans to take a deep dive into this segment.

Ready to contentify

“In the next year, we plan to do about 500 hours of audio production every month. This can be scaled up as we go along,” says Ishan.

As they are still in the initial stages, the duration of the voice cloning is still minimal running into minutes.

Deepsync has started with the English language with a focus on audiobooks and podcasting. It has also started testing in Hindi, and gotten around 90 percent accuracy.

The team plans to have full house studio model to partner with other important players in this setup, like voice artists, production heads, etc. “The quality of our voice technology will be superior, and the narrative will be very human in terms of genre, style of speaking, etc,” says Ishan.

In terms of a business model, Deepsync is open to both licensing and royalty models depending on the requirement.

“The starting point for us is narrative audio, and it will certainly improve over time. We have started with a small segment and are gradually building the appetite (for more),” says Ishan.

Deepsync also plans to add new features to its technology platforms though Ishan is very certain that they are not taking away human jobs, only augmenting efforts.

The founder of Deepsync believes that in Indian landscape there is no company which is focused on this segment, which gives them the head start.

Ishan cites eBooks as an example, explaining how the cost of production lessens which could be a proliferation of the written word with a humanistic narration through multiple languages to reach a wider audience. The journey of Deepsync has just begun.

(Edited by Suruchi Kapur Gomes)

YourStory’s Tech30 companies list is an annual selection of 30 carefully curated and disruptive tech-based startups that we believe will shape the new narrative for India and the world. To get a complete overview of the 2019 Tech30 companies list, download the Tech30 Report here.