[App Fridays] Meet Otter, an AI-powered app that converts voice to text in real time
Otter is an app for the future. Tech experts say its voice recognition technology is better than that of Amazon Alexa, Google Assistant, and Apple Siri.
Otter is a new-age, AI-powered app that generates text transcripts of voice conversations. It debuted at the Mobile World Congress (MWC) in Barcelona in February. Otter maker — California-based deep tech startup AISense that consists of former Google engineers — enabled MWC delegates to record, transcribe and share their meetings and pitches seamlessly, and in real-time.
Since then, Otter has made it to possibly every list of ‘Best apps of 2018’. It is one of the top-productivity apps out there. Sections of the tech media have called it an “AI breakthrough” that even tops the voice recognition work being done by giants Amazon Alexa, Google Assistant and Apple Siri.
Even though voice is the current obsession of global tech companies, smart and accurate ‘voice-to-text’ conversions have been hard to come by. (Sure, IBM offers speech-to-text services, but those are achieved through complicated supercomputers.) Otter is out to correct that, and make smart voice notes an everyday convenience for professionals.
Its speech recognition technology can identify multiple voices in a conversation. The app records and transcribes voices, and spits out text transcripts in near real-time (only a 2-3 seconds delay is experienced), automatically adds keywords that reflect what the meeting was all about, and makes any part of the conversation searchable and shareable, instantly.
Otter’s inbuilt AI engine, Machine Learning algorithms, and Natural Language Processing (NLP) software get smarter with more usage, and can recognise a multitude of voices without error. Of course, you may not get a flawless transcript, but you can listen to parts of the audio later, and clean up the required text. The free version of the app allows you to search for conversations that happened in the past 90 days.
At MWC, AISense Founder-CEO Sam Liang said,
“Otter empowers the user to use AI for everyday conversations, so they can focus on what is being said and forget about taking notes. Our technology is quite different. We call it ‘Ambient Voice Intelligence’ and we use the word ambient to indicate that this is working in the background. Your brain can only remember 10-20 percent of the information (from a meeting)... So, we thought we can help people capture that information and then search for it really fast.”
Otter is a free app available on Android and iOS. However, there is a paid premium version too. While the free version offers 600 minutes (10 hours) of transcription, if you upgrade the app, it comes with 6,000 minutes (100 hours) of recording time. There is a customised offering for enterprises too.
Let’s take a look at the app.
You start by creating an Otter account or signing up with Gmail.
Otter onboards you by asking you to record your sample voice to help its AI algorithms recognise it. You can skip this step too.
The homepage displays your dashboard that gives a count of the minutes used up. 600 free minutes are available every month. The limit is reset after 31 days, which means even if you exhaust the free minutes before that time period, you don’t get any added minutes.
All the time limits can be bypassed by switching to the paid version. In India, Otter Premium is priced at Rs 650 per month. If you’re a journalist who is continuously recording and transcribing interviews, an annual subscription (Rs 5,300) makes sense.
The dashboard lists all recorded conversations along with date and timestamps, and duration of the clip.
When you click on a conversation, it takes you to the transcript. This is the busiest and most important page of the app. The search bar is right on top, where you can look for any part of the transcript.
Below that are 10-20 keywords that indicate what the conversation was all about. These are generated by the app’s Machine Learning (ML) algorithms. You can click on any keyword to jump to a certain part of the transcript.
The playback bar is at the bottom of the screen. You can slow down the audio if you have to clean up the text. Otter syncs the audio with the text during playback. So, you can tap on any highlighted word to hear it distinctly. Or, you can simply scroll down the transcript without playing the audio.
The strange thing about Otter is that it breaks dialogue into multiple lines. Sometimes, the transcript is broken even mid-sentence, punctuation goes missing, and one or many words might be misinterpreted. So, ‘place’ becomes ‘face’, ‘five’ becomes ‘fine’, and Colaba (a place in Mumbai) becomes Columbia.
If there’s any drawback in this well-intentioned app, it is this - and a major one at that. Otter is said to be working out an improvement. And of course, the AI engine gets smarter with more entries.
The app also allows you to export audio and/or text to other locations like mail, WhatsApp, and social media apps etc.
You can also share meetings with groups and collaborators within or outside Otter.
Conversations can be deleted too. Otter says that if a user deletes any transcript, it is permanently deleted from their servers too. This is to ensure greater data privacy.
In Settings, you can select your recording preferences. The app can even record audio from a nearby Bluetooth-connected device. You can choose to stream audio through WiFi or mobile data. Otter also allows you to connect more than one Google account simultaneously.
So, should you Otter?
Both yes and no.
Yes, if you are a journalist or a student who has to record and make sense of long hours of interviews, lectures, and so on almost every day of your life.
Also yes, if you are a business professional who has to continually share minutes of meetings with groups who may or may not be present. This app is a smart note-taker and voice recorder rolled into one that really comes in handy in such situations.
No, if you are expecting cent-percent accuracy in transcripts and lack the patience to clean them up. It is important to note that Otter is a mere enabler, it reduces your effort, doesn’t eliminate it altogether. Not yet.
It’s a smart app. With AI and NLP getting smarter by the day, Otter will be a much more polished offering in no time. It could do with some feature improvements though.
First, conversations recorded on the Otter app may not be the best in terms of audio quality. Further, only English conversations can be processed for now.
Surely, Otter has the potential to become a “category-defining application” — it wants to be the Dropbox or Slack of audio — if more languages are enabled. The app just needs to ramp up its NLP software for that.
Also, text annotations ought to be allowed if Otter wants to become an integral part of people’s professional or academic lives.
Otter is a big step in the right direction, but the journey has just begun!