One LaaS thing: building the Indian Internet in vernacular languages

One LaaS thing: building the Indian Internet in vernacular languages

Monday August 31, 2015,

11 min Read

A living language is a throbbing, vital thing, ever changing, ever growing and mirroring the people who speak and write it. Our great provincial languages are no dialects or vernaculars, as the ignorant sometimes call them. They are ancient languages with a rich inheritance, each spoken by many millions of people, each tied up inextricably with the life and culture and ideas of the masses as well as the upper classes. It is axiomatic that the masses can only grow educationally and culturally through the medium of their own language.
               - Jawaharlal Nehru, ‘The Question of Language’, 1937 essay

Even to a consummate insider, India’s diversity is staggering to behold. If India is a tapestry that binds together a dizzying array of cultures and cuisines, then its languages are the individual threads that represent them. For Indians, it is the most powerful marker of identity, superseding both caste and religion.


India today speaks 780 languages (‘Peoples Linguistic Survey of India’) represented in 86 different scripts. Twenty nine of them are spoken by at least a million people, 22 are recognized by the Constitution as official languages. One of them, of course, is English. From being the language of colonial oppression, it has now risen to the status of the language of aspiration. Whenever I bring up the topic of the need for the promotion of vernacular languages, I’m waved away by most who say that the rise of English speakers is inevitable. And perhaps it is.

However, for the time being, India is overwhelmingly vernacular. India by most estimates has between 100-120 million English speakers. That’s a measly 10% of the population. However, if one goes by the ability to read English, this number dwindles to between 60-80 million people.

Now consider this: India has 220 mobile Internet users as of today, with 20 million additional users being added every quarter. While English language content accounts for 56% of the content on the Internet, Indian languages account for less than 0.1%. If we assume all those who could speak English were the earliest adopters of the mobile Internet, India still has anywhere between 100-160 million users that have no comprehension of the content presented to them.

Have you ever wondered how that might feel?

Humour me for a second and go to the settings menu on your smartphone. Find the tab that says, ‘Language and Input’ and then select a script you don’t understand. Like Cyrillic or Bahasa. Then hit the home key. Now try and change it back to English. That sense of total disorientation and nervous fumbling is what these 100+ million users experience every single day.

And yet, for all our A/B testing on where to put the ‘buy now’ button and which shade of orange it should be, all our talk about how UX is everything, and all the excitement about India being mobile-first, no one has yet solved this very fundamental issue.

Unlike the Chinese or Japanese Internet which was built in native scripts from day zero, the Indian Internet was built in English. But as a nation, we are at a stage in our Internet growth story where English only just doesn’t cut it anymore. It is no accident that the circulation of vernacular language newspapers far outstrips any English daily. To go truly deep, to reach out to our users, to make a digital India – vernacular is the only way.

yourstory-Thousand-Rupees-Feature (2)

This is exactly why we’re extremely excited to announce our investment in Reverie Language Technologies. Reverie is the only company building a Language as a Service (LaaS) platform spanning the full-stack of digital vernacular technology through cross-device font rendering, business grade transliteration and translation, language input through native language keyboards, and contextual search.

All this vernacular goodness is delivered via their platform in a developer friendly package that simply requires an app or a website to integrate their SDK into their codebase.

Now there will always be those who say that people who can’t comprehend English are not high value customers. However, in the same breath everyone in the e-commerce ecosystem will hail the sales they’re generating from Tier 2,3,4 cities. It is self-evident that such arguments are founded in perception and not data. One visit to cities like Ahmedabad, Coimbatore, Surat, or Ludhiana is sufficient to change the perception of English supremacy as the language of commerce.

And yet, while small businesses owners and single person entrepreneurs who can read English, self-select into the digital economy and those who cannot are excluded, leaving a huge untapped market. It is inevitable, that these untapped markets will become the next growth engine for the digital economy. E-commerce companies, cab-aggregators, hyper-local grocery startups, and pretty much every other digital property will have to learn to speak to their customers in their language.

This is why we believe that Reverie’s Language platform will become a foundational technology of the Indian Internet.We are very excited to partner with Reverie as their vision is completely aligned with Aspada’s mission to solve hard problems.

Key elements of building the Indian Internet: creation and comprehension

Given the fact that Indian language content accounts for less than 0.1% the total, creating an easy language input interface becomes fundamental to building an Indian Internet. And with consumers creating a vast majority of Internet content, a mobile keyboard becomes the most prudent mode of language input. For app-developers and mobile-based business, the challenge is to present content that can be easily understood. As we’ve already established, companies are great at reaching out to English audiences. Communicating to their non-English speaking customers is where the real challenge lies. However, creating fresh content from the ground up is time and resource intensive; this is where translation or “localization” kicks in.

Content creation - Language input: A basic prerequisite to delivering content in vernacular language is the ability to create content. In the most basic form, this is essentially a keyboard, but on top of this several other systems such as a publisher, word editor and other programs can be built. Language input for Indic scripts can be achieved by two different approaches – Native and Transliteration based input. A great example of both of these input methods can be seen with the Swalekh keyboard developed by Reverie Language Technologies, which combines both these modes of language input across 11 languages, along with predictive typing, and is available for download on the Google Play store.

Native and Transliteration-based typing on Reverie’s Swalekh keyboard

Comprehension or ‘Localization’ or domain specific translation:

Here’s another little test. Let me ask you to translate the word ‘Play’ into Hindi.

If you said ‘Khel’, you would be right. But what if I was talking about music? You’d instantly go, “Oh, in that case, it should be ‘Bajaao’ ” and with a tip-of-the-hat to Raju band, you’d also be right. But if I was talking about Shabana Azmi and then asked you to translate you’d probably say ‘Naatak’. Also right.

This is exactly why translation algorithms have historically failed to achieve business grade accuracy. This is primarily driven by the fact that popular translation algorithms have adopted a “one-size fits all” approach which is poorly suited to capture the nuances of context, culture, and idioms embedded in Indian languages. As is evident from this example, a translation request without contextual information is bound to fail. Reverie solves this problem by adopting a domain specific approach to translation. Since it primarily serves businesses that operate in particular segments, the contextual data associated with such segments is used by Reverie to ensure higher accuracy when it comes to translation.

Search: The Holy Grail

From “Yeh dil maange more” to “Yenna Rascala”, it is obvious that India’s usage English is very free form. We very liberally shove an English word into a Hindi sentence or throw in words from our native tongue when we can’t find the English analogue.

Imagine if you will that Nitish, a college going student in Allahabad, is feeling particularly dapper, and decides he wants to buy a pair of red shoes for Diwali. Nitish while searching for red shoes on a fashion e-commerce website can potentially type in at least four different strings “Red Shoes”, “Laal Joota” (literal translation of red shoes in Hindi), “Laal Shoes” or “Red Joota”. In such cases the search algorithm must be intelligent enough to identify that all these search terms mean the same thing and deliver identical results.

Indian users are accustomed to using vernacular languages interchangeably with English words and any search algorithm that hopes to work in the vernacular market must hold this to be axiomatic.

To offer a truly rich search experience to users in the vernacular languages, all the other elements of the stack namely fonts, font rendering, native and transliterative input, and domain-specific translation must work together intelligently. Moreover, the input strings may be served in Devanagiri script instead of Roman script, in which case the fluidity of language input must also be accounted for. Another aspect to be covered when it comes to intelligent user-friendly search is recognizing brand names and avoiding literal translation in such cases. For example, if one searches for a John Players (a popular ITC brand of men’s apparel), and requests results in Hindi, the translation algorithm should be able to identify the brand and not convert it to “John Khiladi” which would be meaningless and therefore yield no results. Therefore, we return to the criticality of domain specific search, because without context there can be no comprehension and therefore no search.

Introducing Language as a Service (LaaS)

While there are several players in the market that build individual elements of the vernacular technology stack, there is a dire need to take a full stack approach to solving the vernacular problem with a special focus on mobile platforms. An app-developer or an e-commerce company doesn’t want to deal with a keyboard company, a translation management software, a separate search engine and then spend months of engineering effort integrating them all into their back-end only to see a poor UX in vernacular.

Reverie addresses all these issues with their Language as a Service (LaaS) platform. It allows online businesses to integrate local languages into their web and mobile applications with very little effort expended on engineering and integrations. Once the Software Development Kit (SDK) is integrated into their application and servers, Reverie’s cloud-based backend takes care of translation, rendering text in multiple languages, language input and analytics.

HDFC securities recently integrated the SDK into their HDFC securities multilingual application to great effect.

We believe that Reverie will have a very real business impact on every app or website. Speaking to customers in their mother tongues will not only improve conversions but will also help users understand messages such as delivery timelines, stockout and other text-based nuances that most non-English speakers miss.

Another key benefit is stickiness. It is no secret that every app’s greatest battle is staying on the phone. Uninstall rates regularly hover around the 70% mark. All that money and effort spent in acquiring a download often evaporates into thin air because the customer has no idea what you’re trying to say. When apps can speak to customers in a language they can understand, communicating the value proposition is no longer a question of comprehension. To put it simply, an app available in a vernacular language has a much better chance of sticking.

Furthermore, every app developer and mobile commerce company is looking for that elusive goal – engagement. Companies want users to spend more time on their app because that means more transactions, more ad revenue, and more stickiness. It’s the reason why we’re bombarded with so many notifications today. Enabling vernacular will mean that the user will finally be able to talk back and let the company and the user community know what they think and feel. Additionally, in today’s marketplace driven ecosystem, engagement from sellers becomes critically important to ensure a high quality catalogue and a pleasant consumer experience.

On a personal note, being a polyglot myself has only strengthened my belief that the vernacular language revolution is upon us. Having grown up in a business family in a Tier-2 town like Mangalore, I have seen first hand that most business is conducted in local languages. If you have ever spoken to a stranger in their mother tongue and seen that inevitable spark in their eyes, then you know the value of speaking another language. Language enables greater understanding, which is the very basis of building trust. And that, in a word is, priceless.