The Indian internet’s second big turning point will be language tech


The field of language technology has seen itself take several large strides this year, with advances and milestones marking how far it’s come. If there’s one thing that 2017 has taught us, it’s that if you’re building solutions for India, you should ignore Indian languages at your own risk, writes Arvind Pani, Founder of Reverie.

Even as 2017 draws to a close, the steady march of technology continues at a brisk pace, gaining momentum.

A few years ago, the landscape of India’s internet was fundamentally changed by the explosion of mobile users coming online, and we’re now seeing a similar phase of rapid growth - this time powered by Indian languages.

The field of language technology has seen itself take several large strides this year as well, with advances and its own milestones marking how far it’s come.

Here’s a summary of some of the more important leaps forward that have happened this year in language tech, and their implications for the average Indian.

Quantifying Indic language reach online

In April 2017, Google and KPMG released a report, Indian Languages - Defining India’s Internet, on the presence and reach of Indian languages online.

The key takeaways included the fact that Indic language internet users (234 million) have already surpassed English users (175 million), and that this trend will only accelerate as time goes by.

90 percent of Indians coming online for the first time over the next five years will do so in their own language, bringing those numbers to a projected 536 million Indian language users vs 199 English users.

Sometimes, these statistics can be surprising. According to the Digital Indian Language Report by Reverie Language Technologies, Hindi, Marathi, and Gujarati are the three most used Indian languages online, even though Gujarati is not among the top three Indian languages by native speakers.

Caged in by the lack of language localisation and services in Indian languages, these Indian language users have so far stuck to low friction verticals - Reverie’s report lists social media, messaging, browsing, and entertainment as the verticals these users use the most - but that will change, as companies start building solutions that target this user base.

Which brings us to the next development.

India’s mobile language mandate

In a significant push forward, the Government of India mandated digital Indic language support in 22 languages for all mobile devices in India, a push decisively in favour of their increased digital presence. Once this mandate goes into effect (February 1, 2018), all new phones in India will have to support all 22 official Indian languages, as well as input functionality in at least two Indian languages.

One of the larger implications of this move will be that device support that caters to India’s non-English speaking population - over one billion people - will become a pre-requisite for digital devices, a new default setting.

With the proliferation of cheap data plans and affordable handsets, one can only imagine how this move could impact the country’s internet in the longer run.

Digital government services

In addition, the Government of India has been pushing for more government services to be available online. As internet penetration increases, the internet increasingly becomes both an outreach platform for and a facilitator of government services.

State governments are also pushing for language localisation across digital platforms, both on smartphones, and on websites.

The Government of India’s UMANG (Unified Mobile Application For New-Age Governance) app was unveiled in November this year, and it came with support for 12 Indian languages. UMANG’s nature as an all-in-one government app that lets citizens find and access other government services means that UMANG’s language support will facilitate easier access with government services in general.

BHIM, with its mission of bringing digital payments to the masses, was also built keeping accessibility in mind. It was released with support for multiple Indian languages, ensuring that the average Indian citizen would have as much access to digital payments as their English speaking, upper middle-class fellow countryman.

Machine Learning and voice search

One of the biggest developments that marked this year in language tech was the advent of Machine Learning and voice search.

Machine Learning helps power more accurate, precise translation, that’s essential for localising content at scale. It allows translation systems to learn from millions of examples and patterns and continuously improve the naturalness of its translation.

Indian languages have certain linguistic quirks that can confuse translation systems otherwise, like stark differences in formal and colloquial vocab. Water, for example, can be jal or pānī depending on formality, and the wrong variant would sound horribly out of place.

Voice search, of course, lets users find content by allowing them to speak to their devices. Indians who are coming online for the first time may be more comfortable searching by voice than typing, since Indic language typing would be something completely new to them. Voice, on the other hand, isn’t. According to Google’s own data, 28 percent of Google searches done in India are powered by voice queries.

Building solutions

Tech companies are finally waking up to the fact that Indian languages need digital support too, and that involves creating a user experience that is completely optimised for Indian languages - merely providing a suboptimal, patchwork user experience won’t do. Developing language tech, however, comes with its own numerous challenges.

There’s a very real scarcity in actual resources for building digital support for Indian languages.

The European Union, for example, has EuroParl, a database of corpora (parallel language vocab data) for multiple European languages. Indian languages have nothing comparable. This means these resources for Indian languages have to be built from the ground up.

It’s an exciting space to be in, as there are a whole host of problems to be solved, and whatever solutions are built will end up impacting the lives of hundreds of millions of Indians, forming an essential part of their daily lives.

If there’s one thing that 2017 has taught us, it’s that if you’re building solutions for India, you should ignore Indian languages at your own risk.



Updates from around the world