[Techie Tuesday] Meet Debdoot Mukherjee, the man driving ShareChat's data science vision
Debdoot Mukherjee’s day job at ShareChat involves applying data science to understand nuance and context in the myriad conversations of Bharat.
Models and tools built by him and the team of data scientists he oversees help ShareChat make sense of the tonnes of text, audio, video, and photos posted by its 130 million users, and serve them the most “relevant content feed”.
“With AI, we essentially make sense of what the Bharat user wants,” Debdoot tells YourStory, “Our challenge is to understand every user’s taste and show them the right piece of content at the right time on their feed. What we do is matchmaking of content with consumers to deliver value.”
While it appears to be a one-line value proposition, this is driven by an intricate framework of advanced algorithms, recommendation engines, deep learning mechanisms, language processing systems, and data analytics.
Or, as Debdoot calls it, the “state of the art technology” for the Bharat user.
An IIT-D Gold Medalist and data scientist for 12 years, Debdoot joined ShareChat in early 2019. As the VP of AI at the $650-million soonicorn, he builds, tests, deploys, and drives the data science pipeline at the largest Indic language social media network.
Contextualising social media with AI
The data science team Debdoot leads dabbles in advanced technologies like computer vision, natural language processing (NLP), optical character recognition (OCR), contextual search, text analytics, and data mining.
ShareChat users generate swathes of unstructured data across 15 Indic languages every day, which the AI and ML frameworks crawl through to serve the most personalised content feed to each user.
The social media platform attracts users from India’s small towns and in Indic-language speaking pockets of Bangladesh, Nepal, and the Middle East.
“Their needs may not be polished, or their discourse may not be majoritarian as on other social media platforms. That is where ShareChat delivers value,” Debdoot says, adding, “It becomes a more interesting problem for AI to solve when you are serving needs in hundreds of genres across so many languages.”
Over the years, ShareChat has birthed a “longtail of content genres” that differentiates it from other social media networks.
While the startup’s initial growth was driven by the demand for shareable content — memes, jokes, greetings, bhajans, shayaris, short videos — it now services a variety of “new content needs” from education to entertainment to information.
The VP elaborates, “In an applied environment like ours, all threads of data science start meeting. We recognise patterns in content by combining visual, text, and audio data. The first step in language understanding is recovering text embedded in photos and videos, which makes up most of the content. We have built an in-house OCR model that can understand Indic scripts and a variety of fonts. After retrieval of data, we use NLP and deep learning to interpret the content type. The availability of data in low-resource languages like Assamese and Odia can be a challenge. But ML lets you transfer your understanding from data-rich languages like Hindi to train models in other languages.”
ShareChat harnesses data to not only personalise feeds but also auto-tag content, do complex sentiment analysis, detect spams, and control abuse. “We plan to open-source our OCR pipeline because we see a lot of interest from Indian universities,” Debdoot reveals.
He highlights that in a consumer tech startup, you get “first-hand access to data” and once you solve a problem, the user feedback is immediate.
However, that wasn’t the case a decade ago when he was cutting his teeth in enterprise AI in the hallowed corridors of IBM Research. “ML was largely restricted to research labs or academia. It did not have any applications in the industry that could be touted as successful,” he shares.
But IBM Research was his first “real exposure” to these advanced disciplines of computer science and engineering. “That is where my interest started spiking,” he says.
“I was fortunate to have been exposed to a research environment early in my career when the first applications of machine learning were coming through. Our lab had PhDs and Nobel Prize winners. The learning that happened was tremendous.”
From enterprise research to startups
After earning a master’s in Computer Science & Engineering from IIT Delhi, Debdoot joined IBM Research as an R&D engineer in 2008.
During his six-year stint there, he initiated several research projects; gained expertise in information retrieval, data mining, enterprise search, and built end-to-end tools and models to improve the productivity of knowledge workers.
“IBM had a very large global services division. There was a lot of documentation that happened... PPTs, Word docs, Excel sheets. These were created and put in a silo and never looked at again. Our challenge was to bring this knowledge to life. We built models that could extract information from unstructured documents. ML helped resolve ambiguities in sketches and information, and converted drawings into structured data sets.”
By 2014, IBM’s long feedback cycles and reliance on clients had begun to wear Debdoot out. It was also the time when startups began to boom in India.
It “inspired” him to make the move from enterprise research to consumer tech. “I wanted to experience the startup culture where you create problems, find solutions, deploy, and move on to the next problem,” he says.
In early 2014, he quit IBM to join the data science team at Myntra, which was on a mission to personalise fashion for users at the time. Within a month of his joining, Myntra was acquired by Flipkart, and a new world of possibilities opened up for him.
Personalising fashion with AI
The problem statement at Myntra was simple: Can something as personal as fashion be transformed with data science?
These were early days of the ecommerce boom, and Myntra — already a leading fashion e-tailer — would go on to pioneer the use of AI in fashion.
Debdoot, who was Myntra’s lead data scientist, developed a Customer Insights Platform, a data-led framework that creates rich profiles of customers to understand their taste in fashion, site navigation patterns, intent, purchase behaviour, responses to content notifications, recommendations, and real-time offers.
The framework slices and dices the data to create micro segments of customers based on their behaviour. These “novel predictive models of customer understanding” allows Myntra to deliver a hyper personalised 1:1 shopping experience to every user.
In fact, the online retailer can even go one level up over physical stores.
Debdoot elaborates,“In a brick-and-mortar store, you cannot change the layout of the store for every customer. But it is possible to do that in an online portal because you are essentially playing with pixels. The entire catalogue can be shown with only you in mind. That was one of the early experiments we did in 2014. We hyper personalised product listings and made searches more contextual based on our predictive modelling of the customer. It had a healthy impact on all business metrics.”
Customer analytics came to be integrated at the core of Myntra, and all decision-making was data-driven. Debdoot steered all data science programmes at the ecommerce company and built a “personalisation pipeline” to serve a variety of use cases and improve the visibility of products.
“We transformed decision making across all business functions from merchandising and marketing to demand forecasting and strategy,” he says.
But despite the deep impact of AI in fashion ecommerce, it was still a “niche domain”. “There was only so much you could do in fashion,” Debdoot says.
His desire to build products in a more “mass domain” prompted him to leave Myntra in late-2015 and join homegrown messaging unicorn Hike. And that began his journey in social, mobile, and communication.
AI for the Next Billion Users
Hike helped Debdoot gain a keen understanding of the Next Billion Users.
Also known as the ‘Bharat user’, this set has skipped a generation of technology and gone from no internet to high-speed 4G. They think, act, speak, and consume differently from the first-wave internet users in India.
When Debdoot joined Hike as its Head of Data Science, it was still an early-stage startup, and its “portfolio of problems was much wider” than Myntra’s.
"Hike had a diversity of problems to solve. It was doing stuff with Indic languages, and that presented rich opportunities to do NLP, offer recommendations based on mining of social networks, build ML on the camera-front using computer vision, and so on,” he shares.
But perhaps, the most effective use of AI and data was in Stickers.
Debdoot says, “Stickers was one of the flagship features. Hike was catering to a young audience (18-23) and its core value proposition was to help them communicate with rich expressions. Our goal was to replace texting with Stickers, and make it possible to have a sticker for nearly everything anyone would want to say in a chat conversation. AI helped better that.”
Hike had three things to solve:
1) Can users complete an entire conversation using just Stickers?
2) How much of a conversation can the app enable by suggesting quick and contextual Stickers?
3) And how can the platform build that capability across Indian languages?
“We took on all the problems,” says Debdoot. “The first one with AI, and the rest with NLP and deep learning.”
During his four-year-stint, he led a team of data scientists that mined through, what Debdoot calls, “one of the biggest repositories of user insight in India”. They created conversation and emotion models, search graphs, recommender systems, spam filters, which not only improved the discovery of Stickers, but also individualised them.
“It had a tremendous impact on the sticker-to-text ratio, which went up in double digits, and the Hike’s entire value proposition became centred around Stickers,” Debdoot says, “It also made the app very Indian.”
By 2019, when he left Hike to join ShareChat, the world had upped the ante on Indic language offerings. ‘From Silicon Valley to Indus Valley’ — a phrase often used by VCs — every tech company was building for Bharat.
Future of AI in consumer tech
If the buzz around GPT-3 is anything to go by, the future of AI may be here.
GPT-3 or Generative Pre-trained Transformer 3 is a language prediction model that uses deep learning to create human-like text. Built by Silicon Valley-based research lab OpenAI, it could change the way content is served.
Debdoot reckons its possibilities are great and GPT-3 has “stunned AI experts” with some unimaginable use cases. “Even though OpenAI has kept the last-level details fairly guarded, people are playing with it in interesting ways. Generating content will be a task GPT-3 could be ideally suited for,” he says.
Does that change things for ShareChat that is optimising its AI to improve content delivery? And what else can data science solve in social media?
“We have aligned our content feed with user needs, and the engagement is improving on a daily basis. But can we predict a content mix that can balance short-term engagement with long-term retention? Data science can gauge what users may be interested in later. It can learn from one session to serve them in the next session. Understanding abstract content, which is not easily identifiable by machines, is what we are getting better at.”
For a man who’s spent over 12 years in advancing applied AI in India, Debdoot is still wide-eyed about the power of data and what it can potentially achieve.
He credits the “hyper growth” in these disciplines to the easy access to learning.
“All the advances are coming through because the AI and ML community was perhaps the first set of people to reap the benefits of online education. There are so many people operating in a level-playing field as far as access is concerned. That was not the case even 10 years ago.”
Today, the world’s a stage and it’s still Day One at ShareChat.
Edited by Saheli Sen Gupta