Meet Joydeep Sen Sarma, the IITian who revolutionised big data at Facebook
Our Techie Tuesdays’ candidate this week, Joydeep Sen Sarma is the quintessential Indian geek who believes in figuring things out for himself. Whether it was preparing for IIT-JEE and scoring an all-India rank 18 or fighting the whole tech industry against eventually consistent database systems, he knew he could do it and he was right. As Data Infrastructure Lead at Facebook, Joydeep built Hive data warehouse software along with his colleague Ashish Thusoo. He later co-founded Qubole with him.
If you've ever taken an engineering entrance exam, you would know what it takes to get an all-India rank 18 in IIT-JEE. And that too, without coaching, and sitting in a township on the outskirts of Haridwar. But with the kind of determination that Joydeep Sen Sarma had, it didn’t seem all that difficult.
His life story has elements that most of us can easily relate to. Joydeep, our Techie Tuesdays’ protagonist this week, may spell his surname differently, but he retains the characteristics of being ‘Sharmaji’s son’ through different stages of his life.
Joydeep wasn’t interested in computers when he joined IIT Delhi. It was only in his third year that he discovered his love for programming. The rest, as they say, is history.
He holds multiple patents from his stint at NetApp. He started up, failed, but started up again. He built Hive on the top of Hadoop at Facebook and has never looked back since then. Almost six years ago, he Co-founded Qubole, a Big Data-as-a-Service startup which has now grown to become one of the most reliable big data service on the cloud.
Joydeep shared his story during a recent interaction with YourStory. Excerpts from the conversation:
The making of IIT JEE AIR 18
Joydeep was born in Bhopal in a standard middle-class family. His father worked in a public sector undertaking (BHEL) and moved to Delhi when Joydeep was six-years-old. In Delhi, he went to Mother's International School, which till date exerts an deep influence on Joydeep. He recalls,
The school had a spiritual inclination and it played an important role in reinforcing some of my personal traits like compassion and empathy. I remember that the houses in the school were named after human qualities (and not on people) -- Sincerity, Honesty, Aspiration, Gratitude.
In Class VII, Joydeep moved to Ranipur (a BHEL township on the outskirts of Haridwar) following his father’s transfer there. This wasn’t a very pleasant experience for Joydeep as he had started forming some strong bonds after spending seven years in Delhi. He joined Delhi Public School, Ranipur.
Joydeep could never quite fit into this school and withdrew into a shell of his own with just books for company. He says, “Earlier, I spent a fair amount of time playing, interacting with people. I loved painting, arts, music. But when I came to Haridwar, my only love was books and academics.”
According to Joydeep, in Ranipur, parents seemed to be in competition with each other and measured their self worth by how good their children were in studies. Joydeep wasn't much affected by it but was keen to do well in the competitive exams, to ensure financial security for his family.
He cracked the NTSE (National Talent Search Examination). He considers his JEE (Joint Entrance Examination) preparation as a landmark experience; he went into a hiding for 2-3 years. Joydeep has spoken at length about his experiences during the JEE preparation in this blog post.
Joydeep didn’t like the idea of tutors and believed in figuring things out for himself. His motto was: 'I'll figure it out’. He got study materials from Brilliant Tutorials and Agrawal's coaching. He says, “Along with the study material, I got the pamphlet of previous year's JEE toppers. Pankaj Gupta, from Mother's International School, Delhi had secured an AIR 15 in 1991. I had this deep fear in my head that I was lagging behind. But I also thought if he could do it so could I.”
Before Joydeep, the highest rank anyone from Haridwar could manage was AIR 193. His AIR 18 became a new benchmark for the coming generation of students in Haridwar and inspired many to emulate his performance at IIT-JEE.
However, it's not something that Joydeep is particularly proud of. He says rather philosophically,
I'm always a part of an ecosystem, and irrespective of whether I am going through ups or downs, I'll remain that. If you see randomly placed 0s and 1s, you can find patterns once in a while. There may be few 1s in a row or few 0s in a row. It's nothing but the sequences of outcomes of coin tosses. I may look like a rockstar or a bunch of 1s but as a member of society/ecosystem, we're nothing but random coin tosses and it just happens that sometimes the coin is line up one way or other.
Computer science at IIT Delhi – an acquired love
Joydeep’s parents moved back to Delhi and his father subsequently retired. Joydeep joined the Computer Science department at IIT Delhi even though he wanted Electronics, as he had no background in Computer Science from his school days. However, after talking to parents of other students during counselling, he realised that Computer Science made more sense for an AIR 18.
Joydeep took time to get used to life in IIT. He wasn’t prepared for the hustle and bustle of life on the campus. Just before joining IIT, Joydeep had a bout of measles which made him quite weak. He says, “Everyone was pulling each other down. The students were not only competitive but also adversarial to an extent.”
While Joydeep was learning boolean logic and true/false, a lot of his classmates were already familiar with programming. He didn’t have any competitive advantage, nor did he have any love for the subject. It took him some time to warm up to computer science. He says,
The unfortunate thing about IIT and our education system is that it's completely driven by theory whereas I'm not a theory kind of guy.
He wished somebody had taught him computer programming as a way of building something. Nowadays, things have changed and students are building real world projects on campus.
In his third year, Joydeep started working in labs which had Sun workstations. Before that, students used to work on the mainframes. This was the first time Joydeep felt nice about computing because of good graphics. He started to enjoy computer science then. By the time he reached his fourth year of college, he started working on operating systems, parallel computing, building databases and writing large programs.
In his final year project, he worked on PARAM supercomputer on performance modeling of weather forecasting system. Basically, there was a gigantic code in FORTRAN which did weather forecasting and was running on PARAM supercomputer, and his project was to make it faster.
Joydeep also recalls that at the time, Ernet lab was the only lab with internet connectivity and every student wanted to be there.
Looking back at his IIT days, Joydeep says he suffered because he didn't have any role models right from his school days. He says,
I became the middle of the pack at IIT because I didn't have clarity on what was I interested in and where I saw myself, and only towards the end did I started getting clarity. I liked the machines of Sun Microsystems and started reading about Bill Joy, the founder. After reading about him, I thought may be I wanted to be like him.
Bridging the social, verbal and cultural gap
After BTech, students had three main choices:MBA, prepare for IAS or go for MS. Only a few students opted for a job after B.Tech. Joydeep opted for Master’s in Computer Science from University of Pittsburgh.
After landing in the US, Joydeep faced multiple challenges and every time he emerged as a better and stronger person. He says, “It’s truly said that human beings are like rubber bands. You don't really know much you can stretch. Only when you put yourself in a challenging situation, that you stretch yourself.”
Due to his accent, Joydeep had a tough time as a teaching assistant in the beginning. There were social, verbal and cultural gaps which caused problems for him. He eventually figured out what was wrong with his accent and worked on it. Meanwhile, he was performing well in his academics. He did his major project with Professor Rami Melham who was the head of the department. He worked on parallel computing and did some real-time computing, wrote code for real-time Linux. He ended up publishing some papers as well.
But he soon realised that he wanted to solve more real-life problems and decided to take up a job in the industry. Microsoft and Oracle were top choices for everyone then and Joydeep had job offers from both. He joined Oracle. Unfortunately, just before joining Oracle Joydeep’s father passed away. He later had to take his mother to the US with him.
Within a few months of joining Oracle, Joydeep started feeling frustrated and restless there. He had built some networking stack for the company, rewrote some parts of it, did some locking enhancements and made some performance gains. It wasn't bad but it wasn't enough for him. He says, “I felt I was in a deep dark well where I didn’t know what's outside the well.” With an urge to build things, he joined Network Appliance (NetApp).
In 1999, Network Appliance (NetApp) was one of the most respected names in Silicon Valley. It was around 400-member-strong then. Its founders were well known for writing the first industry log structured file system and Network Appliance became a template for many companies to follow.
At NetApp, Joydeep worked on high availability stack initially where he concentrated on clustering and distributed systems. At the time, eventual consistency became really popular. He had built a little eventually consistent synchronization mechanism in the Network Appliance (HA clusters) without knowing its categorisation.
Joydeep really liked the culture at NetApp and admired the founders, but after spending about three years there, he felt he could do more elsewhere. In 2002, two senior architects from NetApp left and started up. Hugo Patterson joined Data Domain as CTO and Srini started Datsi Appliance. Joydeep had offers from both the companies. He joined Srini as the company offered him the co-founder kind of role and he was its first software engineer. The startup was funded by Rajeev Motwani.
Joydeep wrote distributed file system on Linux. Within three months, he wrote a full system from scratch in the Linux kernel. It was very stressful as Joydeep was operating with very little sleep but he recalls is as a great experience. In the course of his work he met many people whom he wouldn't have met otherwise.
However, after three months, he realised that the startup was going nowhere. He then met his friend Arvind Jain (who had then left Akamai) and discussed the idea of working on a system like Data Domain. This disk-based backup was hot then and deduplication was an interesting problem to solve. Joydeep came up with a very neat algorithm and wrote another file system for doing deduplication.
Joydeep was an H1-B visa holder and without an employment he couldn't have stayed in the US for long. So, he decided to rejoin NetApp. where he planned to moonlight on his startup idea as well.
Getting older and wiser
During his second stint at NetApp, Joydeep recalls how he tried to pitch his startup idea to the NetApp CTO Steve Kleiman. “I told him I had built something and asked him if he was interested in buying or doing something about it. I was very naive and I didn't know that technology by itself has no value. It’s the customer, business and use cases which have the value.”
While, he was NetApp, there was a big industry shift from using tape-based backup systems to disk-based backup systems. Joydeep sensed an opportunity to make money in these transitions, which he felt could give a break to many a startup to enter the industry and grow big. He tried talking to CTOs, architects and CPOs, trying to sell the concept but eventually it didn't work out. While his partner Arvind joined a company that was into deduplication, Joydeep busied himself with some other interesting prototype projects at NetApp.
Joydeep won the CTO awards for two consecutive years for his projects. The first one was a Log Replay. He says, “I was sitting with two filers connecting back to back, figuring out how to transfer memory from one place to another and then I had an Aha! moment. I realised I need not do memory to memory transfer and all I had to do was to convert random reads to sequential reads. While the filer was running, instead of just maintaining the log of operations I would log the buffers needed by the operations. That would keep track of the exact disc sectors and blocks that were required to perform that operation.”
This was later patented in NetApp against Joydeep’s name.
One day, Joydeep overheard Kleiman saying that it would be really cool if NetApp could do backup from the filers directly from the discs. He decided to work on this. He says, “I knew that I could hook up multiple filers where one filer had access to other guy’s disc. I designed a system and implemented a prototype where one of these filers would be a normal filer and the other would attach to this guy's disc and mount them in read only manner (allowing you to access the data in the snapshot manner).” This idea didn't succeed from a business point of view.
Joydeep had become an intrapreneur at NetApp. By the time he finished this project, he was already bored of these ideas and decided to leave NetApp.
Entry to the data world
Joydeep’s friend Abhinav Gupta joined Yahoo! and called him there. It was a break from networking and entry into the world of data for Joydeep. He joined the behavioural targeting platform there and built the first in-house recommendations platform at Yahoo! He used collaborative filtering and built some models using Weka stack. The recommendations platform was used in Yahoo! Shopping and Yahoo! Travel.
It was a big learning curve for Joydeep. He learnt data mining and built some models and services around that. However, he realised that Yahoo! was imploding and decided to leave. After building recommendations platform, he worked briefly with the graphical ads team to figure out the ads business. His project was to build a campaign optimisation platform to help Yahoo! sales team figure out how to tune a client's campaign for better performance.
He saw an opportunity in cross-site targeting of ads i.e. using the data from one site to target ads in other websites (relevant) by selling the data. Around that time, his friend Ashish Thusoo's company was also folding up and the duo decided to work on this idea. They pitched it to the investors but all of them refused citing privacy issues with the idea. They still built a prototype (an ad server) and used the CPL (Cost Per Leads) model. Two companies later on got funded on this model -- Bluekai (bought out by Oracle) and eXelate (Israel-based).
The big data revolution at Facebook – Hive
One of Joydeep’s colleagues from Yahoo! joined Facebook and referred him to the company. He was interviewed and got through, and without thinking much, he joined them. It was a daunting experience for Joydeep because Facebook was populated by 20-something engineers and he was in his early 30s. He recalls, “I had not seen that kind of (raw) talent before. Looking at all these folks, I realized how much time I had lost.”
Joydeep chose the data team over infrastructure team because there wasn't anyone in the data team. He roped in Ashish Thusoo as well. Namit Jain, who's now heading Nutanix India, was the third guy and Zheng Shao, who now heads the data team of Uber, was the fourth member of Facebook data team. The team installed Hadoop and started moving data processing from Oracle to Hadoop.
I had this idea of building a system where I could combine Python and SQL together. That's how I came up with the concept of Hive (now Apache Hive). I started writing SQL operators and added it to Hadoop streaming and made it look like you could query in SQL and Python together.
Joydeep had already pulled all users data into Hadoop and published it as table in Hive. That was a data set everybody (in Facebook) wanted access to. Joydeep was trying to open up Hadoop to the entire Facebook team.
Talking about the naming of the project ‘Hive’, Joydeep says, “I had two names in mind -- Hive or House, as everything in Hadoop ecosystem has to start with H. I went with Hive.”
Two days later, Joydeep’s boss Jeff Hammerbacher (now co-founder of Cloudera) announced that Facebook was working on Hive. Interestingly, he thought that Joydeep and others were only creating table catalogue whereas they planned to write the Hive Query Language (Hive QL). With Hive, Joydeep ended up creating one of the most widely used datawarehouse infrastructures today. He did that without having taken any formal course in relational algebra ever.
While Joydeep had the idea and vision, Ashish helped with the details. He was from Oracle and knew how to go about building such a system. Joydeep and Ashish presented Hive 1 at the Hadoop Summit in 2008 and it became enormously popular. Until then, it was still a hack. So, Joydeep and his team decided to write a proper database on the top of Hadoop. This was internally called Hive 2 and later on became Apache Hive.
Once a tinkerer, always a tinkerer
Facebook was interesting but Joydeep always wanted to start up. In 2010, Joydeep and Ashish started thinking of building a company around Hive. Venture capitalists told them that it's a small market and there wasn’t place for another player since Cloudera was already there. That’s when Joydeep learnt the importance of telling the story to VCs in the way they wanted to hear. He says,
VCs never fund ideas that promise to save money (because there's only so much money you can save). Instead they'll invest in ideas that increases the productivity (where you can make millions of people more productive and do more analysis and process more data).
Joydeep believes that if he had started a company back then, it would have had a much bigger presence now. They were the owners of Hive and it was the hottest project (in big data) then.
Apart from working on Hive, Joydeep architected the backend for Facebook messages. He played an important role in convincing the company to go with HBase (NoSQL database) for messages backend. He also built backend for Facebook credits platform.
When Amazon came up with DynamoDB, Joydeep had his reservations about the platform and thought it to be flawed. When he explained his point of view via a blog post, he had to face severe criticism from the industry. He recalls,
I had built eventually consistent systems in the past without anybody telling me that these are eventually consistent. I knew it inside out. I was literally fighting the whole industry at that time. Everyone turned and said ‘you must be really stupid to say that as everyone will move to eventually consistent databases.’
Like deep learning is a fad today, back then in Silicon Valley everyone was talking about CAP theorem and eventually consistent database system. Today, eventually consistent databases are almost dead.
Genesis of Qubole
After spending four years in Facebook, even though Joydeep had enough stock options vested to sustain him for a lifetime, he left the social media giant to start up. He had already missed the opportunity with disk-based backup systems and with behavioural targeting ads.
He told Ashish (his co-founder), “Cloud is the next big thing and I don't want to miss this time. The whole world is going to move to cloud.”
He saw a great opportunity to ride on the cloud trend yet fit it in with his (and Ashish’s) database background and their desire to build a proprietary software. That was the genesis of Qubole.
The venture was initially named Canopy Data but it turned out that there was a company called Canopy Financials which had committed Wall Street frauds and the investors suggested that they come up with new names. By that time Joydeep had figured that the company will do more database stuff. He wanted to build service which will allow people to build data cubes (common term in business intelligence). So, they wanted to name the company after cubes. Domain names availability issues led them to Qubole.
Qubole over the years
Initially, Joydeep wanted to build a realtime platform where one would publish the data and get the database. The platform would build the whole pipeline (data to database). The team started building the street browsing system but soon realised that it was going to take a lot of time and effort to build an open source project from scratch. And it takes a long time to get people to use it and get large reference customers. The odds of getting them (customers) was even low. Around the same time, people reached out to Joydeep and Ahsish suggesting to them that they work on Hive on the cloud. This made sense and Joydeep and Ashish secured funding based on this thesis. Raising money was easy for them because of their track record.
The duo built Hive as a service in the AWS cloud. They kept building what the market asked. Joydeep says,
“Market wanted a big data restaurant where they can have everything and they can pick and choose and build the stack they want. They also wanted this stack to save them money, be easy to use and secure.”
Qubole has never strayed from its vision of bringing together cloud, SaaS and big data but it has evolved with time. The team consists of more than 110 engineers and product managers worldwide. They have companies like Pinterest, Expedia, NetApp, Ola, Capillary, Saavn as their customers.
It took some time for people to understand that it didn’t make sense to re-invent the wheel and instead of building an in-house big data (or data infrastructure) team from scratch, they could leverage Qubole’s service.
Qubole has raised $50 million and has an ARR in multiples of tens of millions of dollars. Having come this far with Qubole, Joydeep shares his biggest learning,
Most of the people don't understand the dynamics of taking money (raising funding) when they start the company. A company that raises external money has to execute at certain velocity. There's nothing bad about it and often you’ll have the freedom in executing but the fact that VC money drags you in certain direction is a big learning.
Challenges at Qubole
Qubole is now a 'shippable SaaS' company, which means that the company often has to setup dedicated Qubole service for its largest customers. That is something the founders didn't forsee at all when they started the company (this also happens due to regulatory reasons -- Europe, for instance, needs its own dedicated service). Coming from companies like Facebook and Yahoo! (which only have a single instance of their service worldwide), this was a big surprise for Joydeep.
When the team built Qubole initially, there were no containers, no Node.js. Even web sockets were not as popular then. Qubole was built mostly in Ruby on Rails. Now, the company is moving to a world where their services run on every cloud. It's a big transition for a company like Qubole with so many customers and usage, as they will have to rearchitect the whole stack.
In general, one of the challenges for Qubole is to keep abreast of the technology curve. Joydeep says, “When we started Hive was pretty big, but nowadays, Spark has grown bigger. We had to make that transition, and today we have a strong customer base for Spark. And that wasn't our first or last transition.”
Qubole is essentially a platform company which provides lots of open source projects as a service to its customers. So, it has to continuously keep moving from one technology stack to another and yet build expertise and proprietary value add around it.
In fact, Joydeep believes that the biggest problem faced by the company isn't technical. It's how to iterate fast. He says,
We have to build very rock solid enterprise data analysis platform and at the same time, we've to innovate quickly as well. Balancing speed and quality has been the hardest problem we've solved and it's something we didn't foresee.
Team building philosophy
Qubole does multiple tasks. They range from looking into UI to application platform to changing the internals (Hadoop, Spark) to DevOps to quality. Joydeep looks for the right person for the right role. One of the consistent things he looks for is to have a cultural fit. He says,
We need people who listen, are proactive, problem-solvers, action oriented, and curious, who aren't political or just passing the buck around, who've energy and who work well with other people. These are basic attributes. If we find someone good, we create projects to get the best out of them. I want to build a company which gets the best out of its people.
Joydeep believes in the truism he heard during his school days and which is attributed to The Mother (spiritual collaborator of Sri Aurobindo): No child of mine is a zero. It essentially means every person has unique skills and strengths, and different people are good in different roles.
Joydeep believes that companies are all about people and in a reasonably sized company, decisions are about convincing people and taking them together. He takes decisions based on this philosophy: first define objectives, then list out all the options and then objectively evaluate those choices. He says.
What's important is that as a team once you come to a decision, just stick to it.
The next phase
According to Joydeep, technology is the distribution of efforts in different dimensions. He says, “Revolutionary innovations are an important part of that but not the only part.”
Looking back, Joydeep realizes that while he was not a hacker, he was always a risk taker.
He believes that what one desires in life is very much contexted with what one associates with. Now when he’s back in India, he would like to do things that are more directly co-related with the outcomes in this country.
Joydeep still remembers what the instructor told when he was taking motorcycle driving lessons in the US: the motorcycle doesn't go where you point the handle but it goes where you look. Your leaning, handle and everything else will align accordingly. Joydeep adds,
Life is like that. You end up going where you're looking and what your heart desires. And people who've capability are limited only by what they desire to achieve.
- As a kid Joydeep loved knitting. He was fascinated by how one could create something of varying shapes and sizes from a ball of yarn.
- Given his experience and expertise, Joydeep has shared his opinion on why India has failed to produce tech giants like Facebook, Google, and Apple.