Milind Borate - the man building the ‘North Star’ of data protection
In this week’s Techie Tuesdays, we bring to you the story of Milind Borate, Co-founder and CTO of Druva, one of the biggest tech startups of India. In the last two decades, Milind has come a long way to become one of the key authorities in database and storage technologies.
Milind Borate lives by the principle of simplicity — whether it’s choosing a product or making a decision. This is one of the first few things you get to know about Milind when you meet him. He is the Co-founder and CTO of Druva, almost a decade-old cloud-based data protection and governance solutions company.
Milind has two loves of his life which have propelled him towards building one of the most successful tech startups of India — technology and Pune. Yes, Pune. Milind has always been based in Pune — expect for the two years when he was at IIT Bombay for his MTech. Though he had the option to go to other places, he chose to be closer to Pune.
Over the last two decades, Milind has worked considerably on storage and database technologies. He is now focusing on cloud storage and machine learning for unstructured data. He’s our Techie Tuesdays candidate of the week.
Childhood spoiler — fear of examination
Milind was born and brought up in Pune. His father is a civil engineer at Military Engineering Services (MES) while his mother taught kids (and started a school) for almost 35-40 years. Milind remembers being an average student and always a back bencher. Though he wasn’t into learning with the hundreds of students in one class, he was happy exploring his interests in science outside that class. Whether it was the failed attempt to build a steam engine-powered car or to use a step transformer and battery cells as a weapon, Milind found his mojo in science. He also read a lot, mostly fiction in those days.
As a child, Milind used to be extremely scared of examinations. He recalls, “That pressure of delivering something in three hours and that fear that the questions could fall outside the range of what I had studied, dint really help me in studying anything in depth. That stressed me out.”
His fear of exam was so acute that he decided not to go for higher studies. He wanted to get a job with minimum required education and opted for a three-year diploma in engineering. His father, who had wanted him to be a doctor, was unhappy but the stubborn Milind dint change his decision.
From diploma to degree — discovering the love for programming
All through his education, Milind always came across teachers who instilled him in the love for certain subjects. It was Maths and Physics in school days. During diploma, the Applied Mechanics professor had a big influence on Milind. In his second and third year, when he was taught about microprocessors and assembly language programming, he developed a liking for programming. He adds, “We had EPROM burners then. We wrote code in assembly on a piece of paper. Then we looked for opcodes (binary) for each of those instructions (to convert Assembly line coding to machine language). We typed those opcodes (0 to F) on telephone keypad-sized keyboard. And then EPROM would convert it into a little burner.”
Since Milind got interested in the coursework during his diploma, he decided to pursue engineering further. In 1990, he joined Pune Institute of Computer Technology (PICT), the only college in Pune which offered only a computer science degree. He had never wanted to leave Pune.
During his graduation, the professor teaching Unix shaped his thinking and developed his interest in systems programming. After his graduation, he joined Persistent Systems. Though he had a job offer from Motorola as well, he preferred Persistent Systems for its smaller team. There he worked on a distributed database. Anand Deshpande, Founder of Persistent Systems, urged Milind to write a white paper on query processing (of a distributed database). He hated the idea then and only now he realised how stupid that decision was. At Persistent Systems, he worked on the compiler stack mostly — understanding DBMS, SQL, database schema.
Need for more learning
After spending a year at Persistent Systems, Milind joined a US-based startup Advanced Computing Systems Company (ACSC) which was working on NFS caching and had built an AIX kernel driver to NFS caching. He was back to his love for Unix and kernel. He also realised that the basic building blocks on Windows NT are similar to that of Unix. At ACSC, Milind worked on the following:
- Network stack dealing with NFS,
- Process and memory management
- Storage stack
Soon, Milind started to feel the need to go back to college to learn more and joined IIT Bombay for an MTech in Computer Science. There he focussed on learning storage and database systems. His thesis was on distributed database systems — about building the network operators, i.e. how do you take a query and figure out which partitions it needs to touch (and split it and send it to those).
At IIT Bombay, Milind had traversed a long distance — from hating the idea of writing a white paper (at Persistent Systems) to writing a book. Along with his colleagues — Sandeep Phadke and Prasad Dabak — from ACSC, he wrote a book based on their experiences with Windows NT called - ‘Undocumented Windows NT?.
Systems engineering paradise — Veritas
In his final year at IIT Bombay, Milind had finalised the criteria for him joining a company:
- should be in systems engineering,
- should be a small team, and
- should be in Pune.
Veritas was the only one that satisfied his criteria and he joined the Veritas’ file systems group. It had a ten-member team of fresh college post graduates. Milind worked on very specific file systems utility check and then file system migration utility (which takes Unix file systems and converts it into Veritas File system (VxFS)). Milind says,
At that time, there were tons of bugs in that system that touched every part of file systems. We fixed those bugs. It was a great exposure as we were forced to look at the code written by brilliant systems engineers and find bugs in it. Testing a piece of code is more challenging than writing it. In early 2000, Veritas File Systems and Veritas Volume managers were like the golden standards of storage.
Milind worked on this file system for two years, and then on developing a clustered file system (CFS) where his team modified VxFS to work in a cluster. That project didn't see much commercial success but Milind learnt a lot about distributed systems. He led the HP-UX part of it. He explains, “At that time, HP-UX was transitioning from buffer cache to page cache. So, we were working on making VxFS work with page cache and then making sure that the clustered file systems worked with it too.”
One major problem to solve for clustered file system was cache coherency. These caches are controlled by the operating systems (and file systems don't control them directly).
In 2003, Milind moved to another group internally (at Veritas) that focussed on upcoming tech. For example, he worked on building a file system where one server would handle all the name space operations so that other servers could deal with the data part of it, because name space operations were hard to scale. Milind left Veritas in 2005 when Veritas Symantec merger was announced because he was sure that Veritas would no longer remain a small organisation where he would want to work.
Startup #1 — Coriolis technologies
Milind started a services company to build software that was being outsourced by the tech companies. He named his startup Coriolis Technologies after a French scientist Coriolis. The company couldn’t do well and he left (the partner continued to be there) after running it for two and a half years.
At that time, Milind didn't want to go back to a regular job and he didn't know what was next in store for him.
Starting up Druva
In late 2007, Milind met Jaspreet Singh and Ramani Kothandaraman, who later became Druva’s co-founders. Milind knew them through the clients of his previous company. He recalls,
All of us wanted to work in product space. We thought that India as a growing market was ignored by US software companies. We realised that hundreds of manufacturing companies with revenues greater than $100 million needed to maintain small IT shops for daily production and other management. That was critical data for them and we wanted to provide data recovery for that.
The trio thought that the companies would not mind putting in a million dollars in disaster recovery plan of data. At that time, Veritas, EMC and CA (among other companies) had solutions in this fields but Druva’s bet was to make the solution affordable and much simpler to deploy.
The company was named Druva because,
- it was in disaster recovery ('dr') domain, and
- the persistence of data which connotes the North Star — it is there even when the whole world goes around.
Wrong market assessment —> product iteration —> success
Jaspreet and Milind built (coded) the product in six months. It was a kernel driver which would trap all writes going to local hard disk and send that data over the network to another machine which would just keep writing it to a remote hard disk. So, basically you keep replicating the data continuously and if your primary machine dies, all your data is safe in the remote machine. Unlike other existing solutions, the clients wouldn't need a separate storage array (or storage area network) and a regular hard disk can be used with Druva’s solution.
When the Druva’s team went back to the manufacturers, either they got less support (companies refused to share even their hard disk for storage) or there was a demand for the customisation of solution (request by a client to support data copying and recovery via satellite link). Soon, the team realised that the market was not ready yet. In the process, they also got a request to build a data recovery solution for end systems (laptop data protection).
The team validated this problem as a gap then and started working on building end point data protection solution. The target shifted from manufacturing firms to the firms with large number of employees using laptops/desktops to create content on these personal devices (and not on server). Since the end point backup is still a $200-300 million market, it didn’t draw much attention of the bigger data recovery companies (and hence there was no solution being offered).
Success begets success
The market uptake was almost instantaneous for inSync (Druva’s endpoint data backup solution). To begin with, the product was very simple yet a complete product (on-premise solution) with a small storage engine and a small client which would scan file systems on Windows box and transfer those files over a network to backend system. Milind says, “The users on the move didn't consist the network connection to the backup server, so we made sure that even if the network connection kept going up and down, we would continue to back up the data and if the network broke, the backup would resume from the same point next time. The product could be downloaded and deployed very easily. As a result, we started getting traction from North America and Europe by the end of 2008.”
The fundamentals of de-duplication and WAN optimisation were right from the beginning. Once the core pieces around inSync as a product were right, it was more about the completion of the product in terms of the following:
- integration with active directory,
- doing a MAC/Linux port,
- changing a backup storage engine which would scale better, etc.
The two-dimensional growth of Druva
Milind looks at the growth of Druva in two dimensions:
- On X-axis they put all data sources and
- On Y-axis they put all the use cases.
Every year, the company expanded on either data source side (mobile phones/tablets) or the use case side (data loss prevention). Milind and his team also started working on cloud implementation of its technology. On data sources side, apart from end user data, they started tackling server data as well. Legal holds came as an important use case, and after Enron case, this picked up further (in cases of litigations). Before this feature, an IT admin had to take data from individual machines on a hard disk for legal hold.
Milind decided to build out Druva’s cloud stack from the scratch to support scaling better. He says,
Technologies which work at that level are distributed database and object storage where you can have single instance of that storage but be able to scale it. As you move from file system to object storage, the paradigm just shifts.
Tech stack @ Druva
At its base is object storage and distributed database. Over that, there is public cloud (AWS and Azure), and then there are engines that create another level of indexing (for cost effectiveness) over object storage and distributed database (which itself acts as an indexing engine). On top of it, there is file system which allows to create folders, write, and read data. It is a time indexed file system with the feature of de-duplication. On top of that, there’s a backup engine which accepts this data from the backup clients and stores in the file system. Its primary job is to optimise the network interface of it. Then on one dimension there’s client piece which runs on end point on mobile phone/tablets to servers. On the other side of stack, there's an entire configuration management piece. Finally, the UI is rendered by React.
Though it was the right thing to do at that time, picking Python as the primary coding language is something Milind is changing now. He says,
Over the time, we figured out that Python is not the best language for efficient scaling. Rust, Golang are merging as better options now. We are rewiring parts of code in other languages now (primary Golang) and trying to change the system one piece at a time.
Milind believes that Druva’s biggest strength is its understanding of public cloud — how computing, storage and network works in public cloud. His team is exploring AI/ML on the use cases side, and after legal hold, they even built a feature called proactive compliant which scans through all the data, looks out for sensitive information (like credit card number), and generates reports about the compliance status of the organisation. Next step is to scan this data and advice customers on the deeper aspects using machine learning. According to Milind, this is going to shape the future of both Druva and this industry.
How to hire a techie
Milind looks for the following four qualities while hiring:
- Passion - It can be judged by simple coding exercises. Milind’s favourite question is to ask the candidates to implement the strtok() function in C. Most of the time, candidates think it's simple and try finishing it in two minutes. He says, “I look for people who have written it once and then figure that he/she needs to handle a lot more cases and starts modifying the code and then writes the function again.”
- Common sense - It can be judged by giving the candidates fictitious problems like “if you want to build the elevator from earth to moon how would you go about it?” If a person immediately starts working on it, it's a hint that they are not using their common sense. Common sense would be to question why not a rocket (instead of an elevator).
- Team play - More of we than I.
- Humility - Giving due credits to others.
Of honesty, practicality, and simplicity
If there’s one word (and quality) to describe Milind, it’ll be simplicity. He is an honest person who prefers to keep things transparent with people around him. He keeps practicality and common sense close to his heart.
Even after a decade at Druva, Milind is driven to do more by
1. working with smart minds and being able to talk to someone who thinks alike.
2. the feeling that a fairly complex level of code written by him after a point takes its own life (and grows to become more impactful).
At this stage, for Milind, it’s all about enjoying every moment of the journey rather than waiting for the next magical episode to start.
- Persistent Systems
- Jaspreet Singh
- Milind Borate
- Windows NT
- storage technologies
- Pune Institute of Computer Technology
- database technologies
- distributed database systems
- Ramani Kothandaraman
- Sandeep Phadke
- Veritas File Systems
- Prasad Dabak
- Advanced Computing Systems Company
- file systems
- Coriolis Technologies