Self-driving databases are the future, says Feifei Li of Alibaba
Alibaba and Jack Ma are household names today. The tech giant has a market cap among the global top 10 and has expanded into all major markets in the world.
Technology has driven Alibaba’s superlative growth, and has helped the company do what Amazon, Google, eBay, PayPal,, wholesalers, and umpteen manufacturers do in the US.
Now, the world over, companies are adopting and adapting to artificial intelligence and the future of business. And Alibaba is leading the way.
Fostering the tech behemoth’s growth is Feifei Li, the Vice President of the Alibaba Group and President of Database Systems at Alibaba. Before joining the tech conglomerate, whose revenues are $56.15 billion, Feifei was a professor at University of Utah, US.
Feifei believes AI is “yet to make a big impact” and remains limited to heuristics like computer vision, speech, and voice recognition. But, he feels that one thing’s crystal clear: AI will in time play a big role in the way business is conducted.
Feifei Li, Vice President of AliBaba Group and President of Database Systems at the tech conglomerate.
YourStory caught up with Feifei for a candid conversation on the future of databases, what engineers need to learn to power automated databases, and what the company has to offer to data scientists.
Edited excerpts of the interview:
YourStory: What is changing in database technologies and where is this tech heading?
Feifei Li: Database is a mature technology and has been around for 40 years, especially relational databases. I feel like a dinosaur. That's part of the reason why this conversation is important and exciting.
You know what happened to dinosaurs, right? They went extinct. So, how does one evolve in the tech world and not become extinct?
The cloud has provided several opportunities, and there are several cloud-native database companies that can compete with the likes of Oracle. The future-is a cloud native database.
But not many people realise that the cloud was a virtualisation of resources such as storage and compute. These resources are bundled as a pool and sold as infrastructure-as-a-service. This is amazing because the cloud is elastic and easily scalable, and the reason why you see the proliferation of new startups.
Instead of working with fixed costs, you can work on a pool of resources with a variable cost. That’s why business conversations are now about elasticity and high availability. You can be highly available if you are in the cloud; there will be zero downtime.
Now coming back to a cloud-native database. Cloud-native database systems have been around since 2005. Storage, network, and virtualisation were the first disruptive technologies to take off as the cloud offering . After that, a lot of changes happened in the platform layer with algorithms coming in by 2014. Tech disruption happens layer by layer, so a database is no longer legacy.
In a traditional database, resources (storage and compute) are bundled together and you cannot tap the power of pooled resources.
Our database, the PolarDB, decouples compute and storage. This benefits companies to scale up or scale back down for storage and compute. You can manage the CPU or DB through a button; it is automated. At Alibaba, we have the Auto Scaler; you can automate and monitor workloads without having people to do tasks. It is on demand and elastic, which means businesses save on cost. It includes even NewSql.
YS: What is NewSql?
FL: Jargon and terminologies apart, I have to explain this technically and talk about old structured data relational database management systems.
Earlier, a big part of the database business needed to ensure consistency and durability guarantees. This meant ensuring that updates were consistent. To make sure performance was consistent, you needed systems to manage high through put workloads and ensure consistency.
Google changed all this 10 years ago. Their belief was that this old model could not work with new applications that generated massive amounts of data; the world needed availability of databases rather than durability guarantee.
Businesses in the modern world needed a highly scalable database unlike those that offered a structured approach of working with data. A decade ago, rather than worrying about traditional requirements of consistency, it was important to scale horizontally with distributed solutions while handling massive data. That's how big data processing tools like MapReduce and Hadoop were born.
This also gave rise to NewSql systems, which came in around a decade ago and allowed handling massive amounts of data from decoupled resources in the cloud.
A company could scale from 100 nodes to 1,000 nodes in seconds, like ecommerce companies during a sale where traffic spikes. Alibaba has a partnership with MongoDB, who offered NewSql technologies.
NewSql is not just about scale; it also gives you guarantees of consistency like a relational database. It has the best distributed and cloud-native architectures. We also have a hybrid database management system where we can run databases instances running on premise systems and in the cloud.
YS: Is there a product for data scientists?
FL: Our product, Data Lake Analytics, combines data from all sources of legacy and cloud infrastructure.
With the DLA data from file systems, relational databases and NewSql can be pulled into our data lake and can create an interactive analytical processing capability. These analytical databases combine processing of structured and unstructured data on a large scale. This helps data scientists use ML algorithms to understand structured and unstructured data together.
The work experience with data will be much better than earlier. The productivity of data scientists is boosted because they don't have to spend too much time structuring data.
We also have a product, Data Works, which has several ML algorithms to help data scientists make sense of data.
YS: What does the word AI mean to you?
FL: Cloud computing has changed everything because it has fuelled the growth of data. But, we are still far from real AI.
We use deep neural networks today and they need large-scale data to be really useful. AI is a black box today, but AI tech used as heuristics has worked. It has made a mark in facial recognition, computer vision, and speech recognition.
Now, it is making a mark in databases too. We will have self-driving databases in the future, and our roadmap is to fully automate a database. The complexity in automating databases arises because usage changes from customer to customer, which makes it tough to automate the entire process.
However, we can use AI for common scenarios. For example, we can help ecommerce or traditional systems to manage their latency and scalability and use algorithms to ensure that databases are secure and running fine.
We are integrating blockchain with database systems with Ledger DB. This can syndicate and verify the integrity of data and logs. As part of AliBaba Group, Ant Financial has AliPay; that uses blockchain. There is a strong ecosystem and when you transfer money from one party to another, the company uses blockchain to track the integrity of transactions between banks and merchants.
YS: What technologies of the future must engineers focus on when they join AliBaba?
FL: Developers need not worry about which computer language they know. They have to know open source, because we will never build our systems in closed technologies. If you are a database engineer, you don't have to learn new things; Postgres or Oracle DB will do.
At AliBaba, you need fundamental math skills and logic.
(Edited by Teja Lele Desai)