Rob Thomas is co-author of the book ‘Big Data Revolution: what farmers, doctors and insurance agents teach us about discovering Big Data patterns’ (see my book review). He is Vice President of Product Development for Big Data and Information Management at IBM.
His focus is on cloud, appliance, and on-premise software development, with an emphasis on Agile development processes and continuous delivery. He has led teams at IBM sites in the US, Canada, Germany, UK, Japan, China and India. Thomas also led IBM’s acquisition of Initiate Systems, Netezza and Vivisimo. He was earlier at Merrill Lynch and Wheat First Securities.
Rob Thomas joins us in this exclusive interview on the rise of Big Data and analytics, their impact on startups and enterprises, and future skills in the Data era. (See also my interview with Tom Davenport, author of ‘BigData @ Work.’)
YS: How was your book ‘Big Data Revolution’ received? What were some of the unusual responses and reactions you got?
RT: The book has been received quite well. The approach of the book was to profile a variety of industries and how they are driving innovation with data. Those are stories that almost anyone can learn from and apply.
The Financial Times reviewed it, which helped it get a lot more attention than expected. I hear the most from clients of mine that have read it, and are trying to apply it to their business. I think it probably has worked best to generate brainstorming and discussion.
YS: In the time since your book was published, what are some notable new examples you have come across of companies who effectively harnessed Big Data?
RT: I am amazed by what I see happening in oil and gas. Many of those companies are rethinking their entire business processes with data: from exploration to refining to distributing. Their biggest challenge tends to be data ingestion at scale.
I also see retailers getting much more aggressive about the application of data to look at single view of their customers and leveraging that to personalize offers and engagement.
YS: What are the typical challenges companies face as they scale up and encounter ever-growing volumes of data? How can they prevent being overwhelmed?
RT: Skills are the biggest issue in any enterprise. This Data era requires the usage of new technology and tools, and most organizations are not prepared for that. I think BigDataUniversity.com is a great example of education that can help change the nature of skills in an enterprise.
The impact of cloud on IT is profound. Most media focuses on the reduced costs of starting a company and the ability to reduce capital expense via the cloud. While those are both significant impacts, I believe the bigger impact will be on the traditional definition of a skilled IT worker.
As organizations move to the cloud, the traditional IT skills of systems administrator, architect, DBA and IT operations will likely be diminished if not completely eliminated at some point over the next 5-10 years. To be clear, I think this will take a long time to play out. But, that is precisely why now is the time to prepare for the coming revolution.
YS: How should innovators strike that delicate balance between ‘Stick to your vision’ and ‘Adapt to a changed world in the face of new data’?
RT: I’m a big believer in disruption, so ‘stick to your vision’ doesn’t resonate with me. The organizations that will survive and thrive in the Data era will be the most agile and adaptable. Competing and innovating have changed forever for companies. The next assault will be on the individuals within the companies. In the next five years, every employee will be "dealing with Darwin" personally, when it comes to their skills.
YS: What are good examples of who completely pivoted in their strategy or operations because of insights from Big Data? How were they able to succeed?
RT: Monsanto, which I profiled in the book, is a great example. They changed agriculture from being about machines to being about data and insights. Monsanto has developed a clear understanding of their users (different types of farmers) and what will make them more productive in their work. With that data, they can make recommendations to fundamentally change agricultural production in a positive way.
YS: What hybrid strategies are emerging which tap the ‘Big Data’ of mobile operators and the ‘small data’ on user smartphones/tablets?
RT: We do a lot of work in telecommunications, around call data record analysis. To analyze that huge corpus of data and to take action on it requires streaming analytics, which is really a new paradigm for data analysis. I like to think of it as ‘continuous insights,’ where a business analyst can ask a question and then continue to get answers to that question, forever - as new information flows in.
YS: Who are some of the Big Data innovators in large companies whom you admire the most today?
RT: I like to see organizations championing the idea of a Chief Data Officer. Having a senior executive who is only focused on data assets is pretty critical in my mind. This role typically spans governance, security and analytics. So, it is a very dynamic and broad role.
YS: What new trends are Big Data innovators creating or disrupting today?
RT: I like IBM’s Spark Technology Centre in San Francisco. By focusing on the open source community, the pace of innovation is dramatic. We contributed SystemML, which is some key IBM intellectual property around machine learning, which will help advance machine learning in Spark.
Machine learning is better equipped to deal with the modern business environment than traditional statistical approaches, because it can adapt. IBM’s machine learning technology makes expressing algorithms at scale much faster and easier. Our data scientists, mathematicians and engineers will work with the open source community to help push the boundaries of Spark technology with the goal of creating a new era of smart applications to fuel modern and evolving enterprises.
With machine learning at the core of applications, they can drive insight in the moment. Applications with machine learning at their core get smarter and more customized through interactions with data, devices and people - and as they learn, they provide previously untapped opportunity. We can take on what may have been seen as unsolvable problems by using all the information that surrounds us and bringing the right insight or suggestion to our fingertips right when it's most needed.
It is my view that over the next five years, machine learning applications will lead to new breakthroughs that will assist us in making good choices, look out for us, and help us navigate our world in ways never before dreamed possible.
YS: What is your current field of research in the area of Big Data and analytics?
RT: I am very interested in the Apache Spark project. Spark will transform how data is accessed, processed, and utilized in an enterprise. The key points for any company to understand about Spark are:
1) Spark is the Analytics operating system. Any company interested in analytics at scale will be using Spark.
2) Spark unifies data sources in an organization. Hadoop is one of many repositories that Spark may tap into.
3) The unified programming model of Spark (Scala), makes it the best choice for developers building data-rich analytic applications.
4) The real value of Spark is realized through Machine Learning (ML). ML automates analytics and is the next great scale effect for organizations
YS: What is your next book going to be about?
RT: Hah, I can’t even imagine another one yet. But, I’m sure I’ll do it at some point!
YS: What is your parting message to the startups and business leaders in our audience?
RT: Be bold and try new things with data. I see way too many companies who are more focused on the possible downside than the potential upside. Leaders will make big bets and win big.