[Startup Bharat] These Kerala-based engineers help clients extract data at scale from the internet
The internet is the largest data source ever created — there is no questioning that. Extracting data at scale from this global repository is not easy, however. It requires a lot of technical resources to get ready-to-use data.
The technological barriers limit access to structured web data, especially for those non-technical people who are in charge of critical decision-making and business intelligence.
Datahut was founded to democratise web data so that everyone, regardless of their technical expertise, can access and use web data as business inputs.
Says Tony Paul, Co-founder of Datahut,
“We solve this problem by using our cloud-based data extraction platform which delivers ready-to-use data so the user can access data without worrying about the technological challenges involved.”
Tony started Datahut in the year 2015 along with his college friends, Jezeel Muhammed and Binoop Balakrishnan. The Kochi-based startup helps its customers get organised and ready-to-use data from websites at scale regularly.
This data finds application in fields ranging from price analysis at retailers to investment analysis at hedge funds.
The founding of Datahut
The three founders are friends, right from their graduation days at the LBS College of Engineering in Kasaragod, Kerala. Their journey in entrepreneurship wasn’t always smooth sailing, they recall.
“We had a few failed startups before Datahut. We started our first venture from our hostel room, came to Kochi in 2011, while still in college, with the product, which failed miserably. Then, we started taking up consulting projects and a few of them were in big data analytics. It was a period of struggle, we had many days without a proper meal,” says Tony.
After working on the projects which had data at their core, the trio understood there was a huge demand for data as a service model for web data. This was the genesis of the idea behind Datahut.
The founders claim that the data-extraction market was dominated by self-service tools and freelancers or custom scraper development agencies around the time they started up.
According to Tony, a self-service tool simply cannot meet the large-scale data needs of an enterprise. When a self-service tool is used for large-scale data extraction, the company will essentially trade off on data quality for speed of data extraction; hence, this is never a reliable option, he explains, adding,
“You can’t depend completely on custom scraper development either, as it will be painstakingly slow. The best way to achieve data quality at scale is to adopt a hybrid data extraction method — use automation wherever possible and develop custom configurations when there are complex websites.”
He stresses the need for an ‘intelligence layer’ when there are huge amounts of raw data scattered across thousands of web sites and tens of thousands of categories. Intelligence layer is used to analyse and manage various forms of data.
“This is necessary to ensure data integrity. This intelligence layer requires skilled people and constantly improved self-learning algorithms,” adds Tony.
Datahut claims to solve the above problems by building a platform that uses automation wherever possible and which can be manually configured if automation can’t take care of the problem.
The startup’s platform essentially aids enterprises that require data from websites delivered on a regular basis and in a structured format.
“We run the extraction on our platform and deliver the data via cloud-based data-sharing tools such as Amazon S3 or Dropbox,” explains Tony.
The company now has a team of about 40 employees.
Datahut primarily targets the US market, with more than 70 percent of its revenue coming from here. The UK and Europe, in general, are also the areas that the company is actively focussing on for growing its customer base.
Its client roster mostly comprises big enterprises in ecommerce, real estate, finance, travel, and airlines which need data for competitive intelligence.
“We’ve worked with companies like eBay, Nielsen, Workday, and BCG and also with universities like the London School of Economics and Columbia where researchers use our data sets. Two of the big four accounting firms and one of the world’s largest pharma companies are also our customers,” says Tony.
The startup charges clients based on the volume of the data. The fee charged could be anywhere between $40 per month and $10,000 per month.
Crunching the numbers
According to Stratistics MRC, the global data integration market is expected to grow from its value of $7.45 billion in 2017 to $24.95 billion by 2027, at a CAGR of 14.3 percent. Some of the other players in the market include import.io and Parsehub.
Commenting on Datahut’s USP, Tony says,
“Our hybrid data extraction model gives us better coverage and the flexibility to handle complex data extraction and data structuring requirements at scale. Our competitors do not have this flexibility.”
The startup has been completely bootstrapped from day one. “We started building the platform only after signing up our first customer. We were profitable from the first month. We have been growing steadily from the inception, signing up a few big customers and partners,” says Tony.
It aims to hit $5 million in ARR by 2023.
Datahut plans to partner with more large enterprises to cater to its target market. It also wants to scale and develop partnerships with several AI- and ML-focussed companies and more self-service analytics enterprises to help their customer base get better intelligence from publicly available data.
The startup is also looking to integrate more automation into its platform and build tools that will make it easier for people to derive insights from data. It also aims to educate companies about the potential of web data in their business applications.
The team has plans to scale up its sales efforts in the UK and Europe by opening a sales office in the UK in the next five years.
(Edited by Athirupa Geetha Manichandar)