I am attempting to highlight 5 young companies who made an impact to the Big Data ecosystem and have the potential to become key players in this space.
Cloudera – This is one of the first companies to bet on the Big Data paradigm. Started in 2009, Cloudera went onto become the most successful Big Data companies. In 2010, they secured a $5 million funding from Accel Partners that boosted its R&D efforts. Cloudera is the first to commercialize Apache Hadoop the same way that Red Hat chose to commercialize Linux. They should be credited for democratizing the Big Data through their customized, enterprise friendly Hadoop distribution called CDH, Cloudera Distribution including Hadoop. CDH makes it easy to get started with Apache Hadoop by providing packages that can be installed across Linux based servers to VMs that can be launched on popular Cloud platforms like AWS and Rackspace. The fact that Oracle chose to embrace CDH for their Big Data appliance speaks volumes about the maturity of the stack. Some of the tools like Sqoop that are originally developed by Cloudera made their way to Apache’s official projects. With an impressive clientele that includes the likes of eBay, Nokia, Samsung and Qualcomm, Cloudera is the poster child of the Big Data movement.
Hortonworks – Started as a Yahoo! spin-off, Hortonworks was built on the same premise of making Apache Hadoop more affordable to enterprises. Hortonworks was formed by the key architects and core Hadoop committers from the Yahoo! Hortonworks Data Platform (HDP), the core offering is an open source data management platform based on Apache Hadoop. The key differentiator of HDP is the Hadoop management tool called Ambari which is an open source management tool for Apache Hadoop. Unlike Cloudera Manager which is available only for enterprises, Ambari is completely open source and free. Though Hortonworks is younger compared to Cloudera, they have been able to generate enough buzz in the market. Microsoft dumped an internal parallel computing project called Dryad in favor of Apache Hadoop and decided to go with the Hortonworks stack for their HadoopOnAzure offering. This is not surprising given that Microsoft and Yahoo! are now allies in the search engine world. Hortonworks is also getting closer to developers by partnering with Talend to integrate HDP with the Talend Open Studio. Watch out for Hortonworks as they have the potential to make it big!
MapR – Based at San Jose, California, MapR is a young Big Data startup that went live with their product last year. MapR claims that their distribution of Apache Hadoop dubbed as M3 / M5 is the most reliable and faster Apache Hadoop. They seem to have achieved it by tweaking the storage layer that is designed to be faster, easier and reliable than HDFS. Called the Direct Access NFS, MapR allows customers to deal with the data sets through standard tools unlike HDFS which is optimized only for reads than write operations. MapR also claims to offer better automation and management of Apache Hadoop cluster. EMC decided to bundle MapR as a part of their Greenplum HD enterprise edition. AWS offers MapR stack through their Elastic Map Reduce service. The recent announcement of MapR being available on Google Compute Engine is a big deal for the folks at this fastest growing Big Data startup!
DataStax – Found in 2010, this is an innovative startup that promises to solve the Big Data challenges by leveraging some of the emerging technologies. DataStax chose to build their business model based on the open source NoSQL database called Apache Cassandra. Though the mix of Apache Cassandra, Apache Solr and Apache Hadoop, DataStax offers real time analytics to enterprises. DataStax claims to avoid the ETL jobs to move data from traditional sources into Apache Hadoop by taking advantage of Apache Cassandra’s features. DataStax’s USP is their ability to mix and match real time data streams with the historical data thus enabling the enterprises to analyze the data faster. Their customers include Disney, Netflix, Rackspace and Cisco among others. HP Cloud offers Datastax Enterprise as a service to their customers. With NoSQL becoming mainstream, DataStax is all set to benefit from the adoption of Apache Cassandra among the internet scale companies.
Karmasphere – One of the oldest companies in the Big Data space, Karmasphere was founded in 2005 at Cupertino, California. Their differentiating factor happens to be the visualization tools that help customers make sense out of the Big Data. Amazon offers Karmasphere Analyst, a product that helps data professionals and analysts to explore and interact with Big Data on Amazon Elastic MapReduce. Karmasphere Studio is another product which is a a plug-in for the Eclipse IDE that provides a familiar graphical environment for managing the complete lifecycle support while developing Apache Hadoop based solutions. By enabling prototyping, developing, testing, debugging, optimizing, and deploying Karmasphere Studio makes it easy for the developers to deal with Big Data workflows. Karmasphere fills a gap that exists in the Big Data space through their visualization and developer tools.
I want to conclude this article with a disclaimer that these are not the only companies that are making a splash in the space of Big Data. There are many other innovative startups like Datameer, Splunk and Platfora with the potential to make it big. I will cover these new players in the future articles.
- Janakiram MSV, Chief Editor, CloudStory.in