Companies are jumping on the Internet of Things (IoT) bandwagon and for good reasons. McKinsey Global Institute reports that the IoT business will deliver $6.2 trillion of revenue by 2025. Many peoplewonder if companies are ready for this explosion of data generated for IoT. As with any new technology, security is always the first point of resistance. I agree that IoT brings a wave of new security concerns but the bigger concern is how woefully unprepared most data centers are for the massive amount of data coming from all of the “things” in the near future.
Some companies are still hanging on to the belief that they can manage their own data centers better than the various cloud providers out there. This state of denial should all but go away when the influx of petabyte scale data becomes a reality for enterprises. Enterprises are going to have to ask themselves, “Do we want to be in the infrastructure business?” because that is what it will take to provide the appropriate amount of bandwidth, disk storage, and compute power to keep up with the demand for data ingestion, storage, and real-time analytics that will serve the business needs. If there ever was a use case for the cloud, the IoT and Big Data is it.
The IoT Process and its challenges
Processing all of the data from the IoT is an exercise in big data that boils down to three major steps: data ingestion (harvesting data), data storage, and analytics. The value to the business of big data is in the analytics, whereas, the data ingestion and data storage is cost of doing business and is becoming a commodity. Experts estimate that over half of all big data projects fail and most of those failures are due to projects never getting past the data ingestion phase.
Even if enterprises manage to make it past the data ingestion phase, the data storage phase presents another set of challenges. In this area, companies must learn new technologies like Hadoop, Map Reduce, etc. and be able to provision enough disk, network, and compute capacity to keep up with the influx of new data. There is a major skills shortage in the area which creates a serious challenge in the do-it-yourself (DIY) model.
The challenge in the analytics phase is integrating the IoT data with the existing data warehouse investments. This can be extremely challenging when the underlying database technologies of the data warehouse are different than what is used for the IoT data. To make matters worse, the costs and effort to maintain and provision enough infrastructure to keep up with the incoming flow of data is an arduous task that continues to keep risks high throughout the life of the IoT investment. It is also highly likely that the demand for real time analytics coupled with storing many petabytes of data require different server, disk, and network infrastructure than what exists in most data centers today. This will lead to even larger infrastructure costs and the consumption of additional floor space. The DIY model is a very expensive undertaking and the risk/reward quotient is often not a very attractive investment for many companies.
Big Data Strategies
To counter the slow time to market and high costs that come with the DIY model, there are three approaches that companies are taking. The first and most popular model is to leverage one of the many database as a service (DBaaS) offerings in the market place. Solutions like Amazon’s Redshift, Hortonworks Enterprise Hadoop, and Cloudera Enterprise provide automation and database management services so that customers do not have to install, manage, and operate the underlying technologies required to make large NoSQL databases scale. In the DIY model, engineers need to acquire a broad range of skills in order to work with the underlying technologies. DBaaS solutions abstract away much of the underlying complexities so that engineers can focus on the data as opposed to the collection of technologies that make up the underlying database. The challenge with DBaaS is that engineers are typically required to extract or query data from these technologies. Businesses are still heavily dependent on IT for extracting value out of the data.
A second approach is to leverage managed big data services. Managed service providers (MSPs) like Treasure Data will own the responsibility of data ingestion and database management as well as provide capabilities for performing analytics and extracting datasets. This model allows customers to focus on analytics where the business value is and outsource the hard stuff to the MSP. Managed big data services allows customers to get to market very quickly without a large upfront investment. It also addresses the skill gap issue that many companies have. Some companies use this model to ingest data from IoTs and other sources and then extract aggregated data to bring back in-house to join to their existing data warehouse investments.
One of the problems with the first two approaches is that they are tied to a single DBaaS technology. Many enterprises have different use cases for their big data challenges which often required different types of database solutions. It is not rare for an enterprise to require two or more of the following NoSQL database types: key value store, column store, document store, graph database. In addition, enterprises often have requirements for data to reside in multiple datacenters and in both public and private cloud endpoints. This quickly becomes a complex matrix of database technologies mapped to data center locations. One company that is addressing this problem is GoGrid. You may remember GoGrid as an early IaaS company. Their focus today is on solving the matrix of NoSQL to datacenter combinations. GoGrid works with many of the database providers to provide “1-Button-Deploy” technology for Hadoop, Cassandra, Mongo, Cloudera, Hortonworks, Riak, and others. These database technologies can be deployed on any GoGrid datacenter or one of the many ISV’s that GoGrid partners with.
In a discussion with GoGrid CEO John Keagy he stated that “… technical difficulties are the big challenge to big data adoption. Yes, many businesses will simply outsource to “black-box” managed services. However, don’t count out the incredibly well funded ISVs. They have many billions in funding to spend on making their big data technologies easy to consume by the masses”. Back in May, GoGrid released its 1-Button-Deploy solutions to the public on theOpenOrchestration.org website. As the founder of this open source project, GoGrid provides a library of popular Big Data solutions and a repository for orchestration solutions, tools, and services.
Large data volumes from IoT will drive radical changes within today’s datacenters and will require new Big Data strategies within enterprises. Due to a skills shortage and the need to constantly procure infrastructure to keep up with the amounts of incoming data, enterprises will start moving away from the DIY models towards PaaS, managed, and orchestrated solutions. The value of IoT is in the data. The quicker enterprises can start analyzing their data the more business value they can derive. Vendors are stepping up to the plate to remove the complexity and risks of data ingestion and data management so that customers can focus on analytics. Watch this space closely. The winners will win big.