Is Hadoop Ready for Security?

Picture Source: artistsinspireartists

Picture Source: artistsinspireartists



 In 2008, the number of internet-connected devices surpassed the number of people on the planet and Facebook overtook MySpace as the most popular social network. At the time, few people grasped the impact that these rapidly expanding digital networks would have on both national and cyber security. This was also the year I first used Hadoop, a distributed storage and processing framework.

Since then,  Hadoop has become the core of nearly every major large-scale analytic solution, but it has yet to reach its full potential in security. To address this, last week Cloudera, Endgame and other contributors announced the acceptance of Apache Spot, a cyber analytics framework, into the Apache Software Foundation incubator. At the intersection of Hadoop and security, this new project aims to revolutionize security analytics.


“When you invent the ship, you also invent the shipwreck” - Paul Virilio


Back in 2008, the security industry was recovering from the Zeus trojan and unknowingly gearing up for a date with Conficker, a worm that would go on to infect upwards of 9 million devices across 190 countries. Simultaneously, government think tanks warned that web 2.0 social networks would facilitate the radicalization and recruitment of terrorists.

As a computer engineer in the US Intelligence Community, I was lucky enough to work on these large-scale problems. Problems that, at their core, require solutions capable of ingesting, storing, and analyzing massive amounts of data in order to discern good from bad.

Around this time, engineers at search engine companies were the only other teams working on internet-scale data problems. Inspired by Google’s MapReduce and File System papers, Doug Cutting and a team at Yahoo open sourced Apache Hadoop, a framework that made it possible to work with large data sets across inexpensive hardware. Upon its release in 2006, this project began democratizing large-scale data analysis and gained adoption across a variety of industries.

Seeing the promise of Hadoop, the Intelligence Community became an early adopter, as it needed to cost-effectively perform analysis at unprecedented scale.  In fact, they ultimately invested in Cloudera, the first company founded to make Hadoop enterprise ready.


Fast Forward to Today: Hadoop for Security

In 2016, forward-leaning security teams across industry and government are increasingly adopting Hadoop to complement their Security Incident and Event Management (SIEM) systems. There are a number of fundamental characteristics that make Hadoop attractive for this application:

1. Scalability: Network devices, users, and security products emit a seemingly infinite flow of data.  Based on its distributed architecture, Hadoop provides a framework capable of dealing with the volume and velocity of this cross-enterprise data.

2. Low Cost-per-Byte: Detection, incident response, and compliance use cases increasingly demand longer data retention windows.  Due to its use of commodity hardware and open source software, Hadoop achieves a scaling cost that is orders of magnitude lower than commercial alternatives.

3. Flexibility: Starting with a single Apache project, the Hadoop family has grown into an ecosystem of thirty plus interrelated projects.  Providing a “zoo” of data storage, retrieval, ingest, processing, and analytic capabilities, the Hadoop family is designed to address various technical requirements from stream processing to low-latency in-memory analytics.

Unfortunately, many Hadoop-based security projects exceed budget and miss deadlines. To kick off a project, engineers have to write thousands of lines of code to ingest, integrate, store, and process disparate security data feeds. Additionally, the numerous ways of storing data (e.g., Accumulo, HBase, Cassandra, Kudu...) and processing it tees up a myriad of design decisions. All of this distracts from the development and refinement of the innovative analytics our industry needs.


Apache Spot

Apache Spot is a new open source project designed to address this problem, and accelerate innovation and sharing within the security community. It provides an extensible turnkey solution for ingesting, processing, and analyzing data from security products and infrastructure. Hadoop has come a long way since its inception. Apache Spot opens the door for exciting security applications. Purpose-built for security, Spot does the heavy lifting, providing out-of-the-box connectors that automate the ingest and processing of relevant feeds. Through its open data models, customers and partners are able to share data and analytics across teams -  strengthening the broader community.

Endgame is proud to partner with Cloudera and Intel to accelerate the adoption of Apache Spot across customers and partners.  Our expertise in using machine learning to hunt for adversaries and deep knowledge of endpoint behavior will help Apache Spot become a prominent part of the Hadoop ecosystem. We’re excited to contribute to this open source project, and continue pushing the industry forward to solve the toughest security challenges.

To find out more about Apache Spot, check out the announcement from Cloudera and get involved.