The Endgame Guide to Informed Cocktail Party Conversations on Data Science and the Latest Security Trends at RSA 2015


The mathematician George Box famously noted that, “all models are wrong, but some are useful”. This is especially useful advice when looking at quantitatively driven analytics—a topic that is increasingly dominating research and media coverage in the security industry. While the move toward more data-driven analyses is a welcome one, without a proper understanding of the indicators, parameters, and compilation of the data, the field is ripe for misinterpretation and apples-to-oranges comparisons of the state of the security threatscape. New security industry research and related media coverage over the last week indicates a strong focus on the escalation of cyber attacks over the last year. The quantitative findings, coupled with the growing qualitative narrative of China’s Great Cannon escalatory capabilities of censorship outside of Chinese sovereign territory, indicates a troubling trend in rising malicious activity in the cyber domain. These estimates, at a strategic level, are likely correct, but they are prone to misinterpretation and confusion when translating them into business, policy, and course of action decisions for executive leaders. To make sense of competing analytics and best evaluate specific organizational risks, executives need to understand how data and behavioral science work together. As security executives and practitioners get ready to head to RSA next week, here are a few guidelines for comparing and interpreting the latest security research:

  • Parameters: Many of the recent industry reports focus on specific geographic or industry coverage or even company size. For instance, headlines that attacks are up 40% pertains only to large companies with over 2500 employees.  Similarly, headlines that cyber attacks cost companies $400B requires the qualification that this is an estimate for some companies. Similarly, SCADA systems seemed especially vulnerable in analyses that focus solely on SCADA systems, which may or may not apply to other targets. Finally, frequently the research is based on a sample or subset of the data and therefore may not reflect the entire population. In short, findings in one region or vertical or target-type do not necessarily translate into the same risk factor outside of those specific parameters. This could be exacerbated depending on the sample size of the data. The type, severity, and frequency of attacks against the financial services industry in the US likely vary significantly from those targeting the telecommunications industry in Peru. Distinguishing even further based on company size adds another level of complexity that cannot be ignored.
  • Measurement: What constitutes an attack? This is perhaps one of the most challenging and inconsistent areas of quantitative security analytics. For instance, in the critical infrastructure industry there are significant discrepancies in the number of reported attacks, partly due to a lack of consensus on the nature of an attack. Critical infrastructure is not alone, as organizations vary on their definition of an attack. What was the target? From where did the attack occur? Was data breached? For some, the breach of data appears to be the distinguishing element of defining an attack. “It wasn’t an actual hack, no data was breached” noted Alex Willette, when the State of Maine’s website went down last month after being the target of a series of denial of service attacks. And this is just the key independent variable. A series of control and dependent variables also are prone to measurement discrepancies as well. In fact, most quantitative analytics base their measurement on raw numbers and ignore what percent they might be of the larger population. In an industry where the number of connected objects and people continues to expand exponentially, the raw numbers mask the growing population size from which these measurements occur. Are there more attacks simply because there are a greater number of connected devices and people? Maybe not, but it certainly is a factor that must be considered in any rigorous analyses.
  • Collection: Even with the parameters and measurement well established, the security industry faces great challenges in data collection. This is both a technical and a social challenge. Clearly, the technical means to collect the data often remain proprietary and therefore limit apples-to-apples comparisons of the findings. However, the social dimension likely provides an even greater collection problem. Unlike other areas where risk factors are visible (such as conflict), the security industry leans heavily on self-reporting of breaches. This is one of the many areas where behavioral and social science can be integrated into the quantitative analytics. For instance, the notion of norms emerges frequently, but rarely is it applied to norms pertaining to reporting. Previously, companies and organizations were disinclined to report on a breach for fear of the reputational costs. Is this norm even more embedded in light of CEOs at Target and Sony losing their positions? Or is rising awareness of the geo-political threats leading to greater disclosure to the government? In short, the latest figures on the escalating malicious digital activity might reflect changes in reporting, detection, increased activity, or more likely a confluence of the three. Given the nature of obfuscation and continued norms that may limit reporting or even information sharing, it is essential to remain cognizant of how data collection directly impacts any findings in the security industry.

As corporate executives and the security industry flock to San Francisco next week for the RSA conference, there will be plenty of discussion on the latest reports and big data techniques to help tackle the escalatory nature of malicious digital activity. This may be the one time data munging and structuring discussions just might be welcome at the numerous cocktail party receptions that coincide with the RSA conference. When asked for thought-provoking insights on the latest trends in the security industry, it never hurts to remember that models are oversimplifications of reality. The parameters, measurement and collection of the data dramatically impact a model’s robustness, and thus the validity of the findings. It is best to avoid oversimplifying such a complex domain, and instead opt for digging beneath the surface of the latest trends to fully comprehend exactly how they might apply to a given organization.