Securing Big Data Analytics

Big data is being gathered and analyzed everywhere. But are we prepared to deal with its security implications? By Mehak Chawla

“The bigger the data sets, the more the security risks associated with them.” This is an analogy that has hardly ever been disputed in the IT world. That is why, even as big data and its potential to be analyzed is garnering a lot of limelight across the world, CSOs and CIOs are fretting about its consequences on their security set-ups.

The worldwide big data market is estimated to be around $3.2 billion, growing at a CAGR of 31.6% and expected to reach $23.8 billion by 2016. 80% of this growth is going to be driven by unstructured data. By 2015, big data will create 4.4 million IT jobs globally.

The hype around big data can be assessed by the fact that the TIME Magazine declared the #2 buzzword of 2012 to be “Big Data”, only behind “Fiscal Cliff”. 2012 was also the year in which the most interesting character in the U.S. presidential election was not the winner , but a data analyst at the New York Times named Nate Silver. Silver predicted the outcome down to the popular vote percentage and the breakdown in the Electoral College.

With such examples in front of us, there is hardly any speck of doubt that big data, if analyzed using the right tools, has the potential to unveil some spectacular facts and actionable information. However, what is yet to be recognized is the security cost that we are paying for this information.

Big data and security
Perhaps the biggest security threat around big data is that there is no defined input perimeter. Big data analytics differ from traditional systems because they don’t have a dedicated source for their incoming data. For instance, an SCM application’s input sources are the ERP and PLM apps, which in turn get their data logged in by company’s employees. In a big data scenario, that situation goes haywire because the source of information might be in the other part of the globe, for all we know.

According to Arvind Benegal, Practice Head, Security, CTO Organization, Persistent Systems, the security risks associated with this relatively recent technology are still being understood in their full scope. “This is especially so when customer and business information is being collated from different sources, merged, distilled, transported and stored at a large scale and high speed. So naturally, the complexity of security challenges such as governance of data access, exposure of confidential data, integrity of data, and regulatory compliance get magnified.”

The major apprehension here is that confidential information could potentially be combined with other data sets and unintentionally exposed. Organizations also run the risk of breaching privacy laws which require that data remain within specific geographical locations, for example, not allowing data to cross country borders. This is all the more important now that there are myriad ways to process Personally Identifiable Information (PII), including biometrics.

Rohit Tripathi, Vice President – Product Architecture and Technology Strategy, SAP Labs India ,elaborates by saying that, beyond technology, one of the most common use cases of big data is to marry different data sources such as social media feeds, weblog data, and enterprise business information and generate meaningful insights from them. “While enterprises can typically confirm and guarantee the quality and security of their own data, it is much harder in the case of external data where the contents are not verified.”

Some believe that though collecting big data sets and keeping them in isolation might not be a major security concern, analyzing that data together is a potential risk. Ajay Biyani, Solutions Consultant, South Asia, Verizon Terremark, says that the problem with big data analytics is that we are collecting so much data from varied sources that we can never be sure if a classified bit of information is being broadcast to a data engineer.

Biyani believes that in this situation of complex analytics, differentiation of data types before they are fed into the macro system is a plausible solution. Amit Yoran, Senior Vice President, GM Security Management and Compliance, RSA, agrees with that proposition. “Big data analytics is a macro trend and is both a threat and an opportunity. Big data often exists in silos and thus needs multiple analytical methods which are leveraging various tech components. Differentiation of data types is the evolution of big data security analytics.”

However, the appliances that we are using today to analyze big data, might not be quite there yet, when it comes to security.

The appliance puzzle
Even as big data appliances like EMC’s Greenplum, SAP’s HANA and Oracle’s Exadata family are gaining traction, the truth is that their security aspects still remain a grey area. Even though a lot of these proprietary devices come with in-built security features, these are hardly exploited by the customers.

The other fact here is that since most of these devices, and indeed big data analytics itself, are still nascent concepts, the magnitude of the threats that these appliances can come under remains elusive. Explains Ekta Aggarwal, Program Manager, Information & Communication Technologies, Frost & Sullivan, “With data residing everywhere, the top imperative for organizations is to structure, analyze and utilize it using technology in order to derive maximum value to support business objectives. There is a high probability that this structured data becomes rich target for potential attacks, any type of fraud or crime.”

There is also the probability that adoption might be preceding understanding when it comes to big data appliances. Says Pawan Kumar S, Executive Director, PwC India, “Many OEMs have got onto the bandwagon of big data and in a hurry invented, repackaged or created big data appliances. Since many of them are getting used for the first time by organizations, strong best practices and lessons learned are yet to be applied to these implementations, therefore creating new information security risks.”

Aggarwal adds that the question that enterprises need to answer now is not only whether they are adequately equipped to avert attacks, but also, if their traditional security framework and tools are comprehensive enough to secure their businesses in the face of growing volume and complexity of big data.

Dealing with big data security
Though the need of securing both the big data and the appliances that process it might be clear, the solution isn’t that simple. Intelligence driven security, or security analytics that are getting more sophisticated by the day, are sparsely being deployed. Biyani opines that the adoption of security analytics is slow because of the cost and volume involved. Another big challenge here is that systems from vendors are still not very fine tuned and difficult to integrate with the existing security platforms. Vendors also need to develop adapters for legacy systems.

Tripathi of SAP Labs gives us another perspective on security analytics, “On its own, big data does not pose any special security threat. However, most problems arise due to limited security and systems management features that are provided by many big data software solutions, especially the ones that are open source.” An organization should therefore, choose their big data appliance considering the in-built security mechanisms.

Tripathi suggests that, as an indirect security threat measure, organizations should also address data privacy and data quality aspects while considering big data solutions. “Information systems that leverage big data often identify individual users and analyze their behavior on the web. Many jurisdictions such as the Indian Government’s Information Technology Rules (ITR) and EU Data Protection Directive have regulations that control how the data associated with an individual is managed. Applications need to be built to meet the legislative requirements that apply to information systems that leverage big data.”

Also, the standard security concerns around the movement of data from the public cloud and the enterprise need to be looked into. “However, the latest in technology capabilities help isolate selective large data volumes that can be moved over secure ftp to Amazon for batch computations. Furthermore, the open source community around big data is also actively responding to security concerns,” says Tripathi.

Third party verification of devices and other security parameters could also come in handy. “Since most projects are not home-grown, working with vendors to develop technologies required for managing the risks of big data, such as better data masking, meta-tagging, data classification, and fine-grained access control becomes paramount. Data classification further allows a potentially gargantuan problem to be broken down into tractable chunks,” says Benegal of Persistent. Security vendors are also developing security analytics platforms, such as the one recently launched by RSA to tackle big data security.

Saket Modi, CEO, Lucideus Tech, explains that enterprises are employing traditional methods like firewalls and IDS-IPS for securing data appliances. “They also have the option of getting their data devices tested for security vulnerabilities by using services of companies like Lucideus Tech.”

Pawan Kumar S, Executive Director, PwC India opines that enterprises have to learn new ways of dealing with these security concerns that are emerging from big data. Additional controls have to be put in place to deal with new threats. “Enterprise could take two broad approaches in this aspect. The first is to implement information security controls as if the big data implementations are the most mission critical applications and then start tempering down the controls selectively. The second is to implement a toned down version of controls to start with, and slowly increase the security levels with continuous evaluations of risk.”

Given that the most big data implementation will have both critical/sensitive data and non sensitive data, the best approach could be to deal with these types separately and not create and implement the same framework across all data. Classification of data is the only natural evolution of big data security analytics.

Toward security analytics
Security analytics or intelligence driven security is increasingly being talked about as an effective way to mitigate the risks associated with big data. Biyani of Verizon informs us that there are already some pilot projects of intelligence driven security that evolving into full fledged projects.

The upside here is that big data analytics, to a great extent can assist intelligence driven security. As Benegal puts it, “Fortunately, security analytics is itself a wonderful byproduct of big data, and so inherent in the problem lies the solution.”

Kumar of PwC corroborates by saying that big data initiatives can be implemented for security analytics since the big data platforms enable analyzing vast amounts of data with varied types and which have different velocity, it naturally augurs well for security analytics type of business use case. “Many security oriented organizations, especially defense and police departments have started implementing small scale very focused security analytics in India.”

Modi of Lucideus Tech. also mentions that by-products of big data like predictive analysis and data mining can help an organization develop an active security analytics framework.

The demand for services and solutions around security analytics for big data is likely to follow the adoption of big data technologies by businesses. According to Tripathi, it is only a matter of time before Indian businesses realize the architectural advantages of big data based security analytics as it allows them to handle larger quantities of data and scale up with larger amount of events created. Combining this with the query and analytical capabilities of big data technologies, businesses could have a real time Security Information and Event Management Solution (SIEM).

Art Coviello, Executive Chairman, RSA says that the idea of an intelligence-driven model is gradually becoming conventional wisdom. “Big data will be applied in two ways: in security management and in the development and application of individual controls. Because sources of security data are almost limitless, there is a requirement for security management that goes well beyond traditional SIEM.”

When it comes to big data security, we need to move beyond the reactive and perimeter based security dogmas of the past and speed adoption of intelligence-driven security. With a technology as transformational in nature as big data, organizations need to create a transformational security strategy.

Comments (0)
Add Comment