Dataset validation within big data security architecture

(1)

This is the author’s version of a work that was submitted/accepted for pub-lication in the following source:

Wong, Andrew, Liu, Vicky, Caelli, William, & Camtepe, Seyit (2014) Dataset validation within big data security architecture. In International Workshop of Information Technology and Internet Finance (PACIS 2014), 24-28 June 2014, Chengdu, China.

This file was downloaded from: http://eprints.qut.edu.au/76366/

c

Notice: Changes introduced as a result of publishing processes such as copy-editing and formatting may not be reflected in this document. For a definitive version of this work, please refer to the published source:

(2)

DATASET VALIDATION WITHIN BIG DATA SECURITY

ARCHITECTURE

Andrew Wong, Faculty of Science and Technology, Queensland University of Technology,

Brisbane, Australia, [email protected]

Vicky Liu, Faculty of Science and Technology, Queensland University of Technology,

Brisbane, Australia, [email protected]

William Caelli, IISEC Pty Ltd, [email protected]

Seyit Camtepe, Faculty of Science and Technology, Queensland University of Technology,

Brisbane, Australia, [email protected]

Abstract

In recent years, increasing focus has been made on making good business decisions utilizing the product of data analysis. With the advent of the Big Data phenomenon, this is even more apparent than ever before. But the question is how can organizations trust decisions made on the basis of results obtained from analysis of untrusted data? Assurances and trust that data and datasets that inform these decisions have not been tainted by outside agency.

This study will propose enabling the authentication of datasets specifically by the extension of the RESTful architectural scheme to include authentication parameters while operating within a larger holistic security framework architecture or model compliant to legislation.

(3)

1 INTRODUCTION

In today’s business environment the use of information technologies to augment traditional business intelligence in order to gain a competitive advantage is an increasing focus. One of the more current applications of information technology is the use of data gathered from clients or customers in analytical activities in order to predict trends and/or monitor consumer behaviour. Collection of this big data is a big challenge, pun intended, just because of the sheer amount of data generated by users. Predictions of the amount of data that will have passed through the internet by the year 2020 is about 35 zettabytes, or 35 x 1021 bytes or 35 billion terabytes and the rate of increase will grow even higher as the years pass. Coupled with this, the emergence of the Big Data phenomenon is attracting a large amount of attention and placing a higher imperative for Big Data analytics and storage. The International Data Corporation (IDC, 2013) now projects that the market for Big Data will increase through to $32.4 billion in the year 2017.

With the increased market for Big Data, as well as the potential long term effects that Big Data will have on decision making and enterprise resource planning, this brings to mind the importance of safeguarding this data. Protecting the confidentiality, integrity, availability, and privacy of customer data and of data storage and transmission will become even more critical as the impact of a security breach can be severe.

This paper provides a brief look into what Big Data is, illustrating need for further research into Big Data security and proposes a study on a modular, holistic security architecture for Big Data based on an extended Open and Trusted Health Information System framework developed at QUT.

This study will exclude organization data collection activities, as this is not within the scope of security.

2 LITERATURE REVIEW

2.1 What is Big Data?

The first traceable recorded use of the term “Big Data” in the modern context was by John Mashey (1997) in various talks circa 1997-1998 while he was at Silicon Graphics. This is then followed by Weiss and Indurkhya (1998) who refer to Big Data in a data-mining context and suggest practical difficulties for its application.

A research note by Laney (2001) defined what would be the now ubiquitous three V’s of Big Data. Volume, Velocity and Veracity. This is THE first definition of the characteristics of Big Data and Gartner’s definition of Big Data mentions these V’s exclusively. From here, the Big Data phenomenon begins to pick up speed with more and more organizations adopting, co-opting and expanding on Laney’s concepts (Laney, 2012) and over the years, many attempts to append or expand this definition to include characteristics such as Variability, Value, Veracity, and Volatility, among others (Wired.com, 2013; InformationWeek, 2013) can be found.

Kusnetzky (2010) refers to Big Data in the context of a discipline which states that “Big Data” is the specific tools, processes and procedures allowing an organization to create, manipulate and manage very large data sets.

From this one can see the evolution of the Big Data phenomenon, from a vague concept, to something more defined, and now to something approaching a discipline which organizations are currently adopting.

(4)

The profusion of definitions and the ongoing debate over what constitutes Big Data indicates the relative complexity of the subject matter and the need for the concept to mature further. For the purposes of this study, Big Data will be defined as ‘Data and Data Collections of ever increasing Volume, Velocity, Variety, and Complexity, that are used by organizations to operate effectively, make accurate decisions, reduce risks while providing reliable service to customers’. However, as this study does not delve into Big Data directly, this definition will serve to inform the requirements for this Data’s security.

Some parties have identified the privacy concerns which may arise from the potential use of Big Data [16], and a number of research organizations have embarked on security research for Big Data Architecture, but these remain relatively few in number. Some of the related security work will be explored in Section 4.

2.2 An Issue of Trust

Big Data is being used across the broad spectrum of fields and industries, from governance to retail, science to security. Some high profile applications of Big Data include: Retail (Mercier et al., 2013), Genomics (Winney et al, 2012; Donelly, 2014), Governance (Archer, 2013; Joseph and Johnson, 2013; Kim, 2014), Geospatial data (Hemsley, 2013), Health (McGregor, 2013; Celi et al, 2013; Schouten, 2013), Fraud Prevention (Hipgrave, 2013), and Cyber Security (Ponemon, 2013). With this wide scale adoption of the Big Data paradigm comes an awareness of the privacy concerns of using data from certain sources.

Without sufficient thought to keep this data secure across the board, there is a potential for future improper use of this data in future. It may be extremely improbable that this occurs, but with advances in future technologies and analytics capabilities happening at a fast rate, data holds insight and value which was hitherto undiscovered.

With even cyber security utilizing Big Data analytics in a bigger way now than in the past, it may be prudent to consider certain trust issues. Because as it is big data, the feasibility to manually examine each and every data output for its veracity is contraindicated.

Trust has to be in the collections of data (datasets) and the processes used to analyse them. In large corporations, large datasets are being transferred around the globe. The transferal process, which includes details about its transmission and receipt, is part of this trust. The transfer itself needs to be accurate, authenticated and correct.

How can organizations trust conclusions/decisions made on the basis of results obtained from analysis of untrusted data? The short answer is: they can’t. Water drawn from a poisoned well will prove inevitably to be poisoned as well. Garbage in, garbage out. Therefore, assurances and trust that the data and datasets that information is being generated from is not tainted are needed.

2.3 Consequences of a data breach: a justification for improved security

Continued evolution of security is important because, as the utility, nature, and complexity of these systems are constantly changing and increasing, the number of people looking to exploit these types of systems is also increasing. Individuals or parties with malicious intent are constantly on the lookout for new ways to compromise these systems. No rest for the wicked, as it were.

Furthermore, as businesses and the processes in which BI is generated (including BDA) evolve towards a cloud-based model (Liu, 2013; Talia, 2013), security is more often than not, playing catch-up with hackers. With this, it is even more necessary for security technologies to develop to a sufficient maturity to be able to handle or at least mitigate even new unforeseen threats.

The most direct obvious consequence or impact of insufficient capability to meet this threat is financial loss, whether through the cost of investigation or legal penalties paid as a direct result of data

(5)

breaches, even if no data was actually stolen in the attack. Another effect of a data breach is the loss of customer confidence.

In December 2013, Target (Target, 2013) announced that personal information and credit card details of up to 70 million individuals had been compromised. Soon after, Neiman Marcus (CSO, 2014) as well as several other US retailers came forward to announce that they too had experienced a data breach. The following table is an estimate of the cost of the data breach:

Company Data Severity Potential Cost* Method of Breach

Neiman Marcus

July - Dec 2013 (CNBC, 2014)

350,000 payment cards exposed (Neiman Marcus Group, 2014)

Estimated to be $27,300,000

Hack, malware on POS registers Target Dec 2013 Est. 70 million individuals

affected (Target 2014)

Estimated to be $5,460,000,000

Hack, malware on POS registers

Table 1. Estimated cost of a data breach (Estimated costs were calculated based on a Ponemon Institute study on the Costs of a Data Breach (2013) which comes to an average of US$78 per record for the Retail industry)

These incidents may or may not be directly related to Big Data Analytics and/or Business Intelligence itself, however, if the pattern of these attacks is mirrored at a later date in a Big Data environment, the effect of such a breach would be even greater in the long term, as many critical decisions are made on the basis of results obtained during analytics activities.

2.4 Related Work

Although not many academic papers which refer to or propose an architecture for Big Data, there are several data centre and cloud computing related architectural papers which may be of relevance. Several general works describing the conception of Big Data architecture are: the NIST Cloud Computing Reference Architecture, (NIST, 2011), IBM Business Analytics and Optimization Reference Architecture (IBM, 2011), Big Data Ecosystem Reference Architecture (Microsoft, 2011), and Understanding System and Architecture for Big Data (IBM, 2012).

Oracle published its own Information Management and Big Data: A Reference Architecture (Oracle, 2013) in February 2013.

In September 2013, a draft paper which proposed a Big Data Architecture Framework (Demchenko et al., 2013) was made available online by researchers at the Universiteit van Amsterdam. This framework developed with and as part of the NIST Big Data Working Group addresses all aspects of Big Data which includes the aspect of Big Data security most relevant to the proposed study.

The security portion of the framework proposed by Demchenko et al. deals specifically with the new paradigm of data-centric security. They suggest a Federated Access and Delivery Infrastructure (FADI) which include components that support inter-cloud Cloud Service Brokers, Trust Brokers, and Federated Identity Provider services.

They also define Data Centric Access Control Policies which use the attribute-based policy language XACML which allows the description of rules for multi-domain resources which is used in conjunction with NoSQL databases. The paper mandates the use of encryption to enforce its access control and the use of a Trusted Infrastructure Bootstrapping Protocol which has its basis the Trusted Computing Group’s Reference architecture and Trusted Platform Module (TPM). Demchenko et al.’s paper however, does not discuss application security as one of the components of its security framework.

Zhao et al. (2014) proposes a security framework in G-Hadoop for Big Data Computing across distributed Cloud data centres. Their framework uses a master-slave paradigm which allows slave

(6)

nodes to process information without being aware of user information and also a single-sign-on process.

Access control is provided by restricting user access to an SSL connection. The authentication concept of the G-Hadoop security framework proposed here by Zhao et al. follows the same concepts ascribed to by the Globus Security Infrastructure, which is a standard for Grid security. Zhao et al.’s framework utilizes various code, signature, and cryptographic based methods to secure the G-Hadoop application from man-in-the-middle, replay and delay attacks.

However, if the G-Hadoop implementation does not run on a trusted platform it will not assure platform integrity and security and still leave the system vulnerable to attacks that exploits vulnerabilities in the platform.

One more body of related work is the Open Security Architectures (OSA) listed at www.opensecurityarchitecture.org. The Open Security Architecture is an open source security framework developed and owned by the community in accordance with Creative Commons Share-alike. Organizations and developers may pick and choose among control catalogues, visual patterns and assessment methods to tailor-make a security architecture specific to needs.

However, as the OSA is technology agnostic, organizations may have to look further than the OSA to choose appropriate technologies (Christen, 2009) to enable security frameworks. This could result in organizations making less than optimal choices due to a lack of information or experience with the relevant technologies.

2.5 Understanding the legal requirements of Big Data Security

It is important in the socio-political context for applications and architectures for system security to be compliant to existing as well as upcoming legislation, especially in an area as controversial as the collection and analysis of what could very well be personally identifiable information.

As of this paper’s writing, Australia is in the process of implementing the new Privacy Amendment (Enhancing Privacy Protection) Act 2012, which establishes the 13 Australian Privacy Principles (APP) and applies to all organisations and Australian Government (and Norfolk Island Government) agencies from 12 March 2014 onwards.

The United States Government has numerous federal and state legislation which have provisions for data privacy. Among them are, the Privacy Act of 1974, the Privacy Protection Act of 1978 (PPA), the Children’s Online Privacy Protection Act of 1998 (COPPA), the Children’s Internet Protection Act of 2001 (CIPA), the Electronic Communications Privacy Act of 1986 (ECPA), and the Federal Information Security Management Act (FISMA).

The European Union has issued the European Union Data Protection Directive of 1998 and ratified the EU Internet Privacy Law of 2002 (Directive 2002/58/EC) with the express intention of protecting privacy of data. In Switzerland, the protection of Data is governed by the Federal Act on Data Protection (FADP). The United Kingdom has put into force the Data Protection Act 1998 (DPA) in the year 2000.

Meanwhile in Asia, Singapore has the Personal Data Protection Act 2012 (PDPA) which will come into force on 2 July 2014 and Taiwan passed their Personal Data Protection Act 2010 (PDPA) which came into force in October 2012. In Malaysia, their own Personal Data Protection Act 2010 (PDPA) took effect in November 2013.

It is interesting to note that there is no commonly held legislation or treaty on privacy, but what is common among each nation’s data privacy legislation is that the respective organizations and data holders are responsible to exercise due diligence using reasonable methods to secure the privacy of information on penalty of severe consequences. However, what is considered to be “reasonable

(7)

methods” for securing privacy of data is subject to each country’s individual definition. Nevertheless, most countries recognize a universal human “right to privacy”.

No doubt further work will be required to ascertain and ensure that the proposed architecture will be compliant with extant as well as emerging privacy legislation. A comparative analysis of the privacy legislation of four countries will be done as part of the requirement specification of the proposed architecture. Tentatively, it is proposed that the four countries are Australia, the United States, the European Union, and either Taiwan or Singapore.

The reason for the selection of these countries instead of BRIC (Brazil, Russia, India, China) or MIST (Mexico, Indonesia, South Korea, Turkey), where technology off-shoring is growing, is because legislation on privacy and data collection in the selected countries have been given increasing focus in previous years.

3 THE APPROACH – A HOLISTIC SOLUTION

There are many security technologies that are available to the various organizations intending to secure their systems and data against unwanted access. However, the fact of the matter is that each technology will each individually solve only one aspect of the problem. A security architecture, which is a holistic approach that utilizes multiple, intermeshing security technologies, would be in a better position to provide a framework for security against current and emergent threats than piecemeal non-integrated measures.

It is the aim of this research to propose an Open and Trusted security architecture based on an extended Open and Trusted Health Information System (OTHIS) architecture for Business Intelligence and Big Data systems.

The original OTHIS architecture was designed to address privacy and security requirements for Health Information Systems in a holistic manner by these requirements at each level within a general HIS framework to ensure protection of data from internal and external threats (Liu et al, 2008).

3.1 Characterization of OTHIS

OTHIS was characterized as:

 An Open architecture for a holistic approach to Health Information Systems which is compliant with the requirements of health legislation

 Built on top a ‘Trusted computing-base’ or Trusted Platform Module.

 Being a modular system containing three main modules which consider security from the point of view of each of the three states of information:

o Health Informatics Access Control o Health Informatics Application Security o Health Informatics Network Security

The reasons for basing a security architecture for Big Data on the OTHIS architecture are three-fold. Firstly, the OTHIS architecture was designed to handle the privacy and security requirements of the healthcare industry, which places a high premium on keeping the confidentiality and privacy of client information, inviolate, while being compliant with existing health information legislation. This high level of information assurance coupled with ability to handle high volume information will lend itself effectively to Big Data analytics systems.

Secondly, the OTHIS utilizes as foundation a trusted computing-base, also known as the Trusted Platform Module (TPM). Most organizations already possess or are in the process of acquiring hardware which incorporate TPM, and this should help the process of adoption. Furthermore, as the TPM is the recognized industrial standard for trustworthy computing, most major manufacturers and vendors already integrate these chips into their respective professional products.

(8)

Thirdly, the modular nature of OTHIS which recognizes the need to secure and protect data in each of the three states of information, is essential in a case such as Big Data, where information is being transferred, processed and stored at close to real-time speeds. The modules will also provide overlapping fields of security coverage against threats (defence-in-depth).

There are several criteria that can be conceived of as a start to adapt this framework for use in the Big Data analytics/Business Intelligence field:

 Ensure that the architecture must fulfil all regulation and legislative requirements

 Incorporate evolving cloud structures into an extended OTHIS architecture in order to be compatible with industry shift toward cloud-centric models.

 Establish appropriate Authentication and Authorization for use of big data assets by way of well thought out Access Control policies

 Have an appropriate method of monitoring and auditing use of big data assets  Make use of latest network security protocols such as TLS, IPSec, and DNSSec.

 Use appropriate levels of encryption in order to safeguard the privacy of data as required by privacy legislation.

 This architecture must retain enough performance to still be able to run analytic functions expeditiously while maintaining a secure operating state.

4 DISCUSSION

4.1 The Proposed Architecture

(9)

4.1.1 Tier 1 – Governance

It has already been described that compliance with legislation will be a part of the requirement specification for the development of the proposed architecture. Under the new Australian Privacy Amendment (Enhancing Privacy Protection) Act 2012, the civil penalties for organizations or individuals who are found in breach of the Act can be up to 2,000 penalty units ($340,000) for individuals, or up to five times that (10,000 penalty points, or, $1,700,000) for organizations (Australian Parliament, 2012).

It is almost universally recognized that a human element is one of the main components of Information Systems (IS) together with Hardware, Software, Data, and Procedure. Even when every technological aspect of the system is secure and trustworthy, it can still be compromised as a result of human action. Therefore it is vital to establish appropriate guidelines for user behaviour to supplement technological methods for IS security.

4.1.2 Tier 2 – System

The focus for the proposed will be the actual system tier itself. System security is composed of the Operating System (OS) and the Access Control (AC), Application Layer Security (ALS), and Network Security (NS) modules. Access Control is integrated into the OS and interfaces throughout the whole system.

The trusted OS for OTHIS is SELinux (Liu et al., 2009) which uses a Mandatory Access Control (MAC) schema. Due to need for fine layers of granularity, it will not be a pure MAC AC model that will suffice, but a joint one, perhaps Role Based Access Control (RBAC) or Discretionary Access Control (DAC) added to the mix. The OTHIS model was based on an RBAC-MAC hybrid which was suitable to the task for simultaneous control and flexibility when compared to a pure MAC schema. The SELinux kernel also has a sandbox mechanism that provides the ability to separate each of the applications running on the Application Layer in order to prevent alterations whilst still allowing communication of parameters and data between separate computing spaces similar to the concept of virtualization. This is particularly useful given the machine specifications of high performance computing allows different computing spaces for different tasks and means that even if one part of the system is compromised, it does not provide ingress to another area of the system facilitating quick fail recovery.

Outside Access Control, the Application Layer also requires its own security mechanisms. Firstly, certain cryptographic methods such as a “keyed hash” function and requiring the use of a “message authentication code”. Liu et al (2008) believes that although there may be a performance penalty because of the necessary complexity of using public key cipher schemes in a situation where maintaining record confidentiality for multiple access roles, advances in hardware since then will have made the performance issue negligible.

Other Application Layer Security mechanisms that could be implemented is the classification of how securely data should be held, what could be potentially called Sensitivity Labelling. Oracle (2003) has an implementation of what they call Label Security which discusses this particular concept coupled with Discretionary Access Control. However, Oracle’s implementation is at database level to provide label security at cell or role level. Perhaps a mechanism to label/classify/categorize the sensitivity of the different types of data in Big Data can be discovered.

It is further proposed that a modified Digital Signature could be used to provide nonrepudiation at the user side by not only having the sender digitally sign, but the authorized recipient to sign. This has the benefit of requiring both sender and recipient to acknowledge dispatch/receipt of files but increases complexity by requiring two separate but uniquely verifiable digital signatures. This of benefit for the secure communication of datasets/information for providers-to-users as well as provider-to-provider.

(10)

OTHIS currently does not include DNS authentication mechanisms and further work is required in extending the Network Security module to include mechanisms to verify and confirm that HTTP Servers are who they say they are, and allow for this architecture’s use in Cloud-based environments. 4.1.3 Tier 3 – Platform

As mentioned in Section 4.3.2, Tier 2 is the actual focus of the study proposed. In brief, Tier 3 is composed of the trusted firmware and hardware that comprise the Trusted Platform Module of the trusted computing platform. The workings of the Trusted Platform Module and associated hardware is outside the scope of the proposed study at this time.

4.2 Verification method for architecture

Construction of a physical prototype of the architecture would be attempted, as far as possible, within the scope of resources available, but it is probably that the proposed architecture will be verified through use of “codelets”.

Codelets are, in essence sections of code which would demonstrate the feasibility, manageability and scalability of the technologies involved that amplify or make the architecture possible while still require far less, in terms of both human resource and funding.

5 CONCLUSION

In conclusion, before any further inferences can be made, further investigation, confirmation, artefact-building and, evaluation will have to be undertaken to determine the reasonableness, feasibility, manageability, scalability and performance of the proposed architecture as this just the beginning of this research undertaking and as such a working paper.

The significance of this study will be demonstrating whether or not there is a sustainable method for establishing trust in Big Data analytics systems using architectural methods with the current state of technology from the perspectives of both user and provider. At the end of the day, this study will contribute to enable an acceptable level of trust for those who use Big Data analytics in decision making whilst maintaining compliance with respective privacy legislation.

References

Archer, Glenn & Australia. Department of Finance and Deregulation (2013). Big data strategy issues paper. Dept. of Finance and Deregulation, [Canberra, A.C.T.] Retrieved

from:http://www.finance.gov.au/files/2013/03/Big-Data-Strategy-Issues-Paper1.pdf

Biehn, N. (2013) The Missing V’s in Big Data: Viability and Value. WIRED.com Retrieved from: http://www.wired.com/insights/2013/05/the-missing-vs-in-big-data-viability-and-value/ Big Data Analytics in Cyber Defense (2013) Ponemon Institute Retrieved from:

http://www.ponemon.org/local/upload/file/Big_Data_Analytics_in_Cyber_Defense_V12.pdf Celi, Leo Anthony, MD,M.S., M.P.H., Mark, Roger G,M.D., PhD., Stone, D. J., M.D., &

Montgomery, R. A. (2013). "Big data" in the intensive care unit: Closing the data loop. American Journal of Respiratory and Critical Care Medicine, 187(11), 1157-60. Retrieved from

http://search.proquest.com/docview/1446322712?accountid=13380

Christen, T. (2009) OSA Case Study: Applying OSA to Construct a High-Security B2C Internet Service. CTO DSwiss Ltd. Retrieved from:

www.opensecurityarchitecture.org/cms/media/OSA_Case_Study_DataInherit_2009-Oct.pdf Demchenko, Y., et al. (2013). "Architecture Framework and Components for the Big Data

(11)

Donelly, P. (2014) Genetics and geography: using big data in genomics to understand the history of the people of the British Isles –Abstract. Proceeds of Statistical Modelling and Analysis of Big Data Workshop. QUT Retrieved from:

http://nanos2013.qut.edu.au/downloads/big_data_workshop_abstracts.pdf

Gartner (n.d.) IT Glossary > Big Data Retrieved from: http://www.gartner.com/it-glossary/big-data/ Gattiker, A. E., Gebara, F. H., Gheith, A., Hofstee, H. P., Jamsek, D. A., Li, J., ... & Wong, P. W.

(2012). Understanding system and architecture for big data. IBM research. Retrieved from: http://researcher.ibm.com/files/us-jianli/IBM.Research.Big.Data.Terabyte.sort.pltechreport.pdf Grimes, S. (2013) Big Data: Avoid ‘Wanna V’ Confusion. InformationWeek.com Retrieved from:

http://www.informationweek.com/big-data/big-data-analytics/big-data-avoid-wanna-v-confusion/d/d-id/1111077?

Hemsley, P. (2013). Newman exposure yields spatial data. Government News, Vol. 33, No. 3, Jun 2013: 8. Retrieved from:

http://search.informit.com.au/documentSummary;dn=415461916158961;res=IELBUS> ISSN: 1447-0500

Hipgrave, S. (2013). "Smarter fraud investigations with big data analytics." Network Security 2013(12): 7-9. Retrieved from:

(12)

IBM Business Analytics and Optimization Reference Architecture (IBM, 2011) IBM Retrieved from: https://www.ibm.com/developerworks/community/files/basic/anonymous/api/library/48d92427-

47d3-4e75-b54c-b6acfbd608c0/document/aa78f77c-0d57-4f41-a923-50e5c6374b6d/media/BAO%20Reference%20Architecture.pdf

Information Management and Big Data: A Reference Architecture (2013) Oracle White Paper Retrieved from: http://www.oracle.com/technetwork/topics/entarch/articles/info-mgmt-big-data-ref-arch-1902853.pdf

Joseph, R. C. and N. A. Johnson (2013). "Big Data and Transformational Government." IT Professional 15(6): 43-48.

Kim, G.-H., et al. (2014). "Big-data applications in the government sector." Commun. ACM 57(3): 78-85.

Kusnetzky, D. (2010) What is “Big Data”? ZDNet.com Retrieved from: http://www.zdnet.com/blog/virtualization/what-is-big-data/1708

Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety. META Group Retrieved from: http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf

Laney, D. (2012). Deja VVVu: Others Claiming Gartner’s Construct for Big Data. Retrieved from: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/

Levin, O. (2011) Big Data Ecosystem Reference Architecture. Microsoft Corporation. Retrieved from: http://bigdatawg.nist.gov/_uploadfiles/M0015_v1_1596737703.docx

Levinger, J.E., Needham, P., Pesati, V., Srividya, T. (2003) Oracle Label Security. Oracle Corp. Retrieved from: http://docs.oracle.com/cd/B12037_01/network.101/b10774.pdf

Liu, F., Tong, J., Mao, J., Bohn, R., Messina, J., Badger, L., & Leaf, D. (2011) NIST Cloud Computing Reference Architecture. National Institute of Standards and Technology. U.S. Department of Commerce. Special Publication 500-292 Retrieved from:

http://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505

Liu, H. (2013). "Big Data Drives Cloud Adoption in Enterprise." IEEE Internet Computing 17(4): 68-71.

Liu, V., Franco, L., Caelli, William, May, L., & Sahama, T. (2009) Open and trusted information systems/health informatics access control (OTHIS/HIAC). In Brankovic, Ljiljana & Susilo, Willy (Eds.) Seventh Australasian Information Security Conference (AISC 2009), Australian Computer Science, Wellington, New Zealand, pp. 99-108. Retrieved from:

http://eprints.qut.edu.au/28148/

Liu, V., May, L.J., Caelli, W.J., & Croll, P.R. (2007) A Sustainable Approach to Security and Privacy in Health Information Systems. In Toleman, Mark, Cater-Steel, Aileen, & Roberts,

Dave (Eds.) 18th Australasian Conference on Information Systems, 5-7 December, Toowoomba, Qld, Australia. Retrieved from: http://eprints.qut.edu.au/11192/

Manadhata, P. Big data for security: challenges, opportunities, and examples, ACM. Mashey J. R. (1997-1998) Big Data and the Next Wave of Infrastress. Retrieved from:

http://static.usenix.org/event/usenix99/invited_talks/mashey.pdf

McGregor, C. (2013). "Big Data in Neonatal Intensive Care." Computer 46(6): 54-59.

Mercier, K., Richards, B., Shockley, R. (2013) Analytics: The real-world use of big data in retail. IBM Institute for Business Value. Retrieved from: http://ibm.co/14Eeo4C

Neiman Marcus Group Press Release (2014) Neiman Marcus Group Retrieved from: http://www.neimanmarcus.com/en-au/NM/Security-Info/cat49570732/c.cat

New IDC Worldwide Big Data Technology and Services Forecast Shows Market Expected to Grow to $32.4 Billion in 2017 (2013). IDC Retrieved from:

http://www.idc.com/getdoc.jsp?containerId=prUS24542113

Niccolai, J. (2014) US retailer Neiman Marcus notifying customers after card data breach. CSO.com.au Retrieved from:

http://www.cso.com.au/article/535699/us_retailer_neiman_marcus_notifying_customers_after_car d_data_breach/

(13)

Ponemon Cost of a Data Breach (2013) Symantec and Ponemon Institute Retrieved from:

http://www.symantec.com/about/news/resources/press_kits/detail.jsp?pkid=ponemon-2013&om_ext_cid=biz_socmed_twitter_facebook_marketwire_linkedin_2013Jun_worldwide_Cos tofaDataBreach

Popper, N. (2014) Breach at Neiman Marcus went undetected from July to December. CNBC Retrieved from: http://www.cnbc.com/id/101344172

Schouten, P. (2013). Big data in health care solving provider revenue leakage with advanced analytics. Healthcare Financial Management, 67(2), 40-42. Retrieved from

http://search.proquest.com/docview/1313504786?accountid=13380

Talia, D. (2013). "Clouds for Scalable Big Data Analytics." Computer 46(5): 98-101.

Target Confirms Unauthorized Access to Payment Card Data in U.S. Stores (2013) Target Press Release Retrieved from: http://pressroom.target.com/news/target-confirms-unauthorized-access-to-payment-card-data-in-u-s-stores

Target Provides Update on Data Breach and Financial Performance (2014) Target Press Release Retrieved from: http://pressroom.target.com/news/target-provides-update-on-data-breach-and-financial-performance

Weiss, S. M. (1998). Predictive data mining: a practical guide. Morgan Kaufmann.

Winney, B., Royrvik, E. C., Tonks, S., Yang, X., Cheshire, J., Longley, P., . . . Nicodemus, K. (2012). People of the British Isles: Preliminary analysis of genotypes and surnames in a UK-control population. European Journal of Human Genetics: EJHG, 20(2), 203-210.

doi:10.1038/ejhg.2011.127 Retrieved from:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3260910/

Zhao, J., et al. (2014). "A security framework in G-Hadoop for big data computing across distributed Cloud data centres." Journal of Computer and System Sciences.