Table of Contents
Page Break
1. Introduction
It is important for security functions to maintain the heterogeneous composition of diverse software, network domains, and operating systems to use big data. The effective big data deployment secure systems is enabled by Software-defined networking techniques. Big data phenomenon is defined by three dimensions, such as variety, volume, and velocity, which are included in this chapter. Several challenges regarding vulnerability of data in business and access controls within the distributed frameworks will be discussed in the study. Moreover, this study will explain about the main issues of Big Data comprising the data security and access controlling along with real time security compliances. Organizations should deal with the petabyte-scale collection of data, which collected from transaction history, clickstream, sensors, and others. Existing tools are unable to process this data easily; unique technologies are used to process this data. Big data is used in such a way that it supports real-life profitable or beneficial outcomes which is a key to understanding this data. Big data is mainly three types such as structured, semi-structured, and unstructured. This section has been described as the characteristics of big data. Five V’s concept is also included in this chapter. The study will also assist to improve the security and other challenges regarding the implementation of Big Data.
Project Background
The quantity of data is rapidly increasing day by day along with increasing utilization of Internet, social network and smart phone. The Big Data is a vast collection of sets of complex data which generally come in Petabyte and Exabyte sizes. Big Data can be seen in various businesses and finance where numerous amout of banking, stock exchanges, onsite and online purchasing of data flows through automated systems in regular basis. These data are then gathered and recorded for the monitoring of market behavior, customer behavior and inventories. There are six essential and processing stages involved in Big Data and they are Data Visualization, Data Structuring, Data Collation, Data Interpretation, Data Extraction and Data Acquisition (Chaudhari and Srivastava 2016). Apaaart from these mitigational processings of this technology, there are certain challenges of Big Data regarding the data encryption and security. The numerous challenges are as follows:
The intial challenge for enterprises is to select an important and relevant data. With such extreme volumes of data, it becomes very essential for enterprises to able to isolate relevant.
The second challenge is that majority of data pointrs are not connected within the organizations and this issue of connectivity is a extreme hurdle. Therefore, this technology is all about the connections from the transactional points.
In order to leverage Big Data, an individual has to work across several departments like Finance, Engineering and IT. Thus, the procurement and ownership of this data has to be a supportive attempt across these departments. Therefore, it proves to be a remarkable organization challenge (Securitymagazine.com 2020).
There is a security angle associated to collection at Big Data. Therefore, it is one more significant challenges to organizations from retrieving complete benefits of Big Data analysis.
Aim/Objectives
The research aims to evaluate the data security issues involved with big data. Also, the research tries to find out the solutions for overcoming the issues faced while using big data.
Research Objectives
To identify the security issues involved with big data
To recommend solutions for overcoming the issues involved with big data
Research questions
In this research study, it will have certain aim on the following research questions:
How to increase the security of data recorded in the Big Data by utilizing digital signature and authentication of data subsets?
How to mitigate the challenges in Big Data and implement a robust access control plan before initiating data including encryption and decryption techniques by legitimate user?
How can the challenges of Big Data can be identified to mitigate them effectively?
Page Break
2. Literature Review
2.1 Overview of Big Data
In today’s knowledge and technology-driven society, data is treated as the most crucial resource of an organization. Collection and processing big and large data sets are difficult as this information comes from heterogeneous, autonomous, and multiple sources. These datasets are measured by Exabyte. The term big data is normally used to evaluate and compare enormous data sets. Big data consists of most of the shapeless data, that requires more real-time analysis (Manoj and Krishnan 2020). Client reviews on the websites of commercials, remarks on social networking sites, electronic medical records, bank records, photos and videos posted online are examples of Big data. The structure of big data are three types such as unstructured, structured and semi-structured.
Structured data: structured data can be easily b analyzed and categorized such as number and words. Structerd data contains sales figures, account balances and information of tarnsactions. Structured data is involved in the maintenance of data within a standard database. This data is generated due to network season embedded in global positioning system devices, electronic devices, and smartphones
Semi-Structured Data: It is one type of structured data that tries to not follow explicit and fixed design. To implement hierarchies of fields and records within the data is inherently self-describing and consists of tags. Weblogs and social media feed are examples of semi-structured data (Granberg and He 2018).
Unstructured Data: complex information such as clients reviews from commercials; multimedia, comments in the sites of social networking, and websites are presented in unstructured data. It is tough to separate these data into categories and conduct a numerical analysis process. Big data is involved in the maintenance of large data which is mainly heterogeneous. Big data is characterized into three categories such as volume, variety, and velocity.
Page Break
2.2 Characteristics and applications of Big Data
The volume of big data is involved in the evaluation of the generation and collection of large data. The volume of the data is also rapidly growing due to the burst data generated by machine (sensor data, web log files, data record), human involvement in social media is increased now that involves the development of data volume. Analysis of big data depends on the lots of data; variety contains all types of data. The generation and delivery of big data are characterized by a frequency of volume.
Big data is a new idea and individuals, researchers, and organizations are providing numerous definitions of big data. Three V’s such as volume, velocity and variety are articulated by industry analyst Doung Laney that clears the performance and principle of big data. Therefore, variability and complexity are added by the Statistical Analysis System. In 2014, data Science Central Kirk Borne has evaluated that big data consist of 10 V’s such as volume, velocity, variety, variability (Li 2017). Value, validity, vocabulary, vagueness, venue, veracity. Management of big data depends on these characteristics, however, few gaps still exist which need to be solved to get a better insight into the area. Five V’s of big data is described below:
Volume: Unimaginable amount of information is generated from social media, M2M sensors, images, videos, cell phones and credit cards which are considered as a volume of big data. To store data in several locations distributed systems are used now.
Veracity: Veracity is mainly to evaluate the reliability of the collected data. Big data need to implement alternative ways to filter the unstructured and irrelevant data as data is a valuable part for business. The veracity process is involved in the analysis of captured data and maintains its accuracy.
Value: Value is a major issue that is related with reliability of collected data. Mainly business value is derived from big data by this process. Importance and trustworthiness of data is maintained by value.
Velocity: The processing speed of data is maintained by velocity. Velocity is involved in the enhancement of capability, understanding, and responding to events. For example, real-time traffic information is provided by Google Maps. Google Map provides this facility by analyzing the speed of mobile using the Google Map on road.
Variety: Variety is involved in the evaluation of different blinds of data such as picture, video and other multimedia. The nature of structured and unstructured data is evaluated by variety. A variety of unstructured data increases issues for storage, analyzing and mining of data.
Application of Big data
Big data is a wide-spread technology that is used in almost every business sector. Application of big data is described below:
Big data technology is randomly used in the travel and tourism industry. Requirement if travel facilities in many places and the enhancement of business are maintained by big data technology.
Big data technology helps banks and other financial organizations to understand customer behavior (Atillen and Robles 2019).
Healthcare sectors are already enhanced by big data technology. Personalized healthcare services are provided to individual patients with the help of medical professionals, health care personnel and predictive analytics.
2.3 Tools and Methods of Big Data
Different kinds of tools and techniques are involved in the maintenance of data such as google big table, simple DB, data stream management, Voldemort, mem cache DB. However, these technologies are applicable for traditional data, not big data as this data cannot be stored in a single machine. Hadoop, Big table and MapReduce technologies are mainly used in big data handling. Among these all technologies Hadoop technology is used widely.
Hadoop: Hadoop is written in Java and it is an Apache open source framework. Large data is easily processed and categorized with the observation of Hadoop technology. Hadoop technology consist of three layers among them File System distributed by Hadoop and MapReduce are considered a major layer.
HDFS: The file System Distributed by Hadoop is a storage system, that used to store very large files with running on cluster on commodity hardware and streaming data access patterns. HDFS cluster is conducted with the presence of two main nodes such as name node and data nodes. The tree of file system and metadata for all directories and files in the tree are maintained by name node. On the other hand, data node is involved in the data storage and retrieves blocks, data node is conducted according to the instruction of clients or name node.
MapReduce processing/Computation Layer: MapReduce process is a programming paradigm, this process mainly implemented to manage applications and multiple distributed servers. Large complex data is broken into small units and processed by MapReduce process. It can read data from several places including, the web, database and mounted local file system (Clarke et al. 2016). The MapReduce process divides the computations between the different servers or nudes. Reassigning the incomplete work to another node is conducted by this process. Mainly it is scheduling and cluster resource management.
2.4 Issues and challenges with Big Data
Volume of data collection, storing and processing is growing day by day, which produces new challenges in terms of data security. Now widely used mechanisms of security such as DMZs and firewalls cannot be used in the infrastructure of big data. Challenges in big data are mainly divided into two categories such as engineering and semantic. Data management activities such as storage efficiently and query are considered as engineering challenges. Meaning of information from large and unstructured data is determined by semantic challenges. Security related aspects such as controlling and protection aspects are considered as a challenge of big data. Collected data consist of sensitive information and financial records due to this factor security are a major challenge. Complexity of big data is treated as a major challenge (Suharto and Fasa 2016). Big data collected from comments, videos, images and other non-numeric sources, challenges are increasing regarding storage, capturing, search, visualize and analysis. Main complexity is how it casn be understand and analyze collected data. Big data is mainly generated from social media and sensor networks, now, finding the useful data from unwanted data is a major challenge for big data. To mitigate this problem it is important to filter all collected data, separation of essential data from collected data is a time consuming process. Lack of proper infrastructure is a major cause of issues in big data. The big data security objectives are the same as other different types of data to preserve its integrity, availability, confidentiality. Security and privacy related challenges are increased due to the complexity of big data.
2.5 Summary
Big data technology introduces new technologies, which are involved in data harvesting and extracting the value of data. Traditional technologies are unable to process, manage, and capture big data. Wand unique technologies are provided by big data platforms to extract insight out of various, voluminous and velocity of data. Big data technology is used in various aspects such as travel and tourism, banking and financial industries, healthcare center and telecommunication industry, information technology and manufacturing industry. MacReduce, Hadoop and Big Table technologies used to handle big data, massive volume of data is evolved by these technologies. Big data is mainly collected from social media, audio, video and others. Semi-structured and unstructured data is available in raw format. Now-a-days companies are faced with challenges regarding big data implementation. Major cause of these problems is complexity of data.
This study also described the challenges over the big data. Literature review of this study provides a clear idea about the process of big data. This study will be very helpful to perform big data operations. After the analyzing of this study, it can be said that security is the biggest problem of big data. It is necessary to concentrate on the security part of big data as well as use proper technologies to perform big data analysis.
Page Break
Research Methodology
Research approach
The approach of research of this study will assist the researcher to identify the data mining techniques and is classified into two categories that is deductive and inductive research. The deductive research will be regarding the development of the data mining strategy within the organization based on the existing news, and theories, whereas, the inductive approach will deliver the scope of developments in the project within the organization.
Research design
The research design is the state of the techniques in order to determine the associated variables with the research problems for finding out the answers to the research questions. Therefore, in this study of research, the exploratory design will be followed in order to develop a connection between two variables regarding the strategies that used in the organization.
Research strategy
The research strategy is the technique to process and inspecting the problem of research regarding the poor water quality. The Ethnographic strategy will be mostly utilized in order to gather various case studies and, articles, relevant and authenticate journals.
Data Collection
In this study of research, the collection of data is the technique of retrieving relevant and authentic information and data sources. Therefore, to meet the research objectives, the collection of information is mandatory to know the real scenarios of Capital One’s data mining techniques and strategies. Therefore, primary and secondary techniques will be followed to the gather the information. The secondary data will be collected from several articles, journals, magazines and site portals.
Page Break
Sources
Journal:
Chaudhari, N. and Srivastava, S., 2016, April. Big data security issues and challenges. In 2016 International Conference on Computing, Communication and Automation (ICCCA) (pp. 60-64). IEEE.
Magazine:
(2020) Securitymagazine.com. Available at: https://www.securitymagazine.com/articles/84461-top-10-big-data-security-and-privacy-challenges-report-released (Accessed: 18 July 2020).
News
6 Big Data Security Issues for 2019 and Beyond (2019). Available at: https://rtslabs.com/6-big-data-security-issues-for-2019-and-beyond/ (Accessed: 18 July 2020).
Page Break
References
Manoj, M.K. and Krishnan, S.S.R., 2020. Decentralizing Privacy Using Blockchain to Protect Private Data and Challanges With IPFS. In Transforming Businesses With Bitcoin Mining and Blockchain Applications (pp. 207-220). IGI Global.
Granberg, M. and He, D., 2018. The Future of Big Data Analysis in Facility Management-A Study of Implementation areas (Master's thesis).
Li, Q., 2017. Citywide Time-dependent Grid-based Traffic Emissions Estimation and Air Quality Inference Using Big Data. The University of Wisconsin-Madison.
Atillen, F. and Robles, A.C.M.O., 2019. Challanges and Concerns in Attaining Sustainable Development Goals: Basis for SDG Advocacy Framework in South Cotabato. Asian Journal of Multidisciplinary Studies, 2(1).
Clarke, P., Coveney, P.V., Heavens, A.F., Jaykkä, J., Joachimi, B., Karastergiou, A., Konstantinidis, N., Korn, A., Mann, R.G., McEwen, J.D. and de Ridder, S., 2016. Big Data in the physical sciences: challenges and opportunities. Alan Turing Institute.
Suharto, S. and Fasa, M.I., 2017. The Challanges of Islamic Bank for Accelerating the Growth of Micro, Small and Medium Enterprises (MSMEs) in Indonesia. Li Falah: Journal of Islamic Economics and Business Studies, 2(2), pp.1-19.
0 comments:
Post a Comment