FUTURE DEVELOPMENT - Networking for Big Data Chapman pdf

The data volumes created each year grow exponentially. They reached 2.8 zettabytes in 2012, a number that is as gigantic as it sounds, and will double again by 2015 [28]. The technologies to process these amounts of data have to scale and supercomputers have been emerging to provide the computing power needed. Also real-time data analysis is becoming increas- ingly important. Hadoop is batch oriented and a simple query might take minutes to return and thus is not suitable for real-time operations. The real-time computation system Storm, acquired by Twitter and now an open source project at the Apache Foundation, was developed to process unbounded streams of data and can be used with any programing language [29]. It is fault tolerant, scalable and can process one million tuples per second and node.

To provide the needed performance in-memory databases, also called memory resident databases, have been developed. They primarily use a computer’s main memory for data storage rather than the slower disk storage subsystem. They are used for applications where response time is critical like in real-time analytics.

Similarly in-memory distributed data grids use data caching mechanisms to improve performance and scalability. For instance, the Hadoop MapReduce engine can be cached into memory for fast execution. New caching nodes can be added if more processing power is needed.

Complex Event Processing (CEP) is a method for tracking and analyzing data streams for events that are happening by combining data from multiple sources. It is used to iden- tify events such as opportunities or threads. The large amounts of information about events available is called the event cloud. By analyzing and correlating events, complex events can be discovered. CEP is used in fraud detection, stock-trading, and business activity and security monitoring.

BigData has also been moving to the cloud offering data analysis in a data science as a service paradigm (DSaaS). DSaaS lets users focus on the analysis task without being concerned by the underlying platforms or technologies. One BigData cloud solution is Google’s BigQuery [30]. It lets the user upload the data into BigQuery and analyze it using SQL-like queries. BigQuery can be accessed through the browser, a command-line tool or the Representational State Transfer Application Programming Interface (REST API) using the Java, PHP, or Python programming language.

As cloud computing has become a mainstream trend in computing, it is expected to see more cloud-based BigData solutions in the near future.

REFERENCES

1. V. Mayer-Schonberger, and K. Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think, New York, USA: Houghton Mifflin Harcourt Publishing Company, 2013. 2. A. Twinkle, and S. Paul, Addressing big data with Hadoop, International Journal of Computer

3. D. Klein, P. Tran-Gia, and M. Hartmann, Big Data, Informatik-Spektrum, 36(3), 319–323, 2013. 4. J. O. Chan, An architecture for big data analytics, Communications of the IIMA, 13(2), 1–13,

2013.

5. Y. S. Tan, J. Tan, E. S. Chng, B.-S. Lee, J. Li, S. Date, H. P. Chak, X. Xiao, and A. Narishige, Hadoop framework: Impact of data organization on performance, Software: Practice and Experience, 43(11), 1241–1260, 2013.

6. J. Dittrich, S. Richter, and S. Schuh, Efficient or Hadoop: Why not both? Datenbank-Spektrum,

13(1), 17–22, 2013/03/01, 2013.

7. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, Dynamo: Amazon’s highly available key-value store, SIGOPS Operating Systems Review, 41(6), 205–220, 2007.

8. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems, 26(2), 1–26, 2008.

9. I. Tomasic, A. Rashkovska, and M. Depolli, Using Hadoop MapReduce in a multiclus- ter environment. 36th International Convention on Information & Communication Technology Electronics & Microelectronics (MIPRO), http://ieeexplore.ieee.org/xpl/login. jsp?tp=&arnumber=6596280&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all. jsp%3Farnumber%3D6596280, 345–350, 2013.

10. S. Richter, J.-A. Quiané-Ruiz, S. Schuh, and J. Dittrich, Towards zero-overhead static and adap- tive indexing in Hadoop, The VLDB Journal, 23(3), 469–494, 2014/06/01, 2014.

11. K. Shvachko, K. Hairong, S. Radia, and R. Chansler, The Hadoop distributed file system. IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5496972&url=http%3A%2F%2Fieeexplore.ieee. org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5496972, 1–10, 2010.

12. A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. S. Sarma, R. Murthy, and H. Liu, Data warehousing and analytics infrastructure at facebook, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Indianapolis, Indiana, USA, 2010, pp. 1013–1020.

13. J. Vijayan, Hadoop works alongside RDBMS, Computerworld, 45(15), 5–5, 2011.

14. P. Zikopoulos, D. deRoos, K. Parasuraman, T. Deutsch, J. Giles, and D. Corrigan, Harness the Power of Big Data: The IBM Big Data Platform, McGraw-Hill Osborne Media, New York, USA, 2012.

15. “Welcome to Apache™ Hadoop®_{!,” 19. September 2014, 2014; http://hadoop.apache.org/.}

16. J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, OSDI

2004.

17. MapR. MapR Direct Access NFS, https://www.mapr.com/sites/default/files/mapr-tech-brief- direct-access-nfs-2.pdf.

18. W. Frings, and M. Hennecke, A system level view of Petascale I/O on IBM Blue Gene/P,

Computer Science—Research and Development, 26(3–4), 275–283, 2011/06/01, 2011.

19. B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2nd ed., Heidelberg: Springer, 2011.

20. I. H. Witten, E. Frank, and M. A. Hall, Data Mining, 3rd ed., Burlington, MA, USA: Elsevier, 2011.

21. A. Dasgupta, Y. V. Sun, I. R. König, J. E. Bailey-Wilson, and J. D. Malley, Brief review of regres- sion-based and machine learning methods in genetic epidemiology: The Genetic Analysis Workshop 17 experience, Genetic Epidemiology, 35(S1), S5–S11, 2011.

22. SPSS Modeler, 18 September, 2014; http://www-01.ibm.com/software/analytics/spss/products/ modeler/.

Data Process and Analysis Technologies of Big Data ◾ 119

23. SAS Enterprise Miner, 18 September, 2014; http://www.sas.com/en_us/software/analytics/ enterprise-miner.html.

24. STATISTICA Features Overview, 18 September, 2014; http://www.statsoft.com/Products/ STATISTICA-Features.

25. RapidMiner, 18 September, 2014; http://rapidminer.com/products/rapidminer-studio/. 26. Weka 3: Data Mining Software in Java, 18. September, 2014; http://www.cs.waikato.ac.nz/ml/

weka/.

27. Mahout, 19 September, 2014; https://mahout.apache.org/.

28. P. Tucker, Has big data made anonymity impossible? MIT Technology Review, 116(4), 2013. 29. Storm, 19 September, 2014; https://storm.incubator.apache.org/.

121

C h a p t e r

7 Network Configuration

In document Networking for Big Data Chapman pdf (Page 140-144)