Big Data Collection Study for Providing Efficient Information
Jun-soo Yun, Jin-tae Park, Hyun-seo Hwang and Il-young Moon
Computer Science and Engineering, Korea University of Technology and
Education, 1600, Chungjeol-ro, Byongcehon-myen, Dongnam-gu, Cheonan-si,
Chungcheongnam-do, Korea
[email protected]
Abstract
Many data on smartphones and Internet development is communicated indiscriminately, thereby undergoing a rapid increase in data traffic. Especially in social commerce, the enormous information, the user becomes difficult yourself to find the necessary information. Even a company that provides the service, became difficult to be provided to understand the user needs information. Data that is not classified as an increase in senseless amount of data has become a new problem of this age. And is only collect the required data through the analysis, if the exact customized service is provided, it will be able to resolve the data traffic problems. Therefore, in this paper, we describe an efficient method of collecting big data to you customized advertising service users.
Keywords: Big Data, Social Commerce, Data Traffic, Smartphone
1. Introduction
Defines the explosion of modern primary data, it is "the birth of the Internet." The Internet provides the frame of the world's information exchange, these frames, replication, the work piece, was indiscriminate mass-produced, such as secondary information. Now, we came to an explosion of secondary data [1]. Causes the current data explosion is the generation of the data associated with the diversification and communication of the source of the data source. Social networks and social commerce is because the reason that was in the heart of the Internet on the big data also began to face book, and mass production data of the relationship between each user, such as Twitter [2]. In addition, survey results that monthly mobile data usage was a year 2.5EB 2014 increases 10 times Gakahi in 24.3EB to 2019 the end of the year was out[3][4]. And, as the existing data management and analysis methods, huge amounts of data can’t afford is generated [5]. Therefore, as the amount of data increases rapidly, not only require a technique for classifying data, it is necessary services that can easily find the data to users who are using the Internet [6]. In developed countries already big data such as the United States and Europe, companies of course, is raising the heat in interest and investment in order to obtain useful information and value to society and humanity [7, 8]. Companies can analyze the user's preference. It is possible to target marketing, it is necessary to improve the quality of service [9, 10]. Therefore, in this paper, we describe how to efficiently collect the big data needed to analyze a myriad of data.
2. Related Research
2.1. AmazonAmazon dot com, is an international e-commerce company headquartered in the United States of Seattle, Washington, is the largest online shopping brokerage company world. Amazon dot com, was started in the online bookstor e, was
advanced to the various fields through a new business expansion in 1997. The big data analysis in the Amazon, cloud computing technology is used. A cloud computing, by integrating the IT resources huge cloud server is a computer resources only individuals require a concept utilizing online. Amazon cloud drive is the account, it is possible to register the device up to 8, and provides services registered devices only, regardless of the type of device, any device if you start only Web browser, but it is possible to be connected to the storage, the Android applications, to provide. Amazon has maintained the highest level of customer satisfaction in the e-commerce industry through this way thoroughly business development of customer-centric. Also Amazon, using the AWS (Amazon Web Services). Amazon web services, save the Amazon of information in any such company, to be able to run the software on Amazon's computer. Companies may be paid only capacity used when using cloud services. The system is flexible en ough to be handled reliably, fast response speed, and a huge variation occurs in the traffic. Figure 1 shows the RDS in the Amazon.
Figure 1. Amazon RDS 2.2. Google
Google web search a US multinational companies with cloud computing, advertising the main business area. Google is using the "Google Cloud Dataflow (below, GCD) "Big data analysis service called. GCD is, in Google's cloud platform and is linked to the available big data analysis services, can be processed a large amount of data in two formats of the "batch mode", "streaming mode". The basic concept of the GCD, in trading the big data, optimization, distribution, scheduling, user me in charge of all the peripheral functions such as monitoring GCD is, what you called to be able to concentrate on the big data analysis applications. Figure 2 shows the customized advertising services of Google.
Figure 2. Google Customized Advertise 2.3. eBay
eBay is a company of multinational Internet C2C of the United States. In eBay's can take advantage of predictive analytics based on SAP HANA, expanding seller of business, reducing the burden of business analysts, are working to growth of eBay marketplace. SAP HANA is a platform for real-time analysis and application drive. In next-generation solution for real-time business that can be queried and analyzed by all users and business insight required a large volume of data that contains detailed information on the moment the business transaction has occurred enterprise is there. SAP HANA may be 500 index the best model and analyzed automatically select, with 100% accuracy and 97% reliability, and to determine the actual positive responders. Thanks on eBay analysts of automation system of reliable detection is to ensure the time to invest in strategic business, eBay can f ocus on us in conjunction with items that buyers need. Figure 3 shows the predictive analysis service of eBay.
Figure 3. Ebay Forecast Analysis 2.4. Kakao
Kakao is a company a global free mobile messaging service of the following cacao has to offer. Kakao is assembled separately and provides a service which is a content recommendation service “kakao Topics." That they kakao topic, magazines, web magazine, news, community, the contents of the various sources and themes such as SNS according to the personal interests of the user, can be seen easily in one place by single quotes. In particular personalization, based on the automatic algorithm social filtering is reflected, and then analyze the problem that is currently the topic, can you recommend a customized personal content. Also, cacao topic also provides a variety of useful features so that the user can enjoy a more convenient content. First, "greeting" using the function, it is possible to configure their list to collect the content of interest. Also share with friends, if you want to share certain content, and press the bottom of the Share button at the bottom or body pages of content card, KakaoTalk, cacao story, such as through Facebook, the content ,can also be. This addition to cocoa content simultaneously display friends to attention, set the interest keyword, such as comments, is planning to add provide a convenience feature that allows you to further enhance the social and personalization. Figure 4 shows the kakao pick.
Figure 4. Kakao Topic
3. Main Discourse
In order to collect efficiently big data, we have developed a mobile social advertising platform to send the surrounding commercial area advertisements and coupons, based on the position information of the user. Figure 5 shows a development platform. For users to use social advertising platform, to collect data for multiple users, analyze, and provide customized service based on the results. Figure 6, in order to collect data, tables in the database that can store interest for each user is a view that is designed to be configured. The table in the database that is configured, it is possible to make to store only the data that can be analyzed interest, it is possible to shorten the time for further analysis, it is possible to reduce the storage space of the data.
Figure 6. Database Design 3.1. Big Data Collection Techniques
Value in order to calculate the data conversion, storage, repeat the analysis and data collection orthopedic tissue is present inside the common collector and unstructured data over a log present in the tissue outside the crawl ing, RSS Reader or , provided by the social network service ,that is collected through the program using the Open API. Figure 7 is a kind of Big Data automatically collected manner.
First, the log collector collects the logs of the web server to collect tissue present inside, the web loading, transaction log, click logs, database log data, and the like.
Secondly, by using a web crawling is mainly collects log data that is published to the Internet and social data present in the external organization.
Third, sensing, collecting data from various sensors.
Fourth, RSS, Open API to generate the data collected, shared, participatory pass the data needed to implement the technology of Web 2.0 programming environment.
3.2. User’s Query Collection
One way can know the simplest interest of the user is to collect and analyze user search query. By using the collected search keyword, it is possible to us to simplify the data mining is a big data analysis techniques.
3.3. Collection of User Connection Time
User by collecting time to connect their interests in social commerce, it is possible to grasp the interests of users. By using the ASP, also registered in the session time when disconnecting the connection start time to the time of access to areas of interest to be registered in the session.
3.4. Collection of the Number of Accesses User
By using log analysis of general access number, it is possible to measure how many of using the page. That is user and save the page to general access, it is possible to grasp the interests by extracting the stored page theme.
3.5. Usability Analysis
Usability analysis solution is required to collect information from the user of mobile social advertising platform. Usability analysis solutions as applications continue to run in the background and collects your information in real-time. Stores the user data obtained by integrating the usability analysis solutions and mobile applications on the server. On the server a way to provide the required information to the user by analyzing the information of the user. In addition, data collected in the server is used in the one of the text mining techniques of big data analysis techniques. To do this, the analysis data is available to the areas registered in the social advertising platform, the company can provide the necessary information to the user by utilizing the analyzed data can be utilized as a basis for providing a new service.
4. Conclusion
National ICT Strategy Board for Big Data is reported as extract valuable information to analyze and utilize the data collected in large quantities, and actively respond on the basis of the generated knowledge or to predict the changes in information technology. Currently, more and more amount of information to the exchange over the Internet, a large amount of data is rapidly enough never to understand the development of smart phones and wireless Internet. This allows users to obtain information through the Internet, even in the company side that I that is to find the information necessary to provide the information is difficult, in reality the difficult to provide a service that matches the user is there. In the social commerce in particular surge, pointed out that "hard to use" is out by a complex advertising. In Korea, statistics and data mining, machine learning, pattern recognition analysis SW companies that provide such like getting through a lot. Therefore, to a need to provide services to meet the user has occurred, thus the term targeted marketing appeared. In order to complete this concept of providing customized services to analyze the user's preference, it is necessary to accurately grasp the interests of
users. In this paper, we study how to collect the only user of the information data necessary to provide a separate service. Collects only data that matches the purpose, can reduce the overload phenomenon of data, it is expected that it is possible to more accurate analysis. Gathers information on the number of users to integrate future usability analysis solution and other mobile applications we expect to be able to offer more services to their users.
Acknowledgments
This research was financially supported by the Ministry of Trade, Industry and Energy(MOTIE) and Korea Institute for Advancement of Technology(KIAT) through the Research and Development for Regional Industry.
References
[1] Y. H. Young, “The advent of an era of big data”, Korea Institute of Information Technology, vol. 10, no. 3, (2012), pp. 39-43.
[2] Y. I. Yun and S. Kim, “Big Data and Cloud Era”, Korea Information and Communications Society, vol. 30, no. 4, (2013), pp. 3-6.
[3] S. H. Lee, “Utilization of big data”, Korea Institute of Information Technology, vol. 10, no. 3, (2012), pp. 51-54.
[4] S. H. Lee, “A Big Data Example of Support Public Services”, Korea Institute of Information Technology, Proceedings of KIIT Summer Conference, (2013).
[5] C. W. Ahn and S. K. Hwang, “Big Data Technologies and Main Issues”, Journal of Information Department, vol. 30, no. 6, (2012), pp. 10-17.
[6] S. W. Choi, H. Y. Kim and Y. K. Kim, “Research Trend of Bigdata Technology and Analysis Technique”, Korea Information Science Society, vol. 39, no. 2, (2012), pp. 194-196.
[7] S. I. Shin, J. S. Lee, S. J. Jang and S. P. Lee, ”The Development of the Bi-directionally Personalized Broadcasting and the Targeting Advertisement System Based on the User Profile Techniques”, Journal of broadcast engineering, vol. 15, no. 5, (2010), pp. 632-341.
[8] J. M. Ahn, “Online Behavioral Advertising and Privacy”, Journal of Cyber communication Academic Society, vol. 30, no. 4, (2013), pp. 43-86.
[9] C. N. Jeon and I. W. Seo, “Analyzing the Bigdata for Practical Using into Technology Marketing: Focusing on the Potential Buyer Extraction”, Journal of Korean Strategic Marketing Association, vol. 21, no. 2, (2013), pp. 181-203.
[10] C. S. Im, “Technology Trends on Big Data Analysis Tools”, Korean Institute of Next Generation Computing, vol. 10, no. 5, (2014), pp. 77-84.
Authors
Yun Jun Soo, M.SComputer Science & Engineering,
Korea University of Technology and Education,
Gajeon-ri, Byeongcheon-myeon, Dongnam-gu, Cheonan-si, Chungcheongnam-do 330-708, Korea
E-mail: [email protected]
Park Jin Tae, Ph.D
Computer Science & Engineering,
Korea University of Technology and Education,
Gajeon-ri, Byeongcheon-myeon, Dongnam-gu, Cheonan-si, Chungcheongnam-do 330-708, Korea
Hwang Hyun Seo, M.S
Computer Science & Engineering,
Korea University of Technology and Education,
Gajeon-ri, Byeongcheon-myeon, Dongnam-gu, Cheonan-si, Chungcheongnam-do 330-708, Korea
E-mail: [email protected]
Moon Il Young, Ph.D
Computer Science & Engineering,
Korea University of Technology and Education,
Gajeon-ri, Byeongcheon-myeon, Dongnam-gu, Cheonan-si, Chungcheongnam-do 330-708, Korea