Connection Manager
8.2 Future Work
Although framework provides all the basic functionality to explore Big Data in an adhoc manner, but still it has limitations. Data management facility in the framework is very limited and it’s only support the structured data in the form of text file for loading it into Hive. Unstructured datasets are not supported as they require conversion from unstructured to structured data after extracting meaningful information out of them. Modules repository (algorithm repository) is limited. Currently, algorithms are applicable to a particular type of data sets. For instance the data sets which involve temporal information can be analyzed using these algorithms and also the data sets which involves historical data. For instance, if we can make out some sort of structured information from the data of the last fifty years of newspaper outlining the headlines, then this information can be used to draw conclusions about important events happened during a specific time period, lets say in a year or in the past 10 years from this large corpus.
For the framework improvements we need to develop better data handling utility by keeping in mind the non-technical users as main users, which supports the extraction and manipulation of unstructured information. More generic algorithms are needed to develop, which support more data set for analysis. Also an analytical library is needed to develop, which support statistical and mathematical function, which framework can invoke for analytics by passing just input values.
53
Bibliography
[1] (2014, May) YouTube. [Online]. http://www.youtube.com/yt/press/statistics.html
[2] (2014) Cnet. [Online]. http://www.cnet.com/uk/news/facebook-processes-more-than-500-tb- of-data-daily/
[3] Avita Katal, Mohammad Wazid, and R H Goudar, "Big data: Issues, challenges, tools and Good practices," in Contemporary Computing (IC3), 2013 Sixth International Conference, 2013, pp. 404-409.
[4] (2014) Aribus. [Online]. http://www.airbus.com/presscentre/pressreleases/press-release- detail/detail/airbus-and-ibm-to-help-aircraft-operators-optimize-fleet-management-and- operations/
[5] (2014) StatisticsBrain. [Online]. http://www.statisticbrain.com/twitter-statistics/
[6] (2014) Statisticbrain. [Online]. http://www.statisticbrain.com/twitter-statistics/
[7] (2014) MongoDB. [Online]. http://www.mongodb.org/
[8] (2014) StonyBrooks University. [Online].
http://www.stonybrook.edu/commcms/clusterhires/searches/bigdata.html
[9] (2014) Yahoo Labs WebScope. [Online]. http://webscope.sandbox.yahoo.com/
[10] (2014) IBM. [Online]. http://www-01.ibm.com/software/data/infosphere/biginsights/
[11] (2014) SAP Big Data Solutions. [Online]. http://www.sap.com/solution/big-data.html
[12] (2014) SAP HANA In-Memory Database. [Online]. http://www.saphana.com
[13] Yi-Man Ma, Che-Rung Lee, and Yeh-Ching Chung, "InfiniBand virtualization on KVM," in 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012, pp. 777-781.
[14] (2014) TERADATA Aster. [Online]. http://www.asterdata.com
[15] (2014) Apache HCatalog. [Online].
https://cwiki.apache.org/confluence/display/Hive/HCatalog
54 [17] (2014) Apache Falcon. [Online]. http://falcon.incubator.apache.org/
[18] (2014) Apache Sqoop. [Online]. http://sqoop.apache.org/
[19] (2014) Apache Flume. [Online]. http://flume.apache.org/
[20] (2014) HortonWorks. [Online]. http://hortonworks.com/
[21] (2014) 1010data Insight. Now. [Online]. http://www.1010data.com/
[22] (2014) The R Project for Staistical Computing. [Online]. http://www.r-project.org/
[23] (2014) Oracle Big Data. [Online]. http://www.oracle.com/us/technologies/big- data/index.html
[24] (2014) HP Vertica Analytics Platform. [Online]. http://www.vertica.com/
[25] Tom White, Hadoop; The Definitive Guide, 3rd ed., Mike Loukides and Meghan Blanchette, Eds.: O'Reilly, 2012.
[26] Jeffrey Dean and Sanjay Ghemawat, "ʺMapReduce: Simplified Data Processing on Large Clusters," in OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004. [27] Aditya B. Patel, Manashvi Birla, and Ushma Nair, "Addressing Big Data Problem Using
Hadoop and Map Reduce," in 2012 Nirma University International Conference on Engineering (NUiCONE), 2012, pp. 6-8.
[28] Hadoop Wiki. [Online]. http://wiki.apache.org/hadoop/
[29] Edward Capriolo, Dean Wampler, and Jason Rutherglen, Programming Hive, Ist ed., Courtney Nash and Mike Loukides, Eds.: O'Reilly, 2012.
[30] Panagiotis Kalagiakos and Panagiotis Karampelas, "Cloud Computing Learning," in International Conference on Application of Information and Communication Technologies (AICT), Baku, 2011, pp. 1-4.
[31] Leonard Richardson and Sam Rubt, RESTful Web Services.: O'Reilly Media, 2007.
[32] James Snell, Doug Tidwell, and Pavel Kulchenko, Programming Web Services with SOAP.: O'Reilly Media, 2001.
[33] (2014) MSDN Microsoft. [Online]. http://msdn.microsoft.com/en- us/magazine/dd942839.aspx
55 [34] (2014) JSON. [Online]. http://www.json.org/
[35] (Accessed June 2014) Microsoft MSDN. [Online]. http://msdn.microsoft.com/en- us/magazine/dd942839.aspx
[36] Max Katz and Ilya Shaikovsky, Practical RichFaces (Expert's Voice in Java Technology), 2nd ed.: Paul Manning, 2011.
[37] David Geary and Cay S Horstmann, Core JavaServer Faces, 3rd ed.: prentice hall, 2010. [38] (2014) The Perl Programming Language. [Online]. http://www.perl.org/
[39] Robert G Webster and Elena A Govorkova, "H5N1 Influenza Continuing Evolution and Spread," in New England Journal of Medicine, pp. 2174-2177.
[40] (2014) FIFA. [Online]. http://www.fifa.com/mm/document/fifafacts/ffprojects/ip-401 06e tv
[41] (2014) New York Times. [Online]. http://www.nytimes.com/2006/10/09/business/09cnd- deal.html
Appendix
56
Appendix I
Appendix
57 Server Side Class Diagram
Appendix
58
Appendix II
Example Algorithm used for Hourly Analysis
#!/usr/bin/perl
#global variable
my $temp_start_date = '2005-04-01'; my $format = '%Y-%m-%d';
my $temp_original_date = '2005-04-01';
@temp_dates_array = split(/-/, $temp_start_date);
$temp_date = $temp_dates_array[2]+31*$temp_dates_array[1]+365*$temp_dates_array[0]; my $temp_hour = 0;
my $interval_sum = 0.0; my $interval_average = 0.0; my $count=0;
#This flag is for the case where there is only a one hour my $flag=0;
my $date_comparison_flag=0;
my $current_hour_changed_date=0; while ($input = <STDIN>) {
@columns = split(/\t/, $input); my $date = $columns[0]; my $time = $columns[1]; my $score = $columns[2];
chomp($date); chomp($time); chomp($score); #handling time
@current_time = split(/:/,$time);
my $current_hour = $current_time[0] + 0; @current_date_array = split(/-/, $date); $db_date = $current_date_array[2]+31*$current_date_array[1]+365*$current_date_array[0]; if ($db_date == $temp_date) { $date_comparison_flag=1; if ($current_hour == $temp_hour) { my $current_minute = $current_time[1] + 0; if ($current_minute <= 60) { $interval_sum += $score; $count++; $flag=1;
Appendix 59 } } else { if ($temp_hour != 0 ) { $interval_average = $interval_sum/$count; my $temporary_current_hour = $current_hour-1; if ($temporary_current_hour == 0) { $temporary_current_hour++; }
print "$date, $temporary_current_hour:00:00, $interval_average\n"; $temporary_current_hour=0; $count=0; $interval_sum=0; $flag=1; my $current_minute_for_changed_hour = $current_time[1] + 0; if ($current_minute_for_changed_hour <= 60) { $interval_sum += $score; $count++; } my $current_hour_changed = $current_time[0] + 0; $current_hour_changed_date = $current_hour_changed; #displaying the last hour at output
$temp_hour = $current_hour_changed; }
else #first time hour check { my $current_minute = $current_time[1] + 0; if ($current_minute <= 60) { $interval_sum += $score; $count++; $flag=1; } my $current_hour_inside = $current_time[0] + 0; $current_hour_changed_date = $current_hour_inside; $temp_hour = $current_hour_inside; } } } else { if ($date_comparison_flag == 1) {
Appendix 60 $interval_average = $interval_sum/$count; my $temporary_current_hour = $current_hour; if ($current_hour_changed_date == 0) { $current_hour_changed_date++; }
print "$temp_original_date, $current_hour_changed_date:00:00, $interval_average\n"; $current_hour_changed_date=0; } $temp_date = $db_date; $temp_original_date = $date; $interval_sum=0; $count=0; $flag=1; my $current_minute_date_changed = $current_time[1] + 0; if ($current_minute_date_changed <= 60) { $interval_sum += $score; $count++; } $current_hour_changed_date = $current_time[0] + 0; $temp_hour = $current_hour_changed_date; }
}#end of while loop
#for handling one hour only (case where hour will never change) if ($flag == 1) { $interval_average = $interval_sum/$count; if ($current_hour_changed_date == 0) { $current_hour_changed_date++; }
print "$temp_original_date, $current_hour_changed_date:00:00, $interval_average\n"; }