• No results found

Connection Manager

8.2 Future Work

Although framework provides all the basic functionality to explore Big Data in an adhoc manner, but still it has limitations. Data management facility in the framework is very limited and   it’s   only   support the structured data in the form of text file for loading it into Hive. Unstructured datasets are not supported as they require conversion from unstructured to structured data after extracting meaningful information out of them. Modules repository (algorithm repository) is limited. Currently, algorithms are applicable to a particular type of data sets. For instance the data sets which involve temporal information can be analyzed using these algorithms and also the data sets which involves historical data. For instance, if we can make out some sort of structured information from the data of the last fifty years of newspaper outlining the headlines, then this information can be used to draw conclusions about important events happened during a specific time period, lets say in a year or in the past 10 years from this large corpus.

For the framework improvements we need to develop better data handling utility by keeping in mind the non-technical users as main users, which supports the extraction and manipulation of unstructured information. More generic algorithms are needed to develop, which support more data set for analysis. Also an analytical library is needed to develop, which support statistical and mathematical function, which framework can invoke for analytics by passing just input values.

53

Bibliography

[1] (2014, May) YouTube. [Online]. http://www.youtube.com/yt/press/statistics.html

[2] (2014) Cnet. [Online]. http://www.cnet.com/uk/news/facebook-processes-more-than-500-tb- of-data-daily/

[3] Avita Katal, Mohammad Wazid, and R H Goudar, "Big data: Issues, challenges, tools and Good practices," in Contemporary Computing (IC3), 2013 Sixth International Conference, 2013, pp. 404-409.

[4] (2014) Aribus. [Online]. http://www.airbus.com/presscentre/pressreleases/press-release- detail/detail/airbus-and-ibm-to-help-aircraft-operators-optimize-fleet-management-and- operations/

[5] (2014) StatisticsBrain. [Online]. http://www.statisticbrain.com/twitter-statistics/

[6] (2014) Statisticbrain. [Online]. http://www.statisticbrain.com/twitter-statistics/

[7] (2014) MongoDB. [Online]. http://www.mongodb.org/

[8] (2014) StonyBrooks University. [Online].

http://www.stonybrook.edu/commcms/clusterhires/searches/bigdata.html

[9] (2014) Yahoo Labs WebScope. [Online]. http://webscope.sandbox.yahoo.com/

[10] (2014) IBM. [Online]. http://www-01.ibm.com/software/data/infosphere/biginsights/

[11] (2014) SAP Big Data Solutions. [Online]. http://www.sap.com/solution/big-data.html

[12] (2014) SAP HANA In-Memory Database. [Online]. http://www.saphana.com

[13] Yi-Man Ma, Che-Rung Lee, and Yeh-Ching Chung, "InfiniBand virtualization on KVM," in 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012, pp. 777-781.

[14] (2014) TERADATA Aster. [Online]. http://www.asterdata.com

[15] (2014) Apache HCatalog. [Online].

https://cwiki.apache.org/confluence/display/Hive/HCatalog

54 [17] (2014) Apache Falcon. [Online]. http://falcon.incubator.apache.org/

[18] (2014) Apache Sqoop. [Online]. http://sqoop.apache.org/

[19] (2014) Apache Flume. [Online]. http://flume.apache.org/

[20] (2014) HortonWorks. [Online]. http://hortonworks.com/

[21] (2014) 1010data Insight. Now. [Online]. http://www.1010data.com/

[22] (2014) The R Project for Staistical Computing. [Online]. http://www.r-project.org/

[23] (2014) Oracle Big Data. [Online]. http://www.oracle.com/us/technologies/big- data/index.html

[24] (2014) HP Vertica Analytics Platform. [Online]. http://www.vertica.com/

[25] Tom White, Hadoop; The Definitive Guide, 3rd ed., Mike Loukides and Meghan Blanchette, Eds.: O'Reilly, 2012.

[26] Jeffrey  Dean  and  Sanjay  Ghemawat,  "ʺMapReduce:  Simplified  Data  Processing  on  Large   Clusters," in OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004. [27] Aditya B. Patel, Manashvi Birla, and Ushma Nair, "Addressing Big Data Problem Using

Hadoop and Map Reduce," in 2012 Nirma University International Conference on Engineering (NUiCONE), 2012, pp. 6-8.

[28] Hadoop Wiki. [Online]. http://wiki.apache.org/hadoop/

[29] Edward Capriolo, Dean Wampler, and Jason Rutherglen, Programming Hive, Ist ed., Courtney Nash and Mike Loukides, Eds.: O'Reilly, 2012.

[30] Panagiotis Kalagiakos and Panagiotis Karampelas, "Cloud Computing Learning," in International Conference on Application of Information and Communication Technologies (AICT), Baku, 2011, pp. 1-4.

[31] Leonard Richardson and Sam Rubt, RESTful Web Services.: O'Reilly Media, 2007.

[32] James Snell, Doug Tidwell, and Pavel Kulchenko, Programming Web Services with SOAP.: O'Reilly Media, 2001.

[33] (2014) MSDN Microsoft. [Online]. http://msdn.microsoft.com/en- us/magazine/dd942839.aspx

55 [34] (2014) JSON. [Online]. http://www.json.org/

[35] (Accessed June 2014) Microsoft MSDN. [Online]. http://msdn.microsoft.com/en- us/magazine/dd942839.aspx

[36] Max Katz and Ilya Shaikovsky, Practical RichFaces (Expert's Voice in Java Technology), 2nd ed.: Paul Manning, 2011.

[37] David Geary and Cay S Horstmann, Core JavaServer Faces, 3rd ed.: prentice hall, 2010. [38] (2014) The Perl Programming Language. [Online]. http://www.perl.org/

[39] Robert G Webster and Elena A Govorkova, "H5N1 Influenza Continuing Evolution and Spread," in New England Journal of Medicine, pp. 2174-2177.

[40] (2014) FIFA. [Online]. http://www.fifa.com/mm/document/fifafacts/ffprojects/ip-401 06e tv

[41] (2014) New York Times. [Online]. http://www.nytimes.com/2006/10/09/business/09cnd- deal.html

Appendix

56

Appendix I

Appendix

57 Server Side Class Diagram

Appendix

58

Appendix II

Example Algorithm used for Hourly Analysis

#!/usr/bin/perl

#global variable

my $temp_start_date = '2005-04-01'; my $format = '%Y-%m-%d';

my $temp_original_date = '2005-04-01';

@temp_dates_array = split(/-/, $temp_start_date);

$temp_date = $temp_dates_array[2]+31*$temp_dates_array[1]+365*$temp_dates_array[0]; my $temp_hour = 0;

my $interval_sum = 0.0; my $interval_average = 0.0; my $count=0;

#This flag is for the case where there is only a one hour my $flag=0;

my $date_comparison_flag=0;

my $current_hour_changed_date=0; while ($input = <STDIN>) {

@columns = split(/\t/, $input); my $date = $columns[0]; my $time = $columns[1]; my $score = $columns[2];

chomp($date); chomp($time); chomp($score); #handling time

@current_time = split(/:/,$time);

my $current_hour = $current_time[0] + 0; @current_date_array = split(/-/, $date); $db_date = $current_date_array[2]+31*$current_date_array[1]+365*$current_date_array[0]; if ($db_date == $temp_date) { $date_comparison_flag=1; if ($current_hour == $temp_hour) { my $current_minute = $current_time[1] + 0; if ($current_minute <= 60) { $interval_sum += $score; $count++; $flag=1;

Appendix 59 } } else { if ($temp_hour != 0 ) { $interval_average = $interval_sum/$count; my $temporary_current_hour = $current_hour-1; if ($temporary_current_hour == 0) { $temporary_current_hour++; }

print "$date, $temporary_current_hour:00:00, $interval_average\n"; $temporary_current_hour=0; $count=0; $interval_sum=0; $flag=1; my $current_minute_for_changed_hour = $current_time[1] + 0; if ($current_minute_for_changed_hour <= 60) { $interval_sum += $score; $count++; } my $current_hour_changed = $current_time[0] + 0; $current_hour_changed_date = $current_hour_changed; #displaying the last hour at output

$temp_hour = $current_hour_changed; }

else #first time hour check { my $current_minute = $current_time[1] + 0; if ($current_minute <= 60) { $interval_sum += $score; $count++; $flag=1; } my $current_hour_inside = $current_time[0] + 0; $current_hour_changed_date = $current_hour_inside; $temp_hour = $current_hour_inside; } } } else { if ($date_comparison_flag == 1) {

Appendix 60 $interval_average = $interval_sum/$count; my $temporary_current_hour = $current_hour; if ($current_hour_changed_date == 0) { $current_hour_changed_date++; }

print "$temp_original_date, $current_hour_changed_date:00:00, $interval_average\n"; $current_hour_changed_date=0; } $temp_date = $db_date; $temp_original_date = $date; $interval_sum=0; $count=0; $flag=1; my $current_minute_date_changed = $current_time[1] + 0; if ($current_minute_date_changed <= 60) { $interval_sum += $score; $count++; } $current_hour_changed_date = $current_time[0] + 0; $temp_hour = $current_hour_changed_date; }

}#end of while loop

#for handling one hour only (case where hour will never change) if ($flag == 1) { $interval_average = $interval_sum/$count; if ($current_hour_changed_date == 0) { $current_hour_changed_date++; }

print "$temp_original_date, $current_hour_changed_date:00:00, $interval_average\n"; }

Related documents