• No results found

Programming for Big Data

N/A
N/A
Protected

Academic year: 2021

Share "Programming for Big Data"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

H8BGD: Programming for

Big Data

Long Title: Programming for Big Data Language of Instruction: English

Module Code: H8BGD

Credits: 5

NFQ Level: LEVEL 8

Field of Study: Software and applications development and analysis

Taxonomy: Blooms

Module Delivered in 2 programme(s) Module Coordinator: EUGENE O'LOUGHLIN Module editor: Ioana Ghergulescu Teaching and Learning

Strategy: The learning strategy involves the use of lectures, tutorials and practical labs work as appropriate.Furthermore, the module uses a learner-centred approach that acknowledges the variety of learning preferences in the classroom. It is highly interactive incorporating opportunities for learning by doing. Students will participate in activities designed to facilitate the development of their learning skills and strategies, aimed to enhance their ability to engage with and participate in modules taken during the course of their studies.

Learning Environment: Learning will take place in both a classroom and computer laboratory environment with access to IT resources. Learners will have access to library resources, both physical & electronic and to faculty outside of the classroom where required. Module materials will be placed on Moodle, the Colleges virtual learning environment. The labs will concentrate on implementing programs and manipulating data for analysis, and how best to implement the theory learned during the module.

Module Description: Enable learners to acquire the necessary key programming skills required to develop applications for processing big data. identify and illustrate the challenges associated with developing programs and applications for big data. Learners will gain practical experience in programming for big data through the use of a range of relevant and appropriate programming languages for given processing tasks (e.g., data extraction, aggregation, reporting etc.). Utilise cloud computing platforms and develop programs for data processing in distributed computing environments.

Learning Outcomes

On successful completion of this module the learner will be able to:

LO1 Design algorithms and implement key programming patterns and constructs for big data

LO2 Assess the challenges associated with processing big data datasets and compare and contrast programming for big data vis-à-vis programming for conventional datasets

LO3 Investigate parallel and distributed computing and write programs for processing datasets in distributed computing and cloud computing environments using relevant programming paradigms (e.g., MapReduce) and relevant programming languages (e.g., Pig, Hive)

LO4 Apply practical skills using a professional tool/language of data analytics (e.g., R) Pre-requisite learning

Requirements

This is prior learning (or a practical skill) that is mandatory before enrolment in this module is allowed. You may not enrol on this module if you have not acquired the learning specified in this section.

(2)

H8BGD: Programming for

Big Data

Module Content & Assessment

Indicative Content

1. Introduction to Data Programming (50%)

Algorithm design Program I/O Data types and data structures Program control and process models Programming constructs Programming types (imperative, declarative, functional, logic) Programming languages for data analytics (e.g., R, Python) Developing programs for data processing activities (e.g., data extraction, cleaning, merging, aggregation, analysis, reporting)

2. Big Data Programming (50%)

Challenges associated with programming for big data Parallelism for computational processes Storage and compute locality Distributed computing Utilisation of cloud computing platforms for big data processing Distributed programming paradigms Distributed programming environments (e.g., Hadoop/HBase) MapReduce algorithm design Big data programming tools and languages (e.g., Pig, Hive) Learning Environment

Learning will take place in both a classroom and computer laboratory environment with access to IT resources. Learners will have access to library resources, both physical & electronic and to faculty outside of the classroom where required. Module materials will be placed on Moodle, the College’s virtual learning environment. Labs The labs will concentrate on implementing programs and manipulating data for analysis, and how best to implement the theory learned during the module.

Assessment Breakdown %

Coursework 100.00%

Full Time

Coursework Assessment

Type Assessment Description Outcomeaddressed % oftotal AssessmentDate

Practical

(0260) Assessment will be through a series of continuous assessment practicalassignments given throughout the semester. Sample assessment: create a Python program that computes a company’s inventory and returns the stock for a product requested by the user.

1,2,3,4 50.00 n/a

Project Learners will be assessed through a project with both practical and research elements. Sample project: You are required to carry out a series of analyses of two datasets utilising appropriate programming languages and programming environments. For each of the chosen datasets you are required to compile a report of the analysis (circa 3,000 words for the report)

1,2,3,4 50.00 n/a

No End of Module Assessment Reassessment Requirement Repeat examination

Reassessment of this module will consist of a repeat examination. It is possible that there will also be a requirement to be reassessed in a coursework element.

Reassessment Description

Learners will be afforded an opportunity to repeat the final examination and all learning outcomes will be assessed in the repeat sitting. NCIRL reserves the right to alter the nature and timings of assessment

(3)

H8BGD: Programming for

Big Data

Module Workload

Workload: Full Time

Workload Type Workload Description Hours Frequency Average

Weekly Learner Workload

Lecture No Description 2 Every

Week 2.00

Tutorial No Description 1 Every

Week 1.00

Independent Learning No Description 7.5 Once per

semester 0.63

Total Hours 10.50 Total Weekly Learner Workload 3.63 Total Weekly Contact Hours 3.00 Workload: Part Time

Workload Type Workload Description Hours Frequency Average

Weekly Learner Workload

Lecture No Description 2 Every

Week 2.00

Tutorial No Description 1 Every

Week 1.00

Independent Learning No Description 89 Once per

semester 7.42

Total Hours 92.00 Total Weekly Learner Workload 10.42 Total Weekly Contact Hours 3.00

(4)

Module Resources

Recommended Book Resources

Paul Teetor, R Cookbook, O'Reilly Media [ISBN: 0596809158]

Tom White, Hadoop: The Definitive Guide, O'Reilly Media [ISBN: 1449311520] Supplementary Book Resources

Thomas H. Cormen... [et al.] 2009, Introduction to algorithms, MIT Press Cambridge, Mass. [ISBN: 0262033844.] Donald Miner, Adam Shook, MapReduce Design Patterns, O'Reilly Media [ISBN: 1449327176.]

This module does not have any article/paper resources Other Resources

Website: MIT Open Courseware videolectures.nethttp://videolectures.net/mit6046jf05_int roduction_algorithms/ Website: Cloudera Universityhttp://university.cloudera.com/onlineres ources/hadoopecosystem.html

Website: MIT Open Coursewarehttp://ocw.mit.edu/courses/electrical-en gineering-and-computer-science/6-00sc-in troduction-to-computer-science-and-progr amming-spring-2011/index.htm

Website: Andrew M. Raim 2013, Introduction to Distributed Computing with pbdR at the UMBC http://userpages.umbc.edu/~gobbert/paper s/pbdRtara2013.pdf

Book: Wes McKinney 2012, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O'Reilly Book: Anand Rajaraman, Jeffrey David Ullman 2014, Mining of Massive Datasets, Cambridge University Press

(5)

Module Delivered in

Programme Code Programme Semester Delivery

BSHTM B.Sc. (Hons) in Technology Management 7 Group Elective 1

References

Related documents

Compute centric view external users, distributed computing, federations Site (HPC) systems Remote compute resources data staging End-users launch and manage jobs mounted

At this Stage, the data mining challenges is to find suitable algorithms in tackling the difficulties raised by the Big Data volumes, distributed data

Hadoop is a Java-based programming framework that supports the processing of large data sets in a. distributed computing

Common elements specific to Big Data arise from the use of multiple infrastructure tiers (both storage and computing) for processing Big Data; the use of new

Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM.. SIGMOD Workshop on Data Engineering for Wireless and Mobile

Security challenges with distributed-NFV vCPE Enterprise/CPE Mobile Edge Computing Distributed NFV Controller Compute uCPE Compute OpenStack Controller to Compute

In this paper, we researched on enhancing performance of processing time by applying parallelism optimization when implementing big data analytics with parallel

Today’s supercomputers or multiprocessor systems which can provide huge parallelism has become the dominant computing platforms (through the proliferation of multi-core processors),