CSMBD16-Big Data Analytics

Module Provider: School of Mathematical, Physical and Computational Sciences
Number of credits: 10 [5 ECTS credits]
Level:7
Terms in which taught: Autumn term module
Pre-requisites:
Non-modular pre-requisites:
Co-requisites: CSMCC16 Cloud Computing CSMDM16 Data Analytics and Mining
Modules excluded: SEMBD13 Big Data Analytics
Module version for: 2016/7

Module Convenor: Dr Frederic Stahl

Email: f.t.stahl@reading.ac.uk

Summary module description:

Aims:
The analysis of Big Data is not just the analysis of very large data sources, even though this is part of it. Typically data comprises four aspects, Volume, Velocity, Variety, and Veracity. This view on Big Data is commonly accepted. Volume refers to the actual size of the data, here computationally well scaling methods are needed; Velocity refers to the very fast generation of data, here data stream processing methods are needed for time critical applications; Variety refers to the different types of data, possibly unstructured data such as video streams, click streams or audio files; Veracity refers to the challenge of establishing the trust of decision makers in the Knowledge extracted from Big Data Analytics techniques.

This unit’s aim is to address these aspects and challenges of Big Data Analytics by introducing scalable parallel data mining algorithms which can be executed on computer clusters such as Hadoop; the introduction of data stream mining techniques and algorithms for the analysis of high velocity data; the introduction to sentiment analysis techniques for unstructured data such as micro-blogging data and social network data; and the introduction of scalable recommender systems. A further aim of the unit is to introduce software systems used for Big Data Analytics such as Hadoop and Mahout.

Assessable learning outcomes:
1. The students will be able to discuss, identify and describe challenges of Big Data Analytics. Furthermore the students will be able to appraise relevant algorithms, tools and techniques to
tackle these challenges.
2. The students will learn how to apply Big Data Analytics techniques and algorithms to solve challenges in Big Data Analytics.
3. The students will be able to analyse complex Big Data Analytics problems, develop and appraise analytics techniques to tackle the problems and evaluate solutions.
4. The students will learn how to redefine and modify solutions from analytics problems, so they can be applied to new but similar problems.

Additional outcomes:
The students will recognise real world applications of Big Data Analytics and also demonstrate how to deploy and evaluate data mining applications for Big Data on computer clusters.

Outline content:
• Introduction to Big Data Analytics principles and challenges.
• Data mining techniques and tools for Large Data Set Analysis, in particular parallel data mining techniques.
• Data mining algorithms and tools for the analysis of fast streaming real time data.
• Data mining techniques for building recommender systems.
• Date mining techniques and algorithms for unstructured data analysis.

Reading List:
Essential Text:
Data Mining, Concepts and Techniques, (Second Edition)
Jiawei Han, Micheline Kamber
Morgan Kaufmann Publishers, March 2006.
ISBN: 978-1-55860-901-3

Mahout in Action
Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman
ISBN 9781935182689

Further reading:
Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)
Ian H. Witten, Eibe Frank

Brief description of teaching and learning methods:
The module comprises lectures (20 hours), practical sessions (10 hours) and a major coursework. The lectures introduce the basic concepts, methodologies of advanced Data Analytics. The students will gain more insights and skills in the taught subjects through reading assignments and hands-on activities on Big Data Analytics through practical sessions. A final coursework will allow the students to apply some of the concepts learned to a practical case.

Contact hours:
  Autumn Spring Summer
Lectures 20
Practicals classes and workshops 10
Guided independent study 70
       
Total hours by term 100.00
       
Total hours for module 100.00

Summative Assessment Methods:
Method Percentage
Written exam 50
Project output other than dissertation 50

Other information on summative assessment:
- Final project (50%)
- Final exam: one hour and half hour paper comprising module-related questions (50%)

Formative assessment methods:

Penalties for late submission:
Penalties for late submission on this module are in accordance with the University policy. Please refer to page 5 of the Postgraduate Guide to Assessment for further information: http://www.reading.ac.uk/internal/exams/student/exa-guidePG.aspx

Length of examination:
1.5 hours

Requirements for a pass:
50% overall module mark

Reassessment arrangements:
Resit by examination.

Additional Costs (specified where applicable):
1) Required text books:
2) Specialist equipment or materials:
3) Specialist clothing, footwear or headgear:
4) Printing and binding:
5) Computers and devices with a particular specification:
6) Travel, accommodation and subsistence:

Last updated: 4 January 2017

Things to do now