CSMBD16-Big Data Analytics

Module Provider: Computer Science
Number of credits: 10 [5 ECTS credits]
Level:7
Terms in which taught: Spring term module
Pre-requisites:
Non-modular pre-requisites:
Co-requisites: CSMCC16 Cloud Computing and CSMDM16 Data Analytics and Mining
Modules excluded:
Current from: 2020/1

Module Convenor: Mr Mohammed Al-Khafajiy

Email: m.d.al-khafajiy@reading.ac.uk

Type of module:

Summary module description:

This module covers the topic of Big Data.


Aims:

The analysis of Big Data is not just the analysis of very large data sources, even though this is part of it. Typically data comprises four aspects, Volume, Velocity, Variety, and Veracity. This view on Big Data is commonly accepted. Volume refers to the actual size of the data, here computationally well scaling methods are needed; Velocity refers to the very fast generation of data, here data stream processing methods are needed for time critical applications; Variety refers to the different types of data, possibly unstructured data such as video streams, click streams or audio files; Veracity refers to the challenge of establishing the trust of decision makers in the Knowledge extracted from Big Data Analytics techniques. 



This unit’s aim is to address these aspects and challenges of Big Data Analytics by introducing scalable parallel data mining algorithms which can be executed on computer clusters such as Hadoop; the introduction of data stream mining techniques and algorithms for the analysis of high velocity data; the introduction to sentiment analysis techniques for unstructured data such as micro-blogging data and social network data; and the introduction of scalable recommender systems. A further aim of the unit is to introduce software systems used for Big Data Analytics such as KNIME, MOA, MapReduce and Spark.


Assessable learning outcomes:


  • The students will be able to discuss, identify and describe challenges of Big Data Analytics. Furthermore the students will be able to appraise relevant algorithms, tools and techniques to tackle these challenges.

  • The students will learn how to apply Big Data Analytics techniques and algorithms to solve challenges in Big Data Analytics.

  • The students will be able to analyse complex Big Data Analytics problems, develop and appraise analytics techniqu es to tackle the problems and evaluate solutions.

  • The students will learn how to redefine and modify solutions from analytics problems, so they can be applied to new but similar problems.


Additional outcomes:

The students will recognise real world applications of Big Data Analytics and also demonstrate how to deploy and evaluate data mining applications for Big Data on computer clusters.


Outline content:


  • Introduction to Big Data Analytics principles and challenges;

  • Data mining techniques and tools for Large Data Set Analysis, in particular parallel data mining techniques;

  • Data mining algorithms and tools for the analysis of fast streaming real time data;

  • Data mining techniques for building recommender systems;

  • Data mining techniques and algorithms for unstructured data analysis.



Reading List: Essential Text:



Data Mining, Concepts and Techniques, (Second Edition) Jiawei Han, Micheline Kamber Morgan Kaufmann Publishers, March 2006. ISBN: 978-1-55860-901-3



Mahout in Action Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman ISBN 9781935182689



Further reading:



Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) Ian H. Witten, Eibe Frank

Brief description of teaching and learning methods:

The module comprises lectures and practical sessions. The lectures introduce the basic concepts, methodologies of advanced Data Analytics. Students will gain more insights and skills in the taught subjects through the practical sessions. A project-based assignment will allow students to apply the concepts learned to a practical case.


Contact hours:
  Autumn Spring Summer
Lectures 10
Practicals classes and workshops 10
Guided independent study: 80
       
Total hours by term 0 0
       
Total hours for module 100

Summative Assessment Methods:
Method Percentage
Written exam 50
Project output other than dissertation 50

Summative assessment- Examinations:

One 1.5 hour examine paper in May/June.


Summative assessment- Coursework and in-class tests:

One project-based assignment (50%).


Formative assessment methods:

Penalties for late submission:
Penalties for late submission on this module are in accordance with the University policy. Please refer to page 5 of the Postgraduate Guide to Assessment for further information: http://www.reading.ac.uk/internal/exams/student/exa-guidePG.aspx

Assessment requirements for a pass:

A mark of 50% overall.


Reassessment arrangements:

One 2-hour examination paper in August/September. Note that the resit module mark will be the higher of (a) the mark from this resit exam and (b) an average of this resit exam mark and previous coursework marks, weighted as per the first attempt (50% exam, 50% coursework).  


Additional Costs (specified where applicable):
1) Required text books:
2) Specialist equipment or materials:
3) Specialist clothing, footwear or headgear:
4) Printing and binding:
5) Computers and devices with a particular specification:
6) Travel, accommodation and subsistence:

Last updated: 10 August 2020

THE INFORMATION CONTAINED IN THIS MODULE DESCRIPTION DOES NOT FORM ANY PART OF A STUDENT'S CONTRACT.

Things to do now