ST3SML-Statistical Data Science and Machine Learning

Module Provider: Mathematics and Statistics
Number of credits: 10 [5 ECTS credits]
Terms in which taught: Spring term module
Pre-requisites: MA1MSP Mathematical and Statistical Programming or MA2MPR Mathematical Programming and ST1PS Probability and Statistics
Non-modular pre-requisites:
Modules excluded:
Current from: 2020/1

Module Convenor: Dr Fazil Baksh


Type of module:

Summary module description:

The topics of Data Science, Machine Learning and Artificial Intelligence have recently become part of the public consciousness, in part due to their successful application in industry (most notably at large technology companies). Many of the most successful techniques used in these fields are underpinned by statistical techniques. This module begins by covering some of these underpinning techniques, and shows how they may be applied to problems in Data Science and Machine Learning.


This module aims to give students a solid understanding of the types of methods that are used in Statistical Machine Learning, and the ability to implement and use some of them. It also aims to connect students with research being conducted in this area.

Assessable learning outcomes:

By the end of the module it is expected that the student will be able to:

  • use and explain underpinning statistical methods for Data Science and Machine Learning;

  • produce software implementation of the methods taught in the module;

  • use statistical learning tools to build and evaluate algorithms for supervised learning.

Additional outcomes:

The student will also gain experience of reading the scientific literature and learning about current research.

Outline content:

The module will begin with an introduction to Data Science, Machine Learning and Artificial Intelligence, then describe the ideas that underpin the statistical approach to these topics. The module focuses on Machine Learning, covering the topics of regression and classification, including: linear and logistic regression;  linear and quadratic discriminant analysis; resampling methods; model selection and regularisation; ridge regression; lasso; dimension reduction methods; prin cipal components regression; partial least squares; high dimensional problems; regression splines; generalised additive models; tree-based methods; bagging; stacking; random forests; boosting; neural networks and deep learning; support vector machines.

Brief description of teaching and learning methods:

The core material will be delivered in 16 lectures. These will be supported by material from the book "An Introduction to Statistical Learning with Applications in R" that is freely available online at along with research articles, and blog posts. This range of sources will be used to give students exposure to the way a Data Scientist working in industry o r academia would learn their subject. This will provide students who are interested in the area a path to explore the subject more widely, whilst being supported by being provided with an easy-to-follow path through the material.

There will be 4 practical PC lab sessions spread in between the lectures. Each will give the students the chance to learn to code up concepts covered in the lectures. 

There will be one assignment, handed out at the begi nning of the module, and due in at the end. The assignment will consist of problems that one will need to use software implementations of the algorithms in the module in order to solve. PC labs will cover problems that are very close to those given in the assignment, in order to motivate students to attend the PC labs, and engage with the module as it progresses. 

Additional support with programming will be offered where required.

Contact hours:
  Autumn Spring Summer
Lectures 16
Practicals classes and workshops 4
Guided independent study: 80
Total hours by term 0 0
Total hours for module 100

Summative Assessment Methods:
Method Percentage
Written exam 70
Set exercise 30

Summative assessment- Examinations:

One exam, 2 hours

Summative assessment- Coursework and in-class tests:

One assignment, with questions that are related to content covered in practicals.

Formative assessment methods:

Feedback given during practicals.

Penalties for late submission:

The Module Convenor will apply the following penalties for work submitted late:

  • where the piece of work is submitted after the original deadline (or any formally agreed extension to the deadline): 10% of the total marks available for that piece of work will be deducted from the mark for each working day[1] (or part thereof) following the deadline up to a total of five working days;
  • where the piece of work is submitted more than five working days after the original deadline (or any formally agreed extension to the deadline): a mark of zero will be recorded.
The University policy statement on penalties for late submission can be found at:
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.

Assessment requirements for a pass:

A mark of 40% overall.

Reassessment arrangements:

One examination paper of 2 hours duration in August/September - the resit module mark will be the higher of the exam mark (100% exam) and the exam mark plus previous coursework marks (70% exam, 30% coursework).

Additional Costs (specified where applicable):



Required text books


Specialist equipment or materials


Specialist clothing, footwear or headgear


Printing and binding


Computers and devices with a particular specification


Travel, accommodation and subsistence


Last updated: 4 April 2020


Things to do now