CS3TM20-Text Mining and Natural Language Processing

Module Provider: Computer Science
Number of credits: 10 [5 ECTS credits]
Terms in which taught: Spring term module
Non-modular pre-requisites:
Modules excluded:
Current from: 2020/1

Module Convenor: Dr Huizhi Liang

Email: huizhi.liang@reading.ac.uk

Type of module:

Summary module description:

This module introduces both the theory and practice of Text Mining and Natural Language Processing (NLP).


The aim of this module is to introduce the field of text mining and natural language processing. A key focus of the module is placed on the theories and practice of processing text data from the aspects of lexicons, syntactics, and semantics. Aims also include learning about typical application areas such as text classification, topic detection, information extraction, and information retrieval for large scaled text data. The advanced topics such as deep learning for NLP, dialogue systems, machine translation, and current research in the field are also included.


This module also encourages students to develop a set of professional skills, such as problem solving, creativity, technical report writing, organization and time management, self-reflection, software design and development; end-user awareness, action planning and decision making, commercial awareness, critical analysis of published literature and value of diversity.

Assessable learning outcomes:

By the end of this module, students should be able to

  • Understand and apply the fundamental principles of text mining and natural language processing;

  • Apply methods and algorithms to process different types of textual data;

  • Empirically evaluate the performances of methods and algorithms by using accuracy and efficiency metrics;

  • Apply analytical and programming skills through using the existing NLP methods and tool s such as NLTK and scikit-learn (python)

Additional outcomes:

This module will provide an overview of the field of Text Mining and NLP and its sub-areas, and will introduce and explain its key techniques, including their applicability and limitations. Topics covered will include:

  • Regular expression, Text Normalization, and Edit Distance

  • N-gram and language model, part-of-speech tagging

  • lexical semantics, Word Senses and WordNet

  • Syntactic and Semantic parsing

  • Text classification, topic detection, sentiment analysis

  • Information extraction including name entity recognition and relation extraction

  • Information retrieval and recommender systems

  • Advanced topics: deep learning for NLP

  • Advanced topics: question answering, dialog systems, machine translation

Outline content:

Brief description of teaching and learning methods:

The course material will be introduced through lectures and practicals. The lecture material will be applied during lab practical sessions. The lab work will provide the student with support to develop high fidelity prototypes by adopting the concepts and storyboards as well as plan for evaluation.  

Contact hours:
  Autumn Spring Summer
Lectures 16
Practicals classes and workshops 4
Guided independent study:      
    Wider reading (independent) 5
    Wider reading (directed) 5
    Exam revision/preparation 20
    Advance preparation for classes 3
    Preparation of practical report 5
    Completion of formative assessment tasks 30
    Revision and preparation 10
    Reflection 2
Total hours by term 0 0
Total hours for module 100

Summative Assessment Methods:
Method Percentage
Written exam 50
Set exercise 50

Summative assessment- Examinations:

One 1.5 hour examination paper in May/June

Summative assessment- Coursework and in-class tests:

An individual assignment.

Formative assessment methods:

Students will be provided with formative feedback towards preparation of the coursework in tutorial sessions.

Penalties for late submission:

The Module Convenor will apply the following penalties for work submitted late:

  • where the piece of work is submitted after the original deadline (or any formally agreed extension to the deadline): 10% of the total marks available for that piece of work will be deducted from the mark for each working day[1] (or part thereof) following the deadline up to a total of five working days;
  • where the piece of work is submitted more than five working days after the original deadline (or any formally agreed extension to the deadline): a mark of zero will be recorded.
The University policy statement on penalties for late submission can be found at: http://www.reading.ac.uk/web/FILES/qualitysupport/penaltiesforlatesubmission.pdf
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.

Assessment requirements for a pass:

A mark of 40% overall.

Reassessment arrangements:

One 2-hour examination paper in August/September. Note that the resit module mark will be the higher of (a) the mark from this resit exam and (b) an average of this resit exam mark and previous coursework marks, weighted as per the first attempt (50% exam, 50% coursework). 

Additional Costs (specified where applicable):

Last updated: 16 April 2020


Things to do now