The British Academic Spoken English Corpus

The British Academic Spoken English (BASE) corpus is a collection of transcripts of lectures and seminars recorded at two universities in the UK during the period 1998-2005. The corpus that can be accessed through Sketch Engine consists of 160 lectures recorded in a variety of university departments. Holdings are distributed across four broad disciplinary groups, each represented by 40 lectures:

The lectures have been transcribed and annotated in accordance with the TEI Guidelines. File names are made up of five letters and three digits, in which the first two letters indicate the disciplinary group, the next three indicate that the file is a transcript of a lecture, and the digits are unique identifiers:

ah [Arts and Humanities]
ls
[ Life and Medical Sciences]
ps
[ Physical Sciences]
ss
[ Social Studies and Sciences]

lct
[lecture]

0nn

The Manual (PDF) explains the spelling and transcription conventions adopted. In the conversion of the corpus to Sketch Engine format, some of the mark-up has been changed and further details will be made available soon. In addition, a set of guidelines for how to form CQL queries in Sketch Engine for exploring the BASE corpus will be added soon.

A spreadsheet detailing the files in the BASE corpus can also be downloaded, as an Excel file.

For further information or guidance, contact the BASE team through Paul Thompson (p.a.thompson@reading.ac.uk).