The British Academic Written English Corpus

The British Academic Written English (BAWE) corpus is a collaboration between the universities of Warwick, Reading and Oxford Brookes. It was collected as part of the project, 'An Investigation of Genres of Assessed Writing in British Higher Education'. The project was funded by the Economic and Social Research Council.(2004-2007; project number RES-000-23-0800)

The corpus contains just under 3000 good-standard student assignments. Holdings are fairly evenly distributed across four broad disciplinary areas (Arts and Humanities, Social Sciences, Life Sciences and Physical Sciences) and across four levels of study (undergraduate and taught masters level). Thirty-five disciplines are represented.

The version of the corpus in Sketch Engine has been prepared by Paul Thompson and Alois Heuboeck at Reading. The files have been tagged by Paul Rayson at Lancaster University for POS (CLAWS tagset; see and for semantic category (see using WMatrix. Details on how to make use of the tags in CQL queries will appear on this page shortly.

The documentation for the BAWE corpus (without information on POS and semantic tagging) can be downloaded from here.

Information about discipline and level has been recorded for each assignment file, alongside other types of contextual information which did not influence collection policy such as gender, year of birth, native speaker status, and years of UK secondary education.

Subject to the rights of the the three institutions (Warwick, Reading, Oxford Brookes) in the BAWE corpus, and pursuant to the ESRC agreement, the BAWE corpus is available to researchers for research purposes PROVIDED THAT the following conditions are met:

  1. The corpus files are not to be distributed in either their original form or in modified form.
  2. The texts are used for research purposes only; they should not be reproduced in teaching materials.
  3. The texts are not reproduced in full for a wider audience/readership, although researchers are free to quote short passages of text (up to 200 running words from any given text).
  4. The BAWE corpus developers (contact: Hilary Nesi) are informed of all projects, dissertations, theses, presentations or publications arising from analysis of the corpus.
  5. Researchers acknowledge their use of the corpus using the following form of words: "The data in this study come from the British Academic Written English (BAWE) corpus, which was developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick), Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens (Westminster Institute of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800)."

For further information or guidance, contact the BAWE team through Paul Thompson.