The British Academic Written English (BAWE) corpus is a collaboration between the universities of Warwick, Reading and Oxford Brookes. It was collected as part of the project, 'An Investigation of Genres of Assessed Writing in British Higher Education'. The project was funded by the Economic and Social Research Council.(2004-2007; project number RES-000-23-0800)
The corpus contains just under 3000 good-standard student assignments. Holdings are fairly evenly distributed across four broad disciplinary areas (Arts and Humanities, Social Sciences, Life Sciences and Physical Sciences) and across four levels of study (undergraduate and taught masters level). Thirty-five disciplines are represented.
The version of the corpus in Sketch Engine has been prepared by Paul Thompson and Alois Heuboeck at Reading. The files have been tagged by Paul Rayson at Lancaster University for POS (CLAWS tagset; see http://ucrel.lancs.ac.uk/claws7tags.html) and for semantic category (see http://ucrel.lancs.ac.uk/usas/) using WMatrix. Details on how to make use of the tags in CQL queries will appear on this page shortly.
The documentation for the BAWE corpus (without information on POS and semantic tagging) can be downloaded from here.
Information about discipline and level has been recorded for each assignment file, alongside other types of contextual information which did not influence collection policy such as gender, year of birth, native speaker status, and years of UK secondary education.
Subject to the rights of the the three institutions (Warwick, Reading, Oxford Brookes) in the BAWE corpus, and pursuant to the ESRC agreement, the BAWE corpus is available to researchers for research purposes PROVIDED THAT the following conditions are met:
For further information or guidance, contact the BAWE team through Paul Thompson.