Accessibility navigation

Internal

This paper first appeared in a slightly different form in the Proceedings for the Teaching and Language Corpora 98 Conference. Permission to reproduce the article here is gratefully acknowledged.


Examining PhD theses in a corpus

Paul Thompson

1 Introduction

To what extent do PhD theses constitute a genre? Swales (1990), in his seminal text on genre analysis, devotes little space to the discussion of the PhD thesis as a genre, suggesting only that a key feature of the PhD thesis might be the degree of use of metadiscourse, in order to organise a long text. Genre analysis has tended to focus instead on the research article, a more easily accessible and manageable genre of academic text. A number of recent studies within the field of EAP have looked, however, at the texts that students are expected to produce in academic contexts, such as argumentative essays (Hyland 1990) and theses and dissertations (for example, Hanania and Akhtar 1985, Dudley-Evans 1994, Shaw 1992) . Descriptions of the language and organisation of such texts remain, however, limited.

A distinctive feature of the Reading Academic Text corpus, held at the Centre for Applied Language Studies, the University of Reading, is that it contains complete texts of theses (see Carne, 1996, for a description of the rationale behind the establishment of the corpus). The use of concordancing techniques promises to make these long texts more tractable to analysis. A major question that needs to be resolved, however, is that of whether these texts can be judged to constitute a genre in their own right, in which case it might be justifiable to conduct analyses of language use across all the theses as part of an exploration of the genre of theses, or whether it would be better to subdivide the theses according to certain criteria. To answer this question, I have decided to conduct analyses of certain features of text in theses in two subdisciplines, in order to explore the discourse conventions of different disciplinary communities; this paper reports on the study of one feature, the use of citations. Where previous studies of citations have tended to focus on a particular section of a text, such as introductions, the present study examines the use of citations throughout complete texts.

I initially intended to analyse the use of reporting verbs in theses, with the aim of investigating the ways that student writers position themselves within their texts with relation to other, already accepted researchers in their field. Problems arose, however, in distinguishing reporting verbs that were making extratextual reference from those making intratextual reference, and, furthermore, it proved difficult to make a clear distinction between verbs that function as reporting verbs and those that are narrative. Citations, on the other hand, are easily discernible, and form a discrete set. By examining citations, it is possible to demarcate the number of verbs that will be examined (only those directly controlled by the citation subject, in the particular sentence), and also to investigate the different functions that references to other researchers perform in the texts. For this purpose, a set of functional categories was devised (see Section 3 below) and all citations within the texts tagged accordingly. The instances of each type of citation were then quantified, for each chapter, and for the complete texts, to find out which types of citation were most common in either of the two subdisciplines, and also what the patterns of use of citations were across complete texts. Through this analysis, I hoped to identify patterns in usage which would elucidate the ways that writers in either subdicipline construct their texts.

2 The corpus

The RAT corpus contains, at present, 30 PhD theses, written by native speaker students, at the University of Reading, between the years 1991 and 1998. In the present study, fourteen theses were examined: six from the Department of Agricultural Economics (hereafter AgEcon, or AE) and eight from the Department of Agricultural Botany (hereafter AgBot, or AB).

Each chapter of a thesis has its own ASCII file, and, as the focus of analysis is on the individual writer's verbal text, the text is stripped of all tables, plates, quotations and figures. COCOA tags are inserted to indicate where these features have been removed. The file names indicate the department that the thesis was written for. Thus, AgEcon theses are named 'TAE-n', where TAE indicates 'Thesis, Agricultural Economics' and n the number given to the particular thesis; AgBot theses are named 'TAB-n'.

The word counts for the thirteen theses in the study, after preparation of the text, are shown in Table 1. The average length of a thesis in AgEcon, in this sample, is 65,717, while that of an AgBot thesis is 33,081; in other words, the AgEcon theses are nearly double the length.

TAE1
77,787

TAE2
39,711

TAE3
70,125

TAE4
55,189

TAE5
99,176

TAE6
52,316

TAB1
18,491

TAB2
39,189

TAB3
31,485

TAB4
28,114

TAB5
38,733

TAB6
57,374

TAB7
15,298

TAB8
35,961

Table 1 Word counts for the 14 theses

3 The Tagging System

Following Swales (1990:141) citations were divided into integral or non-integral. Integral citations appear within the sentence; if the citation is in the form of a name followed by year number, typically the name will be incorporated into the sentence as an integral part of the syntax of the sentence, and will not be separated by brackets . In a non-integral citation, the citation is separated from the sentence by brackets and it plays no explicit grammatical role in the sentence. Citations can also take the form of a number (rather than name and year) reference but none of the writers in the corpus use this style, and thus it was not considered in this study.

3.1 Non-integral citations

Tag: <RN Source>

    The citation tells the reader where the information (verbal or numerical) or idea comes from. The function of the citation is that of attribution. For example:

    Both diseases are of economic importance, but black Sigatoka develops much more rapidly, causes more severe defoliation, and is more difficult to control than yellow Sigatoka (Stover and Dickson, 1976). (TAB-005)

Tag: <RN Ident>

    The citation identifies an actor in the sentence, where the actor is either explicitly or implicitly included.

    It has been suggested (Wardlaw, 1972) that M. fijiensis might be a mutant of M. musicola ... (TAB-005)

Tag: <RN Refer>

    The citation refers the reader to a text to find further details. The details are not given in the writer's text. This form of citation usually has the word 'see' included, but not necessarily.

    This equation can be rearranged to express Total Factor Productivity as a function of research spending (see Thirtle, 1988). (TAE-002)

Tag: <RN Example>

    The citation provides a number of examples of studies that are referred to in the sentence. 'e.g.' or 'for example' typically preface the name(s) but not necessarily.

    The existing literature on the returns to research is considerable, and several summaries of the indicators found in the literature are available (e.g. Thirtle and Bottomley (1988) or Echeverria (1990)) (TAE-002)

3.2 Integral citations

Two types of integral citation can be distinguished. In the first, the citation controls a lexical verb in the clause, either as the subject (X argues ...) or as the controlling agent in a passive construction (It is argued by X ...). Adapting Thompson and Ye (1991), three processes were identified in the lexical verbs controlled by the citation:

Tag: <RI Research verb>

    These are verbs which refer to research work, and have the sense of 'real-world' activities. Verbs connected with experimental procedures, with findings, with measurement, with categorisation, with observation (in the sense of watching) are included in this group.

    Miller and Tanksley (1990a) found no such correlation when studying tomato genomic clones. (TAB-002)

Tag: <RI Discourse verb>

    These are verbs which describe processes in which verbal expression is involved, such as suggesting, reporting, commenting, or arguing.

    Nodari et al. (1992) suggest that this difference may be due to the fact that random clones mainly detect point mutations, whereas ... (TAB-002)

Tag: <RI Other verb>

    Thompson and Ye had a third category called mental processes which covered all the verbs of cognition (believe, consider, think). There are certain verbs in the corpus which do not fit neatly into either of the previous two categories, such as 'provide' or 'offer' (as in 'offer a view' or 'offer an alternative'), or 'visit' (as in 'Young (1990) visited ...'). This third category was created, therefore, as a catch-all-else.

    Bassett (1986) provides a comprehensive survey of the use of pyrethroids in UK agriculture. (TAE-002)

Tag: <RI Naming>

    The other form of integral citations are those that are within the sentence, and which do not control a lexical verb form. The citation works as a noun phrase, and is typically functioning either 1) as a modifier, as in 'the work of Fuller' (1997) or 'Fuller's (1997) work', or 2) as a free-standing noun phrase followed by a linking verb, as in 'Fuller (1997) is the best example of this approach'.

    Surprisingly no attempt was made on publication of the work of Fukuda et al. (1989), to assay ACC oxidase from plant sources under these conditions. (TAB-007)

3.3 Non-citation

Tags: <RN Non-cit> <RI Non-cit>

    Occurrences of a name in the text which did not appear as a citation (i.e. no year, or page, reference attached to the name) were also tagged. The exception to this was instances where the name was used to identify a particular theorem, model, law, or other such commonly recognised construct. These 'non-citations', occur, of course, after the researcher has already been cited.

    These lower order moments potentially provide enough information to accurately specify an appropriate lag structure (see Silver and Wallace) with minimal priors. (TAE-005)

    Schmidt suggests the last expression can be regarded as the truncated remainder which although time dependant is asymptotically negligible and thus can be omitted in estimation. (TAE-005)

3.4 Quotation

Tag: <DQ>

A tag was inserted to indicate where a citation was followed by a direct quotation from the cited text.

In general 'the World Bank has tended to use the Internal Rate of Return as its principal discounted measure' (Gittinger 1992). (TAE-002)

3.5 Elaboration

Tag: <elab>

    A citation may either be restricted to a single sentence, or it may be elaborated upon in the following lines of text. Where the latter is the case, the tag <elab> was added immediately after the first one or two tags. For example:

      Tapia et al., (1990) <RI Research><elab> studied the anatomical features of leaf surfaces of Grande Naine (AAA), False horn plantain (AAB) and Pelipita (ABB), and the relationship with resistance to M. fijiensis. They identified an apparent relationship between stomatal density and resistance, however, this was confounded with genome, as studies were made across genomic compositions but not within AA and AB. (TAB-005)

    In cases where more than one work was cited for a particular statement, only one tag was inserted. In other words, the tag indicates that a citation has been made; it does not indicate whether it is a single or a multiple reference citation.

4 Results and discussion

Space does not allow a full report of the results here. Instead, data of a general nature are presented, and a commentary on the main points emerging from the study is given. It must be stressed that these observations are highly generalised, and also that they pertain to the small number of theses contained within the corpus.

RN

RI

RN

RI

TAE-001

103

345

TAB-001

83

55

TAE-002

110

115

TAB-002

400

85

TAE-003

24

151

TAB-003

138

100

TAE-004

284

182

TAB-004

115

68

TAE-005

179

376

TAB-005

155

53

TAE-006

99

176

TAB-006

115

73

TAB-007

143

20

TAB-008

95

125

TAE sum

799

1345

TAB sum

1244

479

Table 2 Integral and non-integral citations by thesis and by subdiscipline

It is clear that non-integral citations are far more common than integral citations, in the AgBot theses, and vice versa in the case of the AgEcon theses. In the Agricultural Botany theses, it could be said, writers tend to focus on previous findings, or suggestions, rather than on the researchers that have made the findings or suggestions. On the other hand, in the Agricultural Economics theses, there is more attention paid to the individuals who have developed approaches, expressed equations, or who have articulated complex models and so they appear as actors within sentences.

This broad brush depiction of tendencies overlooks inconsistencies in the patterns within the data, however. TAE4 exhibits the opposite tendency to other theses in the AgEcon group, as does TAB8 in the AgBot group. An explanation for this is that a major section of TAE4 is devoted to an economic history of external trade policy in four sub-Saharan African countries, where <RN Source> is the preferred mode of citation (166 out of 201 citations in the chapter), allowing the focus to fall on the facts; TAB8 presents a series of studies in which there is repeated reference to the findings of other researchers for the purpose of comparison, and thus uses <RI + verb> citations repeatedly.

There was no clear contrast in the proportion of 'research' verbs used in ratio to 'discourse' verbs; for AgEcon, this was 243:236, and for AgBot, 225:165. Examination of the individual theses and the functions of the verbs within different sections reveal more but there is no space for a discussion here. It should also be noted, however, that the distinction between research and discourse was problematic at times, because it was difficult to determine whether discussion and analysis of models counted as research work, or as discourse. An important aspect of the differences between disciplines is the various notions of what constitutes research; without a clear understanding of this, it is difficult to utilise these categories.

The use of a name without a full citation (coded as <RI Non-cit> or <RN Non-cit>) was rare in the AgBot theses (16 instances) but common in the AgEcon theses (368 instances). Similar contrasts were found in the use of direct quotations (AB - 26; AE - 128), and elaborations of citations (AB - 81; AE - 392). Furthermore, elaborations in AgBot theses tended to be restricted to one or two sentences, while those in AgEcon tended to be much longer. This indicates again that individual researchers play explicit roles in the texts of the AgEcon writers, and that their ideas or models are often discussed at some length, and are frequently quoted verbatim, while in the AgBot texts the tendency is for findings, facts, and observations to be foregrounded.

Looking at the overall concentration of citations over the complete theses, a noticeable difference between the subdisciplines was that the AgEcon theses contained many citations in sections in which models or approaches were discussed, with far fewer citations in sections in which particular organisations, situations or events were described. This correlates well with Bloor and Bloor's (1993) suggestion that economists modify propositions relating to field-central claims and tend not to modify substantive statements.

A further finding was that there was a comparatively low use of citations by the AgEcon writers in the final chapter of the thesis; the average was 8 per thesis, 4 of which were actually non-citations (<RI Non-cit>). This was in marked contrast to the concluding chapters in the AgBot theses, which contain on average 31 citations, none of which would be categorised as <RI Non-cit>. The AgBot theses correspond reasonably well with the hourglass IMRD model suggested by Hill et al (1982) in which the writer places the work within the context of the field in general in the introduction, and then narrows the focus (therefore requiring less use of citations) in the description of the studies, before broadening out again in the concluding section in which the present work is placed within the wider field. The AgEcon theses, on the other hand, do not fit with this model. The AgEcon writers in the corpus appear to craft their own framework for analysis out of an extensive critical survey of the approaches and techniques of others, and then assess the effectiveness of their model through its application to a data set. It could further be argued that the AgEcon writers are creating a thesis in a relatively self-contained way, whereas the AgBot writers see a need to place their research within the context of a collective body of scientific knowledge.

5 Conclusion

This study has revealed distinct differences in the use of citations in theses in two subdisciplines. The dispersion of citations throughout the complete texts also indicates that writers in the two subdisciplines construct their texts in markedly different ways. The results suggest that disciplines conceive of the nature and purpose of research, and of a thesis, differently, and that it would be more generative, for a fuller description of discourse practices in academic settings, to examine differences between texts in different disciplines than it would be to attempt to generalise across all fields.

Details of the theses examined

    TAE-001: R. J. Loader (1995) Investigating and assessing agricultural and food marketing systems

    TAE-002: H. S. Beck (1994) The economic value of long term agricultural research

    TAE-003: A. S. Bailey (1996) The estimation of input-output coefficients for agriculture from whole farm accounting data

    TAE-004: M. A. Gadbois (1997) The effects of exchange rate variability and export instability on selected exports from sub-Saharan African countries

    TAE-005: Y. J.G. Khatri (1994) Technical change and the returns to research in UK agriculture 1953-1990

    TAE-006: Steve L. Wiggins (1991) Managing the implementation of agricultural and rural development in the Third World

    TAB-001: C. Darwen (1991) A study of fructan metabolism in the Jerusalem artichoke (Helianthus tuberosus L.)

    TAB-002: S. Berry (1995) Molecular marker analysis of cultivated sunflower (Helianthus annuus L.)

    TAB-003: A. C. Grundy (1993) The implications of extensification for crop weed interactions in cereals

    TAB-004: J. C. Peters (1994) Pattern and impact of disease in natural plant communities of different age

    TAB-005: A. Johanson (1993) Molecular methods for the identification and detection of the Mycosphaerella species that cause Sigatoka leaf spots of banana and plantain

    TAB-006: S. Barrow (1997) A monograph of Phoenix L. (Palmae: Coryphoideae)

    TAB-007: J. J. Smith (1993) Biochemistry of 1-aminocyclopropane-1-carboxylate (ACC) oxidase (the ethylene-forming enzyme) isolated from ripening fruits

    TAB-008: P. J. Harkett (1996) Studies on the use of cut seed tubers for the production of potatoes for French fry processing

The permission of the authors to use their texts for linguistic analysis is gratefully acknowledged.

References

    Bloor, M. and T.Bloor (1993) 'How economists modify propositions'. In W.Henderson, A.Dudley-Evans & R.Backhouse Economics and Language London: Routledge

    Carne, C. (1996) 'Corpora, genre analysis and dissertation writing: an evaluation of the potential of corpus-based techniques in the study of academic writing'. In S.Botley, J.Glass, T.McEnery & A.Wilson (eds) Proceedings of Teaching and Language Corpora 1996 Lancaster: UCREL

    Dudley-Evans, A. (1994) 'Genre analysis: an approach to text analysis for ESP'. In M. Coulthard (ed) Advances in Written Text Analysis London: Routledge

    Hanania, E. and K.Akhtar (1985) 'Verb form and rhetorical function in science writing: a study of MS theses in Biology, Chemistry and Physics' ESP Journal 4/1: 45-58

    Hill, S., B.Soppelsa and G.West (1982) 'Teaching ESL students to read and write experimental research papers' TESOL Quarterly 16:333-47

    Hyland, K. (1990) 'A genre description of the argumentative essay' RELC Journal 21/1:66-78

    Shaw, P. (1992) 'Reasons for the correlation of voice,tense, and sentence function in reporting verbs' Applied Linguistics 13/3:302-319

    Swales, J. (1990) Genre Analysis Cambridge: Cambridge University Press

    Thompson, G. and Ye Yiyun (1991) 'Evaluation in the reporting verbs used in academic papers' Applied Linguistics 12/4:365-382

Things to do

Contact Us

Tel: 0118 3788141

Email: appling@
reading.ac.uk

Page navigation

 

Search Form