Review of: Gard B. Jenset, Barbara McGillivray. Quantitative historical linguistics: A corpus framework. Oxford: Oxford University Press, 2017. 256 p. ISBN 9780198718178

 
PIIS0373658X0008306-5-1
DOI10.31857/S0373658X0008306-5
Publication type Review
Source material for review Gard B. Jenset, Barbara McGillivray. Quantitative historical linguistics: A corpus framework. Oxford: Oxford University Press, 2017. 256 p. ISBN 9780198718178
Status Published
Authors
Affiliation:
Russian State University for the Humanities
Russian Presidential Academy of National Economy and Public Administration
Stockholm University
Address: Russian Federation, Moscow; Kingdom of Sweden, Stockholm
Journal nameVoprosy Jazykoznanija
EditionIssue 1
Pages155-160
Abstract

  

Keywords
AcknowledgmentWork on the paper was supported by the project “Texts and practices of folklore: Typology, semiotics, new research methods” at the Russian State University for the Humanities.
Publication date02.03.2020
Number of characters20356
Cite  
100 rub.
When subscribing to an article or issue, the user can download PDF, evaluate the publication or contact the author. Need to register.
1 The publication under review is a comparatively rare specimen in contemporary linguistics: it is essentially a book-length argument in favour of a particular approach to doing historical-linguistics research. The authors aim “to introduce the framework for quanitative historical linguistics, and to provide some examples of how this framework can be applied in research” (p. 1; emphasis in the original); and then they do precisely this. Along the way, however, they also spend a great deal of effort to persuade the reader that their framework is actually the best possible way of doing historical linguistics and to refute alternative takes on the matter.
2 Jenset and McGillivray begin their argument by positing that the family of statistical models developed in corpus linguistics must be adopted by the historical-linguistics community. They note that even though historical linguistics is known to be highly “data-centric”, “quantitative corpus methods are still underused and often misused in historical linguistics, and an overarching methodological structure inside which to place such methods is missing” (p. 4). The book therefore endeavours to show “what it means to be empirical in historical linguistics research and how to go about doing it.” (ibid.).
3 The authors then pose and resolve several methodological questions, the most important of which are
4 Why should historical linguistics be corpus-based and quantitative? (Because otherwise it is impossible to reproduce other people’s research, properly formulate and refute claims, and compare models.)
5 And
6 Why should historical linguistics be probabilistic? (Because rigid symbolic models tend to be vulnerable to linguistic variation and performance factors. Jenset and McGillivray underline that it is possible to adhere to strict symbolic models of grammar on the theoretical level but still investigate their realisations using probabilistic methods.)
7 The scholars also note that the methods used to analyse corpus data must be adequate to the task. This boils down to the postulates that (i) presenting uncontextualised raw frequencies of occurrence of different phenomena is not enough; and that (ii) as historical-linguistic trends are usually shaped by an array of factors, researchers should use multivariate methods to model them (multivariate models also being useful to directly estimate explanatory power of competing hypotheses).
8 Jenset and McGillivray then explore a sociological angle. They survey the current state of the art in historical lingusitics by counting the number of quantitative and corpus-based articles in the latest issues of several historical-linguistics journals. They then compare the proportion of quantitative articles in each journal with the proportion of quantitative articles in Language, used as a baseline representing best practices in general linguistics. The scholars note that publications in Language tend on average to be more quantitative and empirical in nature than those from historical-linguistics journals and conclude that historical linguisics is still not a truly empirical, data-driven discipline.
9 They contextualise this issue using the Moore-ian technology-adoption life cycle. In this perspective, the adoption of corpus-based quantitative historical linguistics has reached a perilous “chasm” between the “early adopter” and “early majority” stages. The failure to cross this adoption threshold due to the general community’s refusal or hesitance to embrace empirical methods may become lethal to the discipline or at least seriously set back its development.
10 In order to push quantitative historical linguistics forward at this crucial juncture and propel it over the chasm, in Chapter 2 Jenset and McGillivray propose a new framework in which to conduct research in historical linguistics.
11 First, they solidify the terminology needed for such a framework. The following are regarded as the foundational terms:
  • Evidence: things that can be independently observed and verified by different researchers. Evidence can be quantiative (i.e. count-based) or distributional in nature; both types of evidence should be quantified in a way that makes independent verification feasible.
  • Claim: any statement based on the evidence, which does not repeat the evidence itself. Claims can be used as constituent elements for making further claims.
  • Probability. The researchers argue in favour of following the Bayesian approach, where probabilistic statements reflect the degree of their authors’ certainty, as this approach “is explicitly made contingent on our knowledge and our argumentation in a manner that is different and better than in the [frequentist] case” (p. 41).
  • Historical corpus: a machine-readable systematically sampled collection of natural-language texts representative of some state of the language. The scholars note that non-systematic samples, such as collections of examples, can be biased and should not be regarded as corpora.
  • Linguistic annotation scheme: a consistent way to annotate texts from a corpus.
  • Hypothesis: a claim that can be empirically verified.
  • Model: a representation of some linguistic phenomenon derived from statistical verification of hypotheses on corpus data.
  • Trend: a directional change in the probability of some linguistic phenomenon over time detectable and verifiable using statistical methods on corpus data.

Price publication: 100

Number of purchasers: 0, views: 723

Readers community rating: votes 0

Система Orphus

Loading...
Up