Who Can Understand “Dunno”? Automatic Assessment of Text Complexity in Children’s Literature

 
PIIS013161170017239-1-1
DOI10.31857/S013161170017239-1
Publication type Article
Status Published
Authors
Affiliation: V. V. Vinogradov Russian Language Institute of the Russian Academy of Sciences
Address: Russian Federation, Moscow
Affiliation: Novosibirsk State University
Address: Russian Federation, Novosibirsk
Journal nameRusskaya Rech’
EditionIssue 5
Pages55-68
Abstract

The need to assess the readability of a given text may arise in different situations: drafting legal texts or manuals, writing textbooks, selecting literature for extracurricular reading. Especially interesting is the assessment of readability of educational texts for children, since such texts are expected to satisfy multiple requirements that may contradict each other. Children should understand these texts well, the texts should be relevant and interesting, and at the same time they should teach readers new concepts, words and constructions. Currently, age marking of texts for children is carried out by experts manually, which makes the process long and laborious, with the results likely to be subjective. We propose a method for automatic classification of texts with regard to complexity using a neural network model. This method is supposed to be used to create a corpus of children's literature with target age markup (within the framework of the Russian National Corpus). The quality of the predictions of our model reaches 0.92. The emergence of an automatic mechanism that estimates the readability level of a given text with acceptable accuracy will make it possible to quickly create a representative corpus of texts written for children, with the possibility of selecting texts that are obviously understandable to children of a given age. Such a corpus would be in demand by teachers, parents, translators of fiction, linguists, and everyone who intends to select fiction texts that are understandable to children.

Keywordscorpus linguistics, children's literature, readability, text complexity, machine learning, neural networks
AcknowledgmentThis research is supported by the grant from the RFBR No. 19-29-14224.
Received12.12.2021
Publication date12.12.2021
Number of characters21764
Cite  
100 rub.
When subscribing to an article or issue, the user can download PDF, evaluate the publication or contact the author. Need to register.

Number of purchasers: 0, views: 673

Readers community rating: votes 0

1. Breiman L. Random forests. Machine learning, 45, 2001, pp. 5–32.

2. Feygina T. Kriterii vyvodimosti znacheniya neizvestnogo slova iz konteksta [Criteria of deducibility of the meaning of an unknown word based on context]. Course paper (manuscript), 2021.

3. Iomdin B. L., Morozov D. A. [Deceptive words and where to find them]. Komp’yuternaya lingvistika i intellektual’nye tehnologii. Vol. 19 (26), 2020, supp. vol., pp.1011–1024. (In Russ.)

4. Isaeva U., Sorokin A. Investigating the robustness of reading difficulty models for Russian educational texts. Recent trends in analysis of images, social networks and texts, vol. 1357, 2020, pp. 65–77.

5. Ivanov V., Solnyshkina M., Solovyev V. Efficiency of text readability features in Russian academic texts. Komp’yuternaya lingvistika i intellektual’nye tehnologii. Vol 17 (24), 2018, pp. 267–283.

6. Glazkova A., Egorov Yu., Glazkov M. A. Comparative study of feature types for age-based text classification. Analysis of images, social networks and texts. Springer International Publishing, 2021, pp.120–134.

7. Jones K. S. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation. MCB University: MCB University Press, 2004. Vol. 60, no. 5, pp. 493–502.

8. Kuratov Y., Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language, arXiv preprint arXiv:1905.07213, 2019.

9. Melamed, I. Dan. Measuring semantic entropy. Proceedings of the SIGLEX Workshop on tagging text with lexical semantics, 2002, pp. 41–46.

10. Mikk Jaan, Uibo Heli, Elts Jaanus. Word length as an indicator of semantic complexity. Text as a linguistic paradigm: levels, constituents, constructs. Quantitative linguistics, 2001, pp. 187–195.

11. Morozov D. A., Iomdin B. L. [Criteria of semantic complexity of words]. Komp’yuternaya lingvistika i intellektual’nye tehnologii. Vol. 18, 2019, supp. vol, pp. 119–131. (In Russ.)

12. Powers D. M. W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of machine learning technologies, 2011, pp. 37–63.

13. Raukko J. Polysemy as complexity? A man of measure. SKY journal of linguistics. Finland: The linguistic association of Finland, 2006, pp. 357–361.

14. Zhang W., Itoh K., Tanida, J., & Ichioka, Y. Parallel distributed processing model with local space-invariant interconnections and its optical architecture. Applied optics, 29 (32), 1990, pp. 4790–4797.

Система Orphus

Loading...
Up