Methods for terminology extraction in scientific texts (based on articles of earth sciences)



terminology, terminology extraction, thematic modeling, scientific communication


The article describes the theoretical and applied provisions of the initial stage of work on automatic extraction of terms from scientific texts. This stage of the work is a part of the state assignment of the Scientific Laboratory of Linguistic and Pedagogical Research on "Linguosemiotic heterogeneity of scientific picture of the world: theoretical and linguodidactic description". The aim of the research is to extract terms from a prepared corpus of scientific texts relating to a particular subject area. For this purpose, a corpus of scientific texts in the field of Earth Sciences, prepared by random sampling using the Semantic Scholar application, was used. The term extraction by automatic text processing (ATP) is a promising area of research as it simplifies the process of creating terminology systems or ontologies for highly specialized subject areas. With the rapidly changing flow of information, this type of work with texts is undoubtedly still relevant and allows for faster and more efficient processing of large volumes of material. However, it should be noted that automatic term extraction is not always accurate and may contain some errors. Therefore, it is important to carry out additional verification and correction of the results obtained. Prospects for the study are related to the improvement of existing automatic text processing tools. In addition, the analysis of the extracted terms has enabled us to form the basis for further practical research into the creation of a digital product (a digital model of certain terminology systems) for the storage, systematization and use of terminology systems for a certain highly specialized subject area.



Dement`eva Ya.Yu., Bruches E.P., Batura T.V. Terms extraction from texts of scien-tific papers. Programmny`e produkty` i sistemy`/Software & Systems. 2022;35(4):689–697. DOI: 10.15827/0236-235X.140.689-697 (In Russ.)

Bol`shakova E.I., Semak V.V. Combining methods to extract terms from scientific and technical text. Intellektual`ny`e sistemy`. Teoriya i prilozheniya. 2021;25(4):239–242. (In Russ.)

Grishman R. Information Extraction. The Handbook of Computational Linguistics and Natural Language Processing. A. Clark, C. Fox, and S. Lappin (Eds). WileyBlackwell; 2010. Pp. 515–530.

Bruches E. P., Batura T. V. Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision. Vestnik NGU. Seriya: Informacionny`e texnologii. 2021;19(2):5–16. DOI 10.25205/1818-7900-2021-19-2-5-16 (In Russ.)

Rogacheva, V. E`. Methods of extracting terminological units from the corpus of comparable texts. Vestnik Voronezhskogo gosudarstvennogo universiteta. Seriya: Lingvistika i mezhkul`turnaya kommunikaciya. 2017;(2):118–122. (In Russ.)

Eckart de Castilho R., Mújdricza-Maydt, É.,et al. A Web-based Tool for the Inte-grated Annotation of Semantic and Syntactic Structures. In Proceedings of the LT4DH workshop at COLING. 2016. Osaka, Japan (In Eng.)

Sheiko A.M. Language technology toolsin translation quality assurance. Kazan Lin-guistic Journal. 2023;6(2):282–293. DOI 10.26907/2658-3321.2023.6.2.282-293. (In Russ.)





Philological studies. Theoretical, applied and comparative linguistics