Principles of summarization of large amounts of data: features selection of language means

Authors

  • Liliya Sakaeva

Keywords:

large data volumes, linguistic data summarization, natural language, word

Abstract

Natural language is the main resource of people to communicate and understand phenomena and processes. The accelerated introduction of information and communication technologies in all fields of society promotes the generation of large data volumes which contains words, codes, symbols, and numbers. However, originally only a small part of this data can be understood and used by people. There are computational approaches that, using soft computing techniques and natural language, construct summaries in form of sentences which describe large volumes of data. The purpose of this paper is to briefly describe a method used in linguistic summaries   construction,   emphasizing   the    use    of    natural    language    to offer    relevant and comprehensive information about data. First, a background on the linguistic summarization of data is presented. Then two examples of the application of this technique to different databases are analyzed. In both cases, short sentences in natural language are obtained that offer summarized information about the data. In the second example, the beneficiaries showed a satisfaction index of 0.7 according to Iadov's logical framework.

References

Литература

Кузьмина Н.В. Методы исследования педагогической деятельности. Ленинград: Изд-во ЛГУ, 1970.

Pérez I., Santos O., García R., Piñero P., Ramírez E. Discovering linguistic summaries for help in project management // Cuban Journal of Computer Science. Vol. 12. Uciencia, 2018. Pp. 163–175.

Fayyad U. Knowledge Discovery in Databases: An Overview // Relational Data Mining. Berlin, Heidelberg: Springer, 2001. Pp. 28–47.

Yager R.R., Yager R.L. Using linguistic summaries and concepts for understanding large data // Engineering Applications of Artificial Intelligence. No 56. 2016. Pp. 273-280.

Kacprzyk J., Zadrożny S. Queries with Fuzzy Linguistic Quantifiers for Data of Variable Quality Using Some Extended OWA Operators //Advances in Intelligent Systems and Computing. Berlin: Springer. Vol 400. 2016. Pp. 295–305.

Rodríguez C.R. Construction of linguistic summaries of data from criminal processes. Research report (unpublished). Computers and Law Lab at University of Informatics Sciences. Havana, Cuba, 2017. 15 p.

References

Pérez I., Santos O., García R., Piñero P., Ramírez E. (2018). Discovering linguistic summaries for help in project management // Cuban Journal of Computer Science. Vol. 12. Uciencia. Pp. 163–175. (In English)

Fayyad, U. (2001). Knowledge Discovery in Databases: An Overview // Relational Data Mining. Berlin, Heidelberg: Springer. Pp. 28–47. (In English)

Yager, R.R., Yager, R.L. (2016). Using linguistic summaries and concepts for understanding large data // Engineering Applications of Artificial Intelligence. No 56. Pp. 273–280. (In English)

Kacprzyk, J., Zadrożny, S. (2016). Queries with Fuzzy Linguistic Quantifiers for Data of Variable Quality Using Some Extended OWA Operators // Advances in Intelligent Systems and Computing. Vol 400. Cham: Springer. Pp. 295–305. (In English)

Rodríguez, C.R. (2017). Construction of linguistic summaries of data from criminal processes. Research report (unpublished). Computers and Law Lab at University of Informatics Sciences. Havana, Cuba, 15 p. (In English)

Kuzmina, N.V. (1970). Metody issledovania pedagogicheskoi diiatelnosti

[Methods of pedagogical activity research]. Leningrad: Izd-vo LGU. (In Russian)

Published

2019-09-09

Issue

Section

Linguistics and intercultural communication