A national corpus of the Uzbek language will be created...

The Uzbek language is one of the largest languages belonging to the Turkic language family, spoken by about 50 million people around the world. In the following years, the position and prestige of the state language at the international level, determining the perspective of relations with other languages, creating a national corpus of the Uzbek language, Uzbek for compatriots living abroad and foreign citizens who want to learn the Uzbek language practical work is being done on the development of language textbooks, electronic programs and teaching the Uzbek language.
In 2020-2030 concept of development of the Uzbek language and improvement of the language policy, ensuring the active integration of the state language into modern information technologies and communications is defined as a priority. In the concept, it is a great responsibility for us, experts, to create an electronic national corpus of the Uzbek language, which includes all scientific, theoretical and practical information about the Uzbek language, and to popularize it in the world information network. uploaded.
Modern information technologies have opened the door of unlimited opportunities for using the functional capabilities of the language. Computer translation, automatic editing and analysis, text-to-speech synthesizers, speech-to-text speech recognition software, electronic dictionaries, linguistic mobile applications, thesauri, and language ontology are proof of our point. In particular, the creation of a culture of creating and using modern electronic dictionaries has proven to be effective in acquiring language skills. In particular, the role of language corpora, which are being created at a rapid pace around the world, is incomparable in terms of demonstrating and mastering the language.
In this regard, we started practical work together with 9 of our scientific and technical staff on the topic "Designing the national corpus of the Uzbek language and developing a software package".
First of all, we defined the main tasks of our project, such as analyzing existing foreign national corpora, determining the principles of creating a national corpus of the Uzbek language, formulating software requirements, designing, developing algorithms for software, testing and approval.
During this practical study, a model of the national corpus of the Uzbek language was created based on the analysis of the existing foreign national corpora. Models and algorithms for automatic text processing, tokenization, lemming and grammatical classification were developed. The website uzbekcorpora.uz has been launched to use the national corpus of the Uzbek language via the Internet.
The main result of the research will be a software package developed to create a national corpus of the Uzbek language. This software package is intended for experts in corpus linguistics and provides an opportunity to create authorial or thematic corpora and the Uzbek national corpus based on selected texts of the Uzbek language based on modern Internet technologies. The software is tested and ready for use by creating a corpus based on the texts of the Alpomish epic.
As a result of the practical work carried out in this direction, about 10 articles were published in foreign and local magazines, 24 articles were published at conferences, and 5 program certificates were obtained. In the next stages of practical research work, a corpus will be formed based on a collection of selected texts of the Uzbek language, and programs will be created that will allow various scientific researches to be conducted on the texts included in the corpus.
Suyun Karimov,
Professor of Samarkand State University.