G. Ibragimov Institute of Language, Literature and Art


Language: Tatar
Size: 15,7 mln. word occurences
Amount of sentences: 1 831 043
Amount of word forms: 518 251
Amount of lemmas: 219 433
Amount of texts: 13 943

The corpus includes prosaic and poetic works of Tatar authors, texts of particular folklore genres, as well as works translated from other languages into Tatar. These texts chronologically refer to the time range from the XIX century to the present.

Each work in the Corpus has a meta-text markup, which contains information about the author, the title and genre of the work, the time of its creation.

Most of the words in the Corpus are morphologically annotated including information about lemmas, parts of the speech and grammatical characteristics.

Materials of the corpus are intended for specialists of philology, language teachers, students and schoolchildren, and will be useful to a wide range of people interested in the Tatar language and Tatar literature.

The developers express deep gratitude to the publishing collectives and funds that provided for the Corpus electronic versions of texts!