TATAR BELLETRISTIC LITERATURE CORPUS
Language: Tatar
Size: 15 mln. word occurences
Amount of sentences: 1,8 mln.
The corpus includes prosaic and poetic works of Tatar authors, texts of particular folklore genres, as well as works translated from other languages into Tatar. These texts chronologically refer to the time range from the XIX century to the present.
Each work in the Corpus has a meta-text markup, which contains information about the author, the title and genre of the work, the time of its creation.
Most of the words in the Corpus are morphologically annotated including information about lemmas, parts of the speech and grammatical characteristics.
Materials of the corpus are intended for specialists of philology, language teachers, students and schoolchildren, and will be useful to a wide range of people interested in the Tatar language and Tatar literature.
The developers express deep gratitude to the publishing collectives and funds that provided for the Corpus electronic versions of texts!