نسخه بتا

About

Corpus is a systematic, computerized, and accurate collection of language. Many linguistic researches and decisions in language planning can be made only by using a linguistic corpus. IranDak's text body was created in the Research Institute of Information Science and Technology of Iran and has nearly four million and 780 thousand words. The body data are taken from Information Processing and Management Research articles. The content of this body is not universal and has very specialized and interdisciplinary writings such as information science and epistemology, information technology, knowledge management, computational linguistics, terminology, and the like. Therefore, it is very valuable for processes that require the use of specialized writing.

‌Corpus Features

Very Specialized Writings

The text body of Irandak has nearly four million and 780 thousand words. The content of this corpus is not universal and has very specialized and interdisciplinary writings (such as librarianship and information, information technology, knowledge management, information science and epistemology, computational linguistics, terminology and the like).

Effective Search

In information retrieval, in addition to displaying the search word or phrase in the linguistic context, the name of the article in which that word or phrase is used, the subject of the article, the author(s) of the article, and the frequency of the search word or phrase are also displayed.

Comprehensive Tags

The corpora have tags of lexical parts of speech (POS tag) that are used in language processing. These tags specify the categories of words (such as nouns, adjectives, adverbs, etc.).

Ad edu
Ad pishine
Ad hamayesh
Ad edu