Morphological Analysis Corpus Construction of Uyghur

Abudouwaili Gulinigeer; Abiderexiti Kahaerjiang; Wushouer Jiamila; Shen Yunfei; Maimaitimin Turenisha; Yibulayin Tuergen

Morphological Analysis Corpus Construction of Uyghur

Abudouwaili Gulinigeer, Abiderexiti Kahaerjiang, Wushouer Jiamila, Shen Yunfei, Maimaitimin Turenisha, Yibulayin Tuergen

Abstract

Morphological analysis is a fundamental task in natural language processing and results can beapplied to different downstream tasks such as named entity recognition syntactic analysis andmachine translation. However there are many problems in morphological analysis such as lowaccuracy caused by a lack of resources. In this paper to alleviate the lack of resources in Uyghurmorphological analysis research we construct a Uyghur morphological analysis corpus based onthe analysis of grammatical features and the format of the general morphological analysis corpus. We define morphological tags from 14 dimensions and 53 features manually annotate and correctthe dataset. Finally the corpus provided some informations such as word lemma part of speech morphological analysis tags morphological segmentation and lemmatization. Also this paperanalyzes some basic features of the corpus and we use the models and datasets provided bySIGMORPHON Shared Task organizers to design comparative experiments to verify the corpus’savailability. Results of the experiment are 85.56% 88.29% respectively. The corpus provides areference value for morphological analysis and promotes the research of Uyghur natural language processing.

Anthology ID:: 2021.ccl-1.96
Volume:: Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:: August
Year:: 2021
Address:: Huhhot, China
Editors:: Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 1076–1086
Language:: English
URL:: https://aclanthology.org/2021.ccl-1.96/
DOI:
Bibkey:
Cite (ACL):: Abudouwaili Gulinigeer, Abiderexiti Kahaerjiang, Wushouer Jiamila, Shen Yunfei, Maimaitimin Turenisha, and Yibulayin Tuergen. 2021. Morphological Analysis Corpus Construction of Uyghur. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1076–1086, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):: Morphological Analysis Corpus Construction of Uyghur (Gulinigeer et al., CCL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.ccl-1.96.pdf

PDF Cite Search Fix data