Chenhui Dou


2024

pdf bib
Improving Chinese Named Entity Recognition with Multi-grained Words and Part-of-Speech Tags via Joint Modeling
Chenhui Dou | Chen Gong | Zhenghua Li | Zhefeng Wang | Baoxing Huai | Min Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Nowadays, character-based sequence labeling becomes the mainstream Chinese named entity recognition (CNER) approach, instead of word-based methods, since the latter degrades performance due to propagation of word segmentation (WS) errors. To make use of WS information, previous studies usually learn CNER and WS simultaneously with multi-task learning (MTL) framework, or treat WS information as extra guide features for CNER model, in which the utilization of WS information is indirect and shallow. In light of the complementary information inside multi-grained words, and the close connection between named entities and part-of-speech (POS) tags, this work proposes a tree parsing approach for joint modeling CNER, multi-grained word segmentation (MWS) and POS tagging tasks simultaneously. Specifically, we first propose a unified tree representation for MWS, POS tagging, and CNER.Then, we automatically construct the MWS-POS-NER data based on the unified tree representation for model training. Finally, we present a two-stage joint tree parsing framework. Experimental results on OntoNotes4 and OntoNotes5 show that our proposed approach of jointly modeling CNER with MWS and POS tagging achieves better or comparable performance with latest methods.