Issue |
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 4, August 2023
Page(s) | 299 - 308 | |
DOI | | |
Published online | 06 September 2023 |
Computer Science
CLC number: TP183
EPT: Data Augmentation with Embedded Prompt Tuning for Low-Resource Named Entity Recognition
College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China
† To whom correspondence should be addressed. E-mail:
Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.
Key words: data augmentation / token-label misalignment / named entity recognition / pre-trained language model / prompt
Biography: YU Hongfei, male, Master candidate, research direction: reinforcement learning, natural language process. E-mail:
