Issue |
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 4, August 2023
|
|
---|---|---|
Page(s) | 299 - 308 | |
DOI | https://doi.org/10.1051/wujns/2023284299 | |
Published online | 06 September 2023 |
Computer Science
CLC number: TP183
EPT: Data Augmentation with Embedded Prompt Tuning for Low-Resource Named Entity Recognition
College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China
† To whom correspondence should be addressed. E-mail: yhuang@mail.hzau.edu.cn
Received:
10
March
2023
Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.
Key words: data augmentation / token-label misalignment / named entity recognition / pre-trained language model / prompt
Biography: YU Hongfei, male, Master candidate, research direction: reinforcement learning, natural language process. E-mail: hfyu@webmail.hzau.edu.cn
© Wuhan University 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.