UNI-MB - logo
UMNIK - logo
 
E-viri
Celotno besedilo
Recenzirano
  • Improving few-shot named en...
    Zhou, Diange; Li, Shengwen; Chen, Qizhi; Yao, Hong

    Neurocomputing (Amsterdam), 09/2024, Letnik: 597
    Journal Article

    Named entity recognition (NER) is to identify and categorize entities in unstructured text, which serves as a fundamental task for a variety of natural language processing (NLP) applications. In particular, emerging few-shot NER methods aim to learn model parameters well with few samples and have received considerable attention. The dominant few-shot NER methods usually employ pre-trained language models (PLMs) as their basic architecture and fine-tune model parameters with few NER samples. Since the sample size is small and there are a large number of parameters in PLMs, fine-tuning may result in the parameters of PLMs being highly biased. To address this issue, this study introduces the semantic distribution distance constraints to optimize the fine-tuning process of few-shot NER models and develops a framework named Semantic Constraints on few-shot Named Entity Recognition (SCNER). Specifically, the framework formulates the general knowledge transfer of PLMs as an optimal transport procedure with a semantic prior. And, a Semantics-induced Optimal Transport (SOT) regularizer is developed to utilize the importance and similarities of tokens within sentences. SOT builds the semantic distribution of the sentence and defines the transport costs between tokens to achieve the token-level optimal transport procedures. Finally, SOT is employed as a regularization term of few-shot NER to introduce the semantic distribution distance constraint for effectively transferring general knowledge from PLMs. The experiments on four public datasets demonstrate that the proposed method significantly improves the performance of NER models in both few-shot and fully supervised scenarios. SCNER is a common framework that can be applied to a variety of models without adding additional learning parameters, and can be used to enhance the generalization ability and adaptability of various few-shot NER models. •This paper highlights the contribution of the semantic distribution distance constraints on general knowledge transfer.•The proposed approach is a common framework without adding extra learning parameters.•Extensive experiments show the approach significantly improves the performance of the baseline models.