Journal of Scientific Information Research
Keywords
named entity recognition, BERT, boundary-aware, GAT, syntax enhancement
Abstract
[Purpose/significance] This study addresses the issue of inadequate perception of entity boundaries in traditional character-level modeling-based named entity recognition models by integrating syntax information containing entity boundary features into the task using a multi-head graph attention network with dense connections. This integration enhances the effectiveness of named entity recognition.
[Method/process] This study proposes a Syntax-enhanced Boundary-aware Named Entity Recognition Model (SynBNER), which utilizes BERT for text semantic representation and integrates syntax information using a dense-connected graph attention network. This integration incorporates implicit entity boundary information from syntax information into word representations, thereby enhancing the model's entity boundary perception capability.
[Result/conclusion] Empirical studies are conducted on the ACE2005, MSRA, and People's Daily datasets, the F1 scores of the SynBNER model are respectively 86.11%, 96.03%, and 95.81%. The experimental results demonstrate that the method of enhancing entity boundary awareness with syntax information can significantly improve the effectiveness of named entity recognition.
First Page
12
Last Page
23
Submission Date
15-Oct-2024
Revision Date
25-Nov-2024
Acceptance Date
26-Nov-2024
Published Date
01-Jul-2025
Reference
[1] GRISHMAN R, SUNDHEIM B. Message Understanding Conference-6: A Brief History[C]//Proceedings of the 16th Conference on Computational Linguistics, 1996, 1: 466-471.
[2] SHANG W, HUANG H, ZHU H, et al. A novel feature selection algorithm for text categorization[J]. Expert Systems with Applications, 2007, 33 (1): 1-5.
[3] KUMAR Y, KAUR K, KAUR S. Study of automatic text summarization approaches in different languages[J]. Artificial Intelligence Review, 2021, 54 (8): 5897-5929.
[4] BI M, ZHANG Q, ZUO M, et al. Bi-directional long short-term memory model with semantic positional attention for the question answering system[J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2021, 20 (5): 1-13.
[5] BABYCH B, HARTLEY A. Improving machine translation quality with automatic named entity recognition[C]//Pro-ceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools, Improving MT through Other Language Technology Tools: Resources and Tools for Building MT. Budapest: 2003, 1-8.
[6] ZHANG J, XIE J, HOU W, et al. Mapping the Knowledge Structure of Research on Patient Adherence: Knowledge Domain Visualization Based Co-word Analysis and Social Network Analysis[J]. PloS ONE, 2012, 7 (4): e34497.
[7] 艾孜海尔江·玉素甫, 姬东鸿, 李霏, 等. 乌兹别克语命名实体数据集构建研究[J]. 中文信息学报, 2023, 37 (9): 83-91.
[8] 黄政豪, 金光洙, 高君龙. 面向朝鲜语命名实体识别的多粒度融合方法[J]. 中文信息学报, 2023, 37 (8): 66-74.
[9] 陈娜, 孙艳秋, 燕燕. 结合注意力机制的BERT-BiGRU-CRF中文电子病历命名实体识别[J]. 小型微型计算机系统, 2023, 44 (8): 1680-1685.
[10] 柏兵, 侯霞, 石松. 基于CRF和BI-LSTM的命名实体识别方法[J]. 北京信息科技大学学报 (自然科学版), 2018, 33 (6): 27-33.
[11] 李纲, 潘荣清, 毛进, 等. 整合BiLSTM-CRF网络和词典资源的中文电子病历实体识别[J]. 现代情报, 2020, 40 (4): 3-12, 58.
[12] 宋旭晖, 于洪涛, 李邵梅. 基于图注意力网络字词融合的中文命名实体识别[J]. 计算机工程, 2022, 48 (10): 298-305.
[13] 赵丹丹, 黄德根, 孟佳娜, 等. 多头注意力与字词融合的中文命名实体识别[J]. 计算机工程与应用, 2022, 58 (7): 142-149.
[14] 吕书宁, 刘健, 徐金安, 等. 基于多任务标签一致性机制的中文命名实体识别[J]. 中文信息学报, 2023, 37 (12): 87-97.
[15] FANG Q, LI Y, FENG H, et al. Chinese Named Entity Recognition Model Based on Multi-Task Learning[J]. Applied Sciences, 2023, 13 (8): 4770.
[16] 杨晓辉, 毕雪华, 张琳琳, 等. 基于多任务的中文电子病历中命名实体识别研究[J]. 东北师大学报 (自然科学版), 2020, 52 (1): 81-87.
[17] KURU O, CAN O A, YURET D. Charner: Character-level Named Entity Recognition[C]//Proceedings of the 26th International Conference on Computational Linguistics. Osaka: 2016, 911-921.
[18] 宋雅文, 杨志豪, 罗凌, 等. 基于字符卷积神经网络的生物医学变异实体识别方法[J]. 中文信息学报, 2021, 35 (5): 63-69.
[19] 张召武, 徐彬, 高克宁, 等. 面向教育领域的基于SVR-BiGRU-CRF中文命名实体识别方法[J]. 中文信息学报, 2022, 36 (7): 114-122.
[20] 冀相冰, 朱艳辉, 徐啸, 等. 基于注意力机制的包装命名实体识别[J]. 包装工程, 2019, 40 (15): 24-29.
[21] 陈克金, 叶善力. 基于ERNIE与多特征融合的中文命名实体识别[J]. 浙江科技学院学报, 2023, 35 (5): 421-429, 456.
[22] 贾李睿智, 刘胜全, 刘源, 等. 基于分层ERNIE模型的中文嵌套命名实体识别[J]. 东北师大学报 (自然科学版), 2023, 55 (1): 97-103.
[23] 夏成魁, 李少波. 基于BERT-BiLSTM-MHA-CRF的中文命名实体识别方法[J]. 计算机与数字工程, 2023, 51 (9): 2087-2091, 2102.
[24] LI S, QI R, ZHANG S. Chinese Named Entity Recognition Based on Boundary Enhancement with Multi-Class Information[J]. Applied Sciences, 2023, 13 (23): 12925.
[25] ZHEN Y, LI Y, ZHANG P, et al. Frequent words and syntactic context integrated biomedical discontinuous named entity recognition method[J]. The Journal of Super-computing, 2023, 79 (12): 13670-13695.
[26] WANG J, KULKARNI M, PREOŢIUC-PIETRO D. Multi-domain named entity recognition with genre-aware and agnostic inference[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: 2020, 8476-8488.
[27] KRUENGKRAI C, NGUYEN T H, Aljunied S M, et al. Improving low-resource named entity recognition using joint sentence and token labeling[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: 2020, 5898-5905.
[28] MARTINS P H, MARINHO Z, MARTINS A F T. Joint Learning of Named Entity Recognition and Entity Linking[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: 2019, 190-196.
[29] FANG Q, LI Y, FENG H, et al. Chinese Named Entity Recognition Model Based on Multi-Task Learning[J]. Applied Sciences, 2023, 13 (8): 4770.
[30] 罗凌, 杨志豪, 宋雅文, 等. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J]. 计算机学报, 2020, 43 (10): 1943-1957.
[31] LUO Y, ZHAO H. Bipartite Flat-Graph Network for Nested Named Entity Recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: 2020, 6408-6418.
[32] SUN W, LIU S, LIU Y, et al. Named Entity Recognition Networks Based on Syntactically Constrained Attention[J]. Applied Sciences, 2023, 13 (6): 3993.
[33] WAN J, RU D, ZHANG W, et al. Nested named entity recognition with span-level graphs[C]//Proceedings of the 60 th Annual Meeting of the Association for Computational Linguistics. Dublin: 2022, 892-903.
[34] MU J, OUYANG J, YAO Y, et al. Span-Prototype Graph Based on Graph Attention Network for Nested Named Entity Recognition[J]. Electronics, 2023, 12 (23): 4753.
[35] 吴晓鸰, 陈祥旺, 占文韬, 等. 基于门控注意力单元的中文医学命名实体识别[J]. 广东工业大学学报, 2023, 40 (6): 176-184.
[36] 廖梦, 贾真, 李天瑞. 基于标签信息融合与多任务学习的中文命名实体识别[J]. 计算机科学, 2024, 51 (3): 198-204.
[37] 张晓, 李业刚, 王栋, 等. 基于ERNIE的命名实体识别[J]. 智能计算机与应用, 2020, 10 (3): 21-26.
[38] 姚贵斌, 张起贵. 基于XLnet语言模型的中文命名实体识别[J]. 计算机工程与应用, 2021, 57 (18): 156-162.
[39] 游乐圻, 裴忠民, 罗章凯. 融合自注意力的ALBERT中文命名实体识别方法[J]. 计算机工程与设计, 2023, 44 (2): 605-611.
[40] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型[J]. 数据分析与知识发现, 2022, 6 (Z1): 117-128.
[41] 施灿镇, 朱俊国, 余正涛. 融合字符与词性特征的泰语文本语法错误检测[J]. 中文信息学报, 2023, 37 (11): 38-48.
[42] 朱丹浩, 赵志枭, 吴娜, 等. 基于领域大语言模型的古籍分词研究[J]. 科技情报研究, 2024, 6 (2): 11-20.
[43] 张天宇, 孙媛媛, 杜文玉, 等. 基于语义边界增强的司法命名实体识别[J]. 清华大学学报 (自然科学版), 2024, 64 (5): 749-759.
[44] 武帅, 杨秀璋, 何琳, 等. 基于句法特征和Bert-BiLSTM-MHA-CRF的细粒度古籍实体识别研究[J]. 数据分析与知识发现, 2024, 8 (12): 136-148.
[45] 陆鑫涛, 孙丽萍, 凌晨, 等. 融入拼音与词性特征的中文电子病历命名实体识别[J]. 小型微型计算机系统, 2025, 46 (2): 330-338.
Digital Object Identifier (DOI)
10.19809/j.cnki.kjqbyj.2025.03.002
Recommended Citation
YU, Chuanming; DENG, Bin; and ZHANG, Zhengang
(2025)
"Syntax-enhanced Boundary-aware Named Entity Recognition Model,"
Journal of Scientific Information Research: Vol. 7:
Iss.
3, Article 2.
DOI: 10.19809/j.cnki.kjqbyj.2025.03.002
Available at:
https://eng.kjqbyj.com/journal/vol7/iss3/2
Included in
Applied Statistics Commons, Artificial Intelligence and Robotics Commons, Cataloging and Metadata Commons, Categorical Data Analysis Commons, Computational Linguistics Commons, Computer and Systems Architecture Commons, Databases and Information Systems Commons, Data Storage Systems Commons, Information Literacy Commons, Information Security Commons, Multivariate Analysis Commons, Probability Commons, Social and Cultural Anthropology Commons, Software Engineering Commons, Systems Architecture Commons, Theory and Algorithms Commons