Journal of Scientific Information Research
Keywords
entity relation Extraction; method entity; automatic generation; syntactic template; seed learning
Abstract
[Purpose/significance]There are many relationships between method entities and application scenarios, problems,organizations and other entities. Extracting these entity relationships helps to capture the development trend of technology and promote the improvement of innovation ability.[Method/process]This paper discusses a method for extracting method entities and relations based on automatically generated syntactic templates. By designing a new adaptive template, the method improves flexibility and adaptability, reducing dependence on large-scale labeled data. Using a small number of seed triples, the method iteratively generates syntactic templates and extracts method entities and relations for the CSDN artificial intelligence topic blog. It also improves the extraction quality using a filter model. [Result/conclusion]After 5 rounds of iterative extraction, the triplet extraction accuracy of the model reaches 55.2%, which is better than the existing general model. The results show that this method can effectively use the limited labeled data to extract method entities and their relationships in specific fields, and provide support for scientific and technological information analysis in academia and industry.
First Page
30
Last Page
40
Submission Date
09-Aug-2024
Revision Date
09-Oct-2024
Acceptance Date
13-Nov-2024
Published Date
01-Jan-2025
Reference
[1] 章成志, 谢雨欣, 张恒. 学术文献全文内容中的方法实体细粒度抽取及演化分析研究[J]. 情报学报, 2023, 42 (08): 952-966.
[2] WANG Y Z, ZHANG C Z, LI K. A review on method entities in the academic literature: Extraction, evaluation, and application[J]. Scientometrics, 2022, 127 (05): 2479-2520.
[3] 章成志, 谢雨欣, 宋云天. 学术文本中细粒度知识实体的关联分析[J]. 图书馆论坛, 2021, 41 (03): 12-20.
[4] AONE C, HALVERSON L, HAMPTON T, et al. SrA: Description of The Ie2 System Used for MUC-7[C]//Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia: 1998.
[5] 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013 (06): 68-75.
[6] WANG Z Y, WANG K Y, LIU J Y, et al. Measuring the innovation of method knowledge elements in scientific literature[J]. Scientometrics, 2022, 127 (05): 2803-2827.
[7] MILLER S, FOX H, RAMSHAW L, et al. A novel use of statistical parsing to extract information from text[C]//1st Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle Washington: 2000.
[8] FUNDEL K, KÜFFNER R, ZIMMER R. RelEx: Relation extraction using dependency parse trees[J]. Bioinformatics, 2007, 23 (03): 365-371.
[9] 甘丽新, 万常选, 刘德喜, 等. 基于句法语义特征的中文实体关系抽取[J]. 计算机研究与发展, 2016, 53 (02): 284-302.
[10] 李明耀, 杨静. 基于依存分析的开放式中文实体关系抽取方法[J]. 计算机工程, 2016, 42 (6): 201-207.
[11] KAMBHATLA N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the ACL2004 on Interactive Poster and Demonstration SessionsBarcelona: 2004.
Combining lexical, syntactic, and semantic features with
maximum entropy models for extracting relations.
[12] ZHOU G D, SUN J, ZHANG J, et al. Exploring Various Knowledge in Relation Extraction[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor: 2005.
[13] BUI Q C, KATRENKO S, SLOOT P M A. A hybrid approach to extract protein-protein interactions[J]. Bioinformatics, 2011, 27 (02): 259-265.
[14] ZENG D J, LIU K, LAI S W, et al. Relation classification via convolutional deep neural network[C]//Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, Dublin: 2014.
[15] 余丽, 钱力, 付常雷, 等. 基于深度学习的文本中细粒度知识元抽取方法研究[J]. 数据分析与知识发现, 2019, 3 (01): 38-45.
[16] 丁睿祎, 王玉琢, 章成志. 基于学术论文全文内容的特定领域算法实体抽取研究[J]. 数字图书馆论坛, 2022 (03): 2-14.
[17] 江川, 王东波. 基于BERT的突发公共卫生重大传染病事件实体知识自动抽取研究[J]. 科技情报研究, 2021, 3 (02): 23-35.
[18] MIWA M, BANSAL M. End-to-end relation extraction using lstms on sequences and tree structures[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics Berlin: 2016.
[19] ZHOU P, SHI W, TIAN J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]//Proceedings of the 54th annual meeting of the association for computational linguistics Berlin: 2016.
[20] BEKOULIS G, DELEU J, DEMEESTER T, et al. Joint entity recognition and relation extraction as a multi-head selection problem[J]. Expert Systems with Applications, 2018, 114: 34-45.
[21] ZHONG Z, CHEN D. A frustratingly easy approach for entity and relation extraction[J]. arXiv preprint arXiv: 2010. 12812, 2020.
[22] ZHENG S, WANG F, BAO Hongyun, et al. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[J]. arXiv preprint arXiv: 1706. 05075, 2017.
[23] LI X Y, YIN F, SUN Z J. et al. Entity-relation extraction as multi-turn question answering[J]. arXiv preprint arXiv: 1905. 05529, 2019.
[24] DEVLIN J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810. 04805, 2018.
[25] 杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程, 2020, 46 (04): 40-45, 52.
[26] WEI Z P, SU J L, WANG Y, et al. A novel cascade binary tagging framework for relational triple extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online: 2019.
[27] 胡翼, 于海, 郭鑫, 等. 融合句法信息的实体关系联合抽取[J]. 计算机技术与发展, 2024, 34 (08): 93-100.
[28] 化柏林. 学术论文中方法知识元的类型与描述规则研究[J]. 中国图书馆学报, 2016, 42 (01): 30-40.
[29] 郝博. 基于句法模式识别的中文关系抽取方法研究与实现[D]. 成都: 电子科技大学, 2017.
[30] 胡日勒, 宗成庆, 徐波. 基于统计学习的机器翻译模板自动获取方法[J]. 中文信息学报, 2005, 19 (06): 1-6.
[31] 林哲辉. 基于浅层句法分析的翻译模板自动获取研究[D]. 厦门: 厦门大学, 2008.
[32] 赵妍妍, 秦兵, 车万翔, 等. 基于句法路径的情感评价单元识别[J]. 软件学报, 2011, 22 (05): 887-898.
[33] 潘浩, 卫宇杰, 潘尔顺. 基于自动提取句法模板的情感分析[J]. 中文信息学报, 2019, 33 (09): 129-140.
[34] 徐健. 基于句法依赖关系模板的术语相似度计算方法[J]. 现代图书情报技术, 2011 (09): 28-33.
[35] CHE W X, FENG Y L, QIN L B, et al. N-LTP: An open-source neural language technology platform for Chinese[J]. arXiv preprint arXiv: 2009. 11616, 2020.
[36] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv: 1607. 01759, 2016.
[37] YENER, ZENGBIN93, DIRTDUST, et al. Jiagu Natural Language Processing Tools[DB/OL]. (2019-12-03) [2024-09-01]. https://github.com/ownthink/Jiagu.
[38] PaddleNLP Contributors. PaddleNLP: An Easy-to-use and High Performance NLP Library[EB/OL]. [2024-09-01]. https://github.com/PaddlePaddle/PaddleNLP.
Digital Object Identifier (DOI)
10.19809/j.cnki.kjqbyj.2025.01.003
Recommended Citation
LI, Kuiliang and HUANG, Bolin
(2025)
"Method Entity and Relation Extraction Based on Automatically Generated Syntactic Templates: A Case Study of CSDN Artificial Intelligence Blog,"
Journal of Scientific Information Research: Vol. 7:
Iss.
1, Article 3.
DOI: 10.19809/j.cnki.kjqbyj.2025.01.003
Available at:
https://eng.kjqbyj.com/journal/vol7/iss1/3