•  
  •  
 

Scientific Information Research

Keywords

BERT; major infectious diseases; decision support; entity recognition; deep learning; public health

Abstract

[Purpose/significance]In view of the lack of research on decision-making support for major public health emergencies in the current research,and the insufficient construction of relevant knowledge bases,this study established a corpus of physical knowledge of major infectious diseases for public health emergencies Provide decision support for major infectious disease events.[Method/process]Taking People's Daily Online infectious disease news,Baidu Encyclopedia infectious disease entry,Chinese Academy of Sciences virus case database as the research objects,using conditional random field model,recurrent neural network model,pre-training text representation model,BERT for public health emergencies of the entity knowledge of major infectious disease events is identified and analyzed,and the temporal evolution of infectious diseases is analyzed through visualization.[Result/conclusion]A BERT-based automatic extraction model of physical knowledge of major public health infectious disease events was constructed,and its accuracy,recall,and reconciliation average reached 84.09%,87.71%,85.86%.It can provide timely,reliable and effective information for decision-making of relevant departments.

First Page

23

Reference

[1] 中华人民共和国国务院.突发公共卫生事件应急条例[EB/OL].(2020-12-26)[2021-02-07].http://www.gov.cn/zhengce/2020-12/26/content_5574586.htm. [2] 徐健,周华阳,叶光辉.突发公共卫生事件知识库构建研究[J].图书馆学研究,2018(11):26-39. [3] 李纲,陈璟浩,毛进.突发公共卫生事件网络语料库系统构建[J].情报学报,2013,32(09):936-944. [4] 王东波,吴毅,叶文豪,等.多特征知识下的食品安全事件实体抽取研究[J].数据分析与知识发现,2017,1(03):54-61. [5] KIM J,OHTA T,TSURUOKO Y,et al.Introduction to the Bio-entity Recognition Task at JNLPBA[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications.Geneva,Switzerland:Association for Computational Linguistics,2004:70-75. [6] Nédellec C,BOSSY R,Kim J D,et al.Overview of BioNLP Shared Task 2013[C]//Pro-ceedings of the BioNLP Shared Task 2013 Workshop. Sofia,Bulgaria:Association for Computational Linguistics,2013:1-7. [7] WEI C,PENG Y,LEAMAN R,et al.Assessing the State of the Art in Biomedical Relation Extraction:Overview of the BioCreative V Chemical-Disease Relation (CDR) Task[J].Database,2016,2016:baw032. [8] JAGANNARHAN V,E LMAGHRABY A S.MEDKAT Multiple Expert Del phi-based Knowledge Acquisition Tool[C]//Proceedings of the Second Annual ACM Northeast Regional Conference.1985:30-34. [9] SAVOCA G K,MASANZ J J,OGREN P V,et al.Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES):architecture,component evaluation and applications[J].Journal of the American Medical Informatics Association,2010,17(05):507-513. [10] KRAUTHAMMER M,RZHETSKY A,MOROZOV P,et al.Using BLAST for identifying gene and protein names in journal articles[J].Gene,2000,259(1-2):245-252. [11] PROUX D,RECHENMANN F,JULLIARD L,et al.Detecting gene symbols and names in biological texts[J].Genome Informatics,1998,(09):72-80. [12] SURUOKA Y,MCNAUGHT J,ANANIADOU S.Normalizing biomedical terms by minimizing ambiguity and variability[C]//BMC bioinformatics.BioMed Central,2008,9(03):S2. [13] Li Y,GORMAN S L,ELHADAD N.Section classification in clinical notes using supervised hidden markov model[C]//Proceedings of the 1st ACM International Health Informatics Symposium.2010: 744-750. [14] SAHA S K,SARKAR S,MITRA P.Feature selection techniques for maximum entropy based biomedical named entity recognition[J].Journal of biomedical informatics,2009,42(05):905-911. [15] 田家源,杨东华,王宏志.面向互联网资源的医学命名实体识别研究[J].计算机科学与探索,2018,12(06):898-907. [16] 李丽双,何红磊,刘珊珊,等.基于词表示方法的生物医学命名实体识别[J].小型微型计算机系统,2016,37(02):302-307. [17] 孙安,于英香,罗永刚,等.序列标注模型中的字粒度特征提取方案研究:以CCKS2017:Task2临床病历命名实体识别任务为例[J].图书情报工作,2018,62(11):103-111. [18] LEI J B,Tang B Z,Lu X Q,et al.A comprehensive study of named entity recognition in Chinese clinical text[J].Journal of the American Medical Informatics Association,2014,21(05):808-814. [19] 赵青,王丹,徐书世,等.一种基于RNN的弱监督中文医疗实体识别方法[J].哈尔滨工程大学学报,2020,41(03):425-432. [20] 陈德鑫,占袁圆,杨兵,等.基于CNN-BiLSTM模型的在线医疗实体抽取研究[J].图书情报工作,2019,63(12):105-113. [21] 曹春萍,关鹏举.基于E-CNN和BLSTM-CRF的临床文本命名实体识别[J].计算机应用研究,2019,36(12):3748-3751. [22] 李丽双,郭元凯.基于CNN-BLSTM-CRF模型的生物医学命名实体识别[J].中文信息学报,2018,32(01):116-122. [23] LI L,ZhAO J,HOU L,et al.An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records[J].BMC Medical Informatics and Decision Making,2019,19(05):235. [24] 李博,康晓东,张华丽,等.采用Transformer-CRF的中文电子病历命名实体识别[J].计算机工程与应用,2020,56(05):153-159. [25] 杨建林,王文龙.公共卫生类突发事件的抽取研究[J].情报理论与实践,2016,39(04):51-59. [26] 邵琦,牟冬梅,王萍,等.基于语义的突发公共卫生事件网络舆情主题发现研究[J].数据分析与知识发现,2020,4(09):68-80. [27] 冯鑫,李雪,闫月,等.基于知识实体的突发公共卫生事件数据平台构建研究[J].知识管理论坛,2020,5(03):175-190. [28] 李月琳,王姗姗.面向突发公共卫生事件的相关信息发布特征分析[J].图书与情报,2020(01):27-33,50. [29] 张志强,张邓锁,胡正银.突发重大公共卫生事件应急集成知识咨询服务体系建设与实践:以新冠肺炎(COVID-19)疫情事件为例[J].图书与情报,2020(02):1-12. [30] LAFFERTY J,GCCALLUM A,PEREIRA F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//In Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001),2001. [31] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(08):1735-1780. [32] DEY R,SALEMT F M.Gate-variants of gated recurrent unit (GRU) neural networks[C]//2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS).IEEE,2017:1597-1600. [33] HUANG Z H,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[J].arXiv preprint arXiv:1508.01991,2015. [34] DEVLIN J,CHANGang M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J/OL].Cornell University,Arxiv,2019:1810.04805v2.

Share

COinS