•  
  •  
 

Scientific Information Research

Keywords

K-means; autoencoder; topic model; topic detection and topic tracking

Abstract

[Purpose/significance]Exploring the evolution law of topics in literature during COVID-19 can not only reveal the hot topics and evolution paths in various fields in details,but also provide decision support for emergency response of the government.Method/process]In this paper,literature's topics were automatically detected and tracked by introducing the topic detection and tracking (TDT) technology,to excavate topic distribution and evolution paths in literature.The text feature extraction were conducted by the combination of auto-encoder and Word2vec and the topic evolution process was studied with K-means and cosine similarity calculation,while the topic model was optimized in combination with the LDA model.[Result/conclusion]The experimental results proved that the topic-words of literature changed significantly with time, which was consistent with the reality.In the early stage,the epidemic was concentrated in Wuhan and gradually shifted from "remote labor" to "vaccine".The research focused on epidemic prevention and control, economic public opinion and medical and health care.The introduction of TDT can systematically complete the topic detection and tracking of literature on COVID-19,and the multi-dimensional topic model can better adapt to the situation of research topic changing.

First Page

49

Reference

[1] PAK A,ADEGBOYE O A,ADEKUNLE A I,et al.Economic Consequences of the COVID-19 Outbreak:the Need for Epidemic Preparedness[J].Frontiers in Public Health,2020,8(241):1-4.
[2] FAHEY R A,HINO A.COVID-19,digital privacy,and the social limits on data-focused public health responses[J].International Journal of Information Management,2020(55):102181.
[3] RAO H R,VEMPRALA N,AKELLO P,et al.Retweets of officials' alarming vs reassuring messages during the COVID-19 pandemic:Implications for crisis management[J].International Journal of Information Management,2020(55):102187.
[4] 田轩,陈卓,刘碧波.“常态化防疫”阶段我国经济现状与基于科技的应对之策[J].中国科学基金,2020,34(06):719-727.
[5] 王俊,朱静敏,王雪瑶.公共卫生体系与医疗服务、医疗保障体系的融合协同:理论机制与案例分析[J].中国科学基金,2020,34(06):703-711.
[6] HOSSAIN M M.Current Status of Global Research on Novel Coronavirus Disease(COVID-19):A Bibliometric Analysis and Knowledge Mapping[J/OL].(2020-03-04)[2021-11-10].https://www.researchgate.net/publication/339696987_Current_Status_of_Global_Research_on_Novel_Coronavirus_Disease_COVID-19_
A_Bibliometric_Analysis_and_Knowledge_Mapping.
[7] FURSTENAU L B,RABAIOLI B,SOTT M K,et al.A bibliometric network analysis of coronavirus during the first eight months of COVID-19 in 2020[J].International Journal of Environmental Research and Public Health,2021,18(03):952.
[8] EBADI A,XI P,TREMBLAY S,et al.Understanding the temporal evolution of COVID-19 research through machine learning and natural language processing[J].Scientometrics,2021(126):725-739.
[9] 洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007(06):71-87.
[10] 赵旭剑,杨春明,李波,等.一种基于特征演变的新闻话题演化挖掘方法[J].计算机学报,2014,37(04):819-832.
[11] ZHU Z,LIANG J,LI D,et al.Hot Topic Detection Based on a Refined TF-IDF Algorithm[J].IEEE Access,2019(07):26996-27007.
[12] NUR'AINI K,NAJAHATY I,HIDAYATI L,et al.Combination of Singular Value Decomposition and K-means Clustering Methods for Topic Detection on Twitter[C]//(2015-04-15)[2021-11-10].International Conference on Advanced Computer Science and Information Systems.Sydney:2015.
[13] GENG X,Y ZHANG,Y JIAO,et al.A Novel Hybrid Clustering Algorithm for Topic Detection on Chinese Microblogging[J].IEEE Transactions on Computational Social Systems,2019,6(02):289-300.
[14] JANG H,REMPEL E,ROTH D,et al.Tracking COVID-19 Discourse on Twitter in North America:Topic Modeling and Aspect-based Sentiment Analysis (Preprint)[J].Journal of Medical Internet Research,2020,23(02):1-12.
[15] XU G X,MENG Y T,CHEN Z,et al.Research on Topic Detection and Tracking for Online News Texts[J].IEEE Access,2019,7(99):58407-58418.
[16] ZHENG H Y,LIAO C L,TIAN-ZHU L I.A topic detection method for network long text[J].Chinese Journal of Engineering,2019,41(09):1208-1214.
[17] LIU H L,CHEN Z W,TANG J,et al.Mapping the technology evolution path:a novel model for dynamic topic detection and tracking[J].Scientometrics,2020,125(03):2043-2090.
[18] 李保利,俞士汶.话题识别与跟踪研究[J].计算机工程与应用,2003(17):7-10,109.
[19] 洪宇,仓玉,姚建民,等.话题跟踪中静态和动态话题模型的核捕捉衰减[J].软件学报,2012,23(05):1100-1119.
[20] YANG Y M,AULT T,PIERCE T,et al.Improving text categorization methods for event tracking[C]//[2021-11-10].Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Athens:2000.
[21] 张小明,李舟军,巢文涵.基于增量型聚类的自动话题检测研究[J].软件学报,2012,23(06):1578-1587.
[22] 陈兴蜀,高悦,江浩,等.基于OLDA的热点话题演化跟踪模型[J].华南理工大学学报(自然科学版),2016,44(05):130-136.
[23] KUMARAN G,ALLAN J.Text classification and named entities for new event detection[C]//(2004-07-25)[2021-11-10].Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Sheffield:2004.
[24] 孙胜平.中文微博客热点话题检测与跟踪技术研究[D].北京:北京交通大学,2011.
[25] YAN D F, HUA E Z, HU B.An improved single-pass algorithm for Chinese microblog topic detection and tracking[C]//(2016-17-02)[2021-11-10].IEEE International Congress on Big Data(BigData Congress).San Francisco:2016.
[26] 李新盼.基于微博的网络舆情分析系统的设计与实现[D].成都:电子科技大学,2017.
[27] HUANG B,YANG Y,MAHMOOD A,et al.Microblog topic detection based on LDA model and single-pass clustering[C]//International Conference on Rough Sets and Current Trends in Computing.Berlin:2012.
[28] LI W J,FENG Y M,LI D J,et al.Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm[J].Automatic Control and Computer Sciences,2016(50):271-277.
[29] MURFI H,ROSALINE N,HARIADI N.Deep autoencoder-based fuzzy c-means for topic detection[J].ARRAY,2021(13):100124.
[30] 原旭莹.基于自动编码器的突发事件检测与预测方法研究[D].天津:天津大学,2018.
[31] 张仰森,段宇翔,黄改娟,等.社交媒体话题检测与追踪技术研究综述[J].中文信息学报,2019,33(07):1-10,30.
[32] CARBONELL J,YANG Y,LAFFERTY J,et al.CMU report on TDT-2:Segmentation, detection and tracking[C]//Proceedings of the DARPA broadcast news workshop.San Francisco,1999:117-120.
[33] ALLAN J,PAPKA R,LAVRENKO V.On-line new event detection and tracking[C]//Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval.Melbourne:1998.
[34] 席耀一,林琛,李弼程,等.基于语义相似度的论坛话题追踪方法[J].计算机应用,2011,31(01):93-96.
[35] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J/OL].(2013-09-07)[2021-11-10].https://arxiv.org/pdf/1301.3781.pdf.
[36] RUMELHART D E,HINTON G H,WILLIA R J.Learning Internal Representations by Error Propagation[M].Cambridge,Massachusetts:MIT Press,1986.
[37] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[38] HEINRICH G.Parameter estimation for text analysis[J/OL].[2021-11-10].http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5F4028E4CFEFDDBDACA3423164DF29BF?doi=10.1.1.74.6555&rep=rep1&type=pdf.

Share

COinS