•  
  •  
 

Journal of Scientific Information Research

Keywords

topic evolution; BERTopic; semantic function; topic identification

Abstract

[Purpose/significance] Topic evolution analysis can help researchers quickly grasp the research hotspots and development trends of a discipline. However, existing topic models often overlook the semantic functions and structures of texts during topic extraction, making it difficult to reveal the deeper patterns of disciplinary development. This paper proposes an integrated framework for topic evolution analysis that combines the BERTopic model with semantic functions, aiming to enrich and improve the methodological system of topic evolution research.

[Method/process] Firstly, the BERTopic model is used to extract topics, obtaining the“Topic-Word”distribution. Next, a discourse parsing tool analyzes abstracts into five semantic function segments, resulting in the “Semantic Function-Word” distribution. Finally, the two distributions are mapped to obtain the “Topic-Semantic Function” distribution. This approach analyzes topics from a semantic function perspective and explores the impact of semantic function distribution on topic evolution.

[Result/conclusion] An empirical study in the field of library and information science shows that the semantic function distribution of a topic affects its research popularity. Topics oriented towards “Method” and “Objective” may continue to rise in the future, while topics oriented towards “Background” are relatively mature and may enter a decline phase. The proposed method provides a more granular and accurate analysis of discipline development dynamics, helping the academic community better understand the dynamic changes in research hotspots.

First Page

13

Last Page

23

Submission Date

February 2025

Revision Date

April 2025

Acceptance Date

April 2025

Publication Date

October 2025

Digital Object Identifier (DOI)

ꎺ 10.19809/j.cnki.kjqbyj.2025.04.002

Reference

[1] 滕广青, 江瑶, 庹锐. 基于多数据源维度的领域知识演化对比研究: 以美国石墨烯领域研究为例[J]. 情报资料工作, 2023, 44 (6): 61-70.
[2] 郝雯柯, 杨建林. 基于语义表示和动态主题模型的社科领域新兴主题预测研究[J]. 情报理论与实践, 2023, 46 (2): 184-193.
[3] 曲佳彬, 欧石燕. 基于主题过滤与主题关联的学科主题演化分析[J]. 数据分析与知识发现, 2018, 2 (1): 64-75.
[4] 梁爽, 刘小平. 基于文本挖掘的科技文献主题演化研究进展[J]. 图书情报工作, 2022, 66 (13): 138-149.
[5] 徐路路, 王效岳, 白如江. 基于PLDA模型与多数据源融合相关性分析的新兴主题探测研究: 以石墨烯领域为例[J]. 情报理论与实践, 2018, 41 (4): 63-69, 43.
[6] 喻琪琛, 王晓光. 科学论文摘要语义增强形式调查研究[J]. 数字图书馆论坛, 2017 (8): 8-15.
[7] 霍朝光, 董克, 司湘云. 国内外LIS学科主题热度演化分析与预测[J]. 图书情报知识, 2021 (2): 35-47, 57.
[8] 刘敏娟, 张学福, 颜蕴. 基于核心词、突变词与新生词的学科主题演化方法研究[J]. 情报杂志, 2016, 35 (12): 175-180.
[9] 唐果媛. 基于共词分析法的学科主题演化研究方法的构建[J]. 图书情报工作, 2017, 61 (23): 100-107.
[10] 隗玲, 许海云, 胡正银, 等. 学科主题演化路径的多模式识别与预测: 一个情报学学科主题演化案例[J]. 图书情报工作, 2016, 60 (13): 71-81.
[11] 叶春蕾, 冷伏海. 基于共词分析的学科主题演化方法改进研究[J]. 情报理论与实践, 2012, 35 (3): 79-82.
[12] 王晓光, 程齐凯. 基于NEViewer的学科主题演化可视化分析[J]. 情报学报, 2013, 32 (9): 900-911.
[13] 张金柱, 吕品. 基于主题关联度改进的主题演变和突变分析[J]. 情报理论与实践, 2018, 41 (3): 129-135.
[14] 刘自强, 王效岳, 白如江. 多维度视角下学科主题演化可视化分析方法研究: 以我国图书情报领域大数据研究为例[J]. 中国图书馆学报, 2016, 42 (6): 67-84.
[15] 刘自强, 王效岳, 白如江. 语义分类的学科主题演化分析方法研究: 以我国图书情报领域大数据研究为例[J]. 图书情报工作, 2016, 60 (15): 76-85, 93.
[16] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003 (3): 993-1022.
[17] BLEI D M, LAFFERTY J D. Dynamic Topic Models[C]//Proceedings of the 23rd International Conference on Machine Learning, New York: ACM, 2006: 113-120.
[18] WANG X, MCCALLUM A. Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia: 2006: 424-433.
[19] WANG C, BLEI D M, HECKERMAN D E. Continuous Time Dynamic Topic Models[C]//Proceedings of uncertainty in artificial intelligence.Catalina Island: Association for Uncertainty in Artificial Intelligence, 2012: 579-586.
[20] 胡吉明, 陈果. 基于动态LDA主题模型的内容主题挖掘与演化[J]. 图书情报工作, 2014, 58 (2): 138-142.
[21] 秦晓慧, 乐小虬. 基于LDA主题关联过滤的领域主题演化研究[J]. 现代图书情报技术, 2015 (3): 18-25.
[22] 李湘东, 张娇, 袁满. 基于LDA模型的科技期刊主题演化研究[J]. 情报杂志, 2014, 33 (7): 115-121.
[23] 齐亚双, 祝娜, 翟羽佳. 基于DTM的国内外情报学研究主题热度演化对比研究[J]. 图书情报工作, 2016, 60 (16): 99-109.
[24] YAN E. Research dynamics, impact, and dissemination: A topic-level analysis[J]. Journal of the Association for Information Science and Technology, 2015, 66 (11): 2357-2372.
[25] GROOTENDORST M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure[J]. arXiv e-prints arXiv: 2203. 05794, 2022.
[26] 曹树金, 曹茹烨. 基于研究主题和引文分析的信息资源管理学科发展探究[J]. 信息资源管理学报, 2023, 13 (2): 12-29.
[27] 高春玲, 姜莉媛, 董天宇. 基于BERTopic模型的老年人健康信息需求主题演化研究: 以新浪微博平台为例[J]. 情报科学, 2024, 42 (4): 111-118.
[28] 杨思洛, 于永浩. 基于BERTopic模型的国内信息资源管理研究主题挖掘与演化分析[J]. 情报科学, 2024, 42 (8): 12-21.
[29] 杨思洛, 吴丽娟. 基于BERTopic模型的国外信息资源管理研究进展分析[J]. 情报理论与实践, 2024, 47 (2): 189-197.
[30] EGGER R, YU J. A Topic Modeling Comparison between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts[J]. Frontiers in Sociology, 2022, 7: 886498.
[31] SciAIEngine. 科技文献知识人工智能引擎[EB/OL]. [2024-03-15]. http://sciengine.las.ac.cn/.
[32] 王露荷, 吕沨绚, 虞逸飞, 等. 图书情报领域期刊全文本引文特征研究: 基于被引参考文献深度分析功能[J]. 图书情报工作, 2023, 67 (9): 110-120.
[33] 周海晨, 章成志, 胡志刚, 等. 全文计量分析的实践与展望: 理论、方法与应用: 2022全文本文献计量分析学术沙龙综述[J]. 信息资源管理学报, 2023, 13 (2): 135-142.
[34] ZHAI Y J, DING Y, WANG F. Measuring the diffusion of an innovation: A citation analysis[J]. Journal of the Association for Information Science and Technology, 2018, 69 (3): 368-379.
[35] 吴志祥, 赵凯蕊, 何超, 等. 学科领域主题演化与预测研究: 理论、方法与价值[J]. 情报理论与实践, 2022, 45 (6): 98-105.

Share

COinS