Journal of Scientific Information Research

Review of Automatic Processing of Ancient Chinese Character and Prospects for Its Development Trends in the New Era

Sanhong DENG, 1.School of Information Management,Nanjing University,Nanjing 210023 2.Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023
Haotian HU, 1.School of Information Management,Nanjing University,Nanjing 210023 2.Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023
Lei GUO, Intelligence and Information Research Center,The Sixth Academy of China Aerospace
Yanyan ZHANG, China Aerodynamics Research and Development Center

Keywords

ancient Chinese character automatic processing; digital humanities; traditional culture; cultural confidence; ancient Chinese character information processing

Abstract

[Purpose/significance]With the popularization of digitized ancient books and documents,the use of natural language processing and big data analysis technology to carry out text mining and knowledge discovery on ancient Chinese books has gradually become an important research direction in the field of ancient information processing of digital humanities and an important way to reflect cultural confidence.[Method/process]This article defined the concept of ancient Chinese character automatic processing.Wesorted out the connotation and extension of the ancient Chinese character automatic processing,and grasped the overall research status and development trend of this fieldfrom the three aspects of the field of automatic ancient texts processing and model algorithms,corpus resources and existing tools,knowledge bases and platform system.[Result/conclusion]We conducted a more comprehensive summary of the current research status of ancient Chinese character automatic processing,and analyzed the existing problems and deficiencies.

First Page

Last Page

Submission Date

23-Oct-2020

Revision Date

20-Nov-2020

Published Date

01-Jan-2021

Reference

[1] 习近平. 坚定文化自信, 建设社会主义文化强国 [J]. 奋斗, 2019 (12): 1-10.
[2] 黄水清. 人文计算与数字人文: 概念、问题、范式及关键环节 [J]. 图书馆建设, 2019 (05): 68-78.
[3] 黄水清, 王东波. 古文信息处理研究的现状及趋势 [J]. 图书情报工作, 2017, 61 (12): 43-49.
[4] 顾磊, 赵阳. 古籍智能整理研究现状及存在的问题 [J]. 图书馆学研究, 2016 (09): 54-58.
[5] 石民, 李斌, 陈小荷. 基于CRF的先秦汉语分词标注一体化研究 [J]. 中文信息学报, 2010, 24 (02): 39-45.
[6] 黄建年. 农业古籍的计算机断句标点与分词标引研究 [D]. 南京: 南京农业大学, 2009.
[7] FANG M, JIANG Y, ZHAO Q, et al. Automatic word segmentation for Chinese classics of tea based on tree-pruning [C] //2009 Second International Symposium on Knowledge Acquisition and Modeling. IEEE, 2009, (01): 438-441.
[8] 徐润华, 陈小荷. 一种利用注疏的《左传》分词新方法 [J]. 中文信息学报, 2012, 26 (02): 13-17, 45.
[9] 段磊, 韩芳, 宋继华. 古汉语双字词自动获取方法的比较与分析 [J]. 中文信息学报, 2012, 26 (04): 34-42.
[10] TONG FEI C, WEI MENG Z, XUE QIANG L, et al. A kalman filter based human-computer interactive word segmentation system for ancient chinese texts [M]. Chinese computational linguistics and natural language processing based on naturally annotated big data. Berlin, Heidelberg: Springer, 2013: 25-35.
[11] 梁社会, 陈小荷. 先秦文献《孟子》自动分词方法研究 [J]. 南京师范大学文学院学报, 2013 (03): 175-182.
[12] 王嘉灵. 以《汉书》为例的中古汉语自动分词 [D]. 南京: 南京师范大学, 2014.
[13] 王姗姗, 王东波, 黄水清, 等. 多维领域知识下的《诗经》自动分词研究 [J]. 情报学报, 2018, 37 (02): 183-193.
[14] 留金腾, 宋彦, 夏飞. 上古汉语分词及词性标注语料库的构建: 以《淮南子》为范例 [J]. 中文信息学报, 2013, 27 (06): 6-15, 81.
[15] 黄水清, 王东波, 何琳. 以《汉学引得丛刊》为领域词表的先秦典籍自动分词探讨 [J]. 图书情报工作, 2015, 59 (11): 127-133.
[16] 王晓玉, 李斌. 基于CRFs和词典信息的中古汉语自动分词 [J]. 数据分析与知识发现, 2017, 1 (05): 62-70.
[17] FU X, YUAN T, LI X, et al. Research on the Method and SYSTEM of Word Segmentation and POS Tagging for Ancient Chinese Medicine Literature [C] //2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2019: 2493-2498.
[18] LI S, LI M, XU Y, et al. Capsules Based Chinese Word Segmentation for Ancient Chinese Medical Books [J]. IEEE Access, 2018, (06): 70874-70883.
[19] 李成名. 基于深度学习的古籍词法分析研究 [D]. 南京: 南京师范大学, 2018.
[20] 程宁, 李斌, 葛四嘉, 等. 基于BiLSTM-CRF的古汉语自动断句与词法分析: 体化研究 [J]. 中文信息学报, 2020, 34 (04): 1-9.
[21] 朱晓, 金力. 条件随机场图模型在《明史》词性标注研究中的应用效果探索 [J]. 复旦学报 (自然科学版), 2014, 53 (03): 297-304.
[22] 钱智勇, 周建忠, 童国平, 等. 基于HMM的楚辞自动分词标注研究 [J]. 图书情报工作, 2014, 58 (04): 105-110.
[23] 王东波, 黄水清, 何琳. 基于多特征知识的先秦典籍词性自动标注研究 [J]. 图书情报工作, 2017, 61 (12): 64-70.
[24] 曾艳, 侯汉清. 古籍文本抽词研究 [J]. 图书情报工作, 2008 (01): 132-135.
[25] 朱锁玲, 包平. 方志类古籍地名识别及系统构建 [J]. 中国图书馆学报, 2011, 37 (03): 118-124.
[26] 皇甫晶, 王凌云. 基于规则的纪传体古代汉语文献姓名识别 [J]. 图书情报工作, 2013, 57 (03): 120-124.
[27] 汤亚芬. 先秦古汉语典籍中的人名自动识别研究 [J]. 现代图书情报技术, 2013 (Z1): 63-68.
[28] 黄水清, 王东波, 何琳. 基于先秦语料库的古汉语地名自动识别模型构建研究 [J]. 图书情报工作, 2015, 59 (12): 135-140.
[29] 李娜, 包平. 面向数字人文的馆藏方志古籍地名自动识别模型构建 [J]. 图书馆, 2018 (05): 67-73.
[30] 王东波, 高瑞卿, 沈思, 等. 面向先秦典籍的历史事件基本实体构件自动识别研究 [J]. 国家图书馆学刊, 2018, 27 (01): 65-77.
[31] 袁悦, 王东波, 黄水清, 等. 不同词性标记集在典籍实体抽取上的差异性探究 [J]. 数据分析与知识发现, 2019, 3 (03): 57-65.
[32] 李章超, 李忠凯, 何琳. 《左传》战争事件抽取技术研究 [J]. 图书情报工作, 2020, 64 (07): 20-29.
[33] 崔竞烽, 郑德俊, 王东波, 等. 基于深度学习模型的菊花古典诗词命名实体识别 [J]. 情报理论与实践, 2020, 43 (11): 150-155.
[34] 徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究 [J]. 数据分析与知识发现, 2020, 4 (08): 86-97.
[35] 刘忠宝, 党建飞, 张志剑. 《史记》历史事件自动抽取与事理图谱构建研究 [J]. 图书情报工作, 2020, 64 (11): 116-124.
[36] XIA L, BIN W, BAI LING Z. Unknown Word Detection in Song Poetry [C] //2016 IEEE First International Conference on Data Science in Cyberspace (DSC). IEEE, 2016: 544-549.
[37] XIE T, WU B, WANG B. New Word Detection in Ancient Chinese Literature [C] //Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. Springer, Cham, 2017: 260-275.
[38] 张开旭, 夏云庆, 宇航. 基于条件随机场的古汉语自动断句与标点方法 [J]. 清华大学学报 (自然科学版), 2009, 49 (10): 1733-1736.
[39] BO LI W, XIAO DONG S, ZHI XING T, et al. A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM [C]//Workshop on Chinese Lexical Semantics. Springer, Cham, 2016: 387-396
[40] HONG BIN W, HAI BINGW, JIAN YI G, et al. Ancient Chinese Sentence Segmentation Based on Bidirectional LSTM+CRF Model [J]. Journal of Advanced Computational Intelligence and Intelligent Informatics, 2019, 23 (04): 719-725.
[41] 俞敬松, 魏一, 张永伟. 基于BERT的古文断句研究与应用 [J]. 中文信息学报, 2019, 33 (11): 57-63.
[42] 郭锐, 宋继华, 廖敏. 基于自动句对齐的相似古文句子检索 [J]. 中文信息学报, 2008 (02): 87-91, 105.
[43] 马创新, 陈小荷, 曲维光. 注疏文献中的注释语句自动分析 [J]. 计算机科学, 2012, 39 (10): 220-223.
[44] DA YI HENG L, KE XIN Y, QINA Q, et al. Ancient-Modern Chinese Translation with a New Large Training Dataset [J]. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2019, 19 (01): 1-13.
[45] 刘颖, 王楠. 最大熵模型和BP神经网络的短句对齐比较 [J]. 计算机工程与应用, 2015, 51 (07): 112-117.
[46] 梁继文, 江川, 王东波. 基于多特征融合的先秦典籍汉英句子对齐研究 [J]. 数据分析与知识发现, 2020, 4 (09): 123-132.
[47] 于丽丽, 丁德鑫, 曲维光, 等. 基于条件随机场的古汉语词义消歧研究 [J]. 微电子学与计算机, 2009, 26 (10): 45-48.
[48] 张颖杰, 李斌, 陈家骏, 等. 基于词典信息的先秦汉语全文词义标注方法研究 [J]. 中文信息学报, 2012, 26 (03): 65-71, 103.
[49] 常娥, 张长秀, 侯汉清, 等. 基于向量空间模型的古汉语词义自动消歧研究 [J]. 图书情报工作, 2013, 57 (02): 114-118.
[50] 冯秋香. 基于数据库语义学的古汉语句法语义分析研究 [D]. 大连: 大连理工大学, 2012.
[51] HENG W, WEN XIN H, AI HUA O, et al. Ancient medical literature semantic annotation using hidden markov models [C] //2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2014: 37-40.
[52] 丁长林, 白宇, 蔡东风. 基于有监督学习的医古文叙述性术语语义标注 [J]. 中文信息学报, 2015, 29 (02): 49-57.
[53] 周澍绮. 基于GATE的楚辞语义标注研究 [J]. 图书馆理论与实践, 2015 (11): 58-62, 101.
[54] 常娥, 侯汉清, 曹玲. 古籍自动校勘的研究和实现 [J]. 中文信息学报, 2007 (02): 83-88.
[55] 周学文, 江荻.《元朝秘史》的计算机自动校勘方法 [J].语言文字应用, 2007 (03): 136-142.
[56] 蒋锐滢, 崔磊, 何晶, 等. 基于主题模型和统计机器翻译方法的中文格律诗自动生成 [J]. 计算机学报, 2015, 38 (12): 2426-2436.
[57] XIAO YUAN Y, RUO YU L, MAO SONG S. Generating Chinese Classical Poems with RNN Encoder-decoder [M]. Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, Cham, 2017: 211-223.
[58] 黄文明, 卫万成, 邓珍荣. 基于序列到序列神经网络模型的古诗自动生成方法 [J]. 计算机应用研究, 2019, 36 (12): 3539-3543.
[59] GUO Z, XIAO YUAN Y, MAO SONG S, et al. Jiuge: A Human-machine Collaborative Chinese Classical Poetry Generation System [C] //Proceedings of the 57th Annual Meeting of the Association for Com-Putational Linguistics: System Demonstrations. 2019: 25-30.
[60] 王东波, 何琳, 黄水清. 基于支持向量机的先秦诸子典籍自动分类研究 [J]. 图书情报工作, 2017, 61 (12): 71-76.
[61] 秦贺然, 刘浏, 李斌, 等. 融入实体特征的典籍自动分类研究 [J]. 数据分析与知识发现, 2019, 3 (09): 68-76.
[62] JING X, ZHONG SHI H, LIANG YAN L, et al. Brain-oriented Convolutional Neural Network Computer Style Recognition of Classical Chinese Poetry [J]. Neuro Quantology, 2018, 16 (04).
[63] BAO P, ZHU S. System Design for Location Name Recognition in Ancient Local Chronicles [J]. Library HI-IECH, 2014, 32 (02): 276-284.
[64] 陈楚云, 洪佳明, 周蔚林, 等. 基于数据挖掘技术构建针灸古籍经验推荐平台的方法与应用 [J]. 中国针灸, 2017, 37 (07): 768-772.
[65] TSIAO TING T, CHIH MING C, CHEN YU L. An Automatic Text Annotation System to Improve Reading Comprehension of Chinese Ancient Texts [C] //2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI). IEEE, 2018: 176-181.
[66] 鲁迪. 江浙地区现存唐宋经幢铭文浅析 [J]. 文物鉴定与鉴赏, 2020 (07): 8-11.
[67] 于语和, 雷园园. 论中国传统法律文化在依法治国中的价值 [J/OL]. 北京理工大学学报 (社会科学版): 1-12 [2020-09-14]. http://kns.cnki.net/kcms/detail/11.4083.C.20200909.1754.002.html.
[68] 董淑平. 浅论中国古代法律思想对现代法治的作用 [J]. 法制博览, 2019 (36): 247-248.

Recommended Citation

DENG, Sanhong; HU, Haotian; GUO, Lei; and ZHANG, Yanyan (2021) "Review of Automatic Processing of Ancient Chinese Character and Prospects for Its Development Trends in the New Era," Journal of Scientific Information Research: Vol. 3: Iss. 1, Article 1.
Available at: https://eng.kjqbyj.com/journal/vol3/iss1/1

Download

Included in

Scholarly Communication Commons

COinS