面向特定领域的知识图谱构建毕业论文

2021-06-24 22:15:24

摘要

随着互联网技术发展与普及，网络上的信息日益增加，如何在纷繁复杂的信息中提取有效信息逐渐成为研究热点。在此背景下，Google公司提出了知识图谱这一信息组织形式，其目的在于描述现实世界的实体和概念，以及他们之间的关联关系，其在语义搜索、问答系统、数据挖掘等方面均有广泛应用，并已应用于金融、图书情报和公共安全等诸多领域领域。

本文在现有知识图谱构建方法的基础上，探索特定领域中文知识图谱的构建及应用方法，主要进行了如下研究：

探索从互联网开放数据中提取有效信息用于构建高质量图谱。中文百科数据、开放链接数据及垂直网站上具有大量结构化及半结构化数据，这些数据准确度较高，实时性强便于提取，充分利用这些信息有助于确保图谱质量。
研究使用交互式方式构建专业知识图谱。通过机器自动构建的知识图谱质量目前还不能满足专业领域应用要求，本文研究了使用主题模型（LDA）进行文本筛选，融合多源信息分类标签提供本题参考意见，使用词向量（Word2Vec）提供同义关系参考等交互式方法提升了知识图谱构建效率。
研究专业知识图谱的应用服务，提供了基于Web的应用服务，包括信息检索、语义搜索及关联关系可视化等，并探究了多源异构信息的关联融合方式，制作了一个事件信息与时间、地理信息关联的可视化模板。

本文使用互联网开放数据构建了兵器装备领域知识图谱，该图谱包含12000余种兵器装备及其关联关系和文本描述，由于采用了多个数据来源并进行了规范的数据再加工，其具备较好的覆盖度和准确性。

关键词：知识图谱构建；数据融合；信息管理

Abstract

With the development and popularization of the internet technology,information in the network has been increasing rapidly, how to extract useful information in the complicated information become a hot topic gradually . In this context, Google propose the Knowledge Graph , a organizational form of information, which aims to describe real-world entities and concepts, and the relationships between them, which is widely used in semantic search, question answering system, data mining, etc. and has been used in many areas of finance, library information, and public safety.

In this paper, we propose methods to the construction of Graph in specific areas application Chinese Knowledge graph ,On the basis of existing methods mainly for the following research:

1. Attempt to extracted valid data to construction high-quality knowledge graph from Open Internet. Chinese encyclopedia data, with the data link and open a large number of vertical sites structured and semi-structured data, higher data accuracy, real-time easy to extract, make full use of this information helps to ensure the quality spectrum.

2. Attempt to construct the graph using a interactive method. The quality of graph build by machine automatically build not meet the field applications currently , we study the use of topic model (LDA) for text selection, fusion of multi-source information Classification and Labelling provides this title reference, the use of the word vector (Word2Vec) provides synonymous relationship between reference and other interactive methods to enhance the knowledge map construction efficiency.

3. Research expertise map application service that provides Web-based application services, including information retrieval, semantic search and relationship visualization, and explore the correlation of heterogeneous multi-source information fusion manner to produce an event message time visualization of geographic information associated with the template.

It use the Internet open data construct weapons and equipment in the field of knowledge graph, the map contains more than 12,000 kinds of weapons and equipment and their relationships and text description. The graph have good coverage and accuracy because of the use of multiple data sources and data specifications reprocessing,

Key Words：Knowledge Graph construction； information integration；information management

目录

第1章绪论 1

1.1 研究背景及意义 1

1.2 研究现状及问题 3

1.2.1 知识图谱的构建方法 3

1.2.2 知识图谱的构建方法不足 5

1.2.3 知识图谱的应用 5

1.2.4 知识图谱在应用中存在的问题 5

1.3 研究内容 6

1.4 本文组织结构 7

第2章数据采集及信息抽取 9

2.1 数据源分析 9

2.1.1 开放百科数据 9

2.1.2 垂直网站数据 9

2.1.3 开放链接数据 9

2.1.4 专业数据库 9

2.2 数据获取 9

2.2.1 网络爬虫技术 10

2.2.2 使用Scrapy爬虫获取信息 10

2.3 本章小结 11

第3章数据存储 13

3.1 文本信息存储 13

3.1.1 SQL型数据库简介 13

3.1.2 NoSQL型数据库简介 13

3.1.3 使用NoSQL型数据库MongoDB存储文本信息 13

3.2 关联信息存储 14

3.2.1 图数据库简介 15

3.2.2 使用图数据库Neo4j进行关系存储 15

3.3 本章小结 17

第4章专业知识图谱构建 18

4.1 知识图谱构建流程及方法 18

4.1.1 数据获取 18

4.1.2 本体构建 18

4.1.3 实体对齐 18

4.1.4 实体链接 19

4.2 数据清洗 19

4.2.1 无关信息来源分析 19

4.2.2 基于LDA和SVM的文本分类 19

4.3 信息抽取 20

4.3.1 结构化、半结构化信息抽取 20

4.3.2 非结构化信息抽取 21

4.4 本体构建方法 21

4.4.1 上下位关系获取 22

4.4.2 属性学习 23

4.4.3 同义词构造 24

4.4.4 人工本体编辑 25

4.5 多源数据实体对齐 25

4.5.1 实体向本体对齐 25

4.5.2 多源数据实体对齐合并 27

4.6 实体链接 27

4.6.1 候选实体发现 27

4.6.2 实体链接建立 29

4.7 本章小结 29

第5章知识图谱应用 30

5.1 Web服务 30

5.1.1 基于Flask框架的Web后台 30

5.1.2 Web前端设计 31

5.2 信息检索 32

5.2.1 使用实体名称搜索 32

5.2.2 根据语义进行模糊搜索 33

5.3 信息融合及可视化 34

5.3.1 信息融合 34

5.3.2 信息可视化 34

5.3.3异构信息融合及可视化 35

5.4 本章小结 36

第6章结论 37

6.1 本文总结 37

6.2 展望 37

6.2.1面向非结构化数据的信息抽取 37

6.2.1多源异构数据关联与融合 37

6.3 本章小结 38

参考文献 39

致谢 42

第1章绪论

如Google公司辛格博士在介绍知识图谱（Knowledge Graph）时提到的“世界是由实体组成的，而不是字符串（The world is not made of strings， but is made of things）”，知识图谱描述的是真实世界中的实体和概念，以及他们的关联关系和演化。

1.1 研究背景及意义

随着网络信息技术的飞速发展，互联网已成为人们获取信息和知识的重要途径之一，近二十年来数据的产生和扩散形式发生了巨大变化，呈现出以下新特征^[1]：

（1）海量性：早期“大数据”所指即为海量、大规模数据，互联网上每天产生的数据流量超过1EB,Twitter上每天上传的文字信息比纽约时报60年来的文字信息总量还要大，视频网站每秒上传的视频可连续播放5万年以上。

（2）多元性：当下数据不仅数目多，来源广，更重要的是数据在形式、结构及内容上具有多元性，数据格式不仅有文本信息，还有图像、音频、视频、时间、地理信息、关联关系等诸多形式，数据内容不仅是单纯的结构多样，还具有复杂的关联关系和演化关系，关联关系变化等复杂的形式，这是新形势下的大数据与传统海量数据（单一结构，大量数目）的本质区别。

（3）时空性：时空性是指数据不仅是孤立且一成不变的，而是相对时间和空间动态变化的，实体的属性和关联关系可能随着地理空间、时间甚至是自身状态的变化而产生改变，如轮船的载货量、国家的经济形势、汽车在不同车龄、路况和环境下的油耗，这种信息的动态演化大大增加了数据建模的困难

您需要先支付 80元 才能查看全部内容！立即支付

注册

找回密码

面向特定领域的知识图谱构建毕业论文

第1章绪论

1.1 研究背景及意义

您可能感兴趣的文章

最新文档

推荐栏目

登录

注册

找回密码

面向特定领域的知识图谱构建毕业论文

第1章 绪论

1.1 研究背景及意义

您可能感兴趣的文章

最新文档

推荐栏目

第1章绪论