
  • 登录
  • 忘记密码?点击找回


  • 获取手机验证码 60
  • 注册


  • 获取手机验证码60
  • 找回
毕业论文网 > 毕业论文 > 计算机类 > 计算机科学与技术 > 正文


 2021-11-06 11:11  

摘 要



(2)生成式摘要研究:提出一种融合Multi-Head Attention机制和语义相关性的生成式摘要模型。基于seq2seq Attention Beam Search构建基线模型,在此之上引入Multi-Head Attention机制,使模型能够从多角度学习文本特征;使用Mask机制引入先验知识,让模型解码时更加准确地聚焦于关键位置;引入语义相关性,使模型更倾向于输出与源文本相似度高的摘要。


本文的贡献在于:在抽取式摘要中通过引入文本特征对TextRank算法进行了改进;在生成式摘要中提出一种融合Multi-Head Attention机制和语义相关性的生成式摘要模型,通过对比实验验证了模型的有效性。



With the advent of the era of big data, the amount of text information has far exceeded the limit of manual processing. People hope to learn more useful information in a short time. How to quickly and accurately obtain information’s core ideas has become a research hotspot. With the development of natural language understanding and generation technology, automatic text summarization technology has also become increasingly mature, with many application scenarios. This paper which is oriented to the field of Chinese news conducts research from two aspects: extractable summarization and abstract summarization. The main work is as follows:

(1) Research on extractable summarization: In view of the problem that the traditional TextRank algorithm cannot fully excavate the characteristics of text sentences, this paper integrates the positional characteristics and length characteristics of sentences into the TextRank algorithm. Considering that the summarization generated by the TextRank algorithm has semantic duplications, this paper performs redundant processing when generating the summarization.

(2) Research on abstract summarization: An abstract summarization model combining Multi-Head Attention mechanism and semantic correlation is proposed in this paper. Firstly, we build a baseline model based on seq2seq Attention Beam Search. Then we introduce the Multi-Head Attention mechanism to enable the model to learn text features from multiple angles. At the same time, we use the Mask mechanism to introduce priori knowledges, which makes the model more accurately focus on the key position when decoding. Finally, we introduce semantic relevance to make the model more inclined to output summarization with high similarity to the source text.

(3) Verification by comparative experiment: We use the extractive summarization algorithm of this paper to conduct experiments on the long text data set NLPCC, which proves that the optimization work of the TextRank algorithm is effective. At the same time, we use the abstract summarization model proposed in this paper to conduct experiments on the short text data set LCSTS. We compared the experimental results with multiple abstract summarization models and the extractive summarization model. Experimental results show that the model proposed in this paper, whose Rouge-1、Rouge-2 and Rouge-L has reached 32.5、21.4 and 31.1, has greatly improved compared with the baseline model. The output summary of this model is readable and reliable.

The contribution of this paper: In terms of extractive summarization, the TextRank algorithm is improved by introducing text features. In terms of abstract summarization, an abstract summarization model combining Multi-Head Attention mechanism and semantic relevance is proposed, which has been verified to be effective by comparative experiment.

Key Words:Automatic summarization; deep learning; TextRank; multi-head Attention; semantic correlation

目 录

第1章 绪论 1

1.1 研究背景及意义 1

1.2 国内外研究现状 1

1.3 本文的研究内容 2

1.4 本文的组织结构 3

第2章 基于TextRank的抽取式自动摘要 4

2.1 PageRank算法 4

2.2 基于TextRank算法的抽取式自动摘要 4

2.3 改进的抽取式摘要算法 5

2.3.1 算法改进思想 5

2.3.2 改进算法的描述 6

2.4 本章小结 8

第3章 基于深度学习的生成式自动摘要 9

3.1 基础知识 9

3.1.1 循环神经网络 9

3.1.2 长短期记忆网络 10

3.1.3 seq2seq架构 11

3.1.4 Attention机制 12

3.2 基于seq2seq架构的生成式摘要基线模型 13

3.2.1 双向LSTM 13

3.2.2 Beam Search 14

3.3 改进的生成式摘要模型 15

3.3.1 Multi-Head Attention机制 15

3.3.2 引入先验知识 16

3.3.3 引入语义相关性 16

3.3.4 融合Multi-Head Attention机制和语义相关性的生成式摘要模型 17

3.4 本章小结 18

第4章 实验结果与分析 19

4.1 数据集及预处理 19

4.2 评测方法 21

4.3 生成式摘要模型参数设置 21

4.4 实验结果对比与分析 22

4.4.1 改进的抽取式摘要实验结果对比 22

4.4.2 抽取式和生成式摘要的实验结果对比 24

4.5 本章小结 25

第5章 总结与期望 26

5.1 工作总结 26

5.2 研究期望 26

参考文献 27

致 谢 29






1958年,Luhn [2]首次提出自动摘要的概念,他通过统计词频特征对句子进行评分,然后比较每个句子的分数并选取分数最高的若干句子,组合成为摘要。虽然其获得的摘要冗余很高且质量较低,但却拉开了文本自动摘要技术研究的序幕。在这之后的很长一段时间内,抽取式摘要一直都是自动摘要技术的主流,其中基于图模型的TextRank算法[3]是经典的抽取式算法之一。TextRank算法的主要思想是:以句子为顶点,以句子间的相似度为边,构建文本的TextRank网络图,然后对网络图中的节点进行迭代计算,句子的重要性得分就是迭代计算达到收敛时的数值,最后选取得分最高的若干个句子组成摘要。它的优点在于实现简单、无监督、语言弱相关,摘要质量也比较高。

您需要先支付 80元 才能查看全部内容!立即支付


Copyright © 2010-2022 毕业论文网 站点地图