基于hadoop的频繁项集算法的实现毕业论文

2022-03-21 20:42:01

论文总字数：19160字

摘要

如今我们生活在一个信息爆炸的时代，信息爆炸的背后是数据量以惊人的几何式速度增长。根据IDC(International Data Corporation)的数字宇宙 (Digital Universe) 研究报告显示，2006年全球数据总量为0.18ZB ，2011年增长至1.8ZB，而到2020年，这个数字预计将增长到惊人的40ZB。并且届时，约有33%的数据将包含有价值的信息。随着电子设备种类和数量的快速增加，数据的来源也变得更加多种多样，小到纪录运动信息的穿戴设备，大到探寻宇宙起源的巨型射电望远镜，无不在源源不断的产生数据。数据量的增长即使机遇，也是挑战。我们拥有海量的，多的处理不完的数据，而这也为存储和分析这些数据，挖掘其中的有用信息的方法提出了一个难以达到的要求。

为了实现对于海量数据的存储与分析，本文致力于在hadoop云计算平台上实现频繁项集算法，在海量数据中寻找出频繁的有价值的数据。首先介绍Aprior算法基本知识并提出其中不足，随后运用FP-growth算法实现高效率的频繁项集挖掘，并详细介绍了FP-growth算法原理与高效性。最后运用mahout与spark工具进行了hadoop平台的拓展与应用。

关键词：hadoop FP-growth算法频繁项集

The realization of the algorithm of frequent itemsets based on hadoop

Abstract

Now we are living in an era of information explosion, the information explosion is behind the data volume with amazing geometry type growth. According to IDC, International Data Corporation) of the Digital Universe (Digital Universe) study showed that total global Data in 2006 to 0.18 the ZB, growth in 2011 to 1.8 the ZB, and by 2020, that number is expected to rise to the astonishing 40 ZB. And then, about 33% of the data will contain valuable information. With the fast rise in number of species and electronic equipment, the source of data is also becoming more diverse, small to record movement information of dressing equipment, large to giant radio telescope, exploring the origin of the universe is in a steady stream of data. The amount of data growth opportunities and challenges. We have

a huge, amount of processing the data, and this is for storing and analyzing these data, mining useful information of these methods is put forward a to meet the requirements.

In order to achieve for mass data storage and analysis, this paper is dedicated to the hadoop cloud computing platforms to realize frequent itemsets algorithm, seeking out frequently in huge amounts of data of valuable data. First introduces Aprior algorithm shortage and put forward the basic knowledge, then the use of FP growth algorithm to realize the high efficiency of mining frequent itemsets, and introduces in detail the FP - growth algorithm principle and efficiency. Finally using mahout and spark tools for the development and application of hadoop platform.

Key Words：hadoop；FP-growth algorithm；frequent itemsets

第一章绪论

1.1课题背景及意义

数据挖掘技术，正是以大数据为主题，将人工智能、统计学、信息检索、数据可视化神经网络、数据可等知识想结合的新型技术致力于从海量复杂数据当中提取出能为我们所用的信息，以形象、直观的方式展示出来。

请支付后下载全文，论文总字数：19160字

您需要先支付 80元 才能查看全部内容！立即支付

注册

找回密码

基于hadoop的频繁项集算法的实现毕业论文

Abstract

目录

第一章绪论

1.1课题背景及意义

您可能感兴趣的文章

最新文档

推荐栏目

登录

注册

找回密码

基于hadoop的频繁项集算法的实现毕业论文

Abstract

目录

第一章 绪论

1.1课题背景及意义

您可能感兴趣的文章

最新文档

推荐栏目

第一章绪论