基于机器学习的位置大数据特征提取毕业论文

2022-01-27 15:57:42

论文总字数：24159字

摘要

伴随无线通信网络和全球定位系统(GPS)技术的迅猛发展，海量GPS数据的收集和传输成为可能，大量的历史移动轨迹能够被持久化保存，形成了时空轨迹数据。这些数据详细刻画了个体或群体的时空动态性，蕴含着活动对象的行为信息，对交通导航、城市规划、车辆监控等应用具有重要的价值。通过对轨迹数据的处理分析，能够有效地提取轨迹数据价值。

本文主要运用机器学习的方法对现有轨迹数据根据出行方式进行分类，以达到运动特征提取的目的。课题所采用的轨迹数据集产生于微软亚洲研究院的Geolife项目，这个数据集包含了182名用户超过三年的轨迹记录。具体研究工作如下：

首先，广泛了解现有轨迹数据处理技术，深入理解经典的机器学习分类算法原理、性能以及相关评价指标。其次，选择随机森林、逻辑回归、SVM、KNN四种分类算法对轨迹数据进行分类模型训练，并以得出的模型进行轨迹特征预测，对比实际轨迹特征对四种分类算法的分类效果进行横向比较，结果表明随机森林算法对于所用数据集的分类效果最优。最后，对kNN算法进行改进，实现了分类性能的提升，并对现有随机森林算法进行参数调整，以达到更加准确的分类结果。

关键词：轨迹数据机器学习随机森林性能分析

Feature Extraction of Position Data based on Machine Learning

Abstract

The rapid development of wireless communication networks and Global Positioning System (GPS) technologies enable the collection and transmission of massive GPS data. Persistent storage of massive data forms spatio-temporal trajectory data, which describe in detail the spatio-temporal dynamics of individuals or groups, and contain the behavioral information of active objects. It is of great importance for emerging applications such as traffic navigation, urban planning, and vehicle monitoring. By processing and analyzing the trajectory data, the value of the trajectory data can be effectively extracted.

The objective of this thesis is to use the machine learning methodology to classify the existing trajectory data by the travel mode to extract motion feature. The trajectory data set used in this work was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over three years. The specific research work is as follows:

This work begins with extensive understanding of existing processing technology for trajectory data and the principles, performance, and related evaluation indicators of classical machine learning classification algorithms. Secondly, four kinds of classification algorithms including random forest, logistic regression, SVM and kNN are selected to train the trajectory data. The trajectory feature is predicted by the obtained model, and compared with the actual trajectory features to compare the classification effects of the five classification algorithms. It is shown that the random forest algorithm has the best classification effect for the data set used. Finally, the kNN algorithm is improved to achieve the improvement of the classification performance, and the existing random forest algorithm is adjusted to achieve more accurate classification results.

Keywords: trajectory data; machine learning; random forest; performance analysis

第一章绪论

1.1课题背景及意义

空间轨迹是由地理空间中的运动物体产生的轨迹，通常由一系列时间顺序的点表示，例如p¹ →p² → · · · → pⁿ，其中每个点包括地理空间坐标集和时间戳，如p = (x, y, t)。

位置采集技术的进步产生了无数的空间轨迹，代表了各种移动物体（如人，车辆和动物）的移动性。这些轨迹为我们提供了前所未有的信息来了解移动物体和位置，促进了基于位置的社交网络[1]，智能交通系统和城市计算领域的广泛应用[2]。这些应用的流行又要求对于新的计算技术的深入研究，从而从大量轨迹数据中发现有价值的信息。在这种情况下，轨迹数据挖掘已经成为越来越重要的研究课题，引起了计算机科学，社会学和地理学等众多领域的关注。

请支付后下载全文，论文总字数：24159字

您需要先支付 80元 才能查看全部内容！立即支付

注册

找回密码

基于机器学习的位置大数据特征提取毕业论文

Abstract

目录

第一章绪论

1.1课题背景及意义

您可能感兴趣的文章

最新文档

推荐栏目

登录

注册

找回密码

基于机器学习的位置大数据特征提取毕业论文

Abstract

目录

第一章 绪论

1.1课题背景及意义

您可能感兴趣的文章

最新文档

推荐栏目

第一章绪论