摘要现如今的新浪微博已经很深入的影响着人们的日常生活,随着智能手机的迅速普及,人们可以随时随地发布自己的状态,具有实时性和信息碎片性等特点。随着新浪微博功能不断完善,开始形成自己的生态系统,微博用户之间可以相互关注,可以评论、转发和赞自己感兴趣的微博内容,同时还能发布长微博,具有很强的互动性和灵活性。作为现如今的第一社交媒体,新浪微博庞大的用户群和因此而产生的海量数据是值得我们很好的研究的。本文研究了微博数据的提取、话题检测和微博内容的相应情感分析。23120
传统的网络文本数据提取一般是利用图遍历的方法通过网络爬虫搜集信息,而本文是利用新浪微博提供的API接口去获取自己想要的微博中的内容。
本文介绍了相关的微博话题检测大致流程和相应算法,本文主要调用中科院ICTCLAS 2014分词系统里已有的关键词提取算法去获得微博话题。从而筛选相应的微博内容,在此基础上,通过情感分类对微博内容进行模型化表示,进一步转换为能通过weka处理的数据格式,进而通过机器学习来进行情感分析。
毕业论文关键词:微博;数据提取;话题检测;机器学习;情感分析
Title Microblogging hot topic extraction and analysis
techniques
Abstract
Today's sina weibo has been very deeply into and affect people's daily lives,with the rapidly growing popularity of smart phones,people just need to be anywhere that people can publish their own state through finger.So it has real-time information and other characteristics of debris.And now sina weibo function continuously improved, began to form their own ecosystem,weibo users can mutual concern and comment, forwarding, like mutual concern people microblogging content, which has a strong interaction and flexibility makes microblogging has a very strong social features.As is now the first social media, weibo huge user base and huge amounts of data thus generated is worth a good study. This paper studies the microblogging data extraction, topic detection and corresponding emotions microblogging content analysis.
Traditional network text data extraction using graph traversal general idea of gathering information through the web crawler, but this paper is to use API interface provided by sina weibo to get what you want microblogging content, only to realize it is convenient to extract data and extract efficiency is very good.
In introducing the relevant microblogging topic detection process and the corresponding algorithms, the paper calls the CASICTCLAS 2014 segmentation system existing keyword extraction algorithm to obtain microblogging topic. Thereby filtering the corresponding micro-blog content, on this basis, through emotional dictionaries for the micro-blog content processing, expressed as processed by weka data format, and then through machine learning for sentiment analysis.
Keywords : microblogging; data acquisition; topic detection; machine learning; sentiment analysis
目录
摘 要I
AbstractII
1 绪论.1
1.1 研究背景.1
1.2 研究现状.2
1.3 研究的内容和意义.2
1.3.1 研究内容.2
1.3.2 研究意义.3
1.4 论文组织结构.3
2 相关背景知识介绍.4
2.1 微博.4
2.1.1 微博的发展历程、新浪微博及其特性.4
- 上一篇:基于Android的图书管理系统中学生端挂失模块设计
- 下一篇:深空目标中段飞行仿真中的航迹生成软件的开发
-
-
-
-
-
-
-
java+mysql车辆管理系统的设计+源代码
中考体育项目与体育教学合理结合的研究
酸性水汽提装置总汽提塔设计+CAD图纸
当代大学生慈善意识研究+文献综述
乳业同业并购式全产业链...
大众媒体对公共政策制定的影响
电站锅炉暖风器设计任务书
十二层带中心支撑钢结构...
杂拟谷盗体内共生菌沃尔...
河岸冲刷和泥沙淤积的监测国内外研究现状