菜单
  

    摘要:进入大数据时代后,研究用户行为不再像以前引用平均随机抽样代表全体,数据时代需要研究全体用户数据,因此,给现在研究过程中带来了数据存储、数据处理、数据计算困难等挑战。

    本文基于hadoop云平台研究用户行为数据的存储和用户行为挖掘。设计并实现分布式、高可靠、高可用性的数据存储模块,解决现在数据量大存储困难的问题。提出基于MapReduce的分布式并行分词算法,调用集群的所有计算节点,对海量的中文文本进行分词计算,相比较传统中文分词能够提高三倍以上的分词效率,并能够解决现阶段海量文本分词困难的现状。本文将hadoop云平台结合微博用户行为数据进行分析,首先对重庆地区的微博信息进行分词,然后分析挖掘重庆每天各区县关于“感冒”、“肺炎”、“发热”、“咳嗽”的词汇统计,很好的解决微博内容稀疏,价值隐藏深,挖掘困难等问题,实现重庆相关部门对本地医疗的监控和预警。设计数据挖掘结果展示模块,基于Mysql+jdbc+http+Ajax多维度多方位全面的展示微博用户行为分析结果。52407

    毕业论文关键词: HDFS;Hadoop;MapReduce;用户行为分析;微博用户

    Research on the behavior of Micro-blog users based on hadoop 

    Abstract:  After entering the era of big data, the study of user behavior is no longer as previously referenced average random sampling on behalf of all, the era of data need to study all user data. Therefore, now the research process to brought challenges of data storage, data processing, data and calculate difficulty.

    In this paper, based on the Hadoop cloud platform to study the user behavior data storage and user behavior mining. Design and implementation of distributed, high reliability, high availability of data storage module, to solve the problem of large amount of data storage. Is proposed based on the MapReduce distributed parallel word segmentation algorithm, called cluster of all computing nodes, the massive Chinese text segmentation calculation, compared with the traditional Chinese word segmentation can improve more than three times the segmentation efficiency, and can solve the present stage massive text segmentation difficult situation. The Hadoop cloud platform combined with micro Bo user behavior data analysis, first of all to the Chongqing area of the microblog information segmentation, and analysis of mining districts and counties of Chongqing daily vocabulary statistics about "cold", "pneumonia", "fever", "cough", very good solve the microblogging content sparse, deep hidden value, mining is difficult problem, relevant departments of Chongqing on the local medical surveillance and early warning. Design data mining results display module, based on the Mysql+jdbc+http+Ajax multi-dimensional multi-dimensional comprehensive display of micro-blog user behavior analysis results.

    Keywords: Research on the behavior of user; HDFS; Hadoop; MapReduce; Micro-blog users

     目录

    摘要 i

    Abstract ii

    目录 iii

    1 引言 1

    1.1 研究背景 1

    1.2 国内外研究现状 1

    1.2.1 大数据国内外研究现状 1

    1.2.2 用户行为分析研究现状 3

    1.3 主要工作 5

    1.4 论文组织结构 5

    2 大数据技术HADOOP的研究

  1. 上一篇:jsp《计算机通信及网络》课程试题库设计
  2. 下一篇:jsp+sqlserver医院管理系统设计与实现
  1. 基于MATLAB的图像增强算法设计

  2. jsp+sqlserver高校二手商品交...

  3. 基于Kinect的手势跟踪与识别算法设计

  4. JAVA基于安卓平台的医疗护工管理系统设计

  5. java+mysql设备监控记录的大...

  6. 基于核独立元分析的非线...

  7. 基于Hadoop的制造过程大数据存储平台构建

  8. 当代大学生慈善意识研究+文献综述

  9. 中考体育项目与体育教学合理结合的研究

  10. 十二层带中心支撑钢结构...

  11. 杂拟谷盗体内共生菌沃尔...

  12. 乳业同业并购式全产业链...

  13. 大众媒体对公共政策制定的影响

  14. java+mysql车辆管理系统的设计+源代码

  15. 电站锅炉暖风器设计任务书

  16. 酸性水汽提装置总汽提塔设计+CAD图纸

  17. 河岸冲刷和泥沙淤积的监测国内外研究现状

  

About

751论文网手机版...

主页:http://www.751com.cn

关闭返回