


    毕业论文关键词: HDFS;Hadoop;MapReduce;用户行为分析;微博用户

    Research on the behavior of Micro-blog users based on hadoop 

    Abstract:  After entering the era of big data, the study of user behavior is no longer as previously referenced average random sampling on behalf of all, the era of data need to study all user data. Therefore, now the research process to brought challenges of data storage, data processing, data and calculate difficulty.

    In this paper, based on the Hadoop cloud platform to study the user behavior data storage and user behavior mining. Design and implementation of distributed, high reliability, high availability of data storage module, to solve the problem of large amount of data storage. Is proposed based on the MapReduce distributed parallel word segmentation algorithm, called cluster of all computing nodes, the massive Chinese text segmentation calculation, compared with the traditional Chinese word segmentation can improve more than three times the segmentation efficiency, and can solve the present stage massive text segmentation difficult situation. The Hadoop cloud platform combined with micro Bo user behavior data analysis, first of all to the Chongqing area of the microblog information segmentation, and analysis of mining districts and counties of Chongqing daily vocabulary statistics about "cold", "pneumonia", "fever", "cough", very good solve the microblogging content sparse, deep hidden value, mining is difficult problem, relevant departments of Chongqing on the local medical surveillance and early warning. Design data mining results display module, based on the Mysql+jdbc+http+Ajax multi-dimensional multi-dimensional comprehensive display of micro-blog user behavior analysis results.

    Keywords: Research on the behavior of user; HDFS; Hadoop; MapReduce; Micro-blog users


    摘要 i

    Abstract ii

    目录 iii

    1 引言 1

    1.1 研究背景 1

    1.2 国内外研究现状 1

    1.2.1 大数据国内外研究现状 1

    1.2.2 用户行为分析研究现状 3

    1.3 主要工作 5

    1.4 论文组织结构 5

    2 大数据技术HADOOP的研究

  1. 上一篇:jsp《计算机通信及网络》课程试题库设计
  2. 下一篇:jsp+sqlserver医院管理系统设计与实现
  1. 基于MATLAB的图像增强算法设计

  2. jsp+sqlserver高校二手商品交...

  3. 基于Kinect的手势跟踪与识别算法设计

  4. JAVA基于安卓平台的医疗护工管理系统设计

  5. java+mysql设备监控记录的大...

  6. 基于核独立元分析的非线...

  7. 基于Hadoop的制造过程大数据存储平台构建

  8. 当代大学生慈善意识研究+文献综述

  9. 中考体育项目与体育教学合理结合的研究

  10. 十二层带中心支撑钢结构...

  11. 杂拟谷盗体内共生菌沃尔...

  12. 乳业同业并购式全产业链...

  13. 大众媒体对公共政策制定的影响

  14. java+mysql车辆管理系统的设计+源代码

  15. 电站锅炉暖风器设计任务书

  16. 酸性水汽提装置总汽提塔设计+CAD图纸

  17. 河岸冲刷和泥沙淤积的监测国内外研究现状




