    Hadoop and KDD
    Abstract: Cloud computing is now the Internet's hottest technology and is today one of the most influential technology. In the eve of massive data, massive data processing technology more and more attention. Domestic Internet companies are engaged in a data mining reforms. This topic will use Hadoop-based cluster of massive computing power powerful traffic data for data mining. Domestic traffic simulation system development is maturing, with the simulation system left a flood of traffic data, use data mining techniques to these massive lag data for data mining, data mining and then feeding the results of R & D personnel. Issue will be used to quickly develop language Python, and MapReduce principle to build a system prototype. This data mining system will have massive data analysis capabilities, query analysis results interpretation, as well as massive data simple query capabilities of the system, and to ensure the correctness of the results of data analysis and the stability of the algorithm to achieve massive traffic data on the use and traffic data mining technology validation. Issues raised by data mining system model in the field of data analysis has high research value
    Keywords:    Cloud; traffic data; data mining; hadoop; python
    目录    ii
    1    绪论    1
    1.1    国内外研究现状    1
    1.2    研究背景和意义    2
    2    技术背景    4
    2.1    平台技术背景    4
    2.1.1    Hadoop简介    4
    2.1.2    Hbase简介    5
    2.1.3    HDFS简介    6
    2.1.4    Hive简介    7
    2.2    开发语言的技术背景    8
    2.2.1    Python的技术背景    8
    2.2.2    WxPython的技术背景    9
    2.2.3    Python + Hadoop    9
    2.3    数据交互的技术背景    10
    2.3.1    Json简介    10
    2.3.2    SSH简介    10
    2.4    数据挖掘    10
    2.4.1    数据挖掘简介    10
    2.4.2    数据挖掘模型简介    10
    3    本课题基本内容    12
    3.1    系统基本构架    12
    3.1.1    客户表现层    12
    3.1.2    逻辑处理层    13
    3.1.3    后台数据层    13
