
    实验结果表明,两种文本分类算法各有其特性:朴素贝叶斯具有较快的分类速度,但分类不准;KNN 针对加权后得到的高文稀疏向量具有分类准确度较高、分类速度较慢的特性。
    Title   Research on Text Classification Technology
    With the development of Information technology,content based information retrieval and data mining will be a concerned field of investigation increasingly.Text categorization(TC) is regarded as an important foundation of information retrieval and text mining,Its key tests are that the PC decides the class label of a text basing on its content in the time of giving a group of training texts and its class label.
    The two algorithms of Native Bayes and KNN on Chinese text categorization are compared in my paper.First, the Chinese texts are classified  by  useing the ICTCLAS.Then,the frequency feature selection is finished by applying the mutual information based on DF,and in order to make the texts have a uniform and disposal structure-model,I use TFIDF to value the feature.Finally,the predict texts are classified by using the two algorithms.
    It will be seen from the results of experiment that the two text categorization algorithms have their characteristics respectively.Naïve Bayes is compared in the paper that it has a worse accuracy and a better speed than KNN.The other one has a better accuracy and categorization capability,but it is much slower.
    Keywords  Chinese Text categorization  Native Bayes  KNN

      目   次

    1  引言    1
    1.1  研究的意义    1
    1.2  研究的历史与现状    2
    1.3  本文的主要工作    3
    2  中文文本分类技术简介    4
    2.1  文本分类系统    4
    2.2  文本预处理    5
    2.3  几种分类方法    10
    2.4  性能评估    13
    3  文本分类的算法实现    16
    3.1  中文文本预处理    16
    3.2  特征项权重的计算以及降文    17
    3.3  预处理流程    19
    3.4  KNN算法实现    21
    3.5  贝叶斯算法实现    23
    4  结果分析    25
    4.1  系统框架    25
    4.2  结果评估    25
    结论    28
    致谢    29
    参考文献    30
    1  引言
  1. 上一篇:基于最速下降法的FIR滤波器设计方法研究
  2. 下一篇:MATLAB高速动车组牵引变流器的分析与仿真
  1. 基于OFDM的用户接入控制技术研究

  2. 基于OFDM的数字图像无线传输关键技术研究

  3. LSSVM采用几何方法的图像观测技术实现

  4. QPSK无线通信网络中基于...

  5. msp430g2553单片机高精度差分GPS技术研究

  6. VLC可见光通信关键技术研究

  7. MATLAB视频图像液滴速度检测技术研究

  8. 中考体育项目与体育教学合理结合的研究

  9. java+mysql车辆管理系统的设计+源代码

  10. 十二层带中心支撑钢结构...

  11. 河岸冲刷和泥沙淤积的监测国内外研究现状

  12. 酸性水汽提装置总汽提塔设计+CAD图纸

  13. 电站锅炉暖风器设计任务书

  14. 乳业同业并购式全产业链...

  15. 大众媒体对公共政策制定的影响

  16. 杂拟谷盗体内共生菌沃尔...

  17. 当代大学生慈善意识研究+文献综述




