摘要当前,文本情感分类问题已经逐渐成为Web2.0时代的研究热点。同时由于微博、博客等社会化媒体飞速发展,用户参与意识的日益增强,社会化媒体上用户评论信息快速增长。用户评论信息挖掘在产品信息监测、社会舆情监测中发挥重要作用。其中,对用户评论进行情感分类是一项基础和关键性的工作。66776
对不同领域评论文本的情感分类效果进行比较研究,由此考察情感分类方法的领域差异与适应性问题,是一项有意义的研究课题。
本文针对科学网博客、携程网酒店以及当当网书籍三种网络评论信息进行子情感分类,并对分类结果进行比较,主要工作包括如下四个方面内容。
首先,借助支持向量机模型,分别对以上三种领域评论信息进行情感分类,得到分类结果。
其次是分析同种语料不同特征权重算法对于情感分类性能的影响,即:分别比较布尔权重、TF、Log(TF)、TF-IDF、TF-CHI、TF-RF六种特征权重对于三种不同语料的情感分类效果。
然后是分析同种特征权重方法不同语料对于情感分类性能的影响,从而确定可以得到最优情感分类性能的特征权重方法。
最后是分析不同阈值对情感分类性能的影响,从而得到最佳阈值。
本文实验结果表明,TF-IDF、TF-RF、TF三种特征权重算法的情感分类性能较优,TF-CHI、Log(TF)分类性能一般,布尔权重的分类性能较差。阈值在小于等于0.5时情感分类性能最优。三个领域的分类效果差异,说明了不同领域评论信息各有特色,特征权重算法的选择要依据评论自身的特点选择。
毕业论文关键词:情感计算、情感分类,中文评论,支持向量机,特征权重
毕 业 论 文 外 文 摘 要
Title Sentiment Classification of Chinese Reviews in Different Areas and Results Comparison
Abstract
Currently, text sentiment classification problem has gradually become the research focus of the Web2.0 era. At the same time due to the rapid development of micro-blog, blog and other social media, and users’ increasing awareness of participation, users’ reviews on social media grow rapidly. Users’ reviews mining plays an important role in the monitoring of product information and the social and public opinion. Among them, sentiment classification of users’ reviews is a fundamental and crucial work.
Comparative study on sentiment classification results of reviews in different fields and the adaptability of sentiment classification methods is an interesting research topic.
Based on Science blog, Ctrip hotel and Dangdang book these three web reviews, we conduct sub sentiment classification, and compare the classification results, the main work includes the following four aspects.
First, with the support vector machine model, we conduct sentiment classification on reviews of these three areas respectively, and get the classification results.
Second, we analysis the influence of same corpus with different feature weighting algorithms for sentiment classification performance, namely: respectively compare Boolean weights, TF, Log (TF), TF-IDF, TF-CHI, TF-RF six feature weighting algorithms for the three corpus sentiment classification results.
Then, we analysis the influence of different corpus with same feature weighting method for sentiment classification performance, in order to determine the feature weighting method with optimal sentiment classification performance.
Finally, we analysis different thresholds for sentiment classification performance, in order to determine the optimal threshold.
The results of this experiment indicate that TF-IDF, TF-RF, TF three feature weighting algorithms have better sentiment classification performance, the classification performance of TF-CHI, Log (TF) are generally, Boolean weights’ classification performance is poor. When threshold is less than or equal to 0.5, we get the best sentiment classification performance. The different classification performance in three areas indicates that different areas have their own characteristics and the selection of feature weighting algorithm should base on the reviews’ own characteristics.