摘要博客、微博等社会化媒体迅速发展,社会化媒体的分析与挖掘已成为Web2.0时代备受瞩目的热点。海量的信息让人们眼花缭乱,标签及大众分类法的出现,为人们在迅速增长的信息流中快速准确地检索到所需信息提供了一种新的可能。其中,微博使用的语言比较口语化,带有很强的个性特征。Hashtag就是微博环境下产生的一种新型标签,一般用来表示特定话题。
本文首先介绍了Hashtag的概念,然后概述标签和分类系统相关理论,包括标签定义、标签类型以及大众分类的特征等内容。除此之外,还调研了国内外Hashtag 相关研究。在调研分析中通过抓取新浪微博和Twitter的数据,提取其中的Hashtag,并对Hashtag进行整理、加工和分析。然后构建分类体系,对中英文Hashtags进行分类、并对不同类型下的流行Hashtag进行比较研究,重点比较研究中英文Hashtag在词性、长度和频次等统计特征上的分布情况。66771
本文在最后给出了中英文Hashtag的比较研究的结论,并根据分析结果,提出关于更好利用Hashtags的相关建议。
毕业论文关键词:Hashtag,用户标签,标签系统,词性标注,大众分类法
毕 业 论 文 外 文 摘 要
Title Comparative research of Chinese and English Hashtag
Abstract
With the development of electronic technology, Social media such as blog, weibo has become a spot of this Web 2.0 era, with floods of information bombarding. The emergence of labels as well as Folksonomy makes it possible for people to retrieve the information they need in a rapid and accurate way. Weibo has shown its originality and evident personal colors in usage; Hashtag is one of the most novel labels that arise under this circumstance,usually for expressing the certain topic.
This thesis starts by stating the definition of Hashtag, then outlines the relevant theory of label and Folksonomy system, including the definition of label, the types of label as well as the features of Folksonomy etc. Besides, this thesis makes a research of the existing study about Hashtag. Combined with theoretical foundation, this thesis sorts the Hashtags out among the data got from Xin Lang weibo and Twitter, then classifies and analyzes them. This thesis also builds a Folksonomy system of both the Chinese and English Hashtags, and then puts them in order and makes a comparison in respect to word property, length and using frequency.
The last part of the thesis draws a conclusion of the comparative research of Chinese and English Hashtags, and proposes beneficial suggestions of further improvement.
Keywords: Hashtag, customer label, label system, part-of-speech tagging, Folksonomy
目 录
1 绪论 1
1.1研究背景 1
1.2研究意义 2
1.3研究内容 3
1.4本文主要结构 4
2.1相关基本概念 5
2.1.1 社会化媒体与微博 5
2.1.2 Hashtag 6
2.2 标签与大众分类法 7
2.3 Hashtag的研究综述 8
3数据处理流程总体介绍 10
4 Hashtag数据外部特征统计 12
4.1 Hashtag数据概况 12
4.2中英文Hashtag的数量统计 13