毕业论文关键词 特征选择 RELIEF 特征权重 FSLSI 投影矩阵 MFA
Abstract Many applications, such as text processing, gene expression array analysis, and combinatorial chemistry, are characterized by high dimensional data, but usually only a small subset of features is really important. Feature selection is thus preferred. Feature selection can enhance subsequent classifier’s generalization capability and remarkably speed up learning and classification process. Moreover, it improves model interpretability and significantly reduces storage requirements. In this paper, we study two feature selection algorithms, namely RELIEF and FSLSI algorithm. RELIEF is considered one of the most successful algorithms for assessing the quality of features. The Key idea of RELIEF is to iteratively estimate feature weights according to their ability to discriminate between neighboring patterns. FSLSI is a preprocessing step of LSI such that LSI can be efficiently approximated on large scale text corpus. It uses a projection matrix to project the vector to a lower dimensional feature space. We propose a novel algorithm, called Marginal Fisher
Analysis (MFA),the Key idea is concerning more documents when maximize the averaged margin in a weighted feature space.
Keywords feature selection, RELIEF, feature weighting, FSLSI, projection matrix,MFA
目 次
1 引言 5
2 全局最优搜索策略 8
2.1 RELIEF算法 8
2.2 基于隐藏语义空间的特征选择算法(FSLSI) 10
3 基于间隔Fisher分析的全局最优特征搜索 16
4 实验 20
4.1 识别率 20
4.2错误率VS最优特征数 21
4.3全局最优特征搜索在人脸识别上的应用 23
4.4对基于MFA的K-近邻的K的研究 24
结论 26
致谢 27
参考文献 28
1 引言
1.1 特征选择背景介绍
随着信息技术的迅速发展,生物信息学、图像处理、文本挖掘等许多领域中研究对象的数据描述越来越趋向于高维,给数据挖掘造成了“维数灾难”。针对这个问 题,出现了数据降维的研究。而特征选择作为数据降维的方法之一,因简单、直观、有效而应用广泛。特征选择可以去除冗余特征、无关特征、甚至噪声特征,从而 得到一个近乎无冗余、无噪声的样本集。合适的特征选择算法,可以有效的去除不相关的特征和冗余特征,提高学习算法的泛化性能和运行效率,得到更加简单和容易理解的学习模型。 特征选择中的全局最优搜索策略研究:http://www.751com.cn/jisuanji/lunwen_72445.html