摘要当今时代是一个大数据时代,在生活中随处可见的就是数据,而且人们工作、学习中也是不断地与数据打交道,最终处理分析的都会与数据相关。但是每一件需要决策或分析的事情可能是由大量的数据组成,而且这些数据可能是不连续,甚至是无规则的,采用一般的方法很难找到规律或者进行处理,因此数据挖掘技术不断地发展起来,对于大数据的处理也有了专门的算法进行研究分析,给生活中一些难以预测的事情带来了分析,例如:银行贷款风险、临床决策、生产制造等各个方面。
决策树算法是一种分类算法,对于需要进行决策或是分析的事件,根据可能影响该事件结果的因素的属性来确定树的结构。C4.5决策树算法是通过计算影响数据因素的信息增益率的大小来确定树的各个节点。信息增益率最大的作为根节点,之后再计算信息增益率分别确定各个叶节点,从而形成一颗自上向下的决策树,数据通过树的各个节点进行判断最终找到自己所属的分类。48009
在学校中学生成绩的优劣是学校评估学生最好的标准,也是因材施教的判断方式,但是影响学生成绩的因素很多,学生的成绩稳定并不是固定的,也不能单凭学生的考试成绩就能判断出学生的水平,因此需要对学生上课、学习等表现来最终判断学生的学习情况,这是需要将学生各个方面的表现情况进行整体的分析,采用数据挖掘的方式,根据学生各方面的表现情况可以判断出学生的学习状况。
毕业论文关键词:数据挖掘; 大数据; 决策树算法;C4.5算法
Abstract
In today's era is an era of big data, is data can be seen everywhere in life, in people's work, study and is constantly dealing with data, finally will be related to data processing analysis. But every things need decision or analysis may be composed of a large amount of data, and these data may be discontinuous, even without rules, it is difficult to find a rule or adopt the method of general processing, thus the data mining technology constantly developed, for large data processing has a special algorithm research and analysis, has brought some unpredictable things in life, for example: bank loan risk, clinical decision making, production and manufacturing, and other aspects.
The decision tree algorithm is a kind of classification algorithm, for the intention to decision or analysis of events, according to the factors may affect the results of the event attributes to determine the structure of the tree. C4.5 decision tree algorithm is used to determine the size of the information gain rate each node of the tree. Maximum of information gain rate as a root node, and then calculate the information gain rate, respectively, to determine each leaf node, thus forming a downwards on the decision tree, data through the various nodes of tree judgment finally found his own classification.
The advantages and disadvantages of middle school students grades in school is the best standard, evaluate students the school is also according to their aptitude way of judgment, but there are many factors which can affect student achievement, student performance stability is not fixed, nor the students' test scores alone can determine the level of the students, so you need to the students in class and learning performance to eventually determine the students' learning situation, this is the need of the students all aspects of performance to carry on the overall analysis, with the method of data mining, according to the performance of students can judge the students' learning situation.
Keyword: Data Mining ; Big data; Decision Tree Algorithm;C4.5 algorithm
目 录
1、引言 5
2、算法介绍 数据挖掘算法分析与实践C4.5决策树算法:http://www.751com.cn/jisuanji/lunwen_50331.html