         本设计用vs2010架构一个网络信息的抓取系统,将sina作为搜索主页,用户在关键词输入框中,输入自己想要了解的信息,系统就会从网页上抓取,并显示在表中。 用户可以筛选自己感兴趣的新闻,并保存到本地。本文在书写过程中,力求将理论与实践应用相结合,对各种理论进行阐述的同时配合系统从实际应用和操作技巧上加以说明,希望能够更充分地体现到这些知识与技术在本系统中的应用与实现。6104
    关键词: 搜索引擎;关键词;信息抓取;数据分析
    Research on the Abstraction and Statistics Of  Internet Information
    Abstract: With the rapid development of computer and network technology, people's lives more and more inseparable from the network. The course of development of the Internet and the current application status and development trend of network technology, can fully believe will greatly change the way we live and work, and even social values also occur a change. The carrier perfect, bringing the explosive growth of the information, and now the pace of society so fast, the time of the people's access to information is becoming shorter and shorter.
     VS2010 architecture of a network of the design information crawling system, sina Search Home users in the keyword input box, enter the desired information from a web page, the system will grab and shown in the table. The user can filter the news they are interested, and save it to local. In this paper, the process of writing, and strive to theory and practical application of a combination of various theories described in conjunction with the system to be described from the practical application and operating skills, and want to be able to more fully reflect the knowledge and technology in this system application and Implementation.
    Keywords:    search engine; keywords ;information grab; data analysis
    目  录
    1绪论    1
    1.1 课题的目的和意义    1
    1.2 国内外研究现状及发展趋势    1
    1.3 本文的安排    2
    2 技术说明    3
    2.1 网页分析    3
    2.1.1 简述    3
    2.2.2 超文本标记语言HTML    3
    HTML 是用来描述网页的一种语言。    3
    2.2 搜索技术    4
    2.2.1搜索技术简述    4
    2.2.2 索引技术    4
    2.2.3处理技术    4
    2.2.4智能技术    5
    2.3 DLL调用    5
    2.4正则表达式    7
    2.4.1 概念    7
    2.4.2 引擎介绍    7
    2.4.3 符号功能    8
    2.5 C#编写环境    9
    2.5.1 C#简介    9
    2.5.2 C#定义    9
    2.5.3 C#的特点    10
    2.6分词技术    10
    2..6.1 分词的意义    10
    2.6.2  中文分词技术的分类    10
    2.6.3 分词系统在本课题中的应用    11
    3网页特定文本的抓取与统计的方法研究    12
    3.1 网页抓取与统计概述    12
    3.2 网页分析    13
    3.3 分析统计    14
    3.4 调用分词    14
