基于HTMLParser的网页信息提取与分析

菜单

摘要网页信息提取与分析是针对用户某一确定的网络页面进行的操作，即对其页面上的信息进行提取并进行某些分析活动。而通常的方法有两种，一种是在联网的情况下直接抓取网络上用户确定的某一网页的信息进行分析，而另一种方法是先将确定的某一网页的HTM保存在本地，再对此HTM进行分析。最后将分析后的信息以HTML的格式进行存储。65500

本文首先对超文本标记语言进行了介绍，其次对HTMLParser的原理、分类和使用进行简单的叙述，论文的重点就是详细研究基于HTMLParser提取页面的方法，着重是设计模块和提取流程,最后则是调试并实现页面提取的工作并完成分析。

毕业论文关键词 HTML 提取 Parser 解析

毕业设计说明书（论文）外文摘要

Title Web Information Extraction and Analysis based on HTMLParser

Abstract

Web information extraction and analysis is conducted to determine a Web page for the user to an operation, i.e. its extract information on the page and some analysis activities. The usual method, there are two, one is retrieved directly in the case of networked users on the network to determine a page of information for analysis, HTM Another method is to first determine a page is stored in the local then shall HTM analyzed. Finally, the analysis information stored in HTML format.

This paper first introduces the HTML, then the principle of HTMLParser, classification and use a simple narrative, then the paper a brief description of how extracted and analyzed based on the HTMLParser web, and then describes how the extraction of the page and the introduction of the analysis of several functional modules, and finally through the certification test its ability to operate.

Keywords HTML extraction Parser analysis

一引言 1

1.1 研究目的与意义 1

1.2 论文的研究内容 5

1.3 论文的组织结构 5

二相关原理和技术 6

2.1 HTML语言 6

2.1.1 HTML语言的概念 6

2.1.2 HTML文档的编写方法和网页文件命名 6

2.1.3 HTML语言的基本结构 7

2.1.4 HTML的语言特点 9

2.2 HTML解析器 10

2.2.1 解析器的概念 10

2.2.2 HTMLParser的文法与结构 10

2.2.3 HTMLParser对HTML页面处理的方法 14

三基于HTMLParser的网页信息提取与分析系统设计 15

3.1 系统体系结构设计 15

3.2 功能模块设计 17

3.2.1 页面抓取 17

3.2.2 页面解析 17

3.2.3 显示模块 19

3.2.4 文件管理 19

四系统实现与运行 20

4.1 系统实现 20

上一篇：Andriod物流车辆在途信息手机查询系统的开发
下一篇：java+mysql航空订票系统的设计与实现

关闭

暂无收藏

About

751论文网手机版...

主页：http://www.751com.cn

关闭返回

基于MATLAB的图像增强算法设计

jsp+sqlserver高校二手商品交...

基于Kinect的手势跟踪与识别算法设计

JAVA基于安卓平台的医疗护工管理系统设计

java+mysql设备监控记录的大...

基于核独立元分析的非线...

基于Hadoop的制造过程大数据存储平台构建

机器人摩擦焊机头设计

《简爱》女性主义的象征

圆柱绕流国内外研究现状

文化旅游主题展示设计广富林十里长街设计

分光光度法测定水溶液中有机酸含量的研究

带式输送机技术英文文献和中文翻译

新生代农民工培训现状分析

探讨“绿色生态”在都市设计中的体现

合肥老乡鸡连锁餐饮企业的经营策略探析

公共服务均等化文献综述和参考文献

栏目

About