摘要随着信息时代的高速发展,网络技术对我们生活和工作显得越来越重要, 特别是现在信息高度发达的今天,传统的报纸杂志已经远远满足不了人们的需求,互联网已经成为人们快速获取、发布和传递信息的重要渠道,它在人们政治、经济、生活等各个方面发挥着重要的作用。简单地说,新闻采集系统就是充当一个网络新闻媒介的功能,主要实现对新闻的分类、上传、审核、发布,模拟了一般新闻媒介的新闻发布的过程。
此软件是基于网络爬虫软件开发而来。网络新闻信息采集系统的主要功能为:根据用户自定义的任务配置,批量而精确地抽取因特网目标网页中的半结构化与非结构化数据,转化为结构化的记录,保存在本地数据库中,用于内部使用或外网发布,快速实现外部信息的获取。系统基本功能模块包括:用户登陆、站点管理与新闻采集、对采集结果进行筛选和关键字查找、数据库管理。具体如下:1.实现管理员与用户的登陆和用户信息的管理;2.实现指定网页新闻的采集,采集站点的添加与管理;3.实现对采集到的新闻的筛选和关键字搜索功能;4.实现数据库对采集到的新闻信息的管理。8210
关键词: 网络爬虫 新闻采集 新闻管理 数据保存
毕业设计说明书(论文)外文摘要
Title Webpage information acquisition
Abstract: With the rapid development of the information age, network technology on our lives and work are becoming increasingly important, especially now highly developed, traditional newspapers and magazines have far to meet not the needs of people, the Internet has become a fast acquisition, publishing, and an important channel for transmitting information, it plays an important role in the people political, economic, and other aspects of life. Simply put, the function of news gathering system is to act as a network media, the main classification of news, upload, review, publish, simulate the general media press release.
This software is based network reptiles software development to come. The main function of the network news and information gathering system: according to user-defined task configuration, volume and precise extraction Internet landing pages semi-structured and unstructured data into structured records, stored in a local database, for internal use or external network release, quick access to external information. The basic functional modules of the system include: user login, site management and news gathering, screening and keyword search, database management, the acquisition results. As follows: 1. Administrator and user login and user information management; Specified page news gathering, collection site to add and Management; Collected news filtering, and keyword search function; Database management news and information collected.
Keywords: Web crawler news gathering news data retention data save
目次
1 引言 1
1.1 课题背景 2
1.2 课题研发的意义 2
2 网页新闻采集系统 3
2.1 系统概论 3
2.2 国内外新闻采集系统的发展现状 3
2.3 新闻采集系统的发展趋势 6
3 开发技术和工具 7
3.1 系统的开发工具 7
3.2 数据库 9
4 系统总体设计 11
4.1 可行性分析 11
4.2 系统总体设计结构 11
4.3 系统功能简介 11
4.4 数据库的设计 12
5 系统详细设计及代码 18 基于网络爬虫C#网络新闻采集系统设计+文献综述:http://www.751com.cn/jisuanji/lunwen_6451.html