摘要近年来,随着互联网规模和用户数量的不断增长,互联网应用飞速发展,我国信息化水平迅速提高。它是以互联网为依托,利用互联网快速、方便、高效的优势推动新闻报道的发展,从而开启了一个新的传媒时代。但是正是又有存在这些优势,互联网也给用户在阅读新闻时带来了很多问题。65994
本系统在基于这些问题之上,利用visual studio 2010和sql server 2005集成开发了一个网络新闻采集系统,即根据一定的采集规则为用户采集需要的新闻。本系统利用网络爬虫技术,进行深度遍历,根据用户提供网站的url深度搜索新闻,下载并解析相关新闻网页,并将新闻以一定的结构化存储到数据库中。用户还可以选择是否下载在网页上存在的图片,编辑、删除和查看需要采集的站点和已经采集到的新闻。同时,本系统在基于Framework 4.0的框架上,设计界面功能布局和界面显示,以是用户能够方便快捷的使用本系统。
毕业论文关键字 网络新闻 采集 网络爬虫
毕业设计说明书(论文)外文摘要
Title Network news gathering system
Abstract
In recent years, the number of users surfing the Internet continues to grow, because of the rapid development of Internet applications. Network News has become one of the important applications on the Internet. It is relying on the Internet, using the fast, easy and efficient advantages of Internet to promote the development of news reporting, thus opening up a new era of media. However, also because of the existence of these advantages, the Internet also bring a lot of problems to the users reading news, such as problems of irregular nature of news, false news content and intricate news websites.
The system is based on resolving these problems with the use of visual studio 2010 and sql server 2005 to integrate network news gathering system. That’s to say users can gather network news according to the collection of rules. The system uses the technology of Web crawler, with in-depth traversal and according to the urls of websites provided by users, searching for news, downloading and parsing the news pages and storing news with certain structured ways in the database. Users can also choose whether to download the pictures on the page, to edit, delete, and view the websites you need to collect and certain news collected in the system. The system framework was based on Framework 4.0 to design layouts and interface displays ,interface functions, so users can use the system conveniently.
Key words Network news gathering WEB Crawler
目 次
1 绪论 1
1.1 研究目的和意义 1
1.2 新闻采集的研究现状 1
1.3 论文组织 3
2 系统概述 4
2.1 网络新闻采集定义 4
2.2采集系统工作原理 4
2.2网络爬虫 6
2.3 本章小结 8
3 系统需求分析 9
3.1概述 9
3.2系统需求分析 9
3.3可行性分析 10
3.4开发工具简介 11
3.5系统环境配置 13
4 总体设计