网络新闻文本中的命名实体自动抽取研究

菜单

摘要随着信息的过量增长，面对大规模的信息，用户难以找到自己真正需要的信息，信息抽取技术应运而生。在该领域中命名实体是文本的基本信息要素，因而命名实体的抽取是信息抽取的基础。本文从介绍命名实体出发，基于FuDanNLP系统对网络新闻文本中的命名实体自动抽取情况进行了研究。本论文主要分析了人名、时间、地名和机构名四类命名实体的抽取情况，通过测试发现该系统在自动抽取命名实体过程中存在的问题并分析了问题产生的原因，进而提出了改进方案。之后着重对机构名的抽取设定了规则和算法，在对改进后的系统进行了测评后,发现系统在抽取命名实体的查全率和查准率均有所提升，最后对系统的发展提出的展望。66770

毕业论文关键词命名实体网络新闻信息抽取 FuDanNLP

毕业论文外文摘要

Title Study on Automatic Extraction of Named Entity in

e-News Text

Abstract

With the excessive growth of information, it’s very difficult for users

to find the information that they really need from large amounts of information. Then information extraction technology came into being, and the named entity is the basic information elements of text in this field. In this paper, I firstly describe named entities and research the named entity automatic extraction in cyber news text by using FuDanNLP system. This paper analyzes the extraction of four types of named entities , namely names, times, places and organization names. After doing experiment , I found the problems in FuDanNLP system and explored the causes of these problems. And I improve the system. Besides, I reset the rules and designed algorithms of the extraction of organization names. Finally, I evaluated the new system and find the accuracy rate and the recall rate of the system are increased. But there’re many problems in the new system, so I finally put forward prospect.

Keywords named entity ; e-news; information extraction; FuDanNLP

1 引言 1

2 命名实体综述 1

2.1命名实体的类型 1

2.2 命名实体的识别 2

2.3 命名实体抽取方法分析 2

3 网络新闻文本中的命名实体分析 4

3.1 新闻文本的特点分析 4

3.2 网络新闻文本的特点 4

3.3 网络新闻中的命名实体 5

3.3.1 人名的特点 6

3.3.2 地名的特点 6

3.3.3 机构名的特点 7

4 中文命名实体抽取系统——FuDanNLP 9

4.1 FuDanNLP简介 9

4.1.1 FuDanNLP的组织结构 9

4.1.2 FuDanNLP命令行调用使用示例 10

4.1.3 FuDanNLP目录组织机构 11

4.1.4 FuDanNLP Java包组织结构 11

4.1.5 FuDanNLP总体流程 12

4.2 命名实体识别 12

4.3 FuDanNLP的研发路线 12

4.4 性能测试

上一篇：网络外部性下电子商务网站的用户忠诚测评模型研究
下一篇：中英文Hashtag标签的比较研究

关闭

暂无收藏

About

751论文网手机版...

主页：http://www.751com.cn

关闭返回

python+mysql网络习题爬取系统的设计与实现

神经网络算法在核素识别中的应用研究

苏州一建集团网络的规划与设计

asp.net网络商城在线购物系统的设计与实现

ASP.net+SQLserver校园新闻管理系统的设计与实现

asp.net+sqlserver新闻管理系统的设计与实现

java局域网络的即时聊天系统设计+源代码

酸性水汽提装置总汽提塔设计+CAD图纸

java+mysql车辆管理系统的设计+源代码

乳业同业并购式全产业链...

十二层带中心支撑钢结构...

中考体育项目与体育教学合理结合的研究

当代大学生慈善意识研究+文献综述

电站锅炉暖风器设计任务书

杂拟谷盗体内共生菌沃尔...

大众媒体对公共政策制定的影响

河岸冲刷和泥沙淤积的监测国内外研究现状

栏目

About