Abstract Every-day, the postal sorting systems diffuse several tons of mails. It is noted that the principal origin of mail rejection is related to the failure of address-block localization task, particularly, of the physical layout segmentation stage. The bottom-up and top-down segmentation methods bring different knowledge that should not be ignored when we need to increase the robustness. Hybrid methods combine the two strategies in order to take advantages of one strategy to the detriment of other. Starting from these remarks, our proposal makes use of a hybrid segmentation strategy more adapted to the postal mails. The high level stages are based on the hierarchical graphs coloring, allowing managing through a pyramidal data organization, the complex rules leading the interpretation of the connected components decomposition of interest zones.31818
Today, no other work in this context has make use of the powerfulness of this tool. The performance evaluation of our approach was tested on a corpus of 10000 envelope images. The processing times and the rejection rate were considerably reduced. 1. Introduction Automatic mail sorting machines of most recent systems process about 17 mail pieces per second. That requires a fast and precise OCR based recognition of the block-address. This recognition is mainly conditioned by a correct address lines organization [1][2][3]. Once the envelope image has been acquired by a linear CCD camera, three principal modules contribute to the task of the address-block localization: physical layout segmentation of envelope image, feature extraction and address-block identification (see figure 1).
The phase of the physical layout segmentation has a great impact on the global performance of the sorting system. Generally, this segmentation indicates the decomposition of envelope image into disjoint constitutive elements containing homogeneous components in order to identify them separately. These elements are often spaced and form elementary geometrical blocks, based on a rectangle in the large majority of the cases. The definition of segmentation in literal sense is very similar to the word “analyze”. One speaks about over-segmentation when constitutive components are fragmented and about under-segmentation when several constitutive components cannot be isolated. From the effectiveness point of view, we noticed that the traditional segmentation techniques encounter several constraints (figure 2): degraded images (folded envelopes),
very large mail variety (quality, color and different paper textures), real time constraints (limited processing time), skewed text lines on the envelopes, non-uniform spacing between characters, lines and blocks of text, result’s obligation, high spatial resolution of the images (300 dpi), Presence of parasitic elements near the address-block (stamps, post office marks, printed logos…), superimposed information layer (stamp, handwritten notes…). Figure 2. Very large mail varietyTaking into account these limits, we propose in this paper an original method of physical layout segmentation of the postal mail images. Using the graph theory, the principal of our technique is based on pyramidal representation of data. The fundamental objective consists in increasing performances of each segmentation stage and its coherence with the other stages in order to reduce mail rejection and time processing to the maximum. The developed method will be integrated into a system of automatic mail sorting. The remainder of this paper is organized as follows: The various segmentation methods are quoted in section 2 in which the previous works and the set limits are presented. In the third section, the formal aspects of graphs coloring are detailed. The fourth section describes the application of the coloring to the segmentation problem. The obtained results are then commented and discussed. 2. Various segmentation strategies The segmentation methods analyze the envelope image in order to extract the textual block pided into lines and characters. It is mainly based on the hierarchical revelation of the linear structure of physical components. The text regions represent one of the main information sources necessary to the automatic sorting of mail items. It is clear that a huge amount of constraints makes the segmentation of these vital regions very difficult. The literature generally refers to three strategies of segmentation. The bottom-up and top-down segmentation methods bring different knowledges that should not be ignored when we need to increase the robustness. Hybrid methods (or mixed approaches) combine the two strategies (bottom-up and top-down) in order to benefit from the advantages of one strategy to fill the disadvantages of the other. This combination can reduce several errors caused by the traditional segmentation methods [4].
- 上一篇:破碎机偏心速度英文文献和中文翻译
- 下一篇:模糊PID控制算法英文文献和中文翻译
-
-
-
-
-
-
-
java+mysql车辆管理系统的设计+源代码
当代大学生慈善意识研究+文献综述
杂拟谷盗体内共生菌沃尔...
乳业同业并购式全产业链...
中考体育项目与体育教学合理结合的研究
河岸冲刷和泥沙淤积的监测国内外研究现状
十二层带中心支撑钢结构...
酸性水汽提装置总汽提塔设计+CAD图纸
电站锅炉暖风器设计任务书
大众媒体对公共政策制定的影响