IMPROVED PARCEL SORTING BY COMBINING AUTOMATIC SPEECH AND CHARACTER RECOGNITION
Automatic postal sorting systems have traditionally reliedon optical character recognition (OCR) technology. WhileOCR systems perform well for flat mail items such as en-velopes, the performance deteriorates for parcels. In thisstudy, we propose a new multimodal solution for parcel sort-ing which combines automatic speech recognition (ASR)technology with OCR in order to deliver better performance.Our multimodal approach is based on estimating OCR outputconfidence, and then optionally using ASR system outputwhen OCR results show low confidence. Particularly, weproposed a Levenshtein edit distance (LED) based measureto compute OCR confidence. Based on the OCR confidencemeasure, a dynamic fusion strategy is developed that formsits final decision based on (i) OCR output alone, (ii) ASRoutput alone, and (iii) combination of ASR and OCR outputs.The proposed system is evaluated on speech and image datacollected in real-world conditions. Our experiments showthat the proposed multimodal solution achieves an overallzip code recognition rate of 90.2%, which is a 27903
substantialimprovement over ASR alone (81%) and OCR alone (80.6%)systems. This advancement represents an important contri-bution that leverages OCR and ASR technologies to improveaddress recognition in parcels.Index Terms— Automatic Speech Recognition, OpticalCharacter Recognition, Parcel Sorting, Address Recognition
1. INTRODUCTIONParcel sorters are large machines that automatically sort anddirect mail towards their destination. Parcel sorting technol-ogy provides greater speed, efficiency, and reliability to maildelivery while cutting operational cost. The key step in auto-matic parcel sorting is reliable automatic identification of theaddress information on inpidual mail items. Traditionally,optical character recognition (OCR) has been used widely forreading addresses on mail items. While OCR performs ex-tremely well on flat mail items (also known as flats) such asenvelopes, its performance is lower on parcels. Address labelson flats are generally more consistently oriented and providea strong black-font on white-background contrast resulting inbetter OCR performance. However, address labels on parcels are inconsistently placed, often covered with plastic, and theparcel itself could be irregularly shaped resulting in poorerOCR performance. Consequently, there is a need to improveautomatic address recognition accuracy for parcels.In this study, we explore the possibility of using automaticspeech recognition (ASR) together with OCR to deliver im-proved address recognition accuracy for parcels. While mul-timodal approaches that combine OCR and ASR system havebeen explored in other domains [1, 2], the application domainof postal automation is relatively under explored. Here, it isimportant to note that parcel sorters are generally operated bya single person, whose primary job is to pick up the parcel andplace it on the sorting machine conveyor belt. In the proposedmultimodal solution, the operator is assigned a secondary jobwhere he/she reads the address label while placing the par-cel. In this manner, the necessary speech input for the ASRsystem can be obtained. It is also important to note that theoperation is minimally effected as the operator is generallyonly required to speak the zip code (or the first three digits ofthe zip code).
In the proposed multimodal approach, the ASR and OCRsystems work independently to decode the best results, whichis later combined to deliver superior performance. Particu-larly, the combination logic examines the OCR output andgenerates a confidence score. If the OCR output is gener-ated with high confidence, then ASR output is not used forfinal processing of the result. Additionally, if the OCR out-put is generated with low confidence, then the OCR outputis not trusted and the ASR output alone is used for process-ing the final output. Finally, if the OCR output is generatedwith medium confidence, then a combination of OCR andASR outputs is used to process the final result. In order toplace the OCR output in high, medium, and low confidencecategories, a confidence measure based on Levenshtein editdistance (LED) is proposed. The LED based measure is veryefficient and effective at utilizing ASR output when OCR out-put is unlikely to be correct.
- 上一篇:液压系统的英文文献及中文翻译
- 下一篇:鄂式破碎机耐磨衬板英文文献和中文翻译
-
-
-
-
十二层带中心支撑钢结构...
杂拟谷盗体内共生菌沃尔...
大众媒体对公共政策制定的影响
当代大学生慈善意识研究+文献综述
电站锅炉暖风器设计任务书
java+mysql车辆管理系统的设计+源代码
酸性水汽提装置总汽提塔设计+CAD图纸
中考体育项目与体育教学合理结合的研究
河岸冲刷和泥沙淤积的监测国内外研究现状
乳业同业并购式全产业链...