Recognition of off-line printed Arabic text using Hidden Markov Models.
View/ Open
Recognition.pdf (914.1Kb)
Download
Publication date
27/06/2008Keyword
Signal ProcessingOCR
Feature extraction
Arabic text recognition
Hidden Markov Models (HMM)
Omni font recognition
Rights
© 2008 Elsevier. Reproduced in accordance with the publisher's self-archiving policy.Peer-Reviewed
yes
Metadata
Show full item recordAbstract
This paper describes a technique for automatic recognition of off-line printed Arabic text using Hidden Markov Models. In this work different sizes of overlapping and non-overlapping hierarchical windows are used to generate 16 features from each vertical sliding strip. Eight different Arabic fonts were used for testing (viz. Arial, Tahoma, Akhbar, Thuluth, Naskh, Simplified Arabic, Andalus, and Traditional Arabic). It was experimentally proven that different fonts have their highest recognition rates at different numbers of states (5 or 7) and codebook sizes (128 or 256). Arabic text is cursive, and each character may have up to four different shapes based on its location in a word. This research work considered each shape as a different class, resulting in a total of 126 classes (compared to 28 Arabic letters). The achieved average recognition rates were between 98.08% and 99.89% for the eight experimental fonts. The main contributions of this work are the novel hierarchical sliding window technique using only 16 features for each sliding window, considering each shape of Arabic characters as a separate class, bypassing the need for segmenting Arabic text, and its applicability to other languages.Version
Accepted ManuscriptCitation
Al-Muhtaseb, H. A., Mahmoud, S. A. and Qahwaji, R. S. R. (2008). Recognition of off-line printed Arabic text using Hidden Markov Models. Signal Processing, Vol. 88, No. 12, pp. 2902-2912.Link to Version of Record
https://doi.org/doi:10.1016/j.sigpro.2008.06.013Type
Articleae974a485f413a2113503eed53cd6c53
https://doi.org/doi:10.1016/j.sigpro.2008.06.013