Academic Journals Database
Disseminating quality controlled scientific knowledge

A novel framework for Farsi and latin script identification and Farsi handwritten digit recognition

ADD TO MY LIST
 
Author(s): Behrad Alireza | Khoddami Malike | Salehpour Mehdi

Journal: Journal of Automatic Control
ISSN 1450-9903

Volume: 20;
Issue: 1;
Start page: 17;
Date: 2010;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: curvature scale space | script identification | optical character recognition | CBWSVM | clustering | Handwritten digit recognition | PCA | PCA-LDA

ABSTRACT
Optical character recognition is an important task for converting handwritten and printed documents to digital format. In multilingual systems, a necessary process before OCR algorithm is script identification. In this paper novel methods for the script language identification and the recognition of Farsi handwritten digits are proposed. Our method for script identification is based on curvature scale space features. The proposed features are rotation and scale invariant and can be used to identify scripts with different fonts. We assumed that the bilingual scripts may have Farsi and English words and characters together; therefore the algorithm is designed to be able to recognize scripts in the connected components level. The output of the recognition is then generalized to word, line and page levels. We used cluster based weighted support vector machine for the classification and recognition of Farsi handwritten digits that is reasonably robust against rotation and scaling. The algorithm extracts the required features using principle component analysis (PCA) and linear discrimination analysis (LDA) algorithms. The extracted features are then classified using a new classification algorithm called cluster based weighted SVM (CBWSVM). The experimental results showed the promise of the algorithms.
Affiliate Program      Why do you need a reservation system?