A Geometric Feature Extraction Technique for Hindi Handwritten Character Recognition

(1)

A Geometric Feature Extraction Technique for

Hindi Handwritten Character Recognition

Neha Assiwal Dr. Neetu Sharma

M. Tech Scholar Head of Dept.

Department of Computer Science & Engineering Department of Computer Science & Engineering

GITAM Kablana, Jhajjar(Haryana) GITAM Kablana, Jhajjar(Haryana)

Abstract

This paper explains a recognition technique for recognition a character. This paper involves geometry technique for feature extraction which extracts the character separately. It is a segmentation-based character recognition system. The system is based on offline handwritten character recognition. In this system, it recognizes Indian languages characters. The features used in this system are based on the basic line types that form the character skeleton. The system output is formed from feature vector. The feature vectors is generated from a training set, were then used to train a pattern recognition engine. The pattern recognition engine based on Neural Networks so that the system can be benchmarked.

Keywords: Geometry, Character skeleton, Universe of Discourse, Zoning, Feature Extraction

________________________________________________________________________________________________________

I. INTRODUCTION

This paper describes lots of thinks which has been inspire from many literature. Some literature explains many high accuracy recognition systems for separated handwritten characters. However feature extraction based on local and global geometric features of the character skeleton has not been investigated much. The algorithm proposed concentrates on the same. It extracts different line types that form a particular character. It also concentrates on the positional features of the same. Neural network was used for testing feature extraction techniques which was trained with the feature vectors obtained from the system.

II. OVERVIEW OF THIS PAPER

Image Preprocessing is the starting stage of this paper. Universe of discourse is selected just because the features extracted from scanned image. After the universe of discourse is selected, the image is divided into windows of equal size. Character Skeleton which is used in image preprocessing was defined as starter, intersection and minor starter.

Character traversal is starts after zoning is done on the image. Each zone is individually subjected to the process of extracting line segments. After line segments have been extracted from the image, they have to be classified into any one of the following line types: Horizontal line, Vertical line, Right diagonal line, Left diagonal line.

(2)

III. STAGES OF HINDI CHARACTER RECOGNITION

Fig. 1: (a) Stages Of HCR

IV. IMAGE PROCESSING

Image pre-processing involves the following steps 1) Character Extraction from Scanned Image. 2) Binarization.

3) Background Noise removal. 4) Skeletonization.

V. UNIVERSE OF DISCOURSE

Universe of discourse is defined as the shortest and smallest matrix that fits the entire character skeleton. The Universe of discourse is selected just because the features extracted from the scanned character image. The character image includes the positions of different line segments. So each and every character image should be independent of its Image size.

VI. ZONING

The image is divided into windows of equal size, after the universe of discourse is selected. After that feature is done on individual windows. Two types of zoning were used at the time of system implemented. The image was zoned into 9 equal windows sized. Feature extraction was applied on individual zones rather than the complete image. This gives more information about fine details of character skeleton.

If zoning is used it give the information about the positions of different line segments in a character skeleton becomes a feature. This is because, a particular line segment of a character occurs in a particular zone in almost cases.

To extract the different line segments in a particular zone, the entire skeleton should be traversed in that zone. For this purpose, certain pixels in the character skeleton were defined as starters, intersections and minor.

(3)

Starters:

Starters are those pixels with one neighbour in the character skeleton. Before character traversal starts, all the starters in the that particular zone is found and is populated in a list.

Fig. 3: Starter

Intersections:

The definition of intersection is little bit complicated. The necessary but insufficient criterion for a pixel to be an intersection is that it should have more than one neighbour. Neighbouring pixels are classified into two categories i.e. direct pixels and diagonal pixels. All those Pixels which are in the neighbourhood of the pixel under consideration in the horizontal and vertical directions are called as Direct Pixels.

The remaining pixels in the neighbourhood which are in a diagonal direction to the pixel under consideration are called as Diagonal pixels. Under consideration for finding the number of true neighbours for the pixel, it has to be classified based on the number of neighbours it has in the character skeleton. Under consideration Pixels are classified with 3 neighbours, 4 neighbours, 5 or neighbours.

In the image, once all the intersections are identified, then they are populated in a list.

Fig. 4: Intersection

Minor Starters:

When pixel under consideration have more than two neighbours then minor starter are created. They are found along the course of traversal along the character skeleton. There are two conditions that can occur i.e intersection and non-intersection.

Fig. 5: Minor Starter

Starter, Intersection and Minor Starter are different for different Characters.

After zoning is done on the image Character traversal is starts. Each zone is individually subjected to the process of extracting line segments. Algorithm starts by considering the starters list. First the starters and intersections in the zone are found and then populated in a list.

Once all the starters are processed, the minor starters obtained so are processed. After that, the algorithm starts with the minor starters. All the line segments obtained during this process are stored, with the positions of pixels in each line segment. Once all the pixels in the image are visited, the algorithm stops.

After line segments have been extracted from the image, they have to be classified into any one of the following line types: Horizontal line, Vertical line, Right diagonal line, Left diagonal line.

VII.FEATURE EXTRACTION

Feature vector is formed based on this information after the line type of each segment is determined. Every zone has a feature vector corresponding to it. Under this algorithm proposed, every zone has a feature vector with a length of 9. They are as follows:

(4)

2) No. of vertical lines. 3) No. of Right diagonal lines. 4) No. of Left diagonal lines.

5) Normalized Length of all horizontal lines. 6) Normalized Length of all vertical lines. 7) Normalized Length of all right diagonal lines. 8) Normalized Length of all left diagonal lines. 9) Normalized Area of the Skeleton.

The number of any particular line type is normalized using the following method, Value = 1 - ((number of lines/10) x 2) Normalized length of any particular line type is found using the following method,

Length = (Total Pixels in that line type)/ (Total zone pixels)

The feature vector explained here is extracted individual for each zone. So if there are N zones, there will be 9N elements in feature vector for each zone. For the system proposed, the original image was first zoned into 9 zones by dividing the image matrix. The features were then extracted for each zone.

Again the original image was divided into 3 zones by dividing in the horizontal direction. Then features were extracted for each such zone. After zonal feature extraction, certain features were extracted for the entire image based on the regional properties Euler no, Regional area, Eccentricity.

 Euler Number: Euler no. is defined as the difference of Number of Objects and Number of holes in the image.

 Regional Area: Regional area is defined as the ratio of the number of the pixels in the skeleton to the total number of pixels in the image.

 Eccentricity: Eccentricity is defined as the eccentricity of the smallest ellipse that fits the skeleton of the image.

VIII. EXPERIMENTAL RESULT

The Image of the handwritten Hindi text will be taken as shown below. And this image is then processed.

Fig. 6: Load an Image

(5)

Fig. 7: Train using Neural Network Tool

Then when training process is completed then extract the character. Image is processed and made into a grey scale image then convert into binary image then perform edge detection which the application can read. Five figures are shown each with different definition.

Fig. 8(a): Input Image with Noise

(6)

Fig. 8(c): Image Dilation

Fig. 8(d): Image Filling

Fig. 8(E): Image Thining

(7)

Fig. 9: Extract text after five figures

Fig. 10: Output Displayed

IX. CONCLUSION

In this paper, we have been proposed a geometry feature extraction technique for identify the handwritten characters of Hindi script. A feature extraction technique that may be applied to classification of cursive characters for Hindi handwritten Character recognition. After training a Neural Network with a database of 650 images the method proposed was tested. This will give the output as Hindi character. If character was not match with the database then an error message is occur match not found.

The Proposed Algorithm will be implemented in MATLAB and can read Hindi handwritten characters. The Implementation part also covered in the experimental result section that demonstrates the real working of Proposed Algorithm. This is the best system for extract the Hindi character from scanned image.

REFERENCES

[1] Ajay Garg, Simpel Jindal, “to Extract Feature of Handwritten Devnagri Script”, International Journal of Advanced Research in Computer And Communication Engineering, vol. 3, issue 7, July 2014.

[2] M. Blumenstein, B. K. Verma and H. Basli, “A Novel Feature Extraction Technique for the Recognition of Segmented Handwritten Characters”, 7th International Conference on Document Analysis and Recognition (ICDAR ’03) Eddinburgh, Scotland: pp.137-141, 2003.

(8)

[4] Sameer Antani and Lalitha Agnihotri, "Gujarati Character Recognition", Fifth Int. Conf. Document Analysis and Recognition, Bangalore (India), pp. 418, 1999.

[5] K. Vijay Kumar, R.Rajeshwara Rao, “Online Handwritten Character Recognition for Telugu Language Using Support Vector Machines”, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-3, Issue-2, December 2013

[6] R. Jayadevan, Satish R. Kolhe, Pradeep M. Patil, and Umapada Pal "Offline Recognition of Devanagari Script: A Survey", ieee transactions on systems, man, and cybernetics—part c: applications and reviews, vol. 41, no. 6, november 2011

[7] Sonika dogra, chandra prakash, "pehchaan: hindi handwritten character recognition system based on svm", ijcse, issn : 0975-3397 vol. 4 no. 05 may 2012 [8] Brijesh k. Verma, "Handwritten hindi character recognition using multilayer perceptron and radial basis function in neural networks," IEEE International

conference on Neural Networks,vol. 4,pp. 2111-2115, Nov. 1995.

[9] Surya Nath R S, Afseena S, “Handwritten Character Recognition– A Review”, International Journal of Scientific and Research Publications, 1 ISSN 2250-3153, Volume 5, Issue 3, March 2015

[10] Mrs. Asma Shaikh, Mr. Rahul Dagade, “Offline Recognition of Handwritten Devanagari words using Hidden Markov Model”, International Journal for Innovative Research in Science & Technology| ISSN (online):2349-6010|Volume 1 | Issue 11| April 2015

[11] Swapril A. Vaidya, Balaji R. Bombade “A Novel APPROACH OF Handwritten Character Recognition Using Positional Feature Extraction”, IJCSMC, vol. 2, Issue. 6, June 2013.

[12] Rahul Kala, Harsh Vazirani, AnupamShukla, Ritu Tiwari, “Offline Handwriting Recognition”, International Journal of Computer Science Issues, Volume 7, March-2010.

[13] Kauleshwar Parsad, Devrat C Nigam, AshmikalAkhotiya, DheerenUmre, “Charcateer Recognition Using Matlab’s Neural Toolbox”, International journal of u-and-e-Service, Science and Technology vol 6,No.1,February 2013.