HANDWRITTEN DEVNAGARI NUMBER RECOGNITION USING INTELLIGENT AGENT

(1)

HANDWRITTEN DEVNAGARI NUMBER

RECOGNITION USING INTELLIGENT AGENT

Deepshikha

1

, Akhilesh Pandey

2 1

Department of Computer Science Engineering, Suresh Gyan Vihar University, Jaipur, Raj, (India).

2

Assistance Professor, Department of computer science engineering, Gyan Vihar University, (India)

ABSTRACT

The proposed approach deals with extraction of characters from image by walking over the image, in an

environment which is agent based. The OCR system has been developed in this dissertation to fulfill the aim of

extraction of image. The OCR system which has been developed here is implemented in MYSQL, NET LOGO, and

java. The interface of the system is created by java and java also runs NETLOGO.Net Logo is a modeling system

which is agent based and provides environment so that the new approach is implemented. The Net Logo extracts the

characters from the image and concludes the result. This result is derived by walking of the OCR system over the

image. The results of the walking movement over a character are matched against a dataset contained by My SQL

and the most similar character which possesses this result is found from the dataset. The proposed OCR system was

tested on various images of the texts and performance of the System was also compared with an online OCR

application

.

I INTRODUCTION

After the invention of digital computers correlating human functions to computers is becoming very interesting topic

of research .In past Efficient Algorithm has been developed so that Machines can recognize characters. OCR is

defined as the system that makes out the character file from the images which is scanned copy of the printed text

documents, handwritten documents and typewritten documents. We can say it a visual reorganization process which

converts text messages into texts which is editable. By using the OCR technologies, the efficiency of the office work

is increased, because in this technology the characters can be recognized from the image, which are an easy task

then re-typing the text. Due to the advantage of this OCR technology it is used in a number of fields. There are two

types OCR technologies on the basis of the input devices used: - line recognition and off-line recognition, In

On-line method the data are collected by real time devices such as digitizer tablets. But in off-On-line method the data are

collected from static devices such as scanners and cameras. Online recognition method allows the writing of

information in real-time because of the concurrent data collection structure. But in case of Off-line recognition

process few technologies are used for preparing the image for recognition process and it also removes the noise and

(2)

algorithm development that is learned and it is an artificial intelligent branch. [1] A wide range of task is learned and

performed by the machine [2].From the two past decades one of the most important technologies is machine

learning, and hence used in scientific domain of various types such as Robotics, Computer vision, theoretical

computer science, recognition and optimization

Figure 1: Showing the OCR Model

Above diagram is showing different methods of character recognition of Devanagri Script. The main five methods

shown are:

1. One dimensional function

2. Polygon approximation.

3. Spacial Domain Feature Extraction.

4. Moments.

5. Transform Domain.

II PAST STUDIES

In the year 1994 the author Brijesh k.verma proposed that in this paper the comparison is made between two

networks for the handwritten Hindi character recognition. The two networks which is compared are the Radial Basis

Function and the Multi-Layer Perceptnor (MCP).The algorithm that trained the MCP networks was the error back

propagation algorithm. In this paper the system which was presented was an automatic Hindi character Recognition

(HCR) which uses MLP and RBF networks. Two hundred and forty-five samples of five writers were experimented.

Multi-layer preceptor (MLP) was superior in working like in recognition accuracy, memory usage etc .But the

drawback of this system was that it suffered because of the long training time in comparison to the RBF networks.

(3)

In the year 2007-2008 authors M.Hanmandlu, O.V Ramana Marthy and Vamsi Krishna Madasu proposed that the

handwritten character recognition of Hindi characters on the basis of the modified exponential membership function;

related to the fuzzy sets derived on the basis of the features that consists of the approach that is normalized distance

evaluated using the box approach.[4]

In the year 2008 authors Yi Li, yefeng zheng, David Doermann and Stefan jaegar proposed that study should be

done free style handwritten documents segmentation of line. The algorithm which has been developed for machine

printed and hand printed documents faces problems in terms of lines of curvilinear text and the small gaps between

the neighboring texts.[5]

In the year 2010 authors Abhimanyu Kumar and Samit Bhattacharya proposed a scheme that is having an

implementation on the i-phone so that the handwritten devnagri script was recognized online.Devnagri script is one

of the Indian script; This script is used for a number of different Indian language like Sanskrit, Marathi, Hindi[6].

III METHODOLOGIES OF OCR

The process of OCR follows the following steps

a) Preprocessing process

b) Segmentation process

c) Feature extraction process

d) Classification and Recognition process

a)

Preprocessing

In this process texted part useful features are retained and the important information present in the image is

discarded. To achieve this image has to undergo a set of operation of preprocessig. And according to the image

structure these operations are chosen. Some operations may be discarded or can be applied in different sequence.

In this phase the important method performed are as follows

1. Gray scale and Thresh holding method.

2. Colour Image Processing method.

3. Morphological operations method.

b)

Segmentation

Image text lines are converted into characters words and sentences by the following process and its aim is also the

same [7].The step of segmentation is very important because character extraction at this stage affects directly the

(4)

Figure 2: Showing different segments

The two major types of segmentation process are:-

1) Explicit.

2) Implicit.

c)

Feature Extraction

The following phase before the process of recognition the most representative information present in the raw data is

being extracted.

There are various techniques of extraction which are mainly grouped in three categories. [8][9].

1.

Geometric features.

2.

Statistical features.

3.

Global Transformation.

d) Classification and Recognition

In character recognition last step, according to the features of characters they are recognized and classified.

The main categories are as follows:-

1) Template matching.

2) Statistical Technique.

3) Structural Technique.

Figure 3: Zoning of the number zero

.

(5)

IV PROBLEM STATEMENT

The main platform is used in the project to develop the OCR system; the three platforms are the MYSQL, the java

programming language and the Net logo. These platforms are required so that the agent based approach is fulfilled.

The important part of the agent based approach is the recognition of the character in any image by walking over the

image and its implementation in Net logo. But the other two platforms are used so that the Net logo is supported.

There are few results which gives information of the character; only when an agent’s move over a character is

completed. The result which we get from the movement over the character is matched with all character features, to

find out that to which character they belong, that’s why a list is needed which is having features of all characters

(e.g.:-edge count of characters); and is used in matching the result movement of Net logo. Next platform helps in

storing the list of characters.

V IMPLEMENTATION

The work of the image processing is to make ready the image for recognition and processing in Net logo

environment. This step is taken for the files that are having the features such as they are noisy, blurred, damaged and

handwritten. But this is not true for typewritten files because if the quality of the paper is very high then few steps of

this algorithm is skipped during typing the text. That’s why few preprocessing Algorithm is kept optional. In this

case the user makes the decision to use which image processing, according to the quality of the paper. There are few

methods which are used in image processing, they are the thinning algorithm, the Thresh holding algorithm and the

convolution filters. But few convolution filters is having optional work for this program. On the other hand the

Thinning and Threshold algorithm are always used for images. The images in Net logo should always be black

and white. The black and white image is provided by the Threshold Algorithm. The lines in the characters of the

images should be of one pixel in width when these images are processed in Net logo. And the one pixel width of the

line in the characters is provided by the Thinning algorithm.

The image of the text documents is loaded by the user into the system by using the “load image” button. The image

which is loaded is displayed on the left side of the button. The document which is the original text is also loaded by

using the button known as “load text” button. When the image processing button is clicked by the user then the

image processing options are displayed. These image processing options are the convolution filters like the edge

detection and noise removal filters, the threshold methods and the thinning algorithm methods. There are few

methods which are implemented by the system they are stated beneath:-

 Thresh holding:-In this case the local methods and Ostu methods are used.

 Edge detection:-In this case Sobel filter, Robert filters, Prewitt filters, and Laplacian filters are used.

 Noise Removal:-Gaussian filters, Mean filters and Median filters are used in this case.

(6)

Figure 4: Showing the method of detecting the handwritten number

.

According to the above diagram following process is taking place:-

 Input number is taken first of all.

 Then the preprocessing of that number is done.

 Then the feature extraction of that number is done, and after that it is stored in the database.

 Then intelligent agent acts on that number.

 The comparison with the existing number is done.

 Finally the classification of the numbers is done.

Figure 5: The contour formation is shown by the movement of turtle in net logo.

VI RESULTS

In the base paper of my dissertation multiple voting schemes is used on the offline handwritten numbers. In this

method a number of classifiers are used, those classifiers were not able to detect the broken numbers. But the

(7)

this dissertation is user friendly. There are also few drawbacks of my dissertation; the technology used in it can’t

detect the slanting numbers. It can’t detect the numbers having the sirorekha for example number eight written in

Hindi.

Figure 6: Showing the formation of zero by the movement of turtle in NET LOGO.

Figure 7: Diagram Showing the Reorganization Result.

VII CONCLUSION AND FUTURE WORK

This system proposes an effective method for achieving better recognition rates for handwritten Devanagri

numerals.This Strategy used intelligent agent approach to improve recognition accuracy. The previous techniques

were not efficiently able to recognize the devanagri numerals which were broken. But this technique can easily and

(8)

The future works that can be done using this technique are:

 Online recognition of the Hindi numerals can be done.

REFERENCES

1. Smola and S. V. N. Vishwanathan. Introduction to Machine Learning. Cambridge University, UK, 2008.

2. Mellouk and A. Chebira. Machine Learning. In Tech, Crotia,2009.

3. Brijesh k.verma (Handwritten Hindi character using multilayer perleptron and radial basis function neural

networks)1994.

4. M.Hanmandlu,O.V Ramana Marthy and Vanisi Krishna Madasu (fuzzy model based recognition of

handwritten hindi characters)2007-2008.

5. Yi Li , Yefeng zheng, David Doermann and Stefan Jaegar(script independent text line segmentation in free

style handwritten documents)2008.

6. Abhimanyu kumar and samit Bhattacharya (online devanagri Isolated character recognition for i-phone

using hidden marcov model)2010.

7. L. C. Jain and B. Lazzerini. Knowledge-Based Intelligent Techniques in Character Recognition.CRC Press,

London, 1999.

8. V. J. Dongre and V. H. Mankar. A review of Research on Devnagari Character Recognition.

International Journal of Computer Applications, 12(2):8–15, 2010.

9. M. Cheriet, N. Kharma, C. Liu, and C.Y. Suen. Character Recognition Systems: A Guide for Students and