HANDWRITTEN DEVNAGARI NUMBER
RECOGNITION USING INTELLIGENT AGENT
Deepshikha
1, Akhilesh Pandey
2 1Department of Computer Science Engineering, Suresh Gyan Vihar University, Jaipur, Raj, (India).
2
Assistance Professor, Department of computer science engineering, Gyan Vihar University, (India)
ABSTRACT
The proposed approach deals with extraction of characters from image by walking over the image, in an
environment which is agent based. The OCR system has been developed in this dissertation to fulfill the aim of
extraction of image. The OCR system which has been developed here is implemented in MYSQL, NET LOGO, and
java. The interface of the system is created by java and java also runs NETLOGO.Net Logo is a modeling system
which is agent based and provides environment so that the new approach is implemented. The Net Logo extracts the
characters from the image and concludes the result. This result is derived by walking of the OCR system over the
image. The results of the walking movement over a character are matched against a dataset contained by My SQL
and the most similar character which possesses this result is found from the dataset. The proposed OCR system was
tested on various images of the texts and performance of the System was also compared with an online OCR
application
.
I INTRODUCTION
After the invention of digital computers correlating human functions to computers is becoming very interesting topic
of research .In past Efficient Algorithm has been developed so that Machines can recognize characters. OCR is
defined as the system that makes out the character file from the images which is scanned copy of the printed text
documents, handwritten documents and typewritten documents. We can say it a visual reorganization process which
converts text messages into texts which is editable. By using the OCR technologies, the efficiency of the office work
is increased, because in this technology the characters can be recognized from the image, which are an easy task
then re-typing the text. Due to the advantage of this OCR technology it is used in a number of fields. There are two
types OCR technologies on the basis of the input devices used: - line recognition and off-line recognition, In
On-line method the data are collected by real time devices such as digitizer tablets. But in off-On-line method the data are
collected from static devices such as scanners and cameras. Online recognition method allows the writing of
information in real-time because of the concurrent data collection structure. But in case of Off-line recognition
process few technologies are used for preparing the image for recognition process and it also removes the noise and
algorithm development that is learned and it is an artificial intelligent branch. [1] A wide range of task is learned and
performed by the machine [2].From the two past decades one of the most important technologies is machine
learning, and hence used in scientific domain of various types such as Robotics, Computer vision, theoretical
computer science, recognition and optimization
Figure 1: Showing the OCR Model
Above diagram is showing different methods of character recognition of Devanagri Script. The main five methods
shown are:
1. One dimensional function
2. Polygon approximation.
3. Spacial Domain Feature Extraction.
4. Moments.
5. Transform Domain.
II PAST STUDIES
In the year 1994 the author Brijesh k.verma proposed that in this paper the comparison is made between two
networks for the handwritten Hindi character recognition. The two networks which is compared are the Radial Basis
Function and the Multi-Layer Perceptnor (MCP).The algorithm that trained the MCP networks was the error back
propagation algorithm. In this paper the system which was presented was an automatic Hindi character Recognition
(HCR) which uses MLP and RBF networks. Two hundred and forty-five samples of five writers were experimented.
Multi-layer preceptor (MLP) was superior in working like in recognition accuracy, memory usage etc .But the
drawback of this system was that it suffered because of the long training time in comparison to the RBF networks.
In the year 2007-2008 authors M.Hanmandlu, O.V Ramana Marthy and Vamsi Krishna Madasu proposed that the
handwritten character recognition of Hindi characters on the basis of the modified exponential membership function;
related to the fuzzy sets derived on the basis of the features that consists of the approach that is normalized distance
evaluated using the box approach.[4]
In the year 2008 authors Yi Li, yefeng zheng, David Doermann and Stefan jaegar proposed that study should be
done free style handwritten documents segmentation of line. The algorithm which has been developed for machine
printed and hand printed documents faces problems in terms of lines of curvilinear text and the small gaps between
the neighboring texts.[5]
In the year 2010 authors Abhimanyu Kumar and Samit Bhattacharya proposed a scheme that is having an
implementation on the i-phone so that the handwritten devnagri script was recognized online.Devnagri script is one
of the Indian script; This script is used for a number of different Indian language like Sanskrit, Marathi, Hindi[6].
III METHODOLOGIES OF OCR
The process of OCR follows the following steps
a) Preprocessing process
b) Segmentation process
c) Feature extraction process
d) Classification and Recognition process
a)
Preprocessing
In this process texted part useful features are retained and the important information present in the image is
discarded. To achieve this image has to undergo a set of operation of preprocessig. And according to the image
structure these operations are chosen. Some operations may be discarded or can be applied in different sequence.
In this phase the important method performed are as follows
1. Gray scale and Thresh holding method.
2. Colour Image Processing method.
3. Morphological operations method.
b)
Segmentation
Image text lines are converted into characters words and sentences by the following process and its aim is also the
same [7].The step of segmentation is very important because character extraction at this stage affects directly the
Figure 2: Showing different segments
The two major types of segmentation process are:-
1) Explicit.
2) Implicit.
c)
Feature Extraction
The following phase before the process of recognition the most representative information present in the raw data is
being extracted.
There are various techniques of extraction which are mainly grouped in three categories. [8][9].
1.
Geometric features.2.
Statistical features.3.
Global Transformation.d) Classification and Recognition
In character recognition last step, according to the features of characters they are recognized and classified.
The main categories are as follows:-
1) Template matching.
2) Statistical Technique.
3) Structural Technique.
Figure 3: Zoning of the number zero
.IV PROBLEM STATEMENT
The main platform is used in the project to develop the OCR system; the three platforms are the MYSQL, the java
programming language and the Net logo. These platforms are required so that the agent based approach is fulfilled.
The important part of the agent based approach is the recognition of the character in any image by walking over the
image and its implementation in Net logo. But the other two platforms are used so that the Net logo is supported.
There are few results which gives information of the character; only when an agent’s move over a character is
completed. The result which we get from the movement over the character is matched with all character features, to
find out that to which character they belong, that’s why a list is needed which is having features of all characters
(e.g.:-edge count of characters); and is used in matching the result movement of Net logo. Next platform helps in
storing the list of characters.
V IMPLEMENTATION
The work of the image processing is to make ready the image for recognition and processing in Net logo
environment. This step is taken for the files that are having the features such as they are noisy, blurred, damaged and
handwritten. But this is not true for typewritten files because if the quality of the paper is very high then few steps of
this algorithm is skipped during typing the text. That’s why few preprocessing Algorithm is kept optional. In this
case the user makes the decision to use which image processing, according to the quality of the paper. There are few
methods which are used in image processing, they are the thinning algorithm, the Thresh holding algorithm and the
convolution filters. But few convolution filters is having optional work for this program. On the other hand the
Thinning and Threshold algorithm are always used for images. The images in Net logo should always be black
and white. The black and white image is provided by the Threshold Algorithm. The lines in the characters of the
images should be of one pixel in width when these images are processed in Net logo. And the one pixel width of the
line in the characters is provided by the Thinning algorithm.
The image of the text documents is loaded by the user into the system by using the “load image” button. The image
which is loaded is displayed on the left side of the button. The document which is the original text is also loaded by
using the button known as “load text” button. When the image processing button is clicked by the user then the
image processing options are displayed. These image processing options are the convolution filters like the edge
detection and noise removal filters, the threshold methods and the thinning algorithm methods. There are few
methods which are implemented by the system they are stated beneath:-
Thresh holding:-In this case the local methods and Ostu methods are used.
Edge detection:-In this case Sobel filter, Robert filters, Prewitt filters, and Laplacian filters are used.
Noise Removal:-Gaussian filters, Mean filters and Median filters are used in this case.
Figure 4: Showing the method of detecting the handwritten number
.According to the above diagram following process is taking place:-
Input number is taken first of all.
Then the preprocessing of that number is done.
Then the feature extraction of that number is done, and after that it is stored in the database.
Then intelligent agent acts on that number.
The comparison with the existing number is done.
Finally the classification of the numbers is done.
Figure 5: The contour formation is shown by the movement of turtle in net logo.
VI RESULTS
In the base paper of my dissertation multiple voting schemes is used on the offline handwritten numbers. In this
method a number of classifiers are used, those classifiers were not able to detect the broken numbers. But the
this dissertation is user friendly. There are also few drawbacks of my dissertation; the technology used in it can’t
detect the slanting numbers. It can’t detect the numbers having the sirorekha for example number eight written in
Hindi.
Figure 6: Showing the formation of zero by the movement of turtle in NET LOGO.
Figure 7: Diagram Showing the Reorganization Result.
VII CONCLUSION AND FUTURE WORK
This system proposes an effective method for achieving better recognition rates for handwritten Devanagri
numerals.This Strategy used intelligent agent approach to improve recognition accuracy. The previous techniques
were not efficiently able to recognize the devanagri numerals which were broken. But this technique can easily and
The future works that can be done using this technique are:
Online recognition of the Hindi numerals can be done.
REFERENCES
1. Smola and S. V. N. Vishwanathan. Introduction to Machine Learning. Cambridge University, UK, 2008.
2. Mellouk and A. Chebira. Machine Learning. In Tech, Crotia,2009.
3. Brijesh k.verma (Handwritten Hindi character using multilayer perleptron and radial basis function neural
networks)1994.
4. M.Hanmandlu,O.V Ramana Marthy and Vanisi Krishna Madasu (fuzzy model based recognition of
handwritten hindi characters)2007-2008.
5. Yi Li , Yefeng zheng, David Doermann and Stefan Jaegar(script independent text line segmentation in free
style handwritten documents)2008.
6. Abhimanyu kumar and samit Bhattacharya (online devanagri Isolated character recognition for i-phone
using hidden marcov model)2010.
7. L. C. Jain and B. Lazzerini. Knowledge-Based Intelligent Techniques in Character Recognition.CRC Press,
London, 1999.
8. V. J. Dongre and V. H. Mankar. A review of Research on Devnagari Character Recognition.
International Journal of Computer Applications, 12(2):8–15, 2010.
9. M. Cheriet, N. Kharma, C. Liu, and C.Y. Suen. Character Recognition Systems: A Guide for Students and