Tamil Character Recognition Using CNN-SVM Classifier

(1)

Volume 11, Issue 6, November 2019

Tamil Character Recognition Using CNN-SVM

Classifier

1_{A.Muthukumar,}2_K.Shivani,2_{P.G.Umamaheswary,}2_P.Chitradevi

Abstract---In modern Era convolution neural network and support vector machine is playing an significant part in each ascent about computer perception application. This article proposes a new idea of associating the harmony of these two preferable classifiers. This paper utilized handwritten Tamil character data set produced by hp laboratories. In this work CNN-SVM algorithm is utilized as effective approach in training the model by Tamil character and also for achieving good recognition accuracy as (94%) results in both training and testing data sets. The testing and training time period are analyzed for various Tamil characters available in the data set.

Keywords-Convolution Neural Network (CNN), Tamil Character, Support Vector Machine (SVM).

I INTRODUCTION

In India mostly used Primary language is Tamil which contain 12 vowels and 18 consonants and special characters. These characters all together contain 216 characters out of total 247 characters in Tamil. The vowels and consonants have large numbers of set words so it has been difficult to recognise a particular letter when it is present in the old palmt scripts or stone sculpture. Compare to other language Tamil letters are difficult to recognise in offline. This paper targets in offline Tamil character reorganization. So it contains full of hand written patterns. Offline character recognization involves manual changes of text into a figure then interchanged to alphabet cryptograph which is accessible in the idea processing application. The data set which we have used for this paper already has undergone the necessary feature extraction process. The data set is created in the form of Binarized format which makes the extraction process easier. At last SVM classifier is used for extracting the feature manually and this is the last step of CNN. The given data set is moved for the testing process and same kind of pattern is given for the training process as an input. The input images is moved into the hidden layer and get modified by reducing its spatial dimension of the image by enduring pooling layer.The ReLu layer was introduced to speed up the convolution process. At last SVM classifier is used for extracting the feature manually and this is the last step of CNN. Feature extraction is a main aspect in the benefit of a perception system. It depends upon the appearances that should have often detectable behaviours surrounded by various classes when holding invariable behaviours in the relevant classes. In feature extraction traditional handwritten is complex and time-reducing task and deceit the process of raw module, At the same time the

(2)

Volume 11, Issue 6, November 2019 The remaining paper is standardized as follows:

Section 2 gives the analyses of the expected work using Tamil character recognition. Section 3 gives the detailed description of CNN model. Section 4 gives the basic explanation regarding hybrid CNN-SVM classifier. Experimental results of this proposed handwritten Tamil character recognition are discussed in the Section 5. Section 6 gives the detailed description of conclusion which helps to innovate new idea in feature work.

II PROPOSED WORK

This work is modulated to enhance the recognition accuracy with Tamil character dataset. This work mainly focuses an CNN-SVM classifier to recognise the Tamil character with accurate speed and time. Here the Training period and Testing period are calculated with each Tamil letters in an efficient manner and also to improve the character visibility.

Figure 1 Overall proposed block diagram

Based on figure 1 raw input images are collected from the data set (HP Labs). Preprocessing step is followed to extract unwanted noise, lamination, and brightness. Further this method helps in extracting the features to recognising the image present in the dataset. CNN classifier helps in various layer like pooling, fully connected layer etc. SVM classifier is used to classify the margin among two classes. Finally the features are matched and recognised character is evaluated.

A Pre-processing

This method is used for reducing the unwanted distraction present in image and makes the image ready for the next process. The given image is remodified to get a required format. The image requires high computation because of the image which is bigger in size. Preprocessing step continues to flow with several sub process which helps in modifying the image from the dataset and makes it appropriate for recognizing the process in efficient manner.

IIICNN CLASSIFIER

It is a multiple layer network along a wide range of machine learning construction and it broadly applies in the configuration of two different sections such as manual character extractor and the features are trained by the classifier. The manual extractor consists of feature map segment and regains the differentiating

features from the input image by two methods. The convolution refine kernels above the aspect maps have structure of five by five element and the downward sampling operation later filtering the features of quantitative relation of 2. The classifier and also the substance learned in the character extractor are experienced by a back propagation method.

A. Convolution Layer

This layer varies among a neural network in the action that’s not all the constituents is linked with succeeding layer with density and inclination but the complete size of the image remains very small. Density and inclination is moved to filters or kernels and get convoluted in all the regions of the given input image will produce a feature map. The amount of parameter needed for this convolution process will be less as the similar filters are crossed upon the whole image for a unique feature.

B. Pooling Layer

In regulation to decrease the architectural size of the model still as the sum of specification so to decrease the computing, this pooling layer is occupied. Such layer attains a strong and quick achieve by the input, thus no specification area unit acquired. Discrete types of pooling area unit on the display like Max pooling, random pooling and Average pooling. Max pooling obtain such the utmost frequently worn pooling algorithm. Meanwhile that assistant nxn window are drop cross and inferior the input to a stump and it stand for utmost worth in the nxn region is seized, with decreasing the length of the input. Such that the layer organizes navigation invariableness alike even with a small difference in the reference would yet be ready to accept the model. But the establish set is failed because the structure is decreased.

C. Fully Connected Layer

In this cover the planar gain of the latest pooling layer is filling as input to a completely linked layer. This layer that acts as sort of a classic neural network layer at any time each somatic cell of the earlier covering is combined with current layer. Thus, the amount of constants during this layer is greater related to convolution layer. The connected layer is combined to associate with degree output layer that is usually a classifier.

IV SVM CLASSIFIER

(3)

Volume 11, Issue 6, November 2019

 

_

_

 

  

 M

i N

j

Ai LiLj Mj

jMj i i

N

1 1

) ( 1 1 2

1

_

_



(1) This objective function

N

 



is subjected to

constraints with





N

j

iMj

1

0 

,

0 



i



Ci



i

(2) Where C is considered to be a cost parameter it helps in

determining the cost subjected to constraints violation

i



Defines a hyper parameter



(.) Is the feature mapping function

)

(

.

(

P

Z

b

Sgn

X















q

i

b

YiZ

iXi

Sgn

X

1

)

(



(3) Where {Ai, Bi} contains arrange of i=1, 2, 3... Indicates the training set where, Ai is the output for the given training dataset. The work of hyper plane is to divide the different classes but in the SVM it gives an equal and maximum space during the gap between a classes N (a).

V EXPERIMENTAL RESUL

A. Dataset

This task avails the segregated Handwritten Tamil Character dataset is refined over the HP Laboratory. This dataset contains of 156 various Tamil characters are drafted over the original Tamil authors using HP Tablet. The dataset consists of relatively 500 patterns for specific division for a whole of 82,928 patterns and that is openly accessible. These overall Tamil alphabets set were described among the 156 characters.

B. Trainable and testable images

In this primary structure, the entire 156 patterns are worn, where every division get over the 500 patterns. Overall of 82,928 modules has been appeared in the dataset that was divided towards practicing and recognize the modules when the run time is on 80:20 arrangement. This CNN architecture is designed with the model specified in the above segment.The Pattern is organizing once position rare character similar as optimizers, array range. Already several streams of various array ranges, we mounted array range to sixty four. Greater integrity of array range could not be rush because the total established on a mainframe. Once the Pattern was organized, the pre-processed samples is weighted to classic. The coaching is in deep trouble a hundred epochs, once every epoch sanction is done on the sanction half. Individually used new ending design to escape over coaching of the structure .Once standardization the constants for numerous variety of filters and kernel range, the last classic with maximum efficiency are freed. The freed Pattern is engaged for measuring a very advanced knowledge measure samples is feasible in HP Labs datasets that doesn’t seen in the coaching or sanction. The analysis turn subsists of

26,926 patterns in a hundred and seventy samples per category and the field truth is additionally on the market.

C. Activation Function

Several activation functions are used beyond numerous constructions of convolution neural networks. Nonlinear activation functions related Rectified Linear Unit, Parametric (RLU), Leaky(RLU), and exclusive have well-tried superior conclusion in comparison to the simple looped or tangential activity. The particular nonlinear activity has maintained to boost up the exercise. During this effort we have approved totally several activation actions and begin Rectified Linear Unit to be simpler than residue.

Proposed Steps:

Step1: In this work 247 Tamil letters are used to train the images from the given dataset.

Step2: This work follows as Pre-Processing, character extraction and Arrangement.

Step3: The given datasets with the Binarized Pattern for the features to be extracted easily.

Step4: Finally these Binarized dataset enters a pooling layer of CNN classifier in filtering the input images. Step5: This model helps in retrieving the trained features from CNN architecture.

Step6: Finally the remaining unknown characters are recognised by SVM classifier.

Figure 2a Raw Image from dataset

Based on figure 2 a Raw image are collected from the input dataset.

(4)

Volume 11, Issue 6, November 2019 Based on figure 2b Input images are Pre-Processed to

continue for other feature extraction process.

Figure 2c Output Image from CNN-SVM Algorithm

Based on figure 2c Values extractor from input image CNN Classifier helps to recognize the Tamil character in an efficient manner with goodaccuracy.

Figure 2d Calculation of Training Epochs

Based on figure 2d Training epochs are continued with several hydration to enhance the CNN Classifier output and further to import SVM with CNN classifier

Fig 3 Graphical representation for testing and training time period for Tamil characters

Table 1 Testing and Training time period for Tamil character recognition using CNN-SVM classifier.

Input image Training time Testing time

5.4 16

7.2 11

6.5 12

6.8 10.4 7.3 9.4 2 8

3 8

2 9

2 8

3 10

3 9

Table 2. Recognition Accuracy for this proposed work

Proposed work Accuracy Error

CNN-SVM 94% 6%

Based on Table 1 & 2 this proposed work shows the recognition accuracy of 94% as well as error rate of 6% for various Tamil characters. Thus this method combines with CNN-SVM can be opted for OCR procedure .further the testing and Training time period for each Tamil character is calculated and analyzed based on the performance.

VI CONCLUSION

This Paper proposes a new idea with Tamil character recognition using HP labs. This Method investigates each Tamil Character with CNN-SVM Classifier for calculating the recognition Accuracy. More effective and efficient feature extraction technique helps to analyse this handwritten characters in an efficient manner. This work helps in producing the optimum results in outperforming the other methods existence. Here the future work helps in analyzing the features by incorporating better segmentation Accuracy

Acknowledgment

(5)

Volume 11, Issue 6, November 2019 providing usage of the computational efficiency feasible

in Signal Processing and VLSI Design laboratory that was establish by the benefit of the Department of Science and Technology (DST)

References

[1] Bhattacharya, Ujjwal, S. K. Ghosh, and S. Parui. "A two stage recognition scheme for handwritten Tamil

characters." Ninth International Conference on

Document Analysis and Recognition (ICDAR 2007). Vol. 1. IEEE, 2007.

[2]Cireşan, Dan, and Ueli Meier. "Multi-column deep neural networks for offline handwritten Chinese

character classification." 2015 international joint

conference on neural networks (IJCNN). IEEE, 2015.

[3]Elleuch, Mohamed, Najiba Tagougui, and Monji Kherallah. "Optimization of DBN using regularization methods applied for recognizing Arabic handwritten

script." Procedia Computer Science 108 (2017):

2292-2297.

[4]El-Sawy, Ahmed, Mohamed Loey, and E. B. Hazem. "Arabic handwritten characters recognition using

convolutional neural network." WSEAS Transactions on

Computer Research 5 (2017): 11-19.

[5]Jose, Tiji M., and Amitabh Wahi. "Recognition of Tamil Handwritten Characters using Daubechies

Wavelet Transforms and Feed Forward

Backpropagation Network." International Journal of

Computer Applications 64.8 (2013).

[6] Niu, Xiao-Xiao, and Ching Y. Suen. "A novel hybrid CNN–SVM classifier for recognizing handwritten

digits." Pattern Recognition 45.4 (2012): 1318-1325.

[7]Shanthi, N., and K. Duraiswamy. "A novel

SVM-based handwritten Tamil character recognition

system." Pattern Analysis and Applications 13.2 (2010):

173-180.

[8]Rahupathy, Kavitha B ,and Srimathi

Chandrasekaran.”Benchmarking on offline Handwritten Tamil Character Recognition using convolutional neural networks.” Journal of King Saud University-Computer and Information Sciences (2019).

[9]Sureshkumar, C., and T. Ravichandran. "Handwritten Tamil character recognition and conversion using neural

network." International Journal on Computer Science

and Engineering 2.07 (2010): 2261-2267.

[10]Tsai, Charlie. "Recognizing handwritten Japanese

characters using deep convolutional neural

networks." University of Stanford in Stanford,

California (2016).

[11]Vijayaraghavan, Prashanth, and Misha Sra.

"Handwritten Tamil recognition using a convolutional

neural network." (2014).

[12]Zhang, Xu-Yao, Yoshua Bengio, and Cheng-Lin Liu. "Online and offline handwritten Chinese character

recognition: A comprehensive study and new

benchmark." Pattern Recognition 61 (2017): 348-360.

[13]Hanmandlu, Madasu, et al. "Input fuzzy modeling

for the recognition of handwritten Hindi

numerals." Fourth International Conference on

Information Technology (ITNG'07). IEEE, 2007.

[14]Sharma, Dharamveer, and Puneet Jhajj. "Recognition

of isolated handwritten characters in Gurmukhi

script." International Journal of Computer

Applications 4.8 (2010): 9-17.

[15]Bhattacharya, Ujjwal, Malayappan Shridhar, and Swapan K. Parui. "On recognition of handwritten Bangla

characters." Computer vision, graphics and image

processing. Springer, Berlin, Heidelberg, 2006. 817-828.

[16]Lehal, Gurpreet Singh, and Chandan Singh. "A

Gurmukhi script recognition system." Proceedings 15th

International Conference on Pattern Recognition.

ICPR-2000. Vol. 2. IEEE, 2000.

First Author A.Muthu kumar Assistant Professor Electronics And Communication Engineering, Kalasalingam academy of research and education Madurai, India

[email protected]

Second Author K.Shivani Electronics And Communication Engineering, Kalasalingam academy of research and education Madurai, India

[email protected]

Third Author P.G.Uma maheswary Electronics and communication Engineering, Kalasalingam academy of research and Education Madurai, India

[email protected]

Fourth Author P.Chitradevi Electronics and communication Engineering, Kalasalingam academy of research and Education Madurai, India