• No results found

On Various Techniques For Skew Detection And Correction In Handwritten Text Documents

N/A
N/A
Protected

Academic year: 2022

Share "On Various Techniques For Skew Detection And Correction In Handwritten Text Documents"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

A Review On Various Techniques For Skew Detection And Correction In Handwritten Text

Ambica Rani

1M. Tech Student 2Assistant Professor

Abstract

Skew detection and correction methods are used to align the handwritten text document by making the rectangular shape such as paragraph, text lines and tables. A skew can be detected from scanned document Image as well as from handwritten text document Image. In this paper, we discuss different techniques of skew detection and correction. There are various methods which will be discussed in this paper to detect the skewed from Indian script. These methods include vertical projection profile, horizontal projection profile, Hough Transform. Some of them detect a skew and provide better result but are slow in speed. So a new technique for skew detection in this paper will reduce the time and cost.

Keywords: skew Detection, skew Estimation, Skew correction, Profile Projection analysis.

Introduction: Now a day’s skew detection has become vital for the recognition of scanned images because the originality of

On Various Techniques For Skew Detection And Correction In Handwritten Text Documents

Ambica Rani

1

, Er. Harinderpal Singh

2

M. Tech Student, IT Department, AIET, Faridkot Assistant Professor, CSE Department, AIET, Faridkot

Skew detection and correction methods are handwritten text document by making the rectangular shape such as paragraph, text lines and tables. A skew can be detected from scanned document Image as well as from handwritten text document Image. In this paper, we discuss different techniques of rrection. There are various methods which will be discussed in this paper to detect the skewed from Indian script. These methods include vertical projection profile, horizontal projection profile, Hough Transform. Some of them ter result but are slow in speed. So a new technique for skew detection in this paper will reduce the

skew Detection, skew Estimation, Skew correction, Profile

Now a day’s skew detection ome vital for the recognition of scanned images because the originality of

the scanned image is degraded. Sometimes there is a tilt in the document when we give it for zerox or when we scan it. The resulted image is not up to mark. The defects in the images are referred to as skews. The text in the skewed document is sometimes diminished. So, it becomes difficult to read the skewed document. Here we are testing skew detection and correction using some algorithms. Using these algorithms we find faults in the document and finding these faults is referred to as skew detection. After the skew detection, there is a need to correct the faults produced. These corrections are referred to as Skew Corrections. Fig. shows the skewed document.

On Various Techniques For Skew Detection And Documents

the scanned image is degraded. Sometimes there is a tilt in the document when we give it for zerox or when we scan it. The resulted image is not up to mark. The defects in the ges are referred to as skews. The text in the skewed document is sometimes diminished. So, it becomes difficult to read the skewed document. Here we are testing skew detection and correction using some algorithms. Using these algorithms we find he document and finding these faults is referred to as skew detection. After the skew detection, there is a need to correct the faults produced. These corrections are referred to as Skew Corrections. Fig. shows

(2)

Figure: 1 Skewed Image Skewed Document Image

Skew can be broadly classified into two categories namely.

Single Skew: In this skew, whole document is skewed to single angle. Most of document images have this type of skew

work deals with Single Skew problem.

of work has been done in this field and lot of research is still going on.

Multiple Skew: In this, scanned document can have many sections; each may be skewed to different angle. Detecting such type of skew-ness needs lot of efforts.

Multiple Skew problems exists rarely and has not got lot attention from researchers.

Limitations of Skew in Documents ed Image

fied into two

: In this skew, whole document is skewed to single angle. Most of document images have this type of skew-ness. This work deals with Single Skew problem. Lot of work has been done in this field and lot of

In this, scanned document can have many sections; each may be skewed to different angle. Detecting such ness needs lot of efforts.

lems exists rarely and has not got lot attention from researchers.

Limitations of Skew in Documents

. If the document includes skew, then it will be difficult for the reader to read the documents.

. Document which include skew are of reduced and low quality.

Skew can be classified into two categories:

. Clockwise Skew . Anticlockwise Skew

Clockwise Skew:-Clockwise skew refers to the defects in the document in the direction same as the moving hands of the clock. This can also be termed as positive skew.

Anticlockwise Skew:- Anticlockwise skew refers to the defects in the documents in the direction opposite to the moving hands of clock. This can also be termed as negative skew.

Different method to find out the skew…

When the skew is tilted towards the downward direction, then the skew is said to be downward skew. To ensure the correction skew, we must make sure that the angle formed is 90% in the anti

direction.

If the document includes skew, then it will be difficult for the reader to read the

Document which include skew are of

Skew can be classified into two categories:-

Clockwise skew refers to the defects in the document in the direction same as the moving hands of the clock. This can also be termed as positive skew.

Anticlockwise skew refers to the defects in the documents in the direction opposite to the moving hands of clock. This can also be termed as negative

Different method to find out the skew…

When the skew is tilted towards the rd direction, then the skew is said to be downward skew. To ensure the correction skew, we must make sure that the angle formed is 90% in the anti- clockwise

(3)

Figure: 2 Downward Skew

When the skew is tilted towards the upward direction, then the skew is said to be upward skew. Similarly to the corrections made in upward skew, we must make sure that the angle formed is 90% in clock-wise direction.

Figure: 3 Upward Skew Figure: 2 Downward Skew

When the skew is tilted towards the upward e skew is said to be upward skew. Similarly to the corrections made in upward skew, we must make sure that the wise direction.

ward Skew

Literature Survey

Marian Wagdy, Ibrahima Faye, DayangRohaya Document Image Skew Detection and Correction Method Based on Extreme Points

In this paper author present a method for estimating the document image skew angle.

The main idea of this method is based on the concept that any document image has objects with rectangular shape such as paragraphs, text lines, tables and figures.

These objects can be bounded by rectangles.

Author use the extreme point’s properties to obtain the corners of the rectangle which fits the largest connected component of the document image. The angle of this rectangle represents the angle of document skew. The experimental results show the high performance of the algorithm in detecting the angle of skew for a variety of documents with different levels of complexity. The Proposed method has been implemented using MATLABR2009a.It is tested on different variety of documents like journal, books etc. Each document image skewed by Marian Wagdy, Ibrahima Faye, Document Image Skew Detection and Correction Method Based

In this paper author present a method for estimating the document image skew angle.

The main idea of this method is based on the concept that any document image has ectangular shape such as paragraphs, text lines, tables and figures.

These objects can be bounded by rectangles.

Author use the extreme point’s properties to obtain the corners of the rectangle which fits the largest connected component of the ge. The angle of this rectangle represents the angle of document skew. The experimental results show the high performance of the algorithm in detecting the angle of skew for a variety of documents with different levels of complexity. The s been implemented using MATLABR2009a.It is tested on different variety of documents like journal, books etc. Each document image skewed by

(4)

different ground truth skew angles ranges between[-89, 89] degree.

Bishakha jain,Mrinaljit Borah, A comparison Skew detection of scanned Document Images based on Horizontal and vertical Projection Profile analysis A Lot of techniques already exists and has been developing for detection of skew of scanned document Images. In this paper author describes the skew detectio

correction of scanned document images written in Assamese language using horizontal and vertical projection. The algorithm was implemented on input images of Assamese language. The horizontal profile technique could be used for skew correction with images with some noise. The algorithm only estimate skew if the angle is less than ±15˚.

Naazia Makkar and Sukhjit Singh, A Brief tour to various Skew Detection and Correction Techniques

During the scanning of the document, skew is being inevitably introduced in the document image. The scanned text image is a non editable image though it has the text different ground truth skew angles ranges

Bishakha jain,Mrinaljit Borah, A detection of scanned Document Images based on Horizontal and vertical Projection Profile analysis A Lot of techniques already exists and has been developing for detection of skew of scanned document Images. In this paper author describes the skew detection and correction of scanned document images written in Assamese language using horizontal and vertical projection. The algorithm was implemented on input images of Assamese language. The horizontal profile technique could be used for skew mages with some noise. The algorithm only estimate skew if the angle is

Naazia Makkar and Sukhjit Singh, A Brief tour to various Skew Detection and

During the scanning of the document, skew is being inevitably introduced in the document image. The scanned text image is a non editable image though it has the text

but one cannot edit it or make any change, if required. This paper includes the various skew detection and correction techniques.

The methods provide a very efficient way to calculate the Skew. Correction in the skewed scanned document image is very important, because it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages.

Ruby Singh, Ramandeep Kaur,Skew Detection In Image Processing

Many researchers proposed different methodologies for the text skew estimation in binary images/gray scale images. They have been used widely for the skew identification of the printed text. There exist so many ways algorithms for detecting and correcting a slant or skew in a given document or image. Some of them provide better accuracy but are slow in speed, others have angle limitation drawback. So a new technique for skew detection in the paper, will reduce the time and cost.

Lipi Shah, Ripal Patel, Shreyal Patel, Jay Maniar, Skew Detection and Correction for Gujarati Printed and Handwritten but one cannot edit it or make any change, if required. This paper includes the various ew detection and correction techniques.

The methods provide a very efficient way to calculate the Skew. Correction in the skewed scanned document image is very important, because it has a direct effect on the reliability and efficiency of the

and feature extraction stages.

Ruby Singh, Ramandeep Kaur,Skew Detection In Image Processing

Many researchers proposed different methodologies for the text skew estimation in binary images/gray scale images. They have been used widely for the skew cation of the printed text. There exist so many ways algorithms for detecting and correcting a slant or skew in a given document or image. Some of them provide better accuracy but are slow in speed, others have angle limitation drawback. So a new for skew detection in the paper, will reduce the time and cost.

Lipi Shah, Ripal Patel, Shreyal Patel, Jay Maniar, Skew Detection and Correction for Gujarati Printed and Handwritten

(5)

In this paper, author have proposed approach for skew detection and correction of handwritten and printed Gujarati document using Linear Regression method/technique. Skew detection and correction is important for any recognition system as it directly affects the recognition process of characters/documents. The proposed method work involves linear regression formula for detecting angle of rotation and correcting it for printed and handwritten document/characters. With this approach for skew detection and correction author get up to 59.63% of a

printed and 45.58% of accuracy for handwritten document/characters. This proposed method is simple and fast for detecting angle of rotation as well as it corrects the skewed image fast.

Existing Techniques

Techniques can be broadly classified three categories namely…

Document enhancement and binarization In this technique, author use retinex theory to solve the degradation problem. It is used to convert the lightness image into bi In this paper, author have proposed approach for skew detection and correction of handwritten and printed Gujarati document using Linear Regression method/technique. Skew detection and correction is important for any recognition system as it directly affects the recognition ters/documents. The proposed method work involves linear regression formula for detecting angle of rotation and correcting it for printed and handwritten document/characters. With this approach for skew detection and correction author get up to 59.63% of accuracy for printed and 45.58% of accuracy for handwritten document/characters. This proposed method is simple and fast for detecting angle of rotation as well as it

Techniques can be broadly classified into

Document enhancement and binarization author use retinex theory to solve the degradation problem. It is used to convert the lightness image into bi-level

image.(black for text and white for background)

Dividing the document into connected components

In this technique divide the document into connected component

Morphological operation will be used. We use square as structure element with width 4 to give an accurate result in detecting the extreme points for each connected component.

Skew angle estimation

In this technique author estimate the angle of the largest connected component of the document image. Author use the properties of extreme points to obtain the rectangle which can fit the largest c

component with the same skew angle. Each connected component has eight extreme points (top- left, top- right, right

bottom, bottom- right, bottom bottom, and left- top)

Vertical Projection profile Analysis Algorithm:

1. Read the image data into a matrix and convert it to grayscale.

image.(black for text and white for

ding the document into connected

In this technique divide the document into connected components, Erosion Morphological operation will be used. We use square as structure element with width 4 to give an accurate result in detecting the oints for each connected

In this technique author estimate the angle of the largest connected component of the document image. Author use the properties of extreme points to obtain the rectangle which can fit the largest connected component with the same skew angle. Each connected component has eight extreme right, right- top, right- right, bottom -left, left-

Vertical Projection profile Analysis

1. Read the image data into a matrix and

(6)

2. This grayscale image is changed to black background and white writing on comparison each pixels with 0.34

3. Searches for the first column with a white pixel, i.e., with a written pixel.

4. The entire image column-wise is stored in a variable (Skew_input).

5. Each element of the input image matrix is added column-wise to get the number of white pixels per column and is stored in a variable Sum_col.

6. Sum of the squares of each Sum_co

the value of energy function for the skew angle.

7. Input Image is rotated by angle

“rot_angle” and steps 5 and 6 are repeated for this angle to obtain the value of energy function.

8. Input Image is rotated by angle “(

)rot_angle” and steps 5 and 6 are repeated for this angle to obtain the value of energy function.

9. rot_angle = rot_angle – 1.

10. Repeat steps 7, 8 & 9 till rot_angle != 0.

11. Find the angle for which the value of Energy function is maximum.

12. This angle gives the skew angle.

2. This grayscale image is changed to black background and white writing on comparison each pixels with 0.34

3. Searches for the first column with a white

wise is stored in

5. Each element of the input image matrix is wise to get the number of white pixels per column and is stored in a

6. Sum of the squares of each Sum_col gives the value of energy function for the skew

7. Input Image is rotated by angle

“rot_angle” and steps 5 and 6 are repeated for this angle to obtain the value of energy

8. Input Image is rotated by angle “(- d 6 are repeated for this angle to obtain the value of energy

10. Repeat steps 7, 8 & 9 till rot_angle != 0.

11. Find the angle for which the value of

12. This angle gives the skew angle.

13. To display as output the values of energy function for each angle is displayed along with the bar graph for the column values for the skew angle and the corrected image segment.

Horizontal Projection profile Anal Algorithm:

1. Read the image data into a matrix and convert it to grayscale.

2. This grayscale image is changed to black background and white writing on comparison each pixels with 0.34

3. Searches for the first column with a white pixel, i.e., with a written pixel.

4. One-Fourth of the image row stored in a variable (Skew_input).

5. Each element of the input image matrix is added row-wise to get the number of white pixels per column and is stored in a variable Sum_row.

6. Sum of the squares of each Sum_row gives the value of energy function for the skew angle.

7. Input Image is rotated by angle

“rot_angle” and steps 5 and 6 are repeated for this angle to obtain the value of energy function.

13. To display as output the values of energy function for each angle is displayed along with the bar graph for the column values for the skew angle and the corrected image

Horizontal Projection profile Analysis

1. Read the image data into a matrix and

2. This grayscale image is changed to black background and white writing on comparison each pixels with 0.34

3. Searches for the first column with a white th a written pixel.

Fourth of the image row-wise is stored in a variable (Skew_input).

5. Each element of the input image matrix is wise to get the number of white pixels per column and is stored in a variable

uares of each Sum_row gives the value of energy function for the

7. Input Image is rotated by angle

“rot_angle” and steps 5 and 6 are repeated for this angle to obtain the value of energy

(7)

8. Input Image is rotated by angle “(

)rot_angle” and steps 5 and 6 are repeated for this angle to obtain the value of energy function.

9. rot_angle = rot_angle – 1

10. Repeat steps 7, 8 & 9 till rot_angle != 0 11. Find the angle for which the value of Energy function is maximum.

12. This angle gives the skew angle.

13. To display as output the values of energy function for each angle is displayed along with the bar graph for the row values for the skew angle and the corrected image segment.

Thinning and Hough transform

The method has two stages. In the first stage, selected characters from the document image are blocked and thinning is performed over the blocked region [5]. In the second stage, the thinned coordinates are fed to Hough transform (HT) to estimate the skew angle accurately.

The detailed algorithm is shown below:

Algorithm

Step-1: Find connected components in the document image and compute average bounding height (AH).

8. Input Image is rotated by angle “(- ngle” and steps 5 and 6 are repeated for this angle to obtain the value of energy

10. Repeat steps 7, 8 & 9 till rot_angle != 0 11. Find the angle for which the value of

gives the skew angle.

13. To display as output the values of energy function for each angle is displayed along with the bar graph for the row values for the skew angle and the corrected image

Thinning and Hough transform

ges. In the first stage, selected characters from the document image are blocked and thinning is performed over the blocked region [5]. In the second stage, the thinned coordinates are fed to Hough transform (HT) to estimate the skew

The detailed algorithm is shown below:

Find connected components in the document image and compute average

Step-2: Select those connected components whose height is less than AH and remove very small connected components so that the dots of the character i, j, punctuation marks like full stop, comma, hyphen etc. are deleted.

Step-3: Block the selected component present in the document.

Step-4: Perform thinning operation over the selected block region.

Step-5: Remove the parallel straight lines using prespecified threshold.

Step-6: The obtained points are then subjected to Hough transform to estimate skew angle accurately.

Step-7: Stop.

Topline

The topline algorithm [9] does not operate directly on the skewed image. First the skewed image is converted to a segment file or a thin segment file, and then the algorithm operates on one of these files to find skew angle.

Algorithm

Step 1: Input .bmp image to the OCR.exe and convert the image to seg.dat and thinseg.dat.

Select those connected components whose height is less than AH and remove mponents so that the dots of the character i, j, punctuation marks like full stop, comma, hyphen etc. are

Block the selected component

Perform thinning operation over the

move the parallel straight lines using prespecified threshold.

The obtained points are then subjected to Hough transform to estimate

The topline algorithm [9] does not operate ed image. First the skewed image is converted to a segment file or a thin segment file, and then the algorithm operates on one of these files to

Input .bmp image to the OCR.exe and convert the image to seg.dat and

(8)

Step 2: Apply both seg and thinseg algorithm on the images.

Step 3: Calculate the h1 and the h2 coordinates.

Step 4: Calculate the width and the margin.

Step 5: Input these values to the formula for detecting the Skew angle.

Step 6: The values are entered into the formula:

= a tan (diff / (w-margin)).

Step 5: Then this angle is used to rotate the image to get the skew freed images.

Step 6: The difference among both the angles can be seen easily.

Step 7: Stop

Few Existing Methods with their Accuracy are given in the following table:-

Author Pu b.

Ye ar

Method Script

Rohit Sharma ,

Utkarsh Mathur, Naveen Srivasta va

20 13

Angular Skew correcti on Algorith m

Devana gari

Apply both seg and thinseg

Calculate the h1 and the h2

Calculate the width and the margin.

Input these values to the formula for

are entered into the

Then this angle is used to rotate the image to get the skew freed images.

The difference among both the

with their Accuracy -

Script Comm ents

Devana Angula r skew was correct ed with an accurac y of 94.91%

Bishakh a Jain, Mrinalji t Borah

20

14 Compar ison Horizon tal and vertical Projecti on Profile Analysi s

Assame se Langua ge

Lovelee n Kaur, Simple Jindal

20 11

OCR Gurum

ukhi Script

Ruby Singh, Raman deep Kaur

20 13

Used Differen t

techniq ues

Gurum ukhi Docum ents

Rajib Ghosh, Gouran ga Mandal

20 12

Words Recogni tion and Docume nt Analysi s System.

Bangla Langua ge Assame se Langua ge

Skew could be estimat ed if the angle is less than +- 15.

Gurum ukhi Script

Achiev ed Accura cy – 94%

Gurum ukhi Docum ents

The Skew angle of the docum ent can be determi ned in range of - 180%

to 180%

Bangla Langua ge

Accura cy achieve d by taking 3839 words was 92.22%

(9)

Conclusion & Future Scope

In this paper, we have presented a review on skew detection and correction of Handwritten text documents written in Indian languages which are Gurumukhi, Devanagri, Bangla,Asssemese etc.

Algorithms of various techniques has presented in this paper. It is concluded that a lot of work is required to be done to detect and correct the skew in handwritten text document in Devanagri Script. A more robust algorithm is required to be developed to detect the skew angle in the text documents and to correct the detected angle.

In future we are planning to develop an algorithm that can detect and correct the upward and downword skew in handwritten text documents written in various India scripts in an efficient manner.

References

[1] Marian Wagdy, Ibrahima Faye, DayangRohaya Document Image Skew Detection and Correction Method Based on Extreme Points

[2] Lipi Shah, Ripal Patel, Shreyal Patel, Jay Maniar, Skew Detection and In this paper, we have presented a review on skew detection and correction of Handwritten text documents written in Indian languages which are Gurumukhi, ,Asssemese etc.

Algorithms of various techniques has presented in this paper. It is concluded that a lot of work is required to be done to detect and correct the skew in handwritten text document in Devanagri Script. A more be developed to detect the skew angle in the text documents and to correct the detected angle.

In future we are planning to develop an algorithm that can detect and correct the upward and downword skew in handwritten text documents written in various Indian

Marian Wagdy, Ibrahima Faye, DayangRohaya Document Image Skew Detection and Correction Method Based

Lipi Shah, Ripal Patel, Shreyal Patel, Jay Maniar, Skew Detection and

Correction for Gujarati Printed and Handwritten Character using Linear Regression

[3] B. Aditya Vighnesh, Abhishek Kumar, B. Manikanta Yadav, Skew Detection in Handwritten Documents

[4] Tian Jipeng, G.Hemantha Kumar, H.K. Chethan, Skew Correction for Chinese Character us

Transform

[5] Naazia Makkar and Sukhjit Singh, A Brief tour to various Skew Detection and Correction Techniques

[6] Ruby Singh, Ramandeep Kaur,Skew Detection In Image Processing

[7] Rajib Ghosh, Gouranga Mandal , Skew Detection and Correction of

Bangla Handwritten Word

[8] Bishakha jain, Mrinaljit Borah, A Comparison paper on Skew Detection of scanned Document Image Based on Horizontal and vertical Projection Profile Analysis

[9] Loveleen Kaur, Simple Jindal, Skew Detection technique for va

Gujarati Printed and Handwritten Character using Linear

B. Aditya Vighnesh, Abhishek Kumar, B. Manikanta Yadav, Skew Detection in

Tian Jipeng, G.Hemantha Kumar, H.K. Chethan, Skew Correction for Chinese Character using Hough

Naazia Makkar and Sukhjit Singh, A Brief tour to various Skew Detection and

Ruby Singh, Ramandeep Kaur,Skew Detection In Image Processing

Rajib Ghosh, Gouranga Mandal , Skew Detection and Correction of Online Bangla Handwritten Word

[8] Bishakha jain, Mrinaljit Borah, A Comparison paper on Skew Detection of scanned Document Image Based on Horizontal and vertical Projection Profile

[9] Loveleen Kaur, Simple Jindal, Skew Detection technique for various script

(10)

[10] Rohit Sharma, Utkarsh Mathur, Naveen Srivastava, Angular Skew Correction Algorithm for Handwritten Hindi Text

[11] Sepideh Barekat Rezaei, Abdholhossien Sarrafzadeh, and Jamshid Shanbehzadeh, Skew Detection of Scanned Document Images

[12] Utkarsh Mathur, Rohit Sharma, Script Independent Angular Skew Detection and Correction Algorithms [13] Deepak Kumar, Dalwinder Singh, Modified Approach of Hough Transform for Skew Detection and Correction in Documented Images

[10] Rohit Sharma, Utkarsh Mathur, Naveen Srivastava, Angular Skew Correction Algorithm for Handwritten

[11] Sepideh Barekat Rezaei, Abdholhossien Sarrafzadeh, and Jamshid Shanbehzadeh, Skew Detection of

[12] Utkarsh Mathur, Rohit Sharma, Script Independent Angular Skew Detection and Correction Algorithms [13] Deepak Kumar, Dalwinder Singh, Modified Approach of Hough Transform for Skew Detection and Correction in

References

Related documents

CUAJ ? March April 2017 ? Volume 11, Issues 3 4 ? 2017 Canadian Urological Association 94 Original research Cite as Can Urol Assoc J 2017;11(3 4) 94 100 http //dx doi org/10 5489/cuaj

Panigrahi et al Virology Journal 2013, 10 56 http //www virologyj com/content/10/1/56 SHORT REPORT Open Access Characterization of antiviral resistance mutations among the Eastern

Hat sich seit Beginn der Epidemie eine Veränderung der Verfallsgeschwindigkeit von CD4-Zellen im Zeitraum zwischen Infektion und

epidemiological study found that chronic beta blocker use and use of other NE decreasing drugs is associated with a slightly increased (odds ratios approximately 1.05–1.1) risk

The fact that the borough's Bengali population represents the largest single non-white ethnic group of any borough in the United Kingdom, and th a t the local history

We believe that the proposed staging sys- tem and stage-based treatment approach, which need further validation, will have an efficacy in the treatment of chronic pilonidal

Pretreatment with the extract of QBD, PGG, and EG could significantly decrease protein level (pulmonary edema) and inhibit the increase in the lung W/D ratio and

damage density, defined as damage per area, is plotted as a function of the water level for the non-coarse-grained mode with 4 nearest neighbours.. We assume square root, linear,