• No results found

Multilingual Text Induced Spelling Correction

N/A
N/A
Protected

Academic year: 2020

Share "Multilingual Text Induced Spelling Correction"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

! "!

#$

!

! "##$%&

' (

) (

* ( (

'

+

, - .

%

! * ( - *

-

+ ( * *

(

+ '

/ ( 0 .

)(*1.

( 2""32334%5!

( -

--! 6 ! (

7 * + 0

%! !

8 /! (

/

( (

( /

! + ( 6

! /! &

+ 9

8 ! *

7 !' 7

6 + / (*

! ! + ,

* - !(

( ( *

' !

( -

(

!

- +

! ( *

2%! - (

( : -

;%!

4%

* /!

-/ - <%+

9

* (*

+

9 (

*

- +

( (

!* *!

+ =

** '

(2)

* "& - !

/! 7 % - (

' !

*

!

* > ! "#?"%+

!

+

! ( ( *

( *% *

- ! ( *

!(('+

( ! (

*

+

%@

%

!( , "

(

+ 9 < +

( ! (

*( +

* *

: A: *(

( !

*+

!(

- !

(*

+ !

* ! (

* +

'

(

+ '( *'

(

+ 9

- ! % *

(* ! (

( +

'

*+ :*

'

* B

((!

! * *!

*

+ : *( ++

* * (

( * + *

'

(

*

! &

@ "22

@ 2C32C3?"$42 @ ""<

@

23""4<C"?C<! : & 2C32C3?"$42

23""4<C"?C< @ $#"4<3#C<C+

: *( ++

B "4$;?4;3;#4#

"2#<$#?#<"?2 @ $#"4<3#C<C+

*

( ++ D

** * +

'*

- *

- &

+

D ( '*

* B -

!*!

+

* -

6 ! 6 !6

6* /!"#$;%+

( &

'!(

B! -

%+

9 *

B - (

*+

9

(3)
(4)

'

++ @ ?!<?C!4;3!2<C! @

<!#3;!#33!333% *++ @

@"C!$2$!<;?!22<! @@C2#!;$<!#$2!<33 %

( ( (' (+

;;2 + *

*

+

7(* (

(

+ 7((

* &

%

%

* %

* %

%

( (

' + *

(! (

+

G I B

(

I! "#4<%+ (

*

( (+ !

++

(! * * -

* ( (+ (

*-* (!

* *

+ ( - ( (

( '

+ 9 ' *

(

( + (

( + 9

* Æ

' (

+ ( '

!(

(*

!* *

! + 9

*(

8

!( (+

/

( ! -

B *

+ 9

( !

B - '

* +

(! *

! !((** *

( (

+ ( *!

*

*

+

* '

+

( +

- &

! B

!

' ' ( (

! -

+ !

' + (

( ' (

! (

B *

(

- (

*

- !++-

* I +

! ! (

( ( + !

+ 9 ! ! (

' !

- +

9 ' (

- ** *

' (

+ (

- (

* ( !

+ 7' &

(5)

-Text

Input

between

levels

Agreement

Correction

compound level

Output

Evaluation

Post

correction

Correction

bigram level

unigram level

Frequencies

Type Co

occurrence

BE

TO

CORRECTED?

Character

+ Unigram

Frequency List

Bigram

Checking

Values

Anagram

Corpus

Y

Y

N

PREPROCESSING

PROCESSING

ARCHITECTURE

TISC

= Anagram value list

ALPHABET

Cut

off if frequency < n

List

Type

Compound

Splitting

Frequencies

Sort

Unique

Hash

Anagram

Frequency List

Bigram

Unigram

Word

Frequency List

Unigram

Frequency List

Lowercased

Word

Word

LEXICON

’chained’ anagrams

Paired anagram values &

72&

++ 'Æ*'8 1

5& - ! * *'8

'Æ 1 :5 %! (

* !

+

( (

- (

( (

( ( B

-!

+

+

(

(

- +

! ( '

( (

+ ,(

( * (

- +

( ' (

*

+

- - !

+

* '

D! (' (

! * ' (

+

( &

* - *

-

( B %+ ,

* ! -

! - + -

;* ( 2"2 ( (

-++

6 ( 6! 6!

6 (!6%+ :

(

;*

!

+

* +

!

*

! ++ % ( !

:

! 8 % + 9

* *

!

! '

(6)

* ( ( !

B

( *

+ Æ B

-

! *

- +

!"

*

-'

'

(+ !(' (

* *

+

D 0

*

B ( !

- !

D

0!2333%+ ?!333

( + (

2333 (

& (

+

#?+?J 4* '

+ 9 '

* !

+ 9

((!(

* ( *

+

(

* *! ( ( * *

2!333+

**+

9 - * ! (

( (

' ! (

*B(*

! +

B + F !

' +

* +

' +

9 (* *

*+ 7

( (

*( 7B

& ;K"3! "<! 23! 43! ;3! <3 "33%+

(

+ 7

* ( (

F / !

* !

+ 7 !(

";<!"33 ' + 7 *

(

*

+

"

*4+

# $ 7 ! (

/ (

(

3K; %+ (

(

+ 9 *

0"% +

! ! * (

02%+

! $ 9 "3#4

*

B G .

( +! 2334%+ 9 -

- ' ) +

- !

(

( + 9 ! (

"3< +

2; '

+ 9 ( "222

' ( +

* 4+

# %! $ 7 * ! (

*0 G

*!*

* - +

& "

9 7

+

(7)

&

$#'#' $()(* ')+&$ #+(#**

()+) ,++# #(&+# $+)'(

#$' #$& #$$$ #+)+

- $$(. #'. /. ('.

* 4&

!"#

* ;& *

( -! (

* ++

%!* *(

+ , (

(

!('

+ !

% * !

* ' '

!* '

+ 7'*

%

* + (

! *

( +

* ; ( * (

! +

#

'$ 7* !

( -

! !

! (

* *+ 0 -

& ( * '

+

* /

!( (

+

-

* '

* *

! ++

( ( *

( +

/

+

($ *; *

* /

' * * (

* * /% %

/7% %*+

* * + '

* (

+

*

* +

9 * *

*

*

( - ( + G

* (*

8 ! *

*

B + *

( '

+

* *

I ('+

-(

/ +

* !*

+

-

: *

*

+ !

( !

(8)

*

+ ,

( ! ++ (

15 /

+

( (

! * :+

( ( -(

(

: * + 9 ' -(

+

'

(*

(

! **

+ *

(( ('!

( '

*>'!"##2% * '+

!(( (

! *!

( '

*

* -+ 7

/ ! !

*+ 9 * !

(

(

* + -

!( ! +

! "

9 ! (

' + 9

(*

( -+ 9

- (

( - : *

( ( +

'* *!

( * + 9

( ( *!

* (

-

-! * '

*+ 9

( (

0 1

2 3

435

5

6748%

!

%

+ D G+.+ 0+ 2333+

+

! 2?$K2#4+

+G+ .' G+ G + "##C+

.0F

.*'+ !"!

#$$%+

7 L+ /+ "#$;+ B

+ ! M

C! 40 "#$;%&"C" K "C$+

+ "##$+ 7 G+ !&

' "!

/,.F0 9

G+2+

/ :+ 2334+ ( E'

(( + ! () *

+,,-,.+

/ +> !"#?"+ " " !

2 -

! $+;! <"4K<<?+

9! G ! 0!

+

> >'+ "##2+ B

( -+

"&! 2;;%&4CCK;4#+

M+ + + "#$<+ D *

! ! +

-!

"3?%! C3CKC"3+ , & /'

'G"$4;%& ?;<K?;? "#$<%+

/+ (! E+E ! ++ G! 7+ + 2334+

G.M"& ( * '

-) + /

0 +

0 G + 233;+

+ 1 2'( +,,34

( &+

> I+ "#4<+

5

+ 0+ + +!.*!0!

References

Related documents

This study was performed to determine the frequency of oc- currence of nutrient foramina, the association between the nutrient foramen location and pedicle and other bony landmarks,

This systematic review has demonstrated that increased body fat is positively associated with widespread pain, low-back pain, knee pain and foot pain.. Meta-analysis found

Background: We report the successful use of allograft – prosthesis composite (APC) and structural femoral head allografting in the bilateral reconstruction of large femoral and

Methods: In this qualitative study, in-depth interviews of 11 Japanese couples n 4 22 were conducted at an outpatient primary care clinic in southeast Michigan by a team of

Atwater, “Layered Tunnel Barriers for Silicon Based Nonvolatile Memory Applications,” Materials Research Society fall meeting, Boston, MA, November 27, 2001. Casperson, “Vision

The mean cavitation thresholds of ocean sea water and aquarium sea water are not significantly different (Wilcoxon two-sample test, P=0.75 on glass), but aquarium sea water is

A-D, Preembolization lateral views (A and B) from left internal carotid artery and left vertebral arteriograms (C and D) show large left parie- tooccipital AVM supplied

The upcoming questions about the types of influence can be answered by Sandschneider (2003), who distinguishes between a) external influences that aim at creating