The use of this commands is demonstrated in the following examples. The exact parameters you can extract from the command list.
Examples
Example 1: invoice detection
This first example emanates from a quite simple case. There are only two possible document types, an invoice and the rest of the world. Only the invoice number shall be indexed.
In the first step the rectangle list with the region for the invoice text with the appropriate number is set by the OCR-API and the detection starts.
x=ELO.OcrClearRect()
x=ELO.OcrAddRect("500,10,999,100") x=ELO.OcrAnalyze(FileName,0)
© 1998..2002 ELO Digital Office GmbH Page: 48 In the second step now is checked, if the form is an invoice. For this it is checked, is the text “invoice” followed by the invoice number can be detected in text block 1.
CntRechnung=ELO.OcrPattern(10,"*'Invoice'_N*", ELO.OcrGetText(0)) Here it is searched for a pattern of 5 parts:
1. Any, maybe empty, prefix of any characters (e.g. because from above foreign characters rise into the rectangle caused by an unpurified scanner feed)
2. The text ‘invoice’
3. Any long, maybe empty, string of empty characters 4. A number (invoice number)
5. any further text (e.g. also the characters that do not belong to the current text but which rise into the detection rectangle)
If the pattern was detected, the function returns the value 5 (= number of pattern parts), in case of an error you receive a negative value. After that the text for the several pattern parts is available in a text field.
If CntRechnung=5 then
’ it is an invoice, 5 pattern blocks were detected ELO.PrepareObjectEx(0,254,InvoiceMaskNo)
ELO.SetObjAttrib(0, ELO.OcrGetPattern(3)) …
end if
The four-step work in this simple example reduces to two steps. Step 3 escapes, since no further OCR regions have to be read and step 4 escapes, because the indexing was already detected during the classification. By the check, if it is an invoice, the invoice number was also immediately imported in the pattern text field and can be used directly.
Example 2: Invoice/Delivery note detection
This example still emanates from a quite simple case. There are only two possible document types, an invoice and a delivery note. Only the invoice or the delivery note number each shall be indexed.
In the first step the rectangle list with the two regions for the invoice text respectively the delivery note text, with the appropriate number each, is set and the detection starts.
x=Elo.OcrClearRect()
x=Elo.OcrAddRect("500,250,999,390") x=Elo.OcrAddRect("500,10,999,100") x=Elo.OcrAnalyze(FileName,0)
Following it is checked, if the form is an invoice or a delivery note. For this it is checked, if the text ‘invoice’
can be detected in text block 1 or if the text ‘delivery note’ exists in text block 2.
CntInvoice=Elo.OcrPattern(10,“*’Invoice’*“, Elo.OcrGetText(0))
CntDeliveryNote=Elo.OcrPattern(10,“*’Delivery Note’*“, Elo.OcrGetText(1)) If CntInvoice<0 then
’ it is no invoice
If CntLieferschein<0 then
’ it is also no delivery note, so nothing is done
© 1998..2002 ELO Digital Office GmbH Page: 49 if CntDeliveryNote<0 then
’ it is really only an invoice
Now the entry in the indexing. Further rectangles are not read in this simple example, the invoice or delivery note number is already detected by the OCR.
Select Case DocType case 1 ‘DeliveryNote
ELO.PrepareObjectEx(0,254,DeliveryNoteMaskNo)
CntDeliveryNote=Analyze( “*’Delivery Note’_N*“, OcrText(1) ) If CntDeliveryNote=5 then
ELO.SetObjAttrib(0, Elo.OcrGetPattern (3)) ...
end if
case 2 ‘Invoice
ELO.PrepareObjectEx(0,254,InvoiceMaskNo)
CntInvoice=Analyze( “*’Invoice’_N*“, OcrText(0) ) If CntInvoice=5 then
ELO.SetObjAttrib(0, Elo.OcrGetPattern (3)) …
end if end select
Example 3: Form detection trade fair-demo
The following example is a complete script for the detection of three different forms and automatic storage in the archive (if required the directory is created as well).
'OCRELO.VBS 24.08.2000
'--- ' © 2000 ELO Digital Office GmbH
' Author: M.Thiele ([email protected])
'--- ' This script analyzes the postbox documents for certain texts ' which are marked by invoice numbers and then files the detected ' invoices into the appropriate tab (which are automatically ' created if required)
'
' --- set Elo=CreateObject("ELO.professional")
MaskNo=Elo.LookupMaskName("ELOInvoice")
' run through all postbox entries (at most 200) Elo.SelectView(3)
' firstly analyzing all documents Elo.Status "Analyze documents"
Elo.UnselectAllPostboxLines for i=0 to 300
res=Elo.PrepareObject( -1,i,MaskNo ) if res=-6 then
exit for
© 1998..2002 ELO Digital Office GmbH Page: 50 end if
if Elo.ObjShort="" then fname=Elo.ActivePostFile
if UCase(Right(fname,4))=".TIF" then x=Elo.UpdatePostboxEx( 20,i )
' and now assign the detected documents in the archive for j=i to 0 step -1
Move(j) next
' shortly cleaning the postbox view ...
x=Elo.UpdatePostboxEx(0,0) Elo.Status("Finished") ' ... finished
' Help functions '
' this function assigns a detected documents in the archive sub Move( iPostLine )
res=Elo.PrepareObject( -1, iPostLine, 0 ) if res>0 then
Text=Elo.ObjShort
Elo.Status("File: " & Text) Datum=Mid(Text,12,10)
Kdnr=Mid(Text,23,10) Renr=Trim(Left(Text,10))
if Len(Date)=10 and Len(Ctno)>0 then
iRet=Elo.AddPostBoxFile("")
RegId=CheckRegister( Date, Ctno )
' this function checks, if the target tab for a document exists ' and creates it if required
function CheckRegister( Date, Customerno )
RegId=Elo.LookupIndex( "RELO=" & Right(Date,4) & ":" & Customerno ) if RegId<1 then
' tab does not exists, will be created
DirectoryId=Elo.LookupIndex( "¿ELO Invoices¿" & Right(Date,4) ) if DirectoryId>0 then
' directory found, now creating the tab if Elo.PrepareObjectEx( 0,253,0 ) then
© 1998..2002 ELO Digital Office GmbH Page: 51 Elo.ObjShort= Customerno
x=Elo.SetObjAttrib(0,Right(Date,4) & ":" & Customerno) x=Elo.SetObjAttribKey(0,"RELO")
' this function executes an OCR analysis for the current postbox ' document and puts the invoice number into the short name if a ' known document type was found
sub Analyze( iPostLine, FileName ) x=Elo.OcrClearRect()
'x=Elo.OcrPattern( 10,"*", Elo.OcrGetText(0) ) 'MsgBox Elo.OcrGetPattern(0)
'x=Elo.OcrPattern( 10,"*", Elo.OcrGetText(1) ) 'MsgBox Elo.OcrGetPattern(0)
Elo.ObjShort=""
&Elo.OcrGetPattern(6) & " " & Elo.OcrGetPattern(12) x=Elo.SetObjAttrib(0,Elo.OcrGetPattern(12)) x=Elo.SetObjAttrib(1,Elo.OcrGetPattern(3)) x=Elo.SetObjAttrib(2,Elo.OcrGetPattern(9)) Elo.ObjXDate=Elo.OcrGetPattern(6)
found=true end if
end if
if not found then 'ELO Credit
x=Elo.OcrPattern( 10,"*'Credit'L'Number:'NL'Date:'*L'Order-No.:'NL'Customer-No.:'N*", Elo.OcrGetText(0))
if x>0 then
Elo.ObjShort=Left(Elo.OcrGetPattern(4)&" ",10) & " "
&Elo.OcrGetPattern(7) & " " & Elo.OcrGetPattern(13) x=Elo.SetObjAttrib(0,Elo.OcrGetPattern(13)) x=Elo.SetObjAttrib(1,Elo.OcrGetPattern(4)) x=Elo.SetObjAttrib(2,Elo.OcrGetPattern(10)) Elo.ObjXDate=Elo.OcrGetPattern(7)
found=true end if
end if
© 1998..2002 ELO Digital Office GmbH Page: 52 if not found then 'NOKIA Invoice
x=Elo.OcrPattern( 10,"*'Invoice'n*'Customernumber'N*", Elo.OcrGetText(1) )
if x>0 then
Elo.ObjShort=Left(Elo.OcrGetPattern(2)&" ",10) & "
01.01.2000 " & Elo.OcrGetPattern(5)
x=Elo.SetObjAttrib(0,Elo.OcrGetPattern(5)) x=Elo.SetObjAttrib(1,Elo.OcrGetPattern(2)) Elo.ObjXDate="01.01.2000"
found=true end if
end if
if found then
iRet=Elo.AddPostBoxFile("") end if
Elo.Status "Detect line " & i & " : " & Elo.ObjShort end sub
© 1998..2002 ELO Digital Office GmbH Page: 53
Syntax of the format strings of the pattern detection
For the pattern detection only a format string is to be pretended. The check, if the text is enough for this pattern and the division of the text into the pattern parts is accomplished in one step in ELO.
Please note, that a pattern may be made up of at most 32 parts (in version 104, maybe later it is changed).
Following pattern parts are available:
* Any text This partial pattern accepts any text, it can also be empty. You can specify additionally a length in front of the star, the detected text has to be at least as long as the presetting claims.
_ Empty character This partial pattern accepts any , also empty, string of empty characters.
Specifying a length in front of the underscore leads to a detection of a string with the exact presetting.
L Line break This partial pattern detects a line break (exactly one). A number in front of the L (e.g. 3L) claims exactly this amount of line breaks.
N Number It is detected any long string of numerics (but no empty string). The first character which is no numeric closes this string (line break and empty characters as well). If a multiplier is prefixed, a numeric string of exactly this length is detected.
Example: ABC12345XYZ
*N* detects [ABC][12345][XYZ]
*3N* detects [ABC][123][45XYZ]
*6N* detects nothing, because there is no numeric string of this length.
n Number
(special OCR) As for N (Number) – the difference is, that this form accept also letters, which are similiar to certain numerics (O, o and Q are detected as a 0 (zero), I and l are detected as 1 (one).
“...“ Text The text between the double quotes is accepted. Thereby the text has to be existent exactly in this form, no additional empty characters or line breaks are detected.
Example: xyzInvoice 123
*“Invoice“_N* detects [xyz][Invoice][ ][123][]
*“Invoice“N* detects nothing, the empty character was not considered in the format string
*“INVOICE“_N* detects nothing, because invoice is written another way
’...’ Text As for the double quotes, but the text is not detected case sensitively.
In the example above only the third case would be detected.
Annotations
© 1998..2002 ELO Digital Office GmbH Page: 54 Please note, that the detection algorithm searches until it reaches a “match” or until it is sure, that there is none.
It does not get stuck at partial solutions. An innocent implementation could locate the text “invoice” of the invoice copy at the pattern “*’invoice’_N*” and the text “xxx invoice copy yyy invoice 12345 zzz” and after no number is following, it could abandon. ELO whereas goes on with its search after this failure and detects the text as you are expecting it.
Especially the pattern * causes a high internal efford, since in case of doubt many possibilities must be analyzed.
Anyway it often is necessary to use this operator. Particularly the patterns should be framed in general by a
*…*. By that “grungy characters” are intercepted at the beginning or the end of the text. There are often caused by foreign characters, which rise into the detection rectangle. Anyway it should be used only where it is justified. The unnecessary, but seeming harmless combination ** does not change the detection result, but it has a very adverse influence on the performance. The pattern ** forces ELO to control the possibilities [][abcd], [a][bcd], [ab][cd], [abc][d], [abcd][] of the text “abcd”.
Don’t put too much in one pattern. If you have rectangle with successive lines with invoice number , customer number, order number and job number in an invoice, you can search singly for every number. But you can also formulate a complex search pattern *’invoice’_N*’customer’_N*’order no.’_N*’job’_N*. In the second case all numbers would be expected in one pass. But as soon as one of these numbers was not realized correctly by the OCR software (e.g. cuslomer instead of customer) no more numbers will be detected. If the indexing is interesting you only if all was detected, the second variant is adequate. If you hold the view, that as much as possible should be detected and only the missing parts have to be supplemented, then you should let the number be detected singly.
Even for complex patterns it can easily happen, that it is not detected although the text should allow that. Since there are no special “pattern debugger”, here only a stepwise try and error can help. Beginning with the
“defective” pattern
*’invoice’_N*’customer’_N*’order no.’N*’job’_N*
you can test
*’invoice’* in one step. After that you check
*’invoice’_N*
*’invoice’_N*’customer’*
*’invoice’_N*’customer’_N*
etc...
The pattern is always extended by one (or more too) step(s). Mind, that you always finish the pattern with a *.
This * is the match for the whole rest. If it is missing, nothing will be detected.
© 1998..2002 ELO Digital Office GmbH Page: 55 List of the Available OLE Functions