• No results found

Special Feature: system's slowest response time (0.24 sec) with the quickest manual response time, the system is at least 130 times

N/A
N/A
Protected

Academic year: 2021

Share "Special Feature: system's slowest response time (0.24 sec) with the quickest manual response time, the system is at least 130 times"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Special

Feature:

An

Automated Chinese Telephone Directory

Y.H.Chin,J. W.Jou, W. H. Peng,and C. C. Yang Telecommunication Laboratories ofTaiwan

Introduction

A real-time, on-line information storage and retrieval system for telephone information servicehas beendesigned at the Telecommunication Laboratories of Taiwan. The input and output of the system areprocessedinChinese via akeyboard and a graphic display unit respectively. System functions include sorting, merging, updating, displaying, andprintingofdata.

In order to avoid a serial, exhaustive search, the file is subdivided intoblocks oflengthx.Theoptimalsize of each block is determined so as to minimize the average number ofcomparisons (file accesses)to locatea record.

The system has been tested on a telephone office of medium capacity-about 8000 subscribers. The number of daily queries is about 1600 and the updating rate is approximately 50. On the basis of a comparison of the

system's slowest response time (0.24 sec) with the quickest manual response time, the system is at least 130 times faster.

The design philosophy stresses fast responses to queries andquick updatingof the data.

For a computer, a Chinese character is not acharacter but a pattern;therefore, a portion of the codingis related toChinese.

System Description

The system hardware (see Figure 1) consists of an HP/2100Acomputer under the control of the Moving-Head Disk Operating System (DOS-III). Two secondary storage devices are attached: a moving head disk unit and a magnetic tape unit. The disk unit is used to store the system programs, utility programs, and data files.The data 49

(2)

GRAPHIC DISPLAY ELECTROSTATIC PRINTER - ~~~~~~~~~~~MAGNETICDISK TAPE

Figure1. SystemBlock Diagram ofthe Automatic TelephoneDirectoryInquiryServicesSystem

TELEPHONE CHINESE

DIRECTORY CHARACTER

FILE PATTERN

record FILE

record 1 (dot matrix)

Displaying

(a) Data Base StructureandSearching SequenceofTelephoneDirectory Data Base

LI user'sname 16 words

Tel. No.

I

pointer

I

3words 2words

(b) Record FormatforTelephone DirectoryFile

Figure 2. Logical Structure of Data Base

files contain the telephone directory file and the Chinese consonants and 16 vowels. The phonetic spellings are

character pattern file. The magnetic tape is used as a translated into a stringofnumerical codes which is usedas

backup unit. The addresses ofthe telephonesubscribersare a searchkeytolocate the desired record and todisplayitin

storedonthe tape forprintingof thetelephonebook.* Chinese.

Input devices consist of a paper tape reader, used as a

main input device for reading utility programs and data *In general, an operator finds a user's phone number whenever a

, keyboard, Muser's name is

given

in a query. Therefore, the operator at the

which isused to type

thephonet

icsym spellingseof

the i tX servicecenterhasnoneedtofindaphone numberthroughahome

which is used to type the phonetic spellings of the input address of a particular subscriber. In fact, the subscriber's address

Chinese characters,4 contains 37 phonetic symbols-21 isconfidentialattheQueryServiceCenter.

PHONETIC SYMBOL KEYBOARD TAPE READER KEYTABLE key record number record 1 recordm recordk

I

50 COMPUTER

(3)

Output devices include an HP/1331C X-Y graphic display, which is used to display the desired telephone record in Chinese, and an electrostatic printer (a VERSATEC plotter) for printing the telephone book in Chinese. The speed is about two

lines/sec

in Chinese characters. Depending on the size ofthe printed Chinese character, each line can contain 40 to 80 Chinese characters.

Except the DOS-III system program, we developed our own utility programs for handling data processing. The functions ofthese programsaresearching,sorting, merging, converting,updating,displaying, printing,andplotting.

Data Base

Structure

In this system we have two data bases: a telephone directory data base and a Chinese character pattern data base. The telephone directory data base is a three-level indexed sequential file containing three files:

(1)

index table, (2)key table, and(3) telephonedirectory file(Figure 2a). The record format of the keytablehas6bytes forthe search key, 2 bytes forthe correspondingrecord number in the telephone directory file, and 2

bytes

forapointerused for updating. The variable-length record format of the telephone directory file

(Figure 2b)

consists of three fixed-length fields: a 32-byte name

field,

which is a variable-length field containing 1 to 16 Chinese character codes, a 6-byte telephone number

field,

and a 4-byte pointer field for printing the telephone- book. Since each sector

(256

bytes)

can store six

42-byte

records, it is very easy to calculate the actual location in the disk. Each record is assigned an unique record number. Through this number, the physical address of a specified record canbe identifiedwiththe

following

rule:

sector=starting sector of thefile+

(S

1)/R

... . . .

(1)

recordnumberinthesector

remainderof

(S

-

1)/R

...

(2)

where S is the record numberofthe

specified

record andR isthe number of recordsineachsector.

In thekey table

file,

the entries are stored in ascending order with respect to theirnumericalvalue ofsearchkeyas well as record number. In order to minimize the mean number of comparisons to locate the desired record, the key table file is subdivided into blocks of length x. The optimal block size x is determined with respect to combined search methods so that the mean number of comparisonsis minimized. In the

key

table

file,

each block is initially 90%

filled,

with the remainder reserved for overflow. After a period of time the

key

table is reorganized to avoid filling up the reservedoverflowspace. In case of

filling

up, an overflow signal is given, andanew key table and 10% reserved block space are automatically regenerated. Therefore, the logical structure of the data base does not need to be

changed

when the system is applied to a larger city like Taipei, which has

260,000

telephone subscribers.

Every Chinese character is represented

by

an 18 X 15 dot matrix and is assigned an unique number as its

code-i.e.,

its record number (see

Figure

3).

Hence each May 1975

w t v _ ~r I -_-k

JL L I

_--_4PI _I~_

Chinese characterexpressedinan

18x15 dotmatr'ix recordformat: I word word word word word word word word word word word word word word word word word word 1 00174 2 00104 3 04104 4 04174 5 77104 6 04104 7 16174 8 16000 9 16777 10 16100 11 15377 -12 24445 13 44111 14 04221 15 04441 16 24101 17 14102 18 04004

memorydata of the dot matrix

dot matrix 36bytes

Figure3. Chinese Character Patternand the RecordFormat Chinese character needs 36

bytes

for its dot matrix representation. There are 1400 Chinese charactersinuse in

our present system. These are stored in ascending order with respect to frequency ofusage, so thatcharacterswith

high

usage can be accessed (i.e., made core-resident) very

rapidly.3

These 1400 character representations form a file called the Chinese character pattern file. To access a Chinese

character, simply

get this character's code and set R=7 in

formulas

(1)

and

(2). Then,

the location is found and the desired dot matrix of the character is displayed. This

operation

takes,

at most, one disk access. The reasons for using a dot matrix rather than using other forms, such as

the Chiao-Tung Radical System,5 are that it requires a smaller number of disk accesses andpermitsanelegantand symmetrical presentation of the Chinese character in

display

and print. (Typical video andhardcopyoutputsare

shown in

Figures

4 and

5.)

Relationship

Between Search Methods

and

Optimal

Block Size

To locate a desired record

Ri

from the file, the search sequence isas follows. A user's name is firsttranslatedinto

phonetic

spellings which are typed by an operator into the system

through

akeyboard;then thespellingsaretranslated into a stringof numerical codes which are used as a search key. (This input operation is illustrated in Figure 6.) The search

key

is first compared with eachentry in the index table ofn/x entries in order to locate the block, say

Bj,

where the desired record

Ri

is stored. In case of a match, each code of theseChinesecharactersinthe

Ri

mustsearch for theChinese characterpattern file in order to find out its corresponding dot matrix. From the dot matrix, this desired record

Ri

canbe displayed and printed in Chinese. In case ofmismatch, an error message will be sent to the operator. Depending on the received information, the operator can either retype the query or inform the requestertorepeat his message.

(4)

_

_

W

_

_ - t

E_

;

GC

a. b. c. d.

Figure4. The displayed result: (a)multiple-phonenumber record;(b)variable-length record;(c)subscribers with same name aredistinguished bytheir addresses inside theparenthesis;(d) updatedrecord shown in the second row.

Among the various methods known on retrieving information records from a data base, we have chosen ISAM (indexedsequential accessmethod)andbinary search method in our application. For the time being, the other methods are unsuitable for our case. For example, the difficulties in using scatter storage techniques are (1) finding satisfactory hashing function so that, after hashing the phonetic spellings of Chinese characters, the search key can be

uniformly'distributed

ineach blockof thekeytable; and (2) estimating the suitable storage allocation scheme for the hashing table, because the number of telephone users is increased rapidly. This problem is particularly serious for a minicomputer with only 16K core memory. On the other hand, linear search is time-consuming. Therefore, it contradicts the requirement of real-time reply on inquiry.

In order to minimize the mean number ofcomparisons to locate the record

Ri

from the block

Bj,

the problem of determining the optimal block size xwith respecttosearch methods is solved as follows:

(1) For a sequential search in the index block and the selected block

Bj,

the optimalblocksize is

n1/2;

(2) For asequentialsearch in the indextable andbinary search in the block

Bj,

the optimal block size is n.ln2/2;

(3) For a binary search in the index

table

and the selected block

Bj,

the optimal blocksize is

n/2;

(4) For abinary search in theindex tableandsequential

search in the selected block

Bj,

the optimal block sizedoes not exist;

where n is the number of records in the telephone directory.**

**Detailedproof:

An ordered file of n records (either in fixed length or in variablelength) is subdivided into blocks of length x. There aren/x blocks, each having x records. The optimal block size x with respect to various searchmethods is proved as follows:

(1) For a sequential search in the index table and in the block 13-theoptimal block size isnY2.

Proof: The mean number of comparisons to locate a record fromthe fileis

a =1/2((n/x)+ t)(n/x)/(n/x)+l/2x(x+ 1)/x =/2((n/x)+ 1)+ /2(X+) di/dx=O e(-n/x2)+ 1 0 x=n/2 d2i/dx2 2n/x3= 2/nl/2>0-O xmin =n'/2 x=n/2

(2) For a sequential search in the index table and binary search in the blockBi,the optimal block size is(n.ln2)/2.

Proof: a=1/2(n/x+ 1)+((x+ 1)(log2(x+ 1))/x- 1)

1/2(n/x+ 1) +(log2(x+ 1)- 1) (SeeReference I

da/dx= 0x=(n.ln2)/2 forx>10)

d2N/dx2 4/(ln2)3n2>0-xmin =(n.ln2)/2

x=

(n.ln2)/2

(5)

ID I f

k

I

W

e+3 tgn,44 r th t-"

>:

ES L

X-x

PCI. rEI ' L 7

Vcw

*Yf.T

ZM,

P~P-q3-,,

L.tL

NI-

=Ee

Figure 5. The Printed Result.The5-digitnumbersrepresentcontinuedphonenumbers-e.g.,316364representstwotelephone numbers 3163 and3164.

(3) For a binary search in the index table and the block Bj, the

optimalblock size isn.

Proof: a=(log2(n/x+ 1)- 1)+(log2(x+1)- 1)

da/dx=0 x=n/2

d2i/dx2 = 2/nt/2(nl/2+ 1)2>0

1x=n1/2

Xmin-n/2

(4) For abinary search in the index table and sequential search in theblockBj, theoptimal block size doesnotexist.

Proof: a=(log2(n/x+ 1)- 1)+1/2(x+ 1)

da/dx=0 -> x=0or-n

This isimpossible sincexisapositive integer.

In all, ofthese fourexpressions, the firstterm and the second termrepresentthe formula of themeannumber ofcomparisonsin

1975

User'snameinChinese

(givenbyaninguirer)

Phoneticspellings

(typed by an operatoron akeyboard) S{7 2 T- 1 <-72

Translatednumericvalue35 266 80

(systemgenerated) 350 2616 3830

Actualsearchkey 00536 05070 07366

(startingtosearch) 8 8 038

Figure 6. Typical ExampleofInput OperationsforSearching

aTelephoneRecord

the index table and theselectedblock

Bj

respectively. For largen (e.g.,n>1000), the valueofain(2) and (3)isalwaysless than the valuein (1)wheneverx=xmin;therefore,thebinarysearchshould

beadoptedin either theindex tableortheblockB- orinboth. 53

(E

rr3

ft

r-3 BWt IRil t ta

Io1\j&

pt

OIV

il

ftgB r,s Eb !X 1kAE

f#

tI',L4

31U

QI3

ES~i1

3

4206

3653

2%B

290

2670

5073

2011

4071

490

4357

3716

4437

3422

4388

5288

3216

6091

5234

5881

4843

6871

2807

6075

547

4090

5313

6603

4519

4951

6518

2711

652

4778

6743

2424,

2036

4928

2320

5744

32

78

261-3

2017

5609

4836

207

2859

66

4702

6043

1265

3685

337

5002

256

402i8

b5939

3S70

6343

31634S

5191

312930

501-00

6511

3087

2853

23316

307-7

2709

.373

4385i

5831

606?

3661

4441

£311

1263 DU 3-f39 'uJL-h3 293'('

637,

(6)

System

Testing

To insert a new record into the system, the whole information of a

telephone record

is first stored at the bottom of the telephone directory file. Then the record's searchkey

(the

phonetic spellings of the selected characters from the user'sname) is stored at the appropriate location inthe ordered key table file. This operation takestwo disk accesses: one forwriting the search

key

into the keytable and one for writing the record in the telephone directory file. It takes an average of 10 secondsto

key

in atelephone record of8Chinese characters, which includes the character codes, the phonetic spellings of the search key, and the telephone number. Hence,theoiineinsertionoperationis not used in the testing system with one CPU during the busy hours

(9-12

A.M. and 2-5 P.M.

daily2).

The modification is simple: the record to be modified is first fetched and displayedonthe graphic display

(see

Figure 5); then the information needed to be addedordeleted can be processed similarly to an edit operation. The system's regenerating time, defined as the time ofregenerating the key table, equals b(2dt +

pt)/n

where b isthe number of blocksin thekeytable,dtisthe average diskaccesstime,Pt is the CPUtime,and n is thetotal

number

ofrecordsinthe telephone directory file. Now, the system's regeneration time(i.e., rebuilding cost)isapproximately 0.7 milliseconds per record.

The total size ofour utility programsisabout6Kbytes, and the size of the data file is about 313K

bytes

in which 210K bytes are used for thetelephone directory

file,

50K bytes for the key table, and 53K bytes for the Chinese characterpattern file.

The total size of the system is about 320K

bytes.

The response time to displayarecordinChinese

depends

onthe number of Chinese characters in the retrieved record. On the average, it takes 30 milliseconds

(the

average access time of an

HP/7900

disk)to access a Chinesecharacter.In the worst case, it takes 0.24 sec to

display

a record of8 Chinese characters on the

HP/1331C

graphic

display.

By comparisonofthe slowest

system's

response time

(0.24

sec) with the quickest manual response time, whensomeofthe most frequently used Chinese characters

(about

100)

are made core-resident, the system's response time will be improved to 200 times faster than that of a manual system.3

Acknowledgment

The authors would like to

acknowledge

the corrections and suggestions of Dr. R.C.T. Lee and Mr. Pierre Loisel.

They

are also indebted to the referees' useful comments and

colleagues'

discussions intheLaboratories.

References

1. G. Salton, Automatic Information Organization and Retrieval, McGraw-Hill Book Company, N.Y., 1968.

2. C. C. Yang, "Statistic Analysis on the Traffic of Taipei Query Service Center," Quarterly Report of the Telecommunication Labs,Vol.3,No.4,pp.69-126,October1973.

3. C. V. Ramamoorthy andY. H. Chin,"An EfficientOrganization of Large Frequency Dependent Files for Binary Searching," IEEE Trans. onComputers, October 1971,pp. 1178-1187. 4. S. K.Chang, C. S. Chiu, M. H.Yang,and B. S.Lin, "PEACE-A

Phonetic Encoding andChinese Editing System," Proceedings of the First International Symposium on Computers and Chinese I/O Systems, Academia Sinica, Taipei, Taiwan, R.0. C., Aug.

14-16, 1973, pp.2947.

5. C. C.Hsieh,M.W.Du, et al., "The Chiao-Tung Radical System," Proceedings of theFirst International Symposium onComputers andChinese I/OSystems, pp. 49-78.

Yeh-hao Chin is an assistant professor in the Computer Sciences Department at North-western University. Earlier, he was with the Telecommunication Laboratories in Chung-Li,

Taiwan,where hewasresponsiblefor thedesign

and development of software systems for telephone service systems and electronic switch-ing systems. During this period he was also a

,,V! X i researchfellow atthe ElectronicLaboratoryof UC Berkeley,and anadjunctassociateprofessor in the Computer Science Department of the NationalChiao-Tung University.

He received the BSEE from the NationalTaiwanUniversity in 1966 and theMS andPhD degrees in electrical engineering from the University of Texas in 1970 and 1972. Dr. Chin's research and

teaching interests are in the design and development ofdatabase systemsfor Chineseinputand output.

Jun Wun Jou is a research scientist at the Telecommunication Laboratories in Taiwan where he leads a group in designing an automatic telephone directory inquiry service system for the Taipei area. Joureceived his MS tLd+_ degree from National Chiao-Tung University, Hsinchu, Taiwan, in 1970. He isa member of

the Institute of Electrical Engineering of the

RepublicofChina.

System Improvement

At the present, each operation takesacertainamount of disk accesses. The disk access includes not only record retrieval but also location of the Chinese characters in retrieved

record-especially

the time spent in locating the Chinese characters. In order to reduce the diskaccesstime, the usage frequency of each Chinese character is being monitored so that those characters used most oftencan be made core-resident. Also, some coding methods such as threshold functions for querywillbe adoptedto facilitatea user's query. The "best"threshold value is studied so that whenever a communication aX4iguity occurs, the mis-spelled phonetic input willstill " e acorrect answer

(i.e.,

get the desired record). In summary, we plan to do the following in the near future: (1) make the input query format more flexible, (2) reduce the number of disk accesses, (3) modify and extend the operating system when the system is working in a time-sharing environment with multiple terminals.u

54

RepublicofChina. information storag(

W. H. Peng works for the Telecommunication Laboratories in Chung-Li, Taiwan, where he is engaged in research programs concerning information storage and retrieval. He received the BSEE and MSEEfrom the NationalChiao

TungUniversity, Hsinchu,Taiwan, in 1968 and 1970.

Chen Chau Yang is a research scientist in the Computer Scientist Group of the Telecom-munication Laboratories, where heis in charge of the design and implementation of an system at the Taipei Query Station, Yang received the BSEE and MSEE from National received the BSEE and MSEE from National Chiao-Tung University, Hsinchu, Taiwan, in 1969 and 1971, and is a member of the Institute of Electrical Engineering of the His current research interests are in the field of e andretrieval.

References

Related documents

SAM-II Measurements and Ground Truth Requirements The SAM-I Stratospheric Aerosol Measurement sensor is scheduled for launch on the Nimbus G satellite in August 1978 Its mission is

In the mechanical, electrical and instrumentation control engineering sectors, the equipment for auxiliary services, power supply, open-loop control,

2.4 Setting the FFT Resolution and Analysis Length In the Generator Function panel, the length of the wave file is displayed in the “Time” field.. Figure 3: File length

This implies that the overall destabilization due to hump depends not only on the degree of mean flow modification but also on the rel- ative location of the hump with respect to

In this section, we classify load balancing algorithms and discuss their applicability to data-intensive iterative routines and dedicated computational clusters with memory

 All-subnets-directed broadcast address: If the network number is a valid network number, the network is subnetted, and the local part is all ones (for example, 141.85.255.255),

In this study, it is aimed to develop the Science Education Peer Comparison Scale (SEPCS) in order to measure the comparison of Science Education students'

For example, if the time delay of a real interconnect is 1 nsec, it will behave like a simple, ideal lumped circuit capacitor, as long as the rise time of the signal is greater than