• No results found

Exploiting the Functionality of Object-Oriented Database. Management Systems for Information Retrieval 1. Gabriele Sonnenberger

N/A
N/A
Protected

Academic year: 2021

Share "Exploiting the Functionality of Object-Oriented Database. Management Systems for Information Retrieval 1. Gabriele Sonnenberger"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Abstract

1 Introduction

App earedin: IEEEDataEngineeri ng Bulleti n, Vol. 19,No. 1,pp. 14-23,1996

Management Systems for Information Retrieval 1

GabrieleSonnenb erger

Ubilab,Union Bank of Switzerland

Bahnhofstrasse 45, 8021 Zurich,Switzerland

sonnenb [email protected]

Inthispaper, wepresenttheapproachofFIREtoutilizinganobject-orientedDatabase

Manage-mentSystem(DBMS)forInformationRetrieval(IR)purposes. First,acomprehensiveoverview

ofpreviousattempts touseDBMSs forimplementing IRsystems isgiven. Next,dierences b

e-tweenDBMSsandIRsystems,withregardtoindexingandretrieval,arediscussed. Inaddition,

some shortcomings of DBMSs with regard to supporting IR systems are pointed out. Then, an

overview of FIRE, which is designed as a reusable IR framework, is given and its approach

presented in more detail. Special attention is given to the design and implementation of an

IR-indexandhow retrieval eciencycanbeimproved by using the optimizationfacilitiesof the

underlyingobject-oriented DBMS.

and

Due torecent advances in InformationTechnology and Telecommunications, moreand more

informa-tion is pro duced and distributed by electronic means. Instead of plain ASCI I texts, information is

increasinglyenco ded in dierentforms,e.g.,asgraphs,tables orformattedtexts. Thesedevelopments

imp ose additional requirementson themanagement and retrieval ofinformation. Whereasin thepast

thefo cus in InformationRetrieval (IR)wasontext, which wasmostly considered tob e unstructured,

to day's IR systems have to face information which usually consists of structured and unstructured

partsand which mayb ecomp osed of dierent media.

Weobserve furtherchangesin theowand distribution ofinformation. In thepast,theuserof an

IRsystemwastypically apassive consumerof informationgatheredatthesp ecial sitesofprofessional

informationproviders;whilenowadaysauserisoftenb othaninformationconsumer aninformation

provider, e.g.,in a co op eration'sdo cument management system. Hence, datamanagementissues like

p ersistentstorageofdata,concurrencycontrol,and recoveryafterfailuresaregaining imp ortance,and

IRsystems mustcop ewith these issues.

Developing techniques for the indexing and retrieval of heterogeneous information units is a

de-manding topic genuine to the IR domain. In contrast to this, data management issues have already

b een investigated extensively by the Database (DB) community, which has also develop ed theories

and techniques applied in pro ductive systems. Rather than `reinventing the wheel', it is natural for

develop ers of IRsystems totrytoprot from theresults of the DBcommunitybybasing IR systems

(2)

2.1 Relational DBMSs

best match

ranked list

2 IR Systems Based on DBMSs

purp oses. Section 2reviewsprevious attemptstouseDBMSsforimplementingIRsystems. InSection

3, we discuss dierences b etween IR systems and DBMSs with regard to indexin g and retrieval, and

p oint outsome shortcomings of DBMSs with regard to supp orting IR systems. Finally in Section 4,

wesketchourIRframework,whichwecall FIRE,and presentourapproachforusing thefunctionality

ofanobject-orientedDBMSin anIRsystem. Thepap er concludeswithsomeremarksonfuturework.

A rst approach to use a DBMS for IR purp oses is to consider IR as a DB application. This has

b een prop osed, for instance, by Macleo d & Crawford [Mac83], Blair [Bla88], and Smeaton [Sme90].

Commontothese prop osals isthat theIR systems arebased on a relational DBMS,and SQLis used

as retrieval language. These IR systems do not store full do cuments, but bibli ographic references to

do cuments; these usually provide thetitle of thedo cument represented, thenamesof the authors,an

abstract,somecontentdescriptors, and thedetails ab out thedateand lo cationof thepublication.

This approach of making use of a DBMS has received major criticisms. One factor that has

b een criticized esp eciall y, e.g.,by Schek & Pistor [Sch82], is that therelational DB mo del represents

information in a rather unnatural waybya set of tables(relations). Further, SQL queries tend tob e

rathercomplexanddiculttounderstand;seeforinstancetheexamplesgivenin[Mac91]. Inaddition,

retrievalmayb equitecomputationallyexp ensive,esp ecially wheninformationfromdierenttableshas

tob ecombined bya `join'op eration.

Asevereshortcomingofthisapproachistherestrictiontoreferenceretrieval. Thisrestrictionisnot

so muchvoluntary asa matterof theunderlyin g relational DB mo del. Therelational mo del requires

theeldsof arecordtob eof axedlength, thusmakingit dicultforarelational DBMSapplication

to manage information units such as ordinary texts, which usually vary in length. (Of course, there

are work-arounds like follow-up records, but such work-arounds make the mo deling of information

even moreawkward.) A second severeshortcoming is thatSQLin itsbasic form isrestricted toexact

matching. An exact match is appropriate for many DB applications, esp ecially when information is

structuredtoa highdegree and thevo cabularyused isratherxed. In mostIRapplications however,

information units like texts are signicantly less structured and the vo cabulary used is usually

unre-stricted. Corresp ondingly, users of IR systems nd it far more dicult oreven imp ossible to issue a

querywhich successfully delivers all informationrelevant toagiven informationneed, but excludes

ir-relevantmaterial. Therefore,moreadvancedapproachestoIRabandonedtheexactmatchingparadigm

and nd instead the pieces of information which the user's query by applying weighting

schemata. The user receivesas result a which is sortedby theassumed probability that a

piece ofinformation is suited toanswertheuser's information need.

Severalattemptshaveb eenmade toprovide b etterDB supp ortforthedevelopment of IRsystems

byextendingtherelationalmo del,orbyaddingnewfeaturestoSQL.Toprovidemorenaturalexternal

views of information, Schek & Pistor [Sch82] have prop osed a generalization of the relational mo del

which allows nested relations. Lynch & Stonebraker [Lyn88] have intro duced abstract data typ es,

which make theformulationof content-based search conditions moreconvenient, althoughretrieval is

still based on the exact matching paradigm. An extension of SQL allowing a `similar-to' comparison

op eratorhasb een prop osedbyMotro[Mot88]. Furthermore,variousapproacheshave b eendevelop ed

(3)

2.3 Object-oriented DBMSs

A dierent approach to making use of a DBMS for IR purp oses is to couple an IR system with a

DBMS.Croftetal.[Cro92]attemptedalo osecoupling b etweentheIRsystemINQUERY andIRIS, a

prototyp eversionofanobject-orientedDBMS.Theychosean object-orientedDBMSsince theob

ject-orientedmo delallowsone, incontrasttotherelationalmo del,tostorecomplextextual informationin

aquitenaturalway. TheIRandtheDBsystemarecoupledexternallybyacontrolmo dule. Thereare

linksfrom theinformation unitsstoredbyINQUERY tothecorresp onding textualobjectsoftheIRIS

database. Theintegratedsystemprovidesthefunctionalityofb othunderlyin gsystems. Thus,theuser

may issue content-based queries as well as DB queries. However, a severe problem of this coupling

approach is that information is stored twice, by the IR system as well as by the DBMS. This may

cause consistency problems wheninformation is mo died. Furthermore,there is no full integration of

content-based queries and DBqueries.

Guetal.[Gu93]have chosenan alternative waytocouplean IRsystem witha DBMS.They have

emb eddedthefunctionalityoftheIRsystemINQUERYintotherelationalDBMSSybase. Furthermore,

they have extended SQL by a function which takes an INQUERY query as input and allows one to

cho ose theINQUERY database tob e consulted forevaluating the query. This system avoids storing

information twice: textual information is stored and managed by the IR system and other kinds of

informationbytheDBMS,whichalsoprovidesp ointerstothetextualinformationstoredandmanaged

by the IR system. The main drawback is that two dierent approachesfor managing and retrieving

information are used, which makes the management of information more dicult. Further, the b est

match retrievalparadigm isrestricted totextual information, whereasan application of this retrieval

paradigm tootherkinds of informationwouldb e highlydesirable, asforinstance p ointed outbyFuhr

[Fuh92].

Bearing in mind the shortcomings of the relational mo del, ithas b een prop osed to useother mo dels,

more appropriate than the relational DB mo del, as a basis for developing IR systems; e.g.,an array

mo del [Mac87] oranobject-orientedDB mo del [Har92p]. The object-oriented DBmo del is esp eciall y

app ealing, since object-oriented technology is maturing and the rst commercial systems are already

available.

Theobject-oriented approach has also b een adopted for developing our IR framework FIRE. The

frameworkisimplementedusingObjectStore[Lam91],acommerciallyavailableobject-orientedDBMS.

In ObjectStore,p ersistence isnotpart ofthe denitionof anobject,but amatterofallo cation atthe

time of object creation (`p ersistency byallo cation'). Thus, objects of thesame typ e can b e allo cated

p ersistentlyaswellastransiently. Furthermore,itmakesessentiallynodierence whetherwedealwith

a transient orp ersistent object. These features of ObjectStoreenable us todesign and implement an

IR system in a problem-adequate way,mostly neglecting data storage details. Thus the develop er of

anIRsystemcanrelyon theDBMS'smeansforp ersistentstorage,concurrencycontrol,recoveryafter

failure, etc.,without having toaccept severerestrictions, asexp erienced from relationalDBMSs.

Object-orientedDBMSsprovideausefulbasisforthedevelopmentofIRsystems,yetcouldsupp ort

IR systems even further with regard to p erformance issues. ObjectStore, for instance, allows one

to improve retrieval eciency by buildi ng sp ecialized indexes and by optimizing queries. However,

indexingandretrievalinan IRsystemdierin certainasp ectsfromindexingandretrievalinaDBMS,

as describ ed in the following section. Consequently these facilities cannot b e used directly for IR

purp oses. Mo difying thecorefunctionalityofa DBMSforexploiting thesefacilities isnotintherange

(4)

3 Some Dierences between IR Systems and DBMSs

3.1 Index

3.2 Retrieval

3.3 Indexing and Retrieval Functionality a prop erdesign on theIRside, asisshownin Section 4.

Theterm`index'isusedb othbytheDBcommunityandbytheIRcommunity,howeverwithdierent

meanings. While the function of a DB-index is to improve p erformance, an IR-index typically also

provides additional data. Indetail, we can observethe following dierences:

A typical IR-inde x is an inverted le withindex entries,each consisting of a key representing a

featurederived fromasourceandasetofp ostings. Ap ostingmayincludeinformationab outthe

frequencyofthegivenfeaturewithinthesource,ormaysp ecifyitsp ositionwithinthedo cument

inmoredetail. ADB-index, however,onlymaintainsthepathstotheobjects whereanattribute

hasa certainvalue.

In an IR application, information ab out the characteristics of the collection represented by an

index is required when computing the relevance of an information unit to a user's query. An

example of such information isthe maximum frequency of a termin a collection of texts orthe

meanvalueandstandarddeviationofasetofnumb ers. SinceaDB-indexservesonlytooptimize

access,no suchcollection information isneeded.

Due to the dierences, typical IR-indexes are not supp orted by DBMSs. Optimized access to

information | asis done bya DB-index | is,however, crucialforimplementin g ecient IRsystems.

Thus,howwe can useDB-indexes tobuild IR-indexes isan op en researchproblem.

Atypical DBMSprovidesaretrievalinterfacewhichallows onetocheckwhethertwovalues areequal,

and to determine whether a value issmaller orgreater than another value. Usually such queries can

b eoptimized, e.g.,byinstructing the DBMStocreateand maintain appropriateindexes.

Advanced IR systems aresupp osed tosupp ort approximate (`b est')matching. Unfortunately,

ap-proximate matching is not supp orted by DBMSs. This is a severe shortcoming, since approximate

matchingis farmoretime-consuming thanexact matching,and thereforedemands optimization

facil-ities. To dealwiththeadditional complexity,weurgentlyneed sp eciall y tuned matchingmetho ds and

techniques to avoid a linear search through an index. These metho ds and techniques should ideally

workhand in hand withavailable DBMStechniques.

In the case of a conventionalIR application, i.e. one which is not based on a DBMS, the application

develop er hastoprovideall theindexin g and retrievalfunctionality needed. Developing an IRsystem

ontopofaDBMSmightsavedesignandimplementationeort,sinceDBMSsalreadyprovideindexing

and retrievalfunctionality. The taskoftheapplication develop erwouldthenb etocho oseappropriate

functionalitiesandtoinstructtheDBMScorresp ondingly. Inordertoindexasetofobjectsforinstance,

theapplication develop er wouldhave tostatewhich objectattributes aretob eindexed inwhich way,

and to select the index structures b est suited for theattribute values. Indexes would then b e set up

(5)

InformationObject

InfoObjectElement

IOE-List

IOE-Set

IOE-PersonName

IOE-Date

IOE-Integer

IOE-String

. . .

ReprInfoUnit

. . .

. . .

. . .

Index-IDF

Index-2Poisson

Index

F I Re Figure 1:

4 The Approach of FIRE

4.1 An Overview of FIRE

ReprInfoUnit

ReprInfoUnit

ReprInfoUnit

ReprInfoUnit

Title Authors TextBody PublicationDate

InfoObjectElement

InfoObjectElement

atraditional IRmannerusing theindexingmechanismsof aDBMS.Thus,we needtoinvestigatehow

wecan useindexing and retrievalfunctionalities of object-orientedDBMSsforIRpurp oses.

FIRE is designed to facilitate the development of IR applications and to supp ort the exp erimental

evaluation of indexin g and retrieval techniques. In addition, the framework is supp osed to provide

basic functionaliti es for several media and should end up as a exible and extensible to ol that can

b e used for developing a wide range of IR applications. FIRE is an acronym for ` ramework for

nformation trievalApplications'. It is develop ed in co op erationwith theIR group at theRob ert

GordonUniversity,U.K.

The design and implementation of FIRE are based on an object-oriented approach. The object

mo del of FIRE denes thebasic IRconcepts and supplies functionalitie s that supp ort therealization

ofanIRapplication. ThissectiongivesashortoverviewofFIREfo cusingonthemostessentialclasses

ofFIRE'sobjectmo del. Amorecomprehensive descriptionofFIRE'sdesignandobjectmo delisgiven

in [Son95].

FIRErepresents do cuments by a set of features (or attributes). Note that the term do cument is

usedhere in a broadsense: do cuments mayconsist of structuredand unstructured partsand mayb e

comp osed of dierent media. The class (see Figure 1) mo dels do cuments in a generic

way. Itdenesmetho dswhichprovideinformationab outthemo delingofado cumentrepresentationas

wellasmetho dsforaccessingthefeaturesofado cumentrepresentation. Inaddition, and

its sub classes are resp onsible for organizing the indexing and retrievalof do cuments of theresp ective

typ e.

Overview ofthe mostimp ortant classes of FIRE'sobject mo del

Concretesub classesof denehowdo cumentsofacertaintyp e,e.g.,b o oks,tables,etc.,

arerepresentedinanapplication. Whendealingwithtextforinstance,anewsub classof

may b e intro duced, consisting of features like , , , and . The

mo deling ofconcrete typ esof do cuments is notpartof FIRE'sobjectmo del asthis is an

application-sp ecic task. Nevertheless, theframework supp ortsthe application develop er in this task. The class

and its sub classes provide a set of data typ es like string, integer, p erson name,

date, set,and list(c.f. Figure 1), which are intended to b e used forthe application-sp ec i c mo deling

of do cuments. If needed, the application develop er may extend this set of data typ es by adding new

sub classes. helps toreducetheeortfordeveloping anIRapplication byproviding

an asso ciated interface forindexing an information unit, determining the similarity of two units, etc.

(6)

Index

DB-Collection

IndexingFeature

Figure 2: 4.2 Design of an IR-Index Index Index

Index Index-IDF Index-2Poisson

Index Index Index IndexingFeature DB-Collection IndexingFeature IndexingFeature IndexingFeature

IndexingFeature Index

IndexingFeature IndexingFeature

Index Index Index addIFs Index IndexingFeature Index Index update Index

Index getIFsOf IndexingFeature

retractIFsOf IndexingFeature

clear IndexingFeature

Index retrieve

ReprInfoUnit The class is the implementation of a typical IR-index, which not only manages a set of

indexing featuresderived from acollection of do cumentsbut also providesinformation forcalculating

theprobabilityofrelevanceof informationunits totheuser's query. Theclass isageneric class,

which solves general tasks but do es not provide sp ecic IR functionality. The latter is supplied by

concretesub classesof like and ,whichimplementparticularweighting

schemata(see [Har92m]foran overviewofweighting schemata).

In the following two sections, we discuss the design and implementation of an IR-inde x in FIRE

morecomprehensively. First, wepresent thedesign of theclass withfo cus on IR relatedissues.

Then, we discuss somedetails of which supp ortan ecient evaluation of queries and enable us

toexploit theoptimization facilities ofthe underlying DBMS.

An basicallyconsistsofasetof s(seeFigure2). Inaddition,itmayb easso ciated

withzeroormoreobjectsofthetyp e ,whosepurp oseisexplainedinthenextsection. An

consists of a feature, e.g.,a normalized word from a text, and a sourcesp ecication.

The latter is essentially a reference to the do cument from which the feature has b een derived. In

addition,an maysp ecifyap ositionwithinthesource. Positionalinformationisneeded

for indicating to the user why a particular information unit has b een retrieved, which is done by

`highlighting' the relevant pieces of the unit. Furthermore, weighting schemata may use p ositional

information, e.g., it may b e assumed that words o ccurring in a heading are more imp ortant than

words in regular paragraphs. Finally, an is asso ciated with the indexing metho d

that has b een used for deriving the feature. Information units may b e indexed in many dierent

ways, and in many cases more than one metho d can b e applied for indexing a particular unit, e.g.,

weak or strong stemming when indexin g textual units. For consistency reasons, it is imp ortant that

all s of an have b een derived by the same metho d. Hence, FIRE asso ciates

s with the indexing metho d applied, and checks whether the to b e

added toan is compatiblewith theonespreviously added.

Structure of theclass

The class denes a set of op erations and metho ds (see Figure 3), which provide a uniform

interface indep en dent of anyparticular retrieval mo del. The metho d of incorp orates a

set of s into an . The adding of features do es not cause any up dating activities

like sorting the or re-calculating indexing weights. Up dates have to b e invoked explicitl y by

an message. We chose toseparate the pro cesses in order toavoidunnecessary computations,

for instance, sorting an again and again when indexing a whole collection of do cuments. The

class also provides a metho d,called ,fordetermining the s which have

b een derived from a particular source. Bythe metho d ,the s referring to

a given set of sources can b e retracted, whereas the metho d removes all s from

an . Finally, the metho d serves for retrieving information by evaluating single query

conditions. The results of the evaluation are passed to thequery do cument, which is a

(7)

Index

addIFs(features: IndexingFeatures): Boolean

update(): Boolean

getIFsOf(source: IO-Address): IndexingFeatures

retractIFsOf(sources: IO-Addresses): Boolean

clear(): Boolean

retrieve(condition: QueryCondition): BasicRetrievalResults

supportMatchers(names: Strings): Boolean

getSupportedMatchers(): Strings

Index-IDF

getNumberOfSources(): Integer

getMaxFreqOfAnyIndexingFeature(): Integer

getMaxFreqOfIndexingFeature(f: IndexingFeature): Integer

getFreqOfIndexingFeature(f: IndexingFeature, s: IO-Address): Integer

getIndexingWeightOfIndexingFeature(f: IndexingFeature, s: IO-Address): Real

Figure 3:

Figure 4:

4.3 Improving Retrieval Eciency Index Index supportMatchers Index IndexingFeature Index Index Index Index Index Index DB-Collection DB-Collection

DB-Collection IndexEntry IndexEntry

IndexingFeature

Index IndexEntry

IndexEntry IndexingFeature

DB-Collection Index

DB-Collection IndexEntry IndexingFeature

totheuser's query.

Op erations and metho ds of

FIRE supp orts an approximate matching on dierent data typ es. It allows one, for instance, to

retrieve do cuments which cover a particular topic with a high probability and to retrieve do cuments

which have a similar authorname ora similar publication date. Asdiscussed b efore, an approximate

matching mayb e quite time consuming. In order tocomp ensate computing eorts, an object

mayb e instructed tosupp ort particular matchingmetho ds. Thiscan b e done by an authorized user

at run-timevia the metho d . Also, previous instructions may b eoverwrittenby new

ones. Note thatan alwaysallows one touseanymatchingmetho d applicable tothegiven typ e

of s,but p erformssupp ortedmetho ds moreeciently. Thedetails oftheoptimization

of the matching pro cess are fully encapsulated by the class . This is advantageous asit avoids

b othering theuser withoptimization details.

Thesub classes of implement particularweightingschemata. Theydene additional metho ds

which calculate the information required by the resp ective weighting schema. Figure 4 shows an

example sub class of which implements a version of the `Inverse Do cument Frequency' (IDF)

weighting schema.

A concrete sub classof

Thesub classes of are notsp eciali zed to aparticular typ e of indexing features. Hence, they

can b e used in dierent contexts and theapplication develop er needs todene a new sub classonly if

an additional weighting schemais tob esupp orted.

An may b e asso ciated with zero or more objects of the class (see Figure 2). A

servestoimproveretrievaleciency. ItexploitstheoptimizationfacilitiesoftheDBMS

and maysupp ort approximatematchingmetho ds.

An object of the class consists of a collection of s. An is

comp osed of a key, which may b e of any typ e, and a set of p ointers to the s of an

which yield thesame key. (Thus, s haveessentially the samestructure asentries of

an inverted le.) The key of an is derived from the corresp onding , and

may have the same value as the feature or may b e assigned some co de for optimizing access. The

interfaceof ,whichisdepicted in Figure5,is similartotheinterfaceof . However,

(8)

DB-Collection

addIFs(features: IndexingFeatures): Boolean

update(): Boolean

retractIFs(features: IndexingFeatures): Boolean

clear(): Boolean

xMatch(feature: IndexingFeature): IndexingFeatures

setMatcher(name: String): Boolean

getMatcher(): String

setOptionsDB-Index(options: String): Boolean

getOptionsDB-Index(): String

Figure 5: DB-Collection DB-Collection IndexEntry IndexEntry DB-Collection restricted IndexingFeature

DB-Collection IndexEntry IndexingFeature

IndexingFeature Index

DB-Collection DB-Collection Index IndexingFeature IndexingFeature Matcher key

IndexingFeature InfoObjectElement OptionsDB-Collection Interfaceof

A utilizes the query optimization facilities of the underlying DBMS by instructing

theDBMStocreateanappropriateDB-indexforthegivencollectionof s. SincetheDBMS

do es not index do cument representations but collections of s, we can apply IR indexing

techniques without b eing restricted in any way by the DBMS. Nevertheless, we can take advantage

of the DBMS's facilities foroptimizing the evaluation of queries. As a furtheradvantage, FIRE do es

not need to provide any sp eciali zed index structures (B-trees, hash tables, etc.) as well as query

optimizationstrategies, but can relyon ObjectStore'smeans.

In addition, serves to optimize approximate matching. When we try to nd the

objects of a collection which are similar to a given object, we have to consider all objects. This

is in contrast to an exact matching where search can b e restricted in most cases, e.g., by a binary

search in an ordered index. To reduce complexity, FIRE allows one to p erform the approximate

matching in a form, which is a combinationof an exact and an approximatematching. In

this matching mo de, a key is generated in a rst step for the given with the query

condition. Suchakeymayb eforinstance aphonetic co deforap ersonname. Then thecorresp onding

isconsultedandits slo oked-uptodeterminethe sforwhich

thesamekeyhasb een generated. Finally, afull approximatematchingisp erformedwiththe selected

s. Note, an maysupp ort morethen one approximatematchingmetho d at the

sametimebycreating dierent s. Thisis useful,since dierent retrievalsituations may

require dierent matchingmetho ds. In thecaseno appropriate existsforthematching

metho d to b e p erformed, the do es the lo ok-up itself, of coursein a less ecient way,by going

throughtheasso ciated set of s.

The derivation of keys from s is a matterfor the matching algorithms, since they

know b est how to build appropriate keys. Also, the matching algorithms determine which kind of

DB-index is most appropriate for the resulting keys. In FIRE, matching algorithms are represented

bysp eciali zed classes ratherthanbymetho ds of other classes. This design decision allows theuser to

browse through the class hierarchy in order to see which matching algorithmsare available and how

they are to b e used; see [Son96] for further details. The classes implementing particular matching

algorithms are sub classes of an abstract class called , which is depicted in Figure 6. The

imp ortantfeatures of this class withregardto thecurrent topic are themetho d forderiving keys

from s(or s)and theattribute forsp ecifying

(9)

Matcher

TypeInfoObjectElement: String

TypeDB-Collection: String

OptionsDB-Collection: String

key(o: InfoObjectElement): Object

key(o: IndexingFeature): Object

match(o1: InfoObjectElement, o2: InfoObjectElement): Real

match(o1: IndexingFeature, o2: IndexingFeature): Real

keyMatch(o1: InfoObjectElement, key1: Object, o2: InfoObjectElement): Real

keyMatch(o1: IndexingFeature, key1: Object, o2: IndexingFeature): Real

restrictedKeyMatch(o1: IndexingFeature, key1: Object, o2: IndexingFeature, r: IO-Addresses): Real

Figure 6: Conclusions Acknowledgements References Matcher Matcher Index Matcher Index Matcher Information

Processing& Management

Proc. of the 15th Annual

Int.ACMSIGIR Conferenceon R &D in Information Retrieval(SIGIR-92) Class

Thedesignoftheclass and itssub classes easestheoptimizationofapproximatematching

metho ds. It isfully sucient toinstruct an to supp ortcertain (s). Knowing thenames

ofthematchingalgorithmstob esupp orted,the itselfcan gatherthenecessarydetailsbyasking

theprop er (s).

In this pap er, we have reviewed previous attempts to use DBMSs for implementing IR systems and

p ointed out some shortcomings of DBMSs with regard to the supp ort of IR systems. Furthermore,

wehave presented FIRE'sapproach for using the functionalityof an object-oriented DBMS in an IR

system. FIRE allowsone esp ecially

torepresentcomplex heterogeneous information in a naturalway,

toutilize thequery optimizationfacilitie s of theunderlying DBMSforIRpurp oses, and

toreduce thecomplexityof an approximatematching.

In the near future, we will p erform exp eriments to quantify the eect of the DBMS's optimization

facilities in IR applications and totest thep erformanceof therestricted approximatematching.

TheauthorwouldliketothankToreBratvold,Ubilab,andAndreasGepp ertfromtheUniversityZurich

for many fruitful discussions on the design of FIRE and the integration of DBMSs and IR systems.

Thanks are also due toManuel Bleichenbacher, Ubilab, for his work on ETOS, which is thesoftware

development platformusedforimplementing FIRE,and his technical assistance withETOS.

[Bla88] D.C.Blair: AnExtended Relational Do cument Retrieval Mo del. In:

,Vol. 24(3),pp.349-371,1988

[Cro92] W.B.Croft,L.A.Smith, and H.R. Turtle: ALo osely-Coupled Integrationof aText

RetrievalSystem and an Object-Oriented DatabaseSystem. In:

,pp. 223-232,

(10)

Int.ACMSIGIR Conferenceon R &D in Information Retrieval(SIGIR-92)

Proc. of EDBT

InformationRetrieval`93, Vonder Modellierung zur Anwendung

Information

Retrieval: Data Structures & Algorithms

TheComputer Journal

InternationalConferenceon Very Large Data

Bases

InformationTechnology: Researchand Development

Journal of the AmericanSociety for InformationScience

Journal of the American

Society for InformationScience

ACMTransactions of OceInformationSystems

Proc. of the 8th International Conferenceon Very Large

Data Bases

Program

Proc. of the 18th

Annual Int. ACMSIGIRConferenceonR &D in Information Retrieval(SIGIR`95) ,pp. 211-222,

1992

[Gar90] M.Garcia-Molina andD. Porter: Supp ortingProbabilistic Datain aRelational System. In:

,pp.60-74,1990

[Gu93] J.Gu,U.Thiel, and J. Zhao: Ecient Retrievalof ComplexObjects: Query Pro cessing in

aHybrid DBand IR System.In: G.Knorz, J.Krause and C.Womser-Hacker(eds.):

,pp.67-81,

Konstanz/Germany: UVK,1993

[Har92m] D.Harman: Ranking Algorithms.In: W.B.Frakesand R.Baeza-Yates (eds.):

,Englewo o d Clis/N.J.: Prentice Hall,1992

[Har92p] D.J.Harp er andA.D.M.Walker: ECLAIR: AnExtensible Class Library forInformation

Retrieval. In: ,Vol.35(3),pp.256-267,1992

[Lam91] C.Lamb,G.Landis, J.Orenstein, and D.Weinreb: The ObjectStoreDatabaseSystem.In:

Communications ofthe ACM,Vol. 34(10),pp.50-63,1991

[Lyn88] C.A.Lynchand M.Stonebraker: ExtendedUser-Dened Indexing withApplications to

Textual Databases.In: Pro c. ofthe14th

,pp.306-317,1988

[Mac83] I.A.Macleo d andR.G. Crawford: Do cument Retrieval asa DatabaseApplication. In:

,Vol. 2,pp.43-60, 1983

[Mac87] I.A.Macleo d andA.R. Reub er: The ArrayMo del: A ConceptualMo deling Approachto

Do cument Retrieval. In: ,Vol. 38

(3),pp. 162-170,1987

[Mac91] I.A.Macleo d: Text Retrievaland theRelational Mo del.In:

,Vol. 42(3),pp. 155-165,1991

[Mot88] A.Motro: VAGUE:A User Interface toRelational Databasesthat PermitsVague Queries.

In: ,Vol. 6(3),pp.187-214,1988

[Sch82] H.J.Schekand P.Pistor: DataStructures foranIntegratedDatabaseManagement and

InformationRetrievalSystem. In:

,pp.197-207,1982

[Sme90] A.Smeaton: Retriev: An InformationRetrieval SystemImplemented onTop ofa

Relational Database.In: ,Vol. 24(1),pp.21-32,1990

[Son95] G.Sonnenb ergerand H.-P.Frei: Design ofa Reusable IRFramework.In:

,pp.

49-57,1995

[Son96] G.Sonnenb erger,T.A. Bratvold,and H.-P.Frei: Use andReuse of Indexing and Retrieval

Functionality in aMultimedia IR Framework.(Pap er presentedatthe nalMIRO

References

Related documents

Keywords: Tensor decompositions, tensor approximations, Tucker model, CANDECOMP/PARAFAC model, com- pact visual data representation, higher-order SVD methods, data

The reader may therefore feel quite safe in allowing (but not exceeding) a similar margin in regard to most of the horoscopes in this book, remembering that time is likely to be

Primitive speech acts were adapted from the work of Austin ( 1962) and Searle ( 1969) by Dore in 197 5 in order to analyze the communicative functions of infants at the

It is evident from the author’s analysis of recently published NRI Data for 6 sub-Saharan Africa countries that the current framework and analysis of the Global Information

disappears from the sorted address list, the tracking block will output the pulse width (PW), and time of arrival (TOA).. The tracking block will also clear all internal variables

When the owner withdraws money from the business we debit drawings and credit cashbook (cash in hand or cash at bank). ii) Taking goods for own use and.. When the owner takes out

[r]

If its first argument is nonfresh, then the second cond line of walk ∗ must