1
Abstract
1 Introduction
App earedin: IEEEDataEngineeri ng Bulleti n, Vol. 19,No. 1,pp. 14-23,1996
Management Systems for Information Retrieval 1
GabrieleSonnenb erger
Ubilab,Union Bank of Switzerland
Bahnhofstrasse 45, 8021 Zurich,Switzerland
sonnenb [email protected]
Inthispaper, wepresenttheapproachofFIREtoutilizinganobject-orientedDatabase
Manage-mentSystem(DBMS)forInformationRetrieval(IR)purposes. First,acomprehensiveoverview
ofpreviousattempts touseDBMSs forimplementing IRsystems isgiven. Next,dierences b
e-tweenDBMSsandIRsystems,withregardtoindexingandretrieval,arediscussed. Inaddition,
some shortcomings of DBMSs with regard to supporting IR systems are pointed out. Then, an
overview of FIRE, which is designed as a reusable IR framework, is given and its approach
presented in more detail. Special attention is given to the design and implementation of an
IR-indexandhow retrieval eciencycanbeimproved by using the optimizationfacilitiesof the
underlyingobject-oriented DBMS.
and
Due torecent advances in InformationTechnology and Telecommunications, moreand more
informa-tion is pro duced and distributed by electronic means. Instead of plain ASCI I texts, information is
increasinglyenco ded in dierentforms,e.g.,asgraphs,tables orformattedtexts. Thesedevelopments
imp ose additional requirementson themanagement and retrieval ofinformation. Whereasin thepast
thefo cus in InformationRetrieval (IR)wasontext, which wasmostly considered tob e unstructured,
to day's IR systems have to face information which usually consists of structured and unstructured
partsand which mayb ecomp osed of dierent media.
Weobserve furtherchangesin theowand distribution ofinformation. In thepast,theuserof an
IRsystemwastypically apassive consumerof informationgatheredatthesp ecial sitesofprofessional
informationproviders;whilenowadaysauserisoftenb othaninformationconsumer aninformation
provider, e.g.,in a co op eration'sdo cument management system. Hence, datamanagementissues like
p ersistentstorageofdata,concurrencycontrol,and recoveryafterfailuresaregaining imp ortance,and
IRsystems mustcop ewith these issues.
Developing techniques for the indexing and retrieval of heterogeneous information units is a
de-manding topic genuine to the IR domain. In contrast to this, data management issues have already
b een investigated extensively by the Database (DB) community, which has also develop ed theories
and techniques applied in pro ductive systems. Rather than `reinventing the wheel', it is natural for
develop ers of IRsystems totrytoprot from theresults of the DBcommunitybybasing IR systems
2.1 Relational DBMSs
best match
ranked list
2 IR Systems Based on DBMSs
purp oses. Section 2reviewsprevious attemptstouseDBMSsforimplementingIRsystems. InSection
3, we discuss dierences b etween IR systems and DBMSs with regard to indexin g and retrieval, and
p oint outsome shortcomings of DBMSs with regard to supp orting IR systems. Finally in Section 4,
wesketchourIRframework,whichwecall FIRE,and presentourapproachforusing thefunctionality
ofanobject-orientedDBMSin anIRsystem. Thepap er concludeswithsomeremarksonfuturework.
A rst approach to use a DBMS for IR purp oses is to consider IR as a DB application. This has
b een prop osed, for instance, by Macleo d & Crawford [Mac83], Blair [Bla88], and Smeaton [Sme90].
Commontothese prop osals isthat theIR systems arebased on a relational DBMS,and SQLis used
as retrieval language. These IR systems do not store full do cuments, but bibli ographic references to
do cuments; these usually provide thetitle of thedo cument represented, thenamesof the authors,an
abstract,somecontentdescriptors, and thedetails ab out thedateand lo cationof thepublication.
This approach of making use of a DBMS has received major criticisms. One factor that has
b een criticized esp eciall y, e.g.,by Schek & Pistor [Sch82], is that therelational DB mo del represents
information in a rather unnatural waybya set of tables(relations). Further, SQL queries tend tob e
rathercomplexanddiculttounderstand;seeforinstancetheexamplesgivenin[Mac91]. Inaddition,
retrievalmayb equitecomputationallyexp ensive,esp ecially wheninformationfromdierenttableshas
tob ecombined bya `join'op eration.
Asevereshortcomingofthisapproachistherestrictiontoreferenceretrieval. Thisrestrictionisnot
so muchvoluntary asa matterof theunderlyin g relational DB mo del. Therelational mo del requires
theeldsof arecordtob eof axedlength, thusmakingit dicultforarelational DBMSapplication
to manage information units such as ordinary texts, which usually vary in length. (Of course, there
are work-arounds like follow-up records, but such work-arounds make the mo deling of information
even moreawkward.) A second severeshortcoming is thatSQLin itsbasic form isrestricted toexact
matching. An exact match is appropriate for many DB applications, esp ecially when information is
structuredtoa highdegree and thevo cabularyused isratherxed. In mostIRapplications however,
information units like texts are signicantly less structured and the vo cabulary used is usually
unre-stricted. Corresp ondingly, users of IR systems nd it far more dicult oreven imp ossible to issue a
querywhich successfully delivers all informationrelevant toagiven informationneed, but excludes
ir-relevantmaterial. Therefore,moreadvancedapproachestoIRabandonedtheexactmatchingparadigm
and nd instead the pieces of information which the user's query by applying weighting
schemata. The user receivesas result a which is sortedby theassumed probability that a
piece ofinformation is suited toanswertheuser's information need.
Severalattemptshaveb eenmade toprovide b etterDB supp ortforthedevelopment of IRsystems
byextendingtherelationalmo del,orbyaddingnewfeaturestoSQL.Toprovidemorenaturalexternal
views of information, Schek & Pistor [Sch82] have prop osed a generalization of the relational mo del
which allows nested relations. Lynch & Stonebraker [Lyn88] have intro duced abstract data typ es,
which make theformulationof content-based search conditions moreconvenient, althoughretrieval is
still based on the exact matching paradigm. An extension of SQL allowing a `similar-to' comparison
op eratorhasb een prop osedbyMotro[Mot88]. Furthermore,variousapproacheshave b eendevelop ed
2.3 Object-oriented DBMSs
A dierent approach to making use of a DBMS for IR purp oses is to couple an IR system with a
DBMS.Croftetal.[Cro92]attemptedalo osecoupling b etweentheIRsystemINQUERY andIRIS, a
prototyp eversionofanobject-orientedDBMS.Theychosean object-orientedDBMSsince theob
ject-orientedmo delallowsone, incontrasttotherelationalmo del,tostorecomplextextual informationin
aquitenaturalway. TheIRandtheDBsystemarecoupledexternallybyacontrolmo dule. Thereare
linksfrom theinformation unitsstoredbyINQUERY tothecorresp onding textualobjectsoftheIRIS
database. Theintegratedsystemprovidesthefunctionalityofb othunderlyin gsystems. Thus,theuser
may issue content-based queries as well as DB queries. However, a severe problem of this coupling
approach is that information is stored twice, by the IR system as well as by the DBMS. This may
cause consistency problems wheninformation is mo died. Furthermore,there is no full integration of
content-based queries and DBqueries.
Guetal.[Gu93]have chosenan alternative waytocouplean IRsystem witha DBMS.They have
emb eddedthefunctionalityoftheIRsystemINQUERYintotherelationalDBMSSybase. Furthermore,
they have extended SQL by a function which takes an INQUERY query as input and allows one to
cho ose theINQUERY database tob e consulted forevaluating the query. This system avoids storing
information twice: textual information is stored and managed by the IR system and other kinds of
informationbytheDBMS,whichalsoprovidesp ointerstothetextualinformationstoredandmanaged
by the IR system. The main drawback is that two dierent approachesfor managing and retrieving
information are used, which makes the management of information more dicult. Further, the b est
match retrievalparadigm isrestricted totextual information, whereasan application of this retrieval
paradigm tootherkinds of informationwouldb e highlydesirable, asforinstance p ointed outbyFuhr
[Fuh92].
Bearing in mind the shortcomings of the relational mo del, ithas b een prop osed to useother mo dels,
more appropriate than the relational DB mo del, as a basis for developing IR systems; e.g.,an array
mo del [Mac87] oranobject-orientedDB mo del [Har92p]. The object-oriented DBmo del is esp eciall y
app ealing, since object-oriented technology is maturing and the rst commercial systems are already
available.
Theobject-oriented approach has also b een adopted for developing our IR framework FIRE. The
frameworkisimplementedusingObjectStore[Lam91],acommerciallyavailableobject-orientedDBMS.
In ObjectStore,p ersistence isnotpart ofthe denitionof anobject,but amatterofallo cation atthe
time of object creation (`p ersistency byallo cation'). Thus, objects of thesame typ e can b e allo cated
p ersistentlyaswellastransiently. Furthermore,itmakesessentiallynodierence whetherwedealwith
a transient orp ersistent object. These features of ObjectStoreenable us todesign and implement an
IR system in a problem-adequate way,mostly neglecting data storage details. Thus the develop er of
anIRsystemcanrelyon theDBMS'smeansforp ersistentstorage,concurrencycontrol,recoveryafter
failure, etc.,without having toaccept severerestrictions, asexp erienced from relationalDBMSs.
Object-orientedDBMSsprovideausefulbasisforthedevelopmentofIRsystems,yetcouldsupp ort
IR systems even further with regard to p erformance issues. ObjectStore, for instance, allows one
to improve retrieval eciency by buildi ng sp ecialized indexes and by optimizing queries. However,
indexingandretrievalinan IRsystemdierin certainasp ectsfromindexingandretrievalinaDBMS,
as describ ed in the following section. Consequently these facilities cannot b e used directly for IR
purp oses. Mo difying thecorefunctionalityofa DBMSforexploiting thesefacilities isnotintherange
3 Some Dierences between IR Systems and DBMSs
3.1 Index
3.2 Retrieval
3.3 Indexing and Retrieval Functionality a prop erdesign on theIRside, asisshownin Section 4.
Theterm`index'isusedb othbytheDBcommunityandbytheIRcommunity,howeverwithdierent
meanings. While the function of a DB-index is to improve p erformance, an IR-index typically also
provides additional data. Indetail, we can observethe following dierences:
A typical IR-inde x is an inverted le withindex entries,each consisting of a key representing a
featurederived fromasourceandasetofp ostings. Ap ostingmayincludeinformationab outthe
frequencyofthegivenfeaturewithinthesource,ormaysp ecifyitsp ositionwithinthedo cument
inmoredetail. ADB-index, however,onlymaintainsthepathstotheobjects whereanattribute
hasa certainvalue.
In an IR application, information ab out the characteristics of the collection represented by an
index is required when computing the relevance of an information unit to a user's query. An
example of such information isthe maximum frequency of a termin a collection of texts orthe
meanvalueandstandarddeviationofasetofnumb ers. SinceaDB-indexservesonlytooptimize
access,no suchcollection information isneeded.
Due to the dierences, typical IR-indexes are not supp orted by DBMSs. Optimized access to
information | asis done bya DB-index | is,however, crucialforimplementin g ecient IRsystems.
Thus,howwe can useDB-indexes tobuild IR-indexes isan op en researchproblem.
Atypical DBMSprovidesaretrievalinterfacewhichallows onetocheckwhethertwovalues areequal,
and to determine whether a value issmaller orgreater than another value. Usually such queries can
b eoptimized, e.g.,byinstructing the DBMStocreateand maintain appropriateindexes.
Advanced IR systems aresupp osed tosupp ort approximate (`b est')matching. Unfortunately,
ap-proximate matching is not supp orted by DBMSs. This is a severe shortcoming, since approximate
matchingis farmoretime-consuming thanexact matching,and thereforedemands optimization
facil-ities. To dealwiththeadditional complexity,weurgentlyneed sp eciall y tuned matchingmetho ds and
techniques to avoid a linear search through an index. These metho ds and techniques should ideally
workhand in hand withavailable DBMStechniques.
In the case of a conventionalIR application, i.e. one which is not based on a DBMS, the application
develop er hastoprovideall theindexin g and retrievalfunctionality needed. Developing an IRsystem
ontopofaDBMSmightsavedesignandimplementationeort,sinceDBMSsalreadyprovideindexing
and retrievalfunctionality. The taskoftheapplication develop erwouldthenb etocho oseappropriate
functionalitiesandtoinstructtheDBMScorresp ondingly. Inordertoindexasetofobjectsforinstance,
theapplication develop er wouldhave tostatewhich objectattributes aretob eindexed inwhich way,
and to select the index structures b est suited for theattribute values. Indexes would then b e set up
InformationObject
InfoObjectElement
IOE-List
IOE-Set
IOE-PersonName
IOE-Date
IOE-Integer
IOE-String
. . .
ReprInfoUnit
. . .
. . .
. . .
Index-IDF
Index-2Poisson
Index
F I Re Figure 1:4 The Approach of FIRE
4.1 An Overview of FIRE
ReprInfoUnit
ReprInfoUnit
ReprInfoUnit
ReprInfoUnit
Title Authors TextBody PublicationDate
InfoObjectElement
InfoObjectElement
atraditional IRmannerusing theindexingmechanismsof aDBMS.Thus,we needtoinvestigatehow
wecan useindexing and retrievalfunctionalities of object-orientedDBMSsforIRpurp oses.
FIRE is designed to facilitate the development of IR applications and to supp ort the exp erimental
evaluation of indexin g and retrieval techniques. In addition, the framework is supp osed to provide
basic functionaliti es for several media and should end up as a exible and extensible to ol that can
b e used for developing a wide range of IR applications. FIRE is an acronym for ` ramework for
nformation trievalApplications'. It is develop ed in co op erationwith theIR group at theRob ert
GordonUniversity,U.K.
The design and implementation of FIRE are based on an object-oriented approach. The object
mo del of FIRE denes thebasic IRconcepts and supplies functionalitie s that supp ort therealization
ofanIRapplication. ThissectiongivesashortoverviewofFIREfo cusingonthemostessentialclasses
ofFIRE'sobjectmo del. Amorecomprehensive descriptionofFIRE'sdesignandobjectmo delisgiven
in [Son95].
FIRErepresents do cuments by a set of features (or attributes). Note that the term do cument is
usedhere in a broadsense: do cuments mayconsist of structuredand unstructured partsand mayb e
comp osed of dierent media. The class (see Figure 1) mo dels do cuments in a generic
way. Itdenesmetho dswhichprovideinformationab outthemo delingofado cumentrepresentationas
wellasmetho dsforaccessingthefeaturesofado cumentrepresentation. Inaddition, and
its sub classes are resp onsible for organizing the indexing and retrievalof do cuments of theresp ective
typ e.
Overview ofthe mostimp ortant classes of FIRE'sobject mo del
Concretesub classesof denehowdo cumentsofacertaintyp e,e.g.,b o oks,tables,etc.,
arerepresentedinanapplication. Whendealingwithtextforinstance,anewsub classof
may b e intro duced, consisting of features like , , , and . The
mo deling ofconcrete typ esof do cuments is notpartof FIRE'sobjectmo del asthis is an
application-sp ecic task. Nevertheless, theframework supp ortsthe application develop er in this task. The class
and its sub classes provide a set of data typ es like string, integer, p erson name,
date, set,and list(c.f. Figure 1), which are intended to b e used forthe application-sp ec i c mo deling
of do cuments. If needed, the application develop er may extend this set of data typ es by adding new
sub classes. helps toreducetheeortfordeveloping anIRapplication byproviding
an asso ciated interface forindexing an information unit, determining the similarity of two units, etc.
Index
DB-Collection
IndexingFeature
Figure 2: 4.2 Design of an IR-Index Index IndexIndex Index-IDF Index-2Poisson
Index Index Index IndexingFeature DB-Collection IndexingFeature IndexingFeature IndexingFeature
IndexingFeature Index
IndexingFeature IndexingFeature
Index Index Index addIFs Index IndexingFeature Index Index update Index
Index getIFsOf IndexingFeature
retractIFsOf IndexingFeature
clear IndexingFeature
Index retrieve
ReprInfoUnit The class is the implementation of a typical IR-index, which not only manages a set of
indexing featuresderived from acollection of do cumentsbut also providesinformation forcalculating
theprobabilityofrelevanceof informationunits totheuser's query. Theclass isageneric class,
which solves general tasks but do es not provide sp ecic IR functionality. The latter is supplied by
concretesub classesof like and ,whichimplementparticularweighting
schemata(see [Har92m]foran overviewofweighting schemata).
In the following two sections, we discuss the design and implementation of an IR-inde x in FIRE
morecomprehensively. First, wepresent thedesign of theclass withfo cus on IR relatedissues.
Then, we discuss somedetails of which supp ortan ecient evaluation of queries and enable us
toexploit theoptimization facilities ofthe underlying DBMS.
An basicallyconsistsofasetof s(seeFigure2). Inaddition,itmayb easso ciated
withzeroormoreobjectsofthetyp e ,whosepurp oseisexplainedinthenextsection. An
consists of a feature, e.g.,a normalized word from a text, and a sourcesp ecication.
The latter is essentially a reference to the do cument from which the feature has b een derived. In
addition,an maysp ecifyap ositionwithinthesource. Positionalinformationisneeded
for indicating to the user why a particular information unit has b een retrieved, which is done by
`highlighting' the relevant pieces of the unit. Furthermore, weighting schemata may use p ositional
information, e.g., it may b e assumed that words o ccurring in a heading are more imp ortant than
words in regular paragraphs. Finally, an is asso ciated with the indexing metho d
that has b een used for deriving the feature. Information units may b e indexed in many dierent
ways, and in many cases more than one metho d can b e applied for indexing a particular unit, e.g.,
weak or strong stemming when indexin g textual units. For consistency reasons, it is imp ortant that
all s of an have b een derived by the same metho d. Hence, FIRE asso ciates
s with the indexing metho d applied, and checks whether the to b e
added toan is compatiblewith theonespreviously added.
Structure of theclass
The class denes a set of op erations and metho ds (see Figure 3), which provide a uniform
interface indep en dent of anyparticular retrieval mo del. The metho d of incorp orates a
set of s into an . The adding of features do es not cause any up dating activities
like sorting the or re-calculating indexing weights. Up dates have to b e invoked explicitl y by
an message. We chose toseparate the pro cesses in order toavoidunnecessary computations,
for instance, sorting an again and again when indexing a whole collection of do cuments. The
class also provides a metho d,called ,fordetermining the s which have
b een derived from a particular source. Bythe metho d ,the s referring to
a given set of sources can b e retracted, whereas the metho d removes all s from
an . Finally, the metho d serves for retrieving information by evaluating single query
conditions. The results of the evaluation are passed to thequery do cument, which is a
Index
addIFs(features: IndexingFeatures): Boolean
update(): Boolean
getIFsOf(source: IO-Address): IndexingFeatures
retractIFsOf(sources: IO-Addresses): Boolean
clear(): Boolean
retrieve(condition: QueryCondition): BasicRetrievalResults
supportMatchers(names: Strings): Boolean
getSupportedMatchers(): Strings
Index-IDF
getNumberOfSources(): Integer
getMaxFreqOfAnyIndexingFeature(): Integer
getMaxFreqOfIndexingFeature(f: IndexingFeature): Integer
getFreqOfIndexingFeature(f: IndexingFeature, s: IO-Address): Integer
getIndexingWeightOfIndexingFeature(f: IndexingFeature, s: IO-Address): Real
Figure 3:
Figure 4:
4.3 Improving Retrieval Eciency Index Index supportMatchers Index IndexingFeature Index Index Index Index Index Index DB-Collection DB-Collection
DB-Collection IndexEntry IndexEntry
IndexingFeature
Index IndexEntry
IndexEntry IndexingFeature
DB-Collection Index
DB-Collection IndexEntry IndexingFeature
totheuser's query.
Op erations and metho ds of
FIRE supp orts an approximate matching on dierent data typ es. It allows one, for instance, to
retrieve do cuments which cover a particular topic with a high probability and to retrieve do cuments
which have a similar authorname ora similar publication date. Asdiscussed b efore, an approximate
matching mayb e quite time consuming. In order tocomp ensate computing eorts, an object
mayb e instructed tosupp ort particular matchingmetho ds. Thiscan b e done by an authorized user
at run-timevia the metho d . Also, previous instructions may b eoverwrittenby new
ones. Note thatan alwaysallows one touseanymatchingmetho d applicable tothegiven typ e
of s,but p erformssupp ortedmetho ds moreeciently. Thedetails oftheoptimization
of the matching pro cess are fully encapsulated by the class . This is advantageous asit avoids
b othering theuser withoptimization details.
Thesub classes of implement particularweightingschemata. Theydene additional metho ds
which calculate the information required by the resp ective weighting schema. Figure 4 shows an
example sub class of which implements a version of the `Inverse Do cument Frequency' (IDF)
weighting schema.
A concrete sub classof
Thesub classes of are notsp eciali zed to aparticular typ e of indexing features. Hence, they
can b e used in dierent contexts and theapplication develop er needs todene a new sub classonly if
an additional weighting schemais tob esupp orted.
An may b e asso ciated with zero or more objects of the class (see Figure 2). A
servestoimproveretrievaleciency. ItexploitstheoptimizationfacilitiesoftheDBMS
and maysupp ort approximatematchingmetho ds.
An object of the class consists of a collection of s. An is
comp osed of a key, which may b e of any typ e, and a set of p ointers to the s of an
which yield thesame key. (Thus, s haveessentially the samestructure asentries of
an inverted le.) The key of an is derived from the corresp onding , and
may have the same value as the feature or may b e assigned some co de for optimizing access. The
interfaceof ,whichisdepicted in Figure5,is similartotheinterfaceof . However,
DB-Collection
addIFs(features: IndexingFeatures): Boolean
update(): Boolean
retractIFs(features: IndexingFeatures): Boolean
clear(): Boolean
xMatch(feature: IndexingFeature): IndexingFeatures
setMatcher(name: String): Boolean
getMatcher(): String
setOptionsDB-Index(options: String): Boolean
getOptionsDB-Index(): String
Figure 5: DB-Collection DB-Collection IndexEntry IndexEntry DB-Collection restricted IndexingFeatureDB-Collection IndexEntry IndexingFeature
IndexingFeature Index
DB-Collection DB-Collection Index IndexingFeature IndexingFeature Matcher key
IndexingFeature InfoObjectElement OptionsDB-Collection Interfaceof
A utilizes the query optimization facilities of the underlying DBMS by instructing
theDBMStocreateanappropriateDB-indexforthegivencollectionof s. SincetheDBMS
do es not index do cument representations but collections of s, we can apply IR indexing
techniques without b eing restricted in any way by the DBMS. Nevertheless, we can take advantage
of the DBMS's facilities foroptimizing the evaluation of queries. As a furtheradvantage, FIRE do es
not need to provide any sp eciali zed index structures (B-trees, hash tables, etc.) as well as query
optimizationstrategies, but can relyon ObjectStore'smeans.
In addition, serves to optimize approximate matching. When we try to nd the
objects of a collection which are similar to a given object, we have to consider all objects. This
is in contrast to an exact matching where search can b e restricted in most cases, e.g., by a binary
search in an ordered index. To reduce complexity, FIRE allows one to p erform the approximate
matching in a form, which is a combinationof an exact and an approximatematching. In
this matching mo de, a key is generated in a rst step for the given with the query
condition. Suchakeymayb eforinstance aphonetic co deforap ersonname. Then thecorresp onding
isconsultedandits slo oked-uptodeterminethe sforwhich
thesamekeyhasb een generated. Finally, afull approximatematchingisp erformedwiththe selected
s. Note, an maysupp ort morethen one approximatematchingmetho d at the
sametimebycreating dierent s. Thisis useful,since dierent retrievalsituations may
require dierent matchingmetho ds. In thecaseno appropriate existsforthematching
metho d to b e p erformed, the do es the lo ok-up itself, of coursein a less ecient way,by going
throughtheasso ciated set of s.
The derivation of keys from s is a matterfor the matching algorithms, since they
know b est how to build appropriate keys. Also, the matching algorithms determine which kind of
DB-index is most appropriate for the resulting keys. In FIRE, matching algorithms are represented
bysp eciali zed classes ratherthanbymetho ds of other classes. This design decision allows theuser to
browse through the class hierarchy in order to see which matching algorithmsare available and how
they are to b e used; see [Son96] for further details. The classes implementing particular matching
algorithms are sub classes of an abstract class called , which is depicted in Figure 6. The
imp ortantfeatures of this class withregardto thecurrent topic are themetho d forderiving keys
from s(or s)and theattribute forsp ecifying
Matcher
TypeInfoObjectElement: String
TypeDB-Collection: String
OptionsDB-Collection: String
key(o: InfoObjectElement): Object
key(o: IndexingFeature): Object
match(o1: InfoObjectElement, o2: InfoObjectElement): Real
match(o1: IndexingFeature, o2: IndexingFeature): Real
keyMatch(o1: InfoObjectElement, key1: Object, o2: InfoObjectElement): Real
keyMatch(o1: IndexingFeature, key1: Object, o2: IndexingFeature): Real
restrictedKeyMatch(o1: IndexingFeature, key1: Object, o2: IndexingFeature, r: IO-Addresses): Real
Figure 6: Conclusions Acknowledgements References Matcher Matcher Index Matcher Index Matcher Information
Processing& Management
Proc. of the 15th Annual
Int.ACMSIGIR Conferenceon R &D in Information Retrieval(SIGIR-92) Class
Thedesignoftheclass and itssub classes easestheoptimizationofapproximatematching
metho ds. It isfully sucient toinstruct an to supp ortcertain (s). Knowing thenames
ofthematchingalgorithmstob esupp orted,the itselfcan gatherthenecessarydetailsbyasking
theprop er (s).
In this pap er, we have reviewed previous attempts to use DBMSs for implementing IR systems and
p ointed out some shortcomings of DBMSs with regard to the supp ort of IR systems. Furthermore,
wehave presented FIRE'sapproach for using the functionalityof an object-oriented DBMS in an IR
system. FIRE allowsone esp ecially
torepresentcomplex heterogeneous information in a naturalway,
toutilize thequery optimizationfacilitie s of theunderlying DBMSforIRpurp oses, and
toreduce thecomplexityof an approximatematching.
In the near future, we will p erform exp eriments to quantify the eect of the DBMS's optimization
facilities in IR applications and totest thep erformanceof therestricted approximatematching.
TheauthorwouldliketothankToreBratvold,Ubilab,andAndreasGepp ertfromtheUniversityZurich
for many fruitful discussions on the design of FIRE and the integration of DBMSs and IR systems.
Thanks are also due toManuel Bleichenbacher, Ubilab, for his work on ETOS, which is thesoftware
development platformusedforimplementing FIRE,and his technical assistance withETOS.
[Bla88] D.C.Blair: AnExtended Relational Do cument Retrieval Mo del. In:
,Vol. 24(3),pp.349-371,1988
[Cro92] W.B.Croft,L.A.Smith, and H.R. Turtle: ALo osely-Coupled Integrationof aText
RetrievalSystem and an Object-Oriented DatabaseSystem. In:
,pp. 223-232,
Int.ACMSIGIR Conferenceon R &D in Information Retrieval(SIGIR-92)
Proc. of EDBT
InformationRetrieval`93, Vonder Modellierung zur Anwendung
Information
Retrieval: Data Structures & Algorithms
TheComputer Journal
InternationalConferenceon Very Large Data
Bases
InformationTechnology: Researchand Development
Journal of the AmericanSociety for InformationScience
Journal of the American
Society for InformationScience
ACMTransactions of OceInformationSystems
Proc. of the 8th International Conferenceon Very Large
Data Bases
Program
Proc. of the 18th
Annual Int. ACMSIGIRConferenceonR &D in Information Retrieval(SIGIR`95) ,pp. 211-222,
1992
[Gar90] M.Garcia-Molina andD. Porter: Supp ortingProbabilistic Datain aRelational System. In:
,pp.60-74,1990
[Gu93] J.Gu,U.Thiel, and J. Zhao: Ecient Retrievalof ComplexObjects: Query Pro cessing in
aHybrid DBand IR System.In: G.Knorz, J.Krause and C.Womser-Hacker(eds.):
,pp.67-81,
Konstanz/Germany: UVK,1993
[Har92m] D.Harman: Ranking Algorithms.In: W.B.Frakesand R.Baeza-Yates (eds.):
,Englewo o d Clis/N.J.: Prentice Hall,1992
[Har92p] D.J.Harp er andA.D.M.Walker: ECLAIR: AnExtensible Class Library forInformation
Retrieval. In: ,Vol.35(3),pp.256-267,1992
[Lam91] C.Lamb,G.Landis, J.Orenstein, and D.Weinreb: The ObjectStoreDatabaseSystem.In:
Communications ofthe ACM,Vol. 34(10),pp.50-63,1991
[Lyn88] C.A.Lynchand M.Stonebraker: ExtendedUser-Dened Indexing withApplications to
Textual Databases.In: Pro c. ofthe14th
,pp.306-317,1988
[Mac83] I.A.Macleo d andR.G. Crawford: Do cument Retrieval asa DatabaseApplication. In:
,Vol. 2,pp.43-60, 1983
[Mac87] I.A.Macleo d andA.R. Reub er: The ArrayMo del: A ConceptualMo deling Approachto
Do cument Retrieval. In: ,Vol. 38
(3),pp. 162-170,1987
[Mac91] I.A.Macleo d: Text Retrievaland theRelational Mo del.In:
,Vol. 42(3),pp. 155-165,1991
[Mot88] A.Motro: VAGUE:A User Interface toRelational Databasesthat PermitsVague Queries.
In: ,Vol. 6(3),pp.187-214,1988
[Sch82] H.J.Schekand P.Pistor: DataStructures foranIntegratedDatabaseManagement and
InformationRetrievalSystem. In:
,pp.197-207,1982
[Sme90] A.Smeaton: Retriev: An InformationRetrieval SystemImplemented onTop ofa
Relational Database.In: ,Vol. 24(1),pp.21-32,1990
[Son95] G.Sonnenb ergerand H.-P.Frei: Design ofa Reusable IRFramework.In:
,pp.
49-57,1995
[Son96] G.Sonnenb erger,T.A. Bratvold,and H.-P.Frei: Use andReuse of Indexing and Retrieval
Functionality in aMultimedia IR Framework.(Pap er presentedatthe nalMIRO