1
Information access through
information technology
Created to support an invited lecture
at the International Conference MDGICT 2009 in Tamil Nadu, India, December 2009
These slides should be available from the WWW site 2
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)/
1. General: 3
Access to information:
the evolution
2. More specific:
So much information, so little time
contents
= summary
= structure
= overview of this
presentation
4
Information access
through information technology
Access to information:
the evolution
5
Information is important for development
Research / Education / Journalism
Access to information = important
6
Electronic mail Electronic mail
Internet Internet
CD-CD-ROM with cheap 600 MB memoryROM with cheap 600 MB memory
Library automation with integrated library management systems Library automation with integrated library management systems
CDS/ISIS
CDS/ISIS free cataloguing software by UNESCOfree cataloguing software by UNESCO Personal computers
Personal computers Online databases on
Online databases on DIALOGDIALOG host computerhost computer Libraries
Libraries
1970 1975 1980 1985 1990 1995 2000 2005 2010
7
Google WWW search becomes most popular information discovery toolWWW search becomes most popular information discovery tool Electronic mail
Electronic mail
WWW based on Internet (without search engine) WWW based on Internet (without search engine)
Microsoft Windows 95 PC operating system Microsoft Windows 95 PC operating system WWW search engines to uncover the WWW WWW search engines to uncover the WWW
DVD with more than 4000 MB memory DVD with more than 4000 MB memory
CDS/ISIS for Windows
CDS/ISIS for Windows with user-with user-friendly interfacesfriendly interfaces Internet
Internet
CD-CD-ROM with cheap 600 MB memoryROM with cheap 600 MB memory
Library automation with integrated library management systems Library automation with integrated library management systems
CDS/ISIS
CDS/ISIS free cataloguing software by UNESCOfree cataloguing software by UNESCO Personal computers
Personal computers Online databases on
Online databases on DIALOGDIALOG host computerhost computer Libraries
Libraries
1970 1975 1980 1985 1990 1995 2000 2005 2010
Authentication and authorization to access proprietary digital content 8
Electronic full-text journals Local link generators
Digital libraries, including repositories
1970 1975 1980 1985 1990 1995 2000 2005 2010
9
Library automation with free software such as CDS/ISIS = ABCD Computing + memory for storage in the “Cloud”
Social WWW, including Facebook, Flickr and YouTube
Authentication and authorization to access proprietary digital content Electronic full-text journals
Local link generators
Google Scholar allows discovery of academic information free of charge Libraries implement Electronic Resource Management Systems
Electronic books
Digital libraries, including repositories
1970 1975 1980 1985 1990 1995 2000 2005 2010
10
Library automation with free software such as CDS/ISIS = ABCD Computing + memory for storage in the “Cloud”
Social WWW, including Facebook, Flickr and YouTube
Authentication and authorization to access proprietary digital content Electronic full-text journals
Local link generators
Google Scholar allows discovery of academic information free of charge Libraries implement Electronic Resource Management Systems
Better integration, aggregation, federation ??????????
Electronic books
Digital libraries, including repositories
1970 1975 1980 1985 1990 1995 2000 2005 2010
11
Information access:
difficulties and bottlenecks
• Cost of content
»Books
»Journals
»Bibliographical databases
• Cost of computer hardware
»At server side
»At client side
• Cost of computer software
12
Information access:
difficulties and bottlenecks
• Need to train personnel
• Fast evolution
13
Information access:
difficulties and bottlenecks in developing countries
• Poor infrastructure
»Power supply
»Internet access
14
Information access
through information technology
So much information, so little time
15
Introduction:
scattering of sources
• Users want to exploit information sources fast and effectively.
• This is hindered by the fact that digital, electronic information sources that may contain relevant
information are created and scattered, distributed on
numerous computers all over the Internet and the WWW.
16
Introduction:
scattering of sources
• In other words:
integration / aggregation is still far from perfect.
17
Introduction:
scattering of sources difficulties
• Using many information retrieval systems costs time:
1. They must be used one after the other which requires many decisions and actions
18
Introduction:
scattering of sources difficulties
• Using many information retrieval systems costs time:
2. They offer different user interfaces in the retrieval phase, which is confusing
19
Introduction:
scattering of sources difficulties
• Using many information retrieval systems costs time:
3. They offer found information items in various data formats
20
Information access
through information technology
Problem statements
21
Problem statements
1. Which methods have been developed and applied to cope with this reality?
22
Problem statements
2. Which concrete
applications are available and how can an end-user exploit systems created in this domain?
23
Problem statements
3. How can information
intermediaries evaluate and apply these methods to
bring information more efficiently to end-users?
24
Information access
through information technology
Various methods
for information retrieval from scattered sources
25
Method 1: Merging = aggregating into a searchable database
Search engine
Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
UserUser
UserUser
26
Method 2: Federated searching through scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engine
Search engine Search engine
27
Both methods
offer benefits to the users
+ Saves the users time that would be needed to execute queries towards various servers or to browse through various systems.
☺
28
Both methods
offer benefits to the users
+ The users have to learn only 1 user interface for searching and only 1 search syntax,
instead of a user interface and a search syntax for each database.
☺
29
Both methods
offer benefits to the users
+ The system offers a uniform / consistent display of results in the output phase.
☺
30
Method 1: Merging = aggregating into a searchable database
Search engine
Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
UserUser
UserUser
31
Method 2: Federated searching through scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engine
Search engine Search engine
32
Federated searching
through scattered databases: why?
The perfect trip:
1. A cheap and nice flight 2. A cheap and nice hotel 3. A visit to a nice museum
4. Something nice to read (free via your library) The perfect trip:
1. A cheap and nice flight 2. A cheap and nice hotel 3. A visit to a nice museum
4. Something nice to read (free via your library)
☺
33
Federated searching: application:
finding a suitable flight
Example:
• http://CheapTickets.com/ for the USA Example
34
Federated searching: application:
finding a hotel room in some city
Example
35
Federated searching:
searching in a museum
Example
36
Federated searching:
searching in a library
Example
37
So many digital libraries
through information technology
The various methods applied for end-users
38
Method 1: Merging = aggregating into a searchable database
Search engine Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
UserUser UserUser
orD
39
Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many people.
• They can be browsed following a tree structure or a more complicated variation.
40
Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at http://dir.yahoo.com/
• Entries are NOT rated.
• Accessible free of charge.
Example
41
Internet global subject directories:
BUBL LINK
• A hypertext global subject directory to more than 10 000 WWW sites for the higher education community can be found at
http://bubl.ac.uk/link/ [accessed 2008]
• Accessible free of charge.
• The categories are based on the well-known general Dewey classification system.
Example
42
Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at http://www.dmoz.org/
• Accessible free of charge.
• It is allowed to use the contents also in other systems;
this is indeed done in Webbrain.
Example
43
Internet global subject directories:
Librarians' Internet Index
• A hypertext global subject directory can be found at http://www.lii.org/ [accessed 2008]
• Accessible free of charge.
• Librarians select the sites and build the overview.
• The name ‘Internet Index’ may create some confusion,
because this term means in many cases an index as part of a full-text searchable database, that is an Internet search engine.
This is NOT the case here.
Example
44
Internet global subject directories:
Intute
• http://www.intute.ac.uk/ [accessed 2008]
• From 2006
• Accessible free of charge.
• Offers a collection of hypertext subject directories that focus on academic information sources
• Also tutorials are offered about how to find information in specific subject domains.
Example
45
Internet indexes:
automated search tools
• The basic, fundamental architecture of the WWW does NOT include a system to discover relevant information resources.
• Thus search systems / engines have been implemented besides the WWW, mainly by commercial companies.
46
Internet indexes:
automated search tools
• The situation is already since 1990’s as follows:
»WWW
= decentralised without central control
= good
»Search through the WWW
= centralised in a few systems,
each one managed by a commercial US company
= NOT (?) good
47
Internet indexes:
• http://www.google.com/
• Available since 2001 with most of its features.
• The most popular search system since 2003.
Example
48
Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly information sources, including journal articles.
• A beta (test) version has been available since November 2004.
• The system is accessible starting from the home page of Google as one of the additional services,
or more directly from http://scholar.google.com/
Example
49
Internet indexes:
Bing
• http://www.bing.com/
• Available in 2009 in beta = test version.
• Replaces Microsoft Live Example
50
Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Since 2001.
• Offers not only access to files in html format, but also to files in PDF.
Example
51
Internet indexes:
Ask
• Available from: http://www.ask.com/
• Offers a feature that is not offered by most other search systems:
categorization = classification = refinement = clustering of search results,
to help the user coping with the problem of ambiguity of meaning of the search query that was made
Example
52
Internet indexes:
Yahoo!
• An Internet search system is offered through http://www.yahoo.com/
• This is offered BESIDES the well-established, classical Yahoo! subject directory.
Example
53
Current awareness services focusing on WWW pages: Google Alerts
• Available at http://www.google.com/ and then see the page with additional services
or more directly from http://www.google.com/alerts/
• Since 2004.
• Can discover relevant changed or new WWW pages for you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on their server computer.
54
Current awareness services focusing on WWW pages: Google Alerts
☺
55
Databases accessible over the Internet:
example: scientificcommons
Example
• http://www.scientificcommons.org/
• Since 2007
• Similar to OAISTER:
Allows you to search the full texts in scientific open access repositories all over the world.
☺
56
Databases accessible over the Internet: example: Medline
• Medline/PubMed offers bibliographic descriptions of publications on
medicine, free of charge.
Example
☺
57
Internet with WWW and printed books
• Since a few years, Internet with the WWW have become the primary information source for many people.
• However:
»A lot of information is still distributed only in the form of printed books
»The content of old printed books can still be interesting.
»The content of most printed books is (still) not available on the Internet.
58
Public access book databases provided by bookshops
• To find currently available books, the bibliographic databases assembled by big bookshops are interesting.
• Several offer a good coverage.
• Many are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if an individual user wants to buy a book.
• Some provide a current awareness service, also free of charge.
• Take into account delivery costs: postage + import tax
59
Book databases accessible free of charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
• This company offers also different, more local
versions that offer books in other languages, such as http://www.amazon.co.uk/
http://www.amazon.fr/
• note: amazon, NOT amazone
• Subject description is poor.
• Take into account delivery costs: postage + import tax
Examples
60
Book databases accessible free of charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.barnesandnoble.com/ or http://www.bn.com/
Examples
61
Search systems for books that are made available by dealers
descriptions of books & real books for sale User
Book dealer catalog database
62
Search systems for books that are made available by dealers
descriptions of books & real books for sale User
Book dealer catalog databases
63
Search systems for books that are made available by dealers
descriptions of books & real books for sale User
Book dealer catalog databases
64
Free public access search systems for books: multi-dealer databases
» Multi-dealer database = database obtained after merging of several existing catalogue / inventory databases, which are managed and updated by individual dealers/shops/sellers.
» Such a system can include from a few to more than 10000 shops/dealers.
65
Search systems for books that are made available by dealers
descriptions of books & real books for sale User
Book dealer catalog databases
66
Search systems for books that are made available by dealers
descriptions of books & real books for sale User
Multi-dealer database
= merged book dealer databases Book dealer catalog databases
67
Search systems for books that are made available by dealers
descriptions of books & real books for sale User
Multi-dealer databases
= merged book dealer databases Book dealer catalog databases
68
Search systems for books that are made available by dealers
descriptions of books & real books for sale User
Multi-dealer databases
= merged book dealer databases Book dealer catalog databases
69
Free public access multi-dealer book databases: examples
• Amazon Marketplace:
http://www.amazon.com/
[accessed 2009]
• In synergy with the online bookshop Amazon on 1 WWW site:
Used books are displayed alongside Amazon’s new books.
• “the world’s biggest online book bazaar”
• Subject description is poor.
• Take into account delivery costs: postage + tax
70
Free public access multi-dealer book databases: examples
• http://www.antiqbook.com/books/ [accessed 2009]
(NOT www.antiqbooks.com)
“ANTIQBOOK unites more than 400 independent booksellers from all over the world. You can use our search pages for a free search of over 3.8 million books, and order them directly from your bookseller. Strong areas in our database are books from European booksellers, many of them specialist antiquarian booksellers.
While ANTIQBOOK takes care that you can order safely from our
booksellers we do not take part in their sales. We just bring you in touch with some of the finest booksellers in the world. You can order your books straight from the source, at their original prices and no hidden costs or markup fees.”
71
Full-text databases of books:
introduction
• Some organisations have scanned the contents of thousands of books,
to make them full-text searchable through the Internet.
72
Full-text databases of books:
Google Books
• http://www.books.google
• Since 2005
73
Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.
Example
74
Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in
particular subject domains and published by many publishers.
• Many publishers offer searchable bibliographies, but only of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of articles published in journals from many publishers,
free of charge.
75
Online access databases about journal articles: Ingenta
• Available from: http://www.ingentaconnect.com/
• Ingenta allows you to search a bibliographic database of millions of journal articles,
including titles, authors, in many cases abstracts.
• The organisation claims to be
“The most comprehensive collection of academic and professional publications”
Example
76
Online access databases about journal articles: Infotrieve ArticleFinder
• Available from: http://www.infotrieve.com/
• Infotrieve allows you to search free of charge in a bibliographic database of the articles
of more than 20 000 journal titles and conference proceedings,
NOT full-text.
• Payment is required to receive the full text of a document.
Example
77
Online access databases about journal articles: Scirus
• The search interface: http://www.scirus.com
• This is a specialised Internet index that allows you to search for selected scientific information (only) on the WWW.
• This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier.
• Offered free of charge by Elsevier.
• An article can be downloaded in full-text format only when a fee has been paid to the publisher.
Example
78
Online access databases about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly information sources, including journal articles.
• A beta (= test) version has been available since November 2004.
• The system is accessible starting from the home page of Google as one of the additional services besides the
normal, classical WWW search.
Example
79
Finding images on the Internet:
introduction
+ Several public access search systems are available free of charge to search for
images / pictures (either artwork, either photos, or both) on the Internet.
+ When searching for images, the search results from such a system offer not only links to the image files on the
Internet, but also directly small versions of the images (so-called “thumbnails”).
Example 80
Finding images on the Internet:
screen shot of a Google image search
Example 81
Finding images on the Internet:
examples of search engines
• http://images.google.com/ !
or through http://www.google.com/
[accessed in 2009]
• The largest database in this category (at least in 2002…2008).
For each result, not only a thumbnail is offered, but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the document.
Example 82
Finding images on the Internet:
examples of search engines
• http://images.search.yahoo.com/
[accessed in 2007]
or
http://yahoo.com/ or http://www.yahoo.com and then go to searching ‘Images’
Example 83
Finding images on the Internet:
examples of search engines
• http://www.ask.com/
[accessed in 2007]
Ask
Offers no indication of the number of images retrieved, which is a disadvantage when many pictures are found, but only a few can be seen at the time.
84
Finding images on the Internet:
examples of search engines
• http://www.bing.com/
• Available in 2009 in beta = test version.
• Replacing
Microsoft Live and Yahoo Search ? Example
85
Method 2: Federated searching through scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engine
Search engine Search engine
86
Databases accessible over the Internet:
example
Example
• http://WorldWideScience.org/
• “A global science gateway connecting you to national and international scientific databases and portals.
Accelerates scientific discovery and progress by providing one-stop searching of global science sources.”
87
Databases accessible over the Internet: example
• http://www.scitopia.org/scitopia/
• Federated searching through various scientific databases Example
Examples 88
Meta-search systems on a server computer
• http://aftervote.com/
• http://draze.com/
• http://www.all4one.com
• http://www.bytesearch.com
• http://clusty.com/
• http://www.cyber411.com
• http://www.dogpile.com = http://dogpile.com/
• http://www.go2net.com = http://www.metacrawler.com
• http://jux2.com
• http://www.kartoo.com
• http://www.mamma.com
• http://www.museseek.com
• http://www.profusion.com
• http://www.search.com
• http://www.vivisimo.com = http://vivisimo.com/
Example 89
Meta-search systems: server-based:
example: Clusty
• Adds value by analysing the retrieved results / hits / links / WWW documents, in order to
cluster / group / categorize / classify / map these under headings / classes / categories,
to make further selections by the user / searcher easier and faster.
• Can accomplish this on the fly, that is WITHOUT pre- processing the documents before the search.
Example 90
Meta-search systems: server-based:
example: Clusty screenshot in 2009
91
Free public access book meta-search systems: types
We can make the following distinction between various types of meta-systems for searching:
1. Database resulting from merging several existing smaller databases = aggregator database
In this case of books:
multi-dealer database = “listing service”
2. Federated search system
= cross-database search system
92
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User Federated
book search systems
Multi-dealer databases
= merged book dealer databases Book dealer catalog databases
93
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User Federated
book search systems Multi-dealer databases
= merged book dealer databases Book dealer catalogue databases
94
Free public access federated search systems for books: examples
• http://www.addall.com/ [accessed 2009]
• Covers many book dealer databases and multi-dealer databases, including unique databases that are not covered by competing search systems.
• Can calculate the cost to ship/send a book to you, taking into account your country and currency.
• Searches only new books;
to find used books, a companion system should be used.
This is inconvenient if the user is interested in both types of books.
95
Free public access federated search systems for books: examples
• http://www.bookfinder.com/ [accessed 2007, 2008, 2009]
• BookFinder
• Covers many book dealer databases and multi-dealer databases, including unique databases that are not covered by competing search systems.
• It is efficient that new and used books are searched in 1 action;
the results are presented in 2 columns: new | used.
96
Online Public Access Catalogues:
simultaneous searching: examples
Example
• Simultaneous access to catalogues of libraries related to water, organised by IAMSLIC, using Z39.50
97
Federated searching
offered by a university library
• Main goal of such a system is offering easy and fast access to various information sources and NOT
sophisticated and complicated searching.
• The user interface is simple, in agreement with the aim of such a system.
98
Information access
through information technology
Comparison of methods
for efficient information retrieval
99
Comparison of methods
for efficient information retrieval
- +
Presearch analysis of
all data
2.
Federated searching 1. Merging databases
100
Comparison of methods
for efficient information retrieval
- +
Presearch analysis of
all data
+ - Size of the coverage
2.
Federated searching 1. Merging databases
101
Comparison of methods
for efficient information retrieval
- +
Presearch analysis of
all data
+ - Size of the coverage
- +
Independent of Internet /
WWW
2.
Federated searching 1. Merging databases
102
Comparison of methods
for efficient information retrieval
- +
Presearch analysis of
all data
+ - Size of the coverage
- +
Independent of Internet /
WWW
+ 2.
Federated searching
- 1. Merging
databases
Up-to-date information
103
Comparison of methods
for efficient information retrieval
- +
Presearch analysis of
all data
+ - Size of the coverage
- +
Independent of Internet /
WWW
- +
2.
Federated searching
+ -
1. Merging databases
Speed of retrieval
and display Up-to-date
information
104
Comparison of methods
for efficient information retrieval
+ The evolution of information and communication technology makes systems more powerful, easier to implement and use, and cheaper:
+ Merging information sources is pushed forward mainly by the decreasing costs of hard disks and of computer
memory in general.
+ Federated searching is pushed mainly by the evolution of the Internet.
☺
105
Both methods bring
difficulties / challenges / problems
- In many cases there are differences among the merged sources in the formatting/structuring of their database records in fields.
This hinders
- searching limited to a field
- displaying selected fields only (such as title)
- sorting of the displayed records on the contents of a particular selected field (such as author or date)
106
Both methods bring
difficulties / challenges / problems
- In many cases there are differences among sources in the metadata schemes that are applied in the databases to improve retrieval, such as
»classifications
»taxonomies
»thesaurus systems
»ontologies
- This hinders the exploitation of the added value of such metadata.
107
Information access
through information technology
Conclusions
108
Introduction:
scattering of sources difficulties
109
Introduction:
scattering of sources difficulties
110
Methods for efficient information
retrieval:
conclusions
• The examples given show at least that progress in this field is impressive.
☺
111
Questions? Suggestions? Remarks?
112
• You are free to copy, distribute, display this work under the following conditions:
»Attribution:
You must mention the author.
»Noncommercial:
You may not use this work for commercial purposes.
»No Derivative Works:
You may not change, modify, alter, transform, or build upon this work.
• For any reuse or distribution, you must make clear to others the license terms of this work.