Metadata and
Resource and description
Resource
Content, format, …
Access method dependent on format (I can
read it if I “know” its language)
Resource
description
Independent of the format (I can read
“people‟s comments” about the resource…
provided that I know the language in which
the comment is written)
F. Corno, L. Farinetti - Politecnico di Torino 3
Resource and description
description
resource
this resource
was created on
April 14
th, 2009
the title of this
resource is
“Introduction to
the Semantic
Web”
the author of
this resource
is L. Farinetti
this resource is
related to
computer
science,
knowledge
representation
and metadata
the quality of
this resource
is high,
according to F.
Corno
this resource is suitable
for PhD students
Resource and description
Resource
Content, format, …
Access method dependent on format (I can read it if I
“know” its language)
Standardization
(i.e. common language for
applications) ???
Practically impossible …
Huge amount of existing information
Hundreds of human languages
Hundreds of computer languages (other word for
formats)
F. Corno, L. Farinetti - Politecnico di Torino 5
Resource and description
Resource description
Independent of the format (I can read “people‟s
comments” about the resource… provided that I know
the language in which the comment is written)
Standardization
(i.e. common language for
applications) ???
Feasible
Smaller amount of information, possibly new
Solution: define a standard language for writing
Resource and description
this resource
was created on
April 14
th, 2009
the title of this
resource is
“Introduction to
the Semantic
Web”
the author of
this resource
is L. Farinetti
this resource is
related to
computer
science,
knowledge
representation
and metadata
the quality of
this resource
is high,
according to F.
Corno
this resource is suitable
for PhD students
Metadata
F. Corno, L. Farinetti - Politecnico di Torino 7
Resource and description
description
resource
Date =
2009-04-14
Title =
“Introduction to
the Semantic
Web”
Author =
L. Farinetti
Topic =
{computer
science,
knowledge
representation,
metadata}
Quality =
high
Level =
PhD students
Rated by
F. Corno
Meaningful metadata annotations
Common language for
describing resources
Resource description standards
Common language for
description field names
Metadata standards
Common language for
description field values
Metadata standards + controlled vocabularies
Semantically rich
descriptions to support search
F. Corno, L. Farinetti - Politecnico di Torino 9
Common language for field names
Title =
...
Problem
Author =
…
Creator, Maker,
Contributor …
Synonymy
Topic =
…
Topics, Subject, Subjects,
Argument, Arguments
Singular / plural
Level =
…
Difficult to clearly
define concept in a
few words
Educational level,
destination, suitability, …
Date =
…
Date of creation, date of
last modification, date of
revision, …
Different concepts:
need for more details
Common language for field names
Solution:
metadata standards
Many standardization bodies are involved
Standards may be
general
e.g. Dublin Core (DC)
or may
depend on goal
, context, domain, …
e. g. educational resources (IEEE LOM), multimedia
resources (MPEG-7), images (VRA), people (FOAF,
IEEE PAPI), geospatial resources (GSDGM),
bibliographical resources (MARC, OAI), cultural
heritage resources (CIDOC CRM)
F. Corno, L. Farinetti - Politecnico di Torino 11
Dublin Core
Dublin Core Metadata Element Set
(DCMES)
Building blocks to define metadata for the
Semantic Web
15 elements
, or categories, general enough to
describe most of the published resources
F. Corno, L. Farinetti - Politecnico di Torino 13
Example of description using
Dublin Core (in RDF)
A paper in the
“Ariadne” journal
F. Corno, L. Farinetti - Politecnico di Torino 15
Common language for field values
Problems
Value type
Title =
“Introduction to
the Semantic
Web”
type = string
Date =
2009-04-14
type = date
Author =
L. Farinetti
type = string
“standard” format?
Laura Farinetti, Farinetti
Laura, Farinetti L., …
Common language for field values
Problems
Value type
Value restrictions?
freedom vs shared understanding
Quality =
high
High, medium, low?
1 to 5?
any value?
Level =
PhD students
any value?
list of possible values?
Topic =
{computer
science,
knowledge
representation,
metadata}
any value?
F. Corno, L. Farinetti - Politecnico di Torino 17
Common language for field values
Solution: metadata standards + controlled
vocabularies
Metadata standards
Only some, and partially
Controlled vocabularies
Examples from IEEE LOM
1484.12.1 - 2002 Learning Object
Metadata (LOM) Standard
Developed by the IEEE Learning Technology
Standards Committee (LTSC)
Standard to describe the “Learning
Objects” in order to guarantee their
interoperability
F. Corno, L. Farinetti - Politecnico di Torino 19
F. Corno, L. Farinetti - Politecnico di Torino 21
… + controlled vocabularies
A
closed list of named subjects
, which can
be used for classification
Metadata field values are
restricted to a list of terms
(selected by experts)
Topic =
{computer
science,
informatics
,
knowledge
representation,
metadata}
Knowledge
Need for knowledge representation
Semantically rich descriptions need
“understanding”
the meaning of a resource
and the domain related to the resource
Disambiguation of terms
Shared agreement on meanings
Description of the domain, with concepts and
relations among concepts
F. Corno, L. Farinetti - Politecnico di Torino 25
Example: Dublin Core metadata
Problems
Title usually offers good clues, but
it does not necessarily mention all names of all
subjects the user is interested in
it may presuppose knowledge the user does not
actually possess
Subject is meant to convey precisely what the
document is about, but
much depends on how extensive the set of keywords
is, whether all related subjects are mentioned, and
whether too many subjects are listed
Metadata does not say much about “how
F. Corno, L. Farinetti - Politecnico di Torino 27
Problems
Authors were free to define their own
subject keywords
Results are not “about” topic maps, but
“related to” topic maps
If an author forgets to list “topic maps”, his
F. Corno, L. Farinetti - Politecnico di Torino 29
Subject-based classification
Any form of content classification
that groups
objects by their subjects
e.g the use of keywords to classify papers
Metadata fields describe what the objects are
about by listing discrete subjects inside a
subject-based classification
Important: difference between describing the
objects
being classified and describing the
subjects
used to classify them
Metadata describe objects
Subject-based classification is the approach to
describe subject
Subject-based classification ...
“On those remote pages it is written that animals are divided into:
a. those that belong to the Emperor
b. embalmed ones
c. those that are trained
d. suckling pigs
e. mermaids
f. fabulous ones
g. stray dogs
h. those that are included in this classification
i. those that tremble as if they were mad
j. innumerable ones
k. those drawn with a very fine camel's hair brush
l. others
m. those that have just broken a flower vase
n. those that resemble flies from a distance"
From The Celestial Emporium of
Benevolent Knowledge, Borges
http://en.wikipedia.org/wiki/Cele
stial_Emporium_of_Benevolent_
Knowledge's_Taxonomy
F. Corno, L. Farinetti - Politecnico di Torino 31
Subject-based classification
techniques
Controlled
vocabularies
Taxonomies
Thesauri
Faceted classification
Ontologies
Folksonomies
Others
Controlled vocabulary
A closed list of
named subjects
, which can be
used for classification
Composed of terms: particular name for a
particular concept
similar to keywords
Terms are not concepts
A single term may be the name of one or more
concepts
A single concept may have multiple names
F. Corno, L. Farinetti - Politecnico di Torino 33
Controlled vocabulary
Goal
Prevent authors from defining terms that are
meaningless, too broad or too narrow
Prevent authors from misspelling
Prevent different authors from choosing
slightly different forms of the same term
The simplest form of controlled vocabulary
is a list of terms (or “pick list”)
Topic =
{computer
science,
knowledge
representation,
mtadata, RDF,
topic navigation
maps}
topic maps
Controlled vocabulary
Reduce ambiguity inherent in normal
human languages
Solve the problems of homographs,
homonyms, synonyms and polysemes by
ensuring
That each concept is described using only
one authorized term
That each authorized term in the controlled
vocabulary describes only one concept
F. Corno, L. Farinetti - Politecnico di Torino 35
Problems solved
Synonym
Problems solved
Synonym
different words with identical or very similar meanings
close
“Will you please close that door!”
“The tiger was now so close that I could smell it...”
pupil
student
opening in the iris of the eye
axes
('æk.səz) plural of axe
('æk.siz) plural of axis
F. Corno, L. Farinetti - Politecnico di Torino 37
Problems solved
Synonym
different words with identical or very similar meanings
student and pupil (noun)
buy and purchase (verb)
sick and ill (adjective)
to get
take (I'll get the drinks)
become (she got scared)
wood
understand (I get it)
a piece of a tree
Controlled vocabulary examples
Practically no “real” examples
With very little extra effort: taxonomies and
thesauri!
Circuit theory
Electronic circuits
Microwave technology
Electron tubes
Semiconductor materials and devices
Dielectric materials and devices
Magnetic materials and devices
Superconducting materials and devices
…
Blood
Cord blood
Erythrocyte
Leukocyte
Basophil
Eosynophil
Lymphoblast
Lymphocyte
Monocyte
Neutrophil
…
F. Corno, L. Farinetti - Politecnico di Torino 39
Taxonomy
Subject-based
classification that
arranges the terms in the
controlled vocabulary
into a
hierarchy
Dates back to Carl
Linnæus‟s work on
zoological and botanical
classification (18th
Taxonomy
Allow related terms to be grouped together
It is clear that “topic
maps” and “XTM” are
related
Easier to classify
documents
Easier to choose
search keywords
F. Corno, L. Farinetti - Politecnico di Torino 41
Taxonomies and metadata
Metadata are
stored as usual
with the resource
The “subject” will
contain only
controlled terms
Controlled terms
belong to a
hierarchy, shared
by all papers
Taxonomy example: INSPEC
F. Corno, L. Farinetti - Politecnico di Torino 43
INSPEC
journal
article
F. Corno, L. Farinetti - Politecnico di Torino 45
Taxonomy example: anatomy terms
F. Corno, L. Farinetti - Politecnico di Torino 47
Taxonomy example
Taxonomy limits
Only
two kinds
of relationships between terms
Parent = broader term
Child = narrower term
topic navigation maps
synonym
no more in use
difference?
synonym
XML topic map
difference?
F. Corno, L. Farinetti - Politecnico di Torino 49
Thesaurus
Extends taxonomies
subjects are arranged in a hierarchy
Other statements
can be made about the
subjects
Two ISO standards
ISO2788 for monolingual thesauri
Thesaurus relationships
BT – broader term
Refers to a term with wider or less specific meaning
Some systems allow multiple BTs for one term, while
others do not
Inverse property: NT - narrower term
A taxonomy only uses BT and NT
SN – scope note
String explaining its meaning within the thesaurus
Useful when the precise meaning of the term is not
obvious from context
F. Corno, L. Farinetti - Politecnico di Torino 51
Thesaurus relationships
USE
Another term that is to be preferred instead of this
term
Implies that the terms are synonymous
Inverse property: UF
TT – top term
The topmost ancestor of this term
The BT of the BT of the BT...
RT – related term
A term that is related to this term, without being a
synonym of it or a broader/narrower term
Thesaurus example
F. Corno, L. Farinetti - Politecnico di Torino 53
Thesaurus example
Thesaurus example
Library of Congress
Subject Heading
W3C
standard:
SKOS
Faceted classification
Proposed by
S.R. Ranganathan in the „30s
Facets are the
different axes
along which
documents can be classified
Each facet contains a number of terms
Usually with a thesaurus organization
Usually a term belongs to one facet only
A document is classified by selecting one term
from each facet
F. Corno, L. Farinetti - Politecnico di Torino 57
Faceted classification example
Advantages
Multi-dimensionality
Persistence
Scalability
Flexibility
http://freeable.polito.it/
F. Corno, L. Farinetti - Politecnico di Torino 59
Ontology
Model for
describing the world
that
consists of a set of types, properties, and
relationships
Extends the other subject-based
classification approaches
Has
open vocabularies
Has
open relationship types
(not just BT/NT,
RT and USE/UF)
Ontology structure
Concepts
Relationships
Is-a
Other
Instances
F. Corno, L. Farinetti - Politecnico di Torino 61
Folksonomy
Internet-mediated
social environments
Tags compiled
through social tagging
Social tagging
Decentralized practice where individuals and
groups create, manage and share tags to
annotate digital resources in an online social
environment
Generally characterized by non-standard
tagging
F. Corno, L. Farinetti - Politecnico di Torino 63
Other subject-based techniques
Synonym rings
Connect together a set of terms as being
equivalent
for search purpose
Similar to UF/USE relationship of thesauri,
but no preferred term
Other subject-based techniques
Authority file
Similar to a synonym ring, but consists of UF/USE
relationships instead of synonym relationships
One term in each synonym ring is indicated as the
preferred term
for that subject
e.g. Library of
Congress Name
Authority File
F. Corno, L. Farinetti - Politecnico di Torino 65
Subject-based classification
summary
Terminology is rarely used
in a consistent way
Controlled vocabularies
are thesauri, thesauri are
ontologies, …
Subject-based classification
summary
Ontologies
An ontology is an
explicit description of
a domain
concepts
properties and attributes of concepts
constraints on properties and attributes
individuals (often, but not always)
An ontology defines
a
common
vocabulary
F. Corno, L. Farinetti - Politecnico di Torino 69
Why develop an ontology?
To
share common understanding
of the
structure of information
among people
among software agents
To enable
reuse
of domain knowledge
to avoid “re-inventing the wheel”
Example of ontology engineering
F. Corno, L. Farinetti - Politecnico di Torino 71
Example of ontology engineering
1.A piece of furniture consisting of a seat, legs, back, and often
arms, designed to accommodate one person.
2.A seat of office, authority, or dignity, such as that of a bishop.
a.An office or position of authority, such as a professorship.
b.A person who holds an office or a position of authority,
such as one who presides over a meeting or administers a
department of instruction at a college; a chairperson.
3.The position of a player in an orchestra.
4.Slang.
The electric chair.
5.A seat carried about on poles; a sedan chair.
6.Any of several devices that serve to support or secure, such as
a metal block that supports and holds railroad track in position.
Example of ontology engineering
A piece of furniture consisting of a seat, legs, back,
and often arms, designed to accommodate one
person.
F. Corno, L. Farinetti - Politecnico di Torino 73
Example of ontology engineering
Example of ontology engineering
Something I can sit on
chair
seat
stool
bench
Something I can sit on
F. Corno, L. Farinetti - Politecnico di Torino 75
chair
seat
stool
bench
Something I can sit on
“sittable”
chair
seat
stool
bench
table
Example of ontology engineering
Something I can sit on
F. Corno, L. Farinetti - Politecnico di Torino 77
Example of ontology engineering
Something I can sit on
chair
seat
stool
bench
“for_sitting”
table
“sittable”
Ontology structure
chair
seat
stool
bench
“for_sitting”
table
“sittable”
F. Corno, L. Farinetti - Politecnico di Torino 79
Concepts
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
Furniture to sit on
Shorthand name
Synthetic
title
Definition
Internationalization
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
Furniture to sit on
Shorthand name
Synthetic title
Definition
Furniture to sit on
Furniture to sit on
Furniture to sit on
Furniture to sit on
Furniture to sit on
Furniture to sit on
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
Some piece of furniture that can
be used to sit on, either by
design or by its shape.
F. Corno, L. Farinetti - Politecnico di Torino 81
Relationships
chair
seat
stool
bench
“for_sitting”
table
“sittable”
is_a
is_a
is_a
is_a
is_a
is_a
room
material
wood
is_a
classroom
dining room
is_a
is_a
Relationships
chair
seat
stool
bench
“for_sitting”
table
“sittable”
is_a
is_a
is_a
is_a
is_a
is_a
room
material
wood
is_a
classroom
dining room
is_a
is_a
made_of
made_of
F. Corno, L. Farinetti - Politecnico di Torino 83