Metadata and Metadata Standards

84  Download (0)

Full text

(1)

Metadata and

(2)

Resource and description

Resource

Content, format, …

Access method dependent on format (I can

read it if I “know” its language)

Resource

description

Independent of the format (I can read

“people‟s comments” about the resource…

provided that I know the language in which

the comment is written)

(3)

F. Corno, L. Farinetti - Politecnico di Torino 3

Resource and description

description

resource

this resource

was created on

April 14

th

, 2009

the title of this

resource is

“Introduction to

the Semantic

Web”

the author of

this resource

is L. Farinetti

this resource is

related to

computer

science,

knowledge

representation

and metadata

the quality of

this resource

is high,

according to F.

Corno

this resource is suitable

for PhD students

(4)

Resource and description

Resource

Content, format, …

Access method dependent on format (I can read it if I

“know” its language)

Standardization

(i.e. common language for

applications) ???

Practically impossible …

Huge amount of existing information

Hundreds of human languages

Hundreds of computer languages (other word for

formats)

(5)

F. Corno, L. Farinetti - Politecnico di Torino 5

Resource and description

Resource description

Independent of the format (I can read “people‟s

comments” about the resource… provided that I know

the language in which the comment is written)

Standardization

(i.e. common language for

applications) ???

Feasible

Smaller amount of information, possibly new

Solution: define a standard language for writing

(6)

Resource and description

this resource

was created on

April 14

th

, 2009

the title of this

resource is

“Introduction to

the Semantic

Web”

the author of

this resource

is L. Farinetti

this resource is

related to

computer

science,

knowledge

representation

and metadata

the quality of

this resource

is high,

according to F.

Corno

this resource is suitable

for PhD students

Metadata

(7)

F. Corno, L. Farinetti - Politecnico di Torino 7

Resource and description

description

resource

Date =

2009-04-14

Title =

“Introduction to

the Semantic

Web”

Author =

L. Farinetti

Topic =

{computer

science,

knowledge

representation,

metadata}

Quality =

high

Level =

PhD students

Rated by

F. Corno

(8)

Meaningful metadata annotations

Common language for

describing resources

Resource description standards

Common language for

description field names

Metadata standards

Common language for

description field values

Metadata standards + controlled vocabularies

Semantically rich

descriptions to support search

(9)

F. Corno, L. Farinetti - Politecnico di Torino 9

Common language for field names

Title =

...

Problem

Author =

Creator, Maker,

Contributor …

Synonymy

Topic =

Topics, Subject, Subjects,

Argument, Arguments

Singular / plural

Level =

Difficult to clearly

define concept in a

few words

Educational level,

destination, suitability, …

Date =

Date of creation, date of

last modification, date of

revision, …

Different concepts:

need for more details

(10)

Common language for field names

Solution:

metadata standards

Many standardization bodies are involved

Standards may be

general

e.g. Dublin Core (DC)

or may

depend on goal

, context, domain, …

e. g. educational resources (IEEE LOM), multimedia

resources (MPEG-7), images (VRA), people (FOAF,

IEEE PAPI), geospatial resources (GSDGM),

bibliographical resources (MARC, OAI), cultural

heritage resources (CIDOC CRM)

(11)

F. Corno, L. Farinetti - Politecnico di Torino 11

(12)

Dublin Core

Dublin Core Metadata Element Set

(DCMES)

Building blocks to define metadata for the

Semantic Web

15 elements

, or categories, general enough to

describe most of the published resources

(13)

F. Corno, L. Farinetti - Politecnico di Torino 13

(14)

Example of description using

Dublin Core (in RDF)

A paper in the

“Ariadne” journal

(15)

F. Corno, L. Farinetti - Politecnico di Torino 15

Common language for field values

Problems

Value type

Title =

“Introduction to

the Semantic

Web”

type = string

Date =

2009-04-14

type = date

Author =

L. Farinetti

type = string

“standard” format?

Laura Farinetti, Farinetti

Laura, Farinetti L., …

(16)

Common language for field values

Problems

Value type

Value restrictions?

freedom vs shared understanding

Quality =

high

High, medium, low?

1 to 5?

any value?

Level =

PhD students

any value?

list of possible values?

Topic =

{computer

science,

knowledge

representation,

metadata}

any value?

(17)

F. Corno, L. Farinetti - Politecnico di Torino 17

Common language for field values

Solution: metadata standards + controlled

vocabularies

Metadata standards

Only some, and partially

Controlled vocabularies

(18)

Examples from IEEE LOM

1484.12.1 - 2002 Learning Object

Metadata (LOM) Standard

Developed by the IEEE Learning Technology

Standards Committee (LTSC)

Standard to describe the “Learning

Objects” in order to guarantee their

interoperability

(19)

F. Corno, L. Farinetti - Politecnico di Torino 19

(20)
(21)

F. Corno, L. Farinetti - Politecnico di Torino 21

(22)

… + controlled vocabularies

A

closed list of named subjects

, which can

be used for classification

Metadata field values are

restricted to a list of terms

(selected by experts)

Topic =

{computer

science,

informatics

,

knowledge

representation,

metadata}

(23)

Knowledge

(24)

Need for knowledge representation

Semantically rich descriptions need

“understanding”

the meaning of a resource

and the domain related to the resource

Disambiguation of terms

Shared agreement on meanings

Description of the domain, with concepts and

relations among concepts

(25)

F. Corno, L. Farinetti - Politecnico di Torino 25

Example: Dublin Core metadata

(26)

Problems

Title usually offers good clues, but

it does not necessarily mention all names of all

subjects the user is interested in

it may presuppose knowledge the user does not

actually possess

Subject is meant to convey precisely what the

document is about, but

much depends on how extensive the set of keywords

is, whether all related subjects are mentioned, and

whether too many subjects are listed

Metadata does not say much about “how

(27)

F. Corno, L. Farinetti - Politecnico di Torino 27

(28)

Problems

Authors were free to define their own

subject keywords

Results are not “about” topic maps, but

“related to” topic maps

If an author forgets to list “topic maps”, his

(29)

F. Corno, L. Farinetti - Politecnico di Torino 29

Subject-based classification

Any form of content classification

that groups

objects by their subjects

e.g the use of keywords to classify papers

Metadata fields describe what the objects are

about by listing discrete subjects inside a

subject-based classification

Important: difference between describing the

objects

being classified and describing the

subjects

used to classify them

Metadata describe objects

Subject-based classification is the approach to

describe subject

(30)

Subject-based classification ...

“On those remote pages it is written that animals are divided into:

a. those that belong to the Emperor

b. embalmed ones

c. those that are trained

d. suckling pigs

e. mermaids

f. fabulous ones

g. stray dogs

h. those that are included in this classification

i. those that tremble as if they were mad

j. innumerable ones

k. those drawn with a very fine camel's hair brush

l. others

m. those that have just broken a flower vase

n. those that resemble flies from a distance"

From The Celestial Emporium of

Benevolent Knowledge, Borges

http://en.wikipedia.org/wiki/Cele

stial_Emporium_of_Benevolent_

Knowledge's_Taxonomy

(31)

F. Corno, L. Farinetti - Politecnico di Torino 31

Subject-based classification

techniques

Controlled

vocabularies

Taxonomies

Thesauri

Faceted classification

Ontologies

Folksonomies

Others

(32)

Controlled vocabulary

A closed list of

named subjects

, which can be

used for classification

Composed of terms: particular name for a

particular concept

similar to keywords

Terms are not concepts

A single term may be the name of one or more

concepts

A single concept may have multiple names

(33)

F. Corno, L. Farinetti - Politecnico di Torino 33

Controlled vocabulary

Goal

Prevent authors from defining terms that are

meaningless, too broad or too narrow

Prevent authors from misspelling

Prevent different authors from choosing

slightly different forms of the same term

The simplest form of controlled vocabulary

is a list of terms (or “pick list”)

Topic =

{computer

science,

knowledge

representation,

mtadata, RDF,

topic navigation

maps}

topic maps

(34)

Controlled vocabulary

Reduce ambiguity inherent in normal

human languages

Solve the problems of homographs,

homonyms, synonyms and polysemes by

ensuring

That each concept is described using only

one authorized term

That each authorized term in the controlled

vocabulary describes only one concept

(35)

F. Corno, L. Farinetti - Politecnico di Torino 35

Problems solved

Synonym

(36)

Problems solved

Synonym

different words with identical or very similar meanings

close

“Will you please close that door!”

“The tiger was now so close that I could smell it...”

pupil

student

opening in the iris of the eye

axes

('æk.səz) plural of axe

('æk.siz) plural of axis

(37)

F. Corno, L. Farinetti - Politecnico di Torino 37

Problems solved

Synonym

different words with identical or very similar meanings

student and pupil (noun)

buy and purchase (verb)

sick and ill (adjective)

to get

take (I'll get the drinks)

become (she got scared)

wood

understand (I get it)

a piece of a tree

(38)

Controlled vocabulary examples

Practically no “real” examples

With very little extra effort: taxonomies and

thesauri!

Circuit theory

Electronic circuits

Microwave technology

Electron tubes

Semiconductor materials and devices

Dielectric materials and devices

Magnetic materials and devices

Superconducting materials and devices

Blood

Cord blood

Erythrocyte

Leukocyte

Basophil

Eosynophil

Lymphoblast

Lymphocyte

Monocyte

Neutrophil

(39)

F. Corno, L. Farinetti - Politecnico di Torino 39

Taxonomy

Subject-based

classification that

arranges the terms in the

controlled vocabulary

into a

hierarchy

Dates back to Carl

Linnæus‟s work on

zoological and botanical

classification (18th

(40)

Taxonomy

Allow related terms to be grouped together

It is clear that “topic

maps” and “XTM” are

related

Easier to classify

documents

Easier to choose

search keywords

(41)

F. Corno, L. Farinetti - Politecnico di Torino 41

Taxonomies and metadata

Metadata are

stored as usual

with the resource

The “subject” will

contain only

controlled terms

Controlled terms

belong to a

hierarchy, shared

by all papers

(42)

Taxonomy example: INSPEC

(43)

F. Corno, L. Farinetti - Politecnico di Torino 43

(44)

INSPEC

journal

article

(45)

F. Corno, L. Farinetti - Politecnico di Torino 45

Taxonomy example: anatomy terms

(46)
(47)

F. Corno, L. Farinetti - Politecnico di Torino 47

Taxonomy example

(48)

Taxonomy limits

Only

two kinds

of relationships between terms

Parent = broader term

Child = narrower term

topic navigation maps

synonym

no more in use

difference?

synonym

XML topic map

difference?

(49)

F. Corno, L. Farinetti - Politecnico di Torino 49

Thesaurus

Extends taxonomies

subjects are arranged in a hierarchy

Other statements

can be made about the

subjects

Two ISO standards

ISO2788 for monolingual thesauri

(50)

Thesaurus relationships

BT – broader term

Refers to a term with wider or less specific meaning

Some systems allow multiple BTs for one term, while

others do not

Inverse property: NT - narrower term

A taxonomy only uses BT and NT

SN – scope note

String explaining its meaning within the thesaurus

Useful when the precise meaning of the term is not

obvious from context

(51)

F. Corno, L. Farinetti - Politecnico di Torino 51

Thesaurus relationships

USE

Another term that is to be preferred instead of this

term

Implies that the terms are synonymous

Inverse property: UF

TT – top term

The topmost ancestor of this term

The BT of the BT of the BT...

RT – related term

A term that is related to this term, without being a

synonym of it or a broader/narrower term

(52)

Thesaurus example

(53)

F. Corno, L. Farinetti - Politecnico di Torino 53

Thesaurus example

(54)

Thesaurus example

Library of Congress

Subject Heading

(55)

W3C

standard:

SKOS

(56)

Faceted classification

Proposed by

S.R. Ranganathan in the „30s

Facets are the

different axes

along which

documents can be classified

Each facet contains a number of terms

Usually with a thesaurus organization

Usually a term belongs to one facet only

A document is classified by selecting one term

from each facet

(57)

F. Corno, L. Farinetti - Politecnico di Torino 57

Faceted classification example

(58)

Advantages

Multi-dimensionality

Persistence

Scalability

Flexibility

http://freeable.polito.it/

(59)

F. Corno, L. Farinetti - Politecnico di Torino 59

Ontology

Model for

describing the world

that

consists of a set of types, properties, and

relationships

Extends the other subject-based

classification approaches

Has

open vocabularies

Has

open relationship types

(not just BT/NT,

RT and USE/UF)

(60)

Ontology structure

Concepts

Relationships

Is-a

Other

Instances

(61)

F. Corno, L. Farinetti - Politecnico di Torino 61

Folksonomy

Internet-mediated

social environments

Tags compiled

through social tagging

Social tagging

Decentralized practice where individuals and

groups create, manage and share tags to

annotate digital resources in an online social

environment

Generally characterized by non-standard

tagging

(62)
(63)

F. Corno, L. Farinetti - Politecnico di Torino 63

Other subject-based techniques

Synonym rings

Connect together a set of terms as being

equivalent

for search purpose

Similar to UF/USE relationship of thesauri,

but no preferred term

(64)

Other subject-based techniques

Authority file

Similar to a synonym ring, but consists of UF/USE

relationships instead of synonym relationships

One term in each synonym ring is indicated as the

preferred term

for that subject

e.g. Library of

Congress Name

Authority File

(65)

F. Corno, L. Farinetti - Politecnico di Torino 65

Subject-based classification

summary

Terminology is rarely used

in a consistent way

Controlled vocabularies

are thesauri, thesauri are

ontologies, …

(66)

Subject-based classification

summary

(67)
(68)

Ontologies

An ontology is an

explicit description of

a domain

concepts

properties and attributes of concepts

constraints on properties and attributes

individuals (often, but not always)

An ontology defines

a

common

vocabulary

(69)

F. Corno, L. Farinetti - Politecnico di Torino 69

Why develop an ontology?

To

share common understanding

of the

structure of information

among people

among software agents

To enable

reuse

of domain knowledge

to avoid “re-inventing the wheel”

(70)

Example of ontology engineering

(71)

F. Corno, L. Farinetti - Politecnico di Torino 71

Example of ontology engineering

1.A piece of furniture consisting of a seat, legs, back, and often

arms, designed to accommodate one person.

2.A seat of office, authority, or dignity, such as that of a bishop.

a.An office or position of authority, such as a professorship.

b.A person who holds an office or a position of authority,

such as one who presides over a meeting or administers a

department of instruction at a college; a chairperson.

3.The position of a player in an orchestra.

4.Slang.

The electric chair.

5.A seat carried about on poles; a sedan chair.

6.Any of several devices that serve to support or secure, such as

a metal block that supports and holds railroad track in position.

(72)

Example of ontology engineering

A piece of furniture consisting of a seat, legs, back,

and often arms, designed to accommodate one

person.

(73)

F. Corno, L. Farinetti - Politecnico di Torino 73

Example of ontology engineering

(74)

Example of ontology engineering

Something I can sit on

chair

seat

stool

bench

Something I can sit on

(75)

F. Corno, L. Farinetti - Politecnico di Torino 75

chair

seat

stool

bench

Something I can sit on

“sittable”

(76)

chair

seat

stool

bench

table

Example of ontology engineering

Something I can sit on

(77)

F. Corno, L. Farinetti - Politecnico di Torino 77

Example of ontology engineering

Something I can sit on

chair

seat

stool

bench

“for_sitting”

table

“sittable”

(78)

Ontology structure

chair

seat

stool

bench

“for_sitting”

table

“sittable”

(79)

F. Corno, L. Farinetti - Politecnico di Torino 79

Concepts

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

Furniture to sit on

Shorthand name

Synthetic

title

Definition

(80)

Internationalization

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

Furniture to sit on

Shorthand name

Synthetic title

Definition

Furniture to sit on

Furniture to sit on

Furniture to sit on

Furniture to sit on

Furniture to sit on

Furniture to sit on

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

Some piece of furniture that can

be used to sit on, either by

design or by its shape.

(81)

F. Corno, L. Farinetti - Politecnico di Torino 81

Relationships

chair

seat

stool

bench

“for_sitting”

table

“sittable”

is_a

is_a

is_a

is_a

is_a

is_a

room

material

wood

is_a

classroom

dining room

is_a

is_a

(82)

Relationships

chair

seat

stool

bench

“for_sitting”

table

“sittable”

is_a

is_a

is_a

is_a

is_a

is_a

room

material

wood

is_a

classroom

dining room

is_a

is_a

made_of

made_of

(83)

F. Corno, L. Farinetti - Politecnico di Torino 83

Ontology building blocks

Ontologies generally describe:

Individuals

the basic or “ground level” objects

Classes

sets, collections, or types of objects

Attributes

properties, features, characteristics, or parameters

that objects can have and share

Relationships

(84)

License

This work is licensed under the Creative

Commons

Attribution-Noncommercial-Share Alike

3.0 Unported License.

To view a copy of this license, visit

http://creativecommons.org/licenses/by-nc-sa/3.0/

or send a letter to Creative

Commons, 171 Second Street, Suite 300,

San Francisco, California, 94105, USA.

Figure

Updating...

References

Related subjects :