Taxonomy
Enterprise Content
Management (ECM)
ECM ECM ECM Case Study
AIIM ECM Certificate programme
ECM Strategy ECM Practitioner ECM Specialist Case Study 2
ECM Practitioner Course Outline
Foundations Tools & Instruments
1. Introduction 2. Technologies & Functionality 4. Create & Capture 5. Metadata 7. Security & Control 10. Delivery & Presentation 8. Process & Automation 11. Trends & Directions Futures 3 3. Information Architecture 9. Findability 6. Taxonomy
© AIIM | All rights reserved 3
Agenda
Defining taxonomies and classification
Subject-based classification
TaxonomiesFolksonomies
Ontologies
Thesaurus and Semantic networks
Business case for classification
4
Business case for classification
Standards and guidelines
Classification challenges
Agenda
Defining taxonomies and classification
Subject-based classification
TaxonomiesFolksonomies Ontologies
Thesaurus and Semantic networks
Business case for classification
5
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Defining taxonomy (1)
Taxonomy is the science of classifying information
A taxonomy is a law for classifying information
Taxonomies are nearly ubiquitous, but poorly
understood
6
Source: Dictionary.com
Defining taxonomy (2)
“In recent years, the business world has fallen in love
with the term ‘taxonomies’. We use it specifically to refer
with the term taxonomies . We use it specifically to refer
to a hierarchical arrangement of categories within the
user interface of a website or intranet.”
7
Source: Information Architecture for the World Wide Web (Louis Rosenfeld and Peter Morville, 2002)
© AIIM | All rights reserved
AIIM website
1 2 4 3 8 © AIIM | All rights reservedUnderstanding taxonomies
A taxonomy is a classification scheme
Such as the way that an individual classifies the content of their e mail
Such as the way that an individual classifies the content of their e-mail inbox, a personal CD collection, or the contents on an iPod
A taxonomy is a knowledge map
Reflects how it’s owner conceives a given body of content (a knowledge domain), for purposes of browsing, navigating, discovering, and sharing that information
9
A taxonomy is semantic
Indicating the relationships between concepts, such as the relationships between a car and a steering wheel, in that the steering wheel is a “part of” a car
Source: Organising Knowledge (Patrick Lambe, 2007)
© AIIM | All rights reserved
Category perspectives
Business function
Geo-political
Company focus vs. industry focus
Product or service
Business issues, conditions, events
10
Type/Source of content
Representations of taxonomies (1)
Lists
Trees
Hierarchies
Polyhierarchies
Matrices
11
Facets
System Maps
© AIIM | All rights reserved
Source: Organising Knowledge (Patrick Lambe, 2007)
Representations of taxonomies (2)
Lists
Simple collection of related
Simple collection of related things. The relationship is defined by the purpose of the list.
Good when domain is simple, amount of content is small. Basic building blocks of all other taxonomical representations
Examples: Country codes types
12
Examples: Country codes, types of diseases
Source: Organising Knowledge (Patrick Lambe, 2007) Source: Wikipedia
Representations of taxonomies (3)
Trees
Represents a transition from
Represents a transition from general to more specific relationships or whole to part.
Good when a list gets to be too long, and “naturally” breaks into subcategories.
Examples: Yellow pages (phone directories)
13
directories)
Source: Organising Knowledge (Patrick Lambe, 2007) Source: CoreFiling.com
© AIIM | All rights reserved
Representations of taxonomies (4)
Hierarchies
A specific tree structure that has
A specific tree structure that has inclusiveness, consistency, and maintains the same “type” of relationship at each level. The “child” inherits all of the
characteristics of the “parent” and each child can only belong in one place in the taxonomy
14
Works best with mature, formal, logical schemes
Examples: Military rank, Biological, Family Genealogy
Source: Organising Knowledge (Patrick Lambe, 2007)
Representations of taxonomies (5)
Polyhierarchies
Used when an item belongs in
Used when an item belongs in more than one place in the real world, and multiple organising principles are required. Provides “virtual linking” between
hierarchies.
Example: a single collection of content concerning diseases can
15
content concerning diseases can be organised/taxonomised via affected body part and causes
Source: Organising Knowledge (Patrick Lambe, 2007) Source: Rosenfeld, Morville (2006)
© AIIM | All rights reserved
Representations of taxonomies (6)
Matrices
Provides a 2 or 3 dimensional
Provides a 2 or 3-dimensional cross linking of taxonomies, and an ability to provide differing views into the same body of content.
Example: The same content could be located based on project manager, project initiation, and/or
16
manager, project initiation, and/or affected standards
Source: Organising Knowledge (Patrick Lambe, 2007)
Representations of taxonomies (6)
Facets
A multi dimensional taxonomy
A multi-dimensional taxonomy comprised of multiple tags, each tag representing an individual taxonomy, thus the content is categorised in multiple ways, within a single interface.
Example: selecting wines based on characteristics such as type,
17
on characteristics such as type, price, varietals, regions,
appellations, and price.
Source: Organising Knowledge (Patrick Lambe, 2007) Source: wine.com
© AIIM | All rights reserved
Representations of taxonomies (7)
System maps
Visual representations of a
Visual representations of a domain of knowledge
Labelled representing taxonomy categories
Example: A collection of medical content relating to the human nervous system is accessible via a diagram of the human body
18
a diagram of the human body. Each component of that system is illustrated in context, and labelled appropriately.
Source: Organising Knowledge (Patrick Lambe, 2007)
Defining classification
Classification:
“The systematic identification and arrangement of
business activities and/or records into categories
according to logically structured conventions,
methods and procedural rules represented in a
classification system”
19
Source: ISO 15489
© AIIM | All rights reserved
What is classification?
In simple terms, it’s just grouping information together
Common examples of classification:
Carsby make, model, performance
Food
tinned/fresh, type (meat, vegetable, grain)
TV programmes
20
p g
comedy, thriller, quiz show
Clothes
adult/child, expensive/cheap, winter/summer
Dewey Decimal system
Used to classify information
throughout the western
Dewey Decimal system
000 General & Bibliography
throughout the western
world
Very Euro-centric
000 General & Bibliography
100 Philosophy & Psychology
200 Religion
300 Social Science
400 Languages & Linguistics
500 Sciences
21 © AIIM | All rights reserved
600 Technology
800 Literature
900 Geography & History
Chinese library classification
43,600 categories. Constantly expanding to meet the
needs of a rapidly changing nation
1) Marxism, Leninism, Maoism & Deng Xiaoping Theory 2) Philosophy and Religion 3) Social Sciences 4) Politics and Law 5) Military Science 6) Economics
12) Natural Science
13) Mathematics, Physics and Chemistry
14) Astronomy and Geoscience 15) Life Sciences
16) Medicine and Health Sciences 17) Agricultural Sciences
Political considerations drive some organisation
22
6) Economics
7) Culture, Science, Education, and Sports
8) Languages and Linguistics 9) Literature
10) Art
11) History and Geography
17) Agricultural Sciences 18) Industrial Technology 19) Transportation
20) Aviation and Aerospace 21) Environmental Science
US Library of Congress
Used to categorise books published in the United States
Expanded categories emphasise USA-specific history
and interests
A) General Works
B) Philosophy, Psychology, Religion C) History: Auxiliary Sciences D) History: General and Old World E) History: United States
F) History: Western Hemisphere
L) Music M) Fine Arts
N) Literature & Languages O) Science P) Medicine Q) Agriculture 23 G) Geography, Anthropology, Recreation H) Social Science I) Political Science J) Law K) Education R) Technology S) Military Science T) Naval Science
U) Bibliography & Library Science
© AIIM | All rights reserved
What are classification schemes?
A classification scheme…
Is the structure an organisation uses for organising
Is the structure an organisation uses for organising, accessing/retrieving, storing and managing its information
Can be used to classify records
A Business Classification Scheme (BCS) is a
classification scheme based on an organisation’s
business functions and activities
24
These are predominately used for Records Management purposes
Classification schemes: Types
Keyword /Deployment
P
Hierarchical / thesaurus-based Functional Subject / thematicP
rinciples of
clas
s
Generally preferred tree style 25s
ification
Organisational© AIIM | All rights reserved
Hierarchical / tree style BCSs: Key
CLASS
C
FILE
RECORD
F R 26
DOCUMENT
DSchematic example: Hierarchical / tree BCS
C C C C C C C C C C C C C C C C C C 27 F F F F F F F F© AIIM | All rights reserved
Populated example: Hierarchical / tree BCS
Innovation, Knowledge Transfer and Technical Infrastructure (super ( p function)
Innovation (function)
Knowledge Transfer (function)
Technical Infrastructure (function)
Standards and Accreditation (sub
function)
Policy Management (activity)
Super Function
Function Function Sub
Function FunctionSub FunctionSub
28
Activity Activity Activity Activity
Infrastructure Support (activity)
National Measurement System
(sub function)
Policy Management (activity)
Civil Space Activity (sub function)
Space Regulation (activity)
Agenda
Defining taxonomies and classification
Subject-based classification
Taxonomies
Folksonomies
Ontologies
Thesaurus and Semantic networks
Business case for classification
29
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Toward subject-based classification
It’s often valuable to create multiple classifications
Users: Intended audience Content: Inherent subject matter
Context: Temporal, organisational or political drivers
User-understood terms are critical
Especially important for e-commercePeople search Google for “cheap flights” 75x th “l f ” (S G M G )
30
Source: Louis Rosenfeld LLC
more than “low fares” (Source: Gerry McGovern)
Who are the users? Scientists? Consumers?
Context matters
Why this user with this content?
Taxonomies in context
31 © AIIM | All rights reserved
Source: Yahoo!
Hierarchies as implicit semantics
Divides information space into categories &
subcategories, relating broader & narrower concepts via
subcategories, relating broader & narrower concepts via
parent-child relationship
Generic = Class-species: Species B (crow) is a member of Class A (Bird) & inherits characteristics of its parent}
Whole-Part = B is a part of A (i.e., Index Finger is part of Hand)
Instance = B is an instance of A (i.e., Indian Ocean is an Ocean)
32
A A
B B
Differing views
Simple truth: People see (and label!) the world
differently…
Sand trap, or bunker?
33
Sand trap, or bunker?
© AIIM | All rights reserved
Personal taxonomy
• Personal classification of information • E-mail folders -- most common
manifestation manifestation
• Can improve relevance and findability to an individual
– Some approaches enable personal classification in addition to “authorised” taxonomy
– Gmail and some other systems employ faceted classification as well
• From enterprise perspective, personal taxonomies can be quite problematic
No interoperabilit ling istic chaos
34
– No interoperability, linguistic chaos – Impossible to establish enterprise-wide
standards and vocabularies
• When combined with peers, can become a “folksonomy”
Folksonomy
Collaborative tagging of content with minimal controls
Relevance between metadata and content may be
determined by users in a democratic fashion
Clusters emerge and communities typically self-organise
around them (“Wisdom of the crowd”)
Typically arise in Web-based communities where
i di id
l
h
t
t th
t
d
t
35
individuals share content, then create and use tags
Best used when there is a critical mass of taggers
Can be a useful “bottom-up” approach to developing taxonomies© AIIM | All rights reserved
Folksonomy example
36
Source: flickr.com
What is an ontology?
Explicit specification or conceptualisation of a domain
Often subsume thesauri, but employ richer semantic relationshipsOften subsume thesauri, but employ richer semantic relationshipsamong terms and attributes
Apply rigid rules specifying terms and relationships
Do more than just control vocabulary; are a knowledge representation
Semantic technologies are typically centered around
ontologies
An ontology for salad would contain the structure for
37
An ontology for salad would contain the structure for
how it relates to everything, from ingredients to growers
to the rodents that might eat it, and how a salad is
different in Japan vs. Italy
© AIIM | All rights reserved
Why develop an ontology?
To improve knowledge sharing and reuse, and make
software more adaptable to an environment
software more adaptable to an environment
Share common understanding of the structure of information among people or software agents
Enable reuse of domain knowledge
Make domain assumptions explicit
Separate domain knowledge from operational knowledge
38
p g
Analyse domain knowledge
Source: http://www.alphaworks.ibm.com/contentnr/introsemantics
The challenge of meaning
Meaning is a hard problem for machines and humans alike Same term can have multiple meaningsSame term can have multiple meanings
Multiple terms can have the same meaning
Ultimately meaning is contextual
Dublin Core designed to disambiguate at a fundamental level
E.g., distinguishes definitively among “Creator” and “Contributor,” and
“Publisher”
39
But in the wild, it is much harder to achieve semantic agreement
© AIIM | All rights reserved
Controlled vocabularies
Supporting tools based on collections of terms used to
tag, track and describe content
tag, track and describe content
For example, users may wish to organise content
according to
business sector geographical location product type 40 organisation type policy topic
Allow content to be described using only 'official terms'
Controlled vocabularies: Types
Simple lists
Lists of terms allowed to be used to describe an information resource
Lists of terms allowed to be used to describe an information resource
Synonym rings
A 'ring' of connected terms, all treated as equivalent for searching
Synonym rings can be used to link acronyms, variant spellings or scientific / popular terms
Thesaurus
41
Thesaurus
Hierarchical arrangement of broader and narrower meanings
© AIIM | All rights reserved
Simple lists and synonym rings
Simple list of bovine diseases
AnaplasmosisAnaplasmosis
Babesiosis
Bovine spongiform encephalopathy (BSE)
Cysticercosis
Synonym ring for a BSE
BSE Mad cows’ disease Bovine spongiform encephalopathy 42 Prion disease
Thesaurus
A networked collection of controlled vocabulary terms,
using associative relationships
Used to manage and identify the relationships among and between terms
E.g. Equal to, Related to, Opposite of
Some examples from a hypothetical domain
Lettuce = Frisée (a.k.a, ‘a synonym ring’)Lettuce is a narrower type of Greens
43
Coriander is related to Cilantro; but they are not equal
Useful to reconcile different lexicons across business
units or functional groups
© AIIM | All rights reserved
Sample thesaurus
44 © AIIM | All rights reserved
Ontologies and taxonomies and thesauri
How does this relate to Taxonomies and Thesauri?
“We have all agreed to call this thing lettuce Lettuce is a vegetable ” We have all agreed to call this thing lettuce. Lettuce is a vegetable.
There is a much larger potential pool of semantic
information that a taxonomy may or may not contain:
“Lettuce grows in the ground. Rabbits are a hazard to lettuce growers.Tomatoes and cucumbers are often eaten with lettuce, and the three of these things together make what is called a salad. But, a salad is not only defined by the collection of these three things in Japan a mixture
45
only defined by the collection of these three things…in Japan, a mixture of seaweed and sesame seeds is a salad. In the Midwestern United States, a collection of Jell-O and radishes is called a salad, and there is no lettuce involved.”
© AIIM | All rights reserved
Agenda
Defining taxonomies and classification
Subject-based classification
Taxonomies Folksonomies Ontologies
Thesaurus and Semantic networks
Business case for classification
46
Business case for classification
Standards and guidelines
Classification challenges
Benefits of classifying records (1)
Providing linkages between individual records which
accumulate to provide a continuous record of activity
p
y
Ensuring records are named in a consistent manner over
time
Assisting in the retrieval of all records relating to a
particular function or activity
Determining security protection and access appropriate
47
g
y p
pp
p
for sets of records
Allocating user permissions for access to, or action on,
particular groups of records
© AIIM | All rights reserved
Benefits of classifying records (2)
Distributing responsibility for management of particular
sets of records
sets of records
Distributing records for action
Determining appropriate retention periods and
disposition actions for records
48 © AIIM | All rights reserved
Agenda
Defining taxonomies and classification
Subject-based classification
Taxonomies Folksonomies Ontologies
Thesaurus and Semantic networks
Business case for classification
49
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Standards and guidelines
ISO 15489 - the international standard for records
management
g
MoReq2 - the Model Requirements for the Management
Of Electronic Records
DIRKS - the Design and Implementation of
Record-Keeping Systems methodology
ISO 2788 - Guidelines for the Establishment of
50
Monolingual Thesauri
Agenda
Defining taxonomies and classification
Subject-based classification
Taxonomies Folksonomies Ontologies
Thesaurus and Semantic networks
Business case for classification
51
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Classification challenges (1)
Laborious and difficult to develop
Tendency to over analyseTendency to over-analyse
May need more than one
Classification of content into a
categorisation scheme is ongoing
work
52 © AIIM | All rights reserved
Classification challenges (2)
Categories need ongoing care and feeding
(including thesauri, taxonomies, controlled vocabularies
(including thesauri, taxonomies, controlled vocabularies
and ontologies)
Content changes, Context changes
Vocabularies change
Experience may breed new perspectives
53 © AIIM | All rights reserved
What you have learned
How to leverage classification in general and taxonomies
in particular as part of an ECM strategy
in particular as part of an ECM strategy
Different approaches to subject-based organisation
schemes:
Taxonomies Thesauri Semantic networks 54 Ontologies Folksonomies
Managing classification challenges