Best Practices for
Architecting Taxonomy and Metadata
in an Open Source Environment
Don Miller
Vice President of Sales Concept Searching
Twitter @conceptsearch Zach Wahl
President and Chief Executive Officer Enterprise Knowledge
Expert Speakers
Zach Wahl - President and Chief Executive Officer at
Enterprise Knowledge has over 15 years’ experience leading
programs in knowledge and information management, working with more than 200 public and private organizations to
successfully design and implement information management systems. He has developed his own taxonomy design
methodology, has authored courses on knowledge management, and is a frequent speaker and trainer.
Don Miller – Vice President of Sales at Concept Searching
has over 20 years’ experience in knowledge management. He is a frequent speaker on records management, and
information architecture challenges and solutions, and has
been a guest speaker at Taxonomy Boot Camp, and numerous SharePoint events about information organization and records management.
Agenda
• Enterprise Knowledge
• Introduction to Business Taxonomy for Open Source • Open Source Challenges and Considerations
• Design Best Practices • Taxonomy in Action • Concept Searching • Unique Approach • Considerations • Use Case • Demonstration • Next Steps
• Company founded in 2002
• Product launched in 2003
• Focus on management of structured and unstructured information • Technology Platform
• Delivered as a web service
• Automatic concept identification, content tagging, auto-classification, taxonomy management
• Only statistical vendor that can extract conceptual metadata • 2009, 2010, 2011, 2012, 2013 ‘100 Companies that Matter in KM’
(KMWorld) and Trend Setting product of 2009, 2010, 2011, 2012, 2013
• Authority to Operate enterprise wide US Air Force and enterprise wide NETCON US Army
• Locations: US, UK, and South Africa
• Client base: Fortune 500/1000 organizations
• Managed Partner under Microsoft global ISV Program - ‘go to partner’ for Microsoft for auto-classification and taxonomy management
• Smart Content Framework for Information Governance comprising
• Six Building Blocks for success
•
The Global Leader in
Managed Metadata Solutions
Enterprise Knowledge
Dedicated to Making Your Information Work for You
• Principals bring over 15 years of taxonomy design consulting with support for over 200 organizations globally.
• www.enterprise-knowledge.com • Twitter: @EKConsulting
• Blog: http://www.enterprise-knowledge.com/category/blog/
• Core services include:
• Knowledge Management and Taxonomy
• Enterprise Search
• Application Development
Taxonomy Definitions
tax·on·o·my (tāk-sōn-mē)
n. pl. tax·on·o·mies
1. The classification of organisms in an ordered system that indicates natural relationships.
2. The science, laws, or principles of classification; systematics. 3. Division into ordered groups or categories: "Scholars have been
laboring to develop a taxonomy of young killers" (Aric Press).
Zach’s Definition – Controlled vocabularies used to describe or characterize explicit concepts of information, for purposes of capture, management, and presentation.
Taxonomy and Metadata
• Provide structure to unstructured information
• Join or relate multiple disparate sources of information • Provide multiple avenues to find and discover information • Enable findability
Metadata “Card” Title Author Doc Type Topic Department
Brochures & Manuals Memos
News
Policies & Procedures Presentations Reports Employee Services Compensation Retirement Insurance Education & Training Manufacturing Safety Quality Free Text Entry
Taxonomy and Metadata
Content~Information~Data~Files
Metadata Fields
Metadata Values/Tags
Taxonomies (Flat or Hierarchical)~ Controlled Vocabularies
Traditional v. Business Taxonomies
Traditional Taxonomy Business Taxonomy
Purpose Categorization Findability
Designed By Scientists/Librarians The Business
Managed By Scientists/Librarians The Business
Used By Scientists/Librarians Everyone
Complexity Deep, Wide, Detailed Flat, Simple, Deconstructed Key Characteristics Mutually Exclusive,
Collectively Exhaustive
Usable, Intuitive, Natural
The Business Taxonomy
• Usable – Easy to adopt and utilize for any skill level
• Relatively flat (2-3 levels) • “Easy” to navigate
• Intuitive – Does not require training and reflects the way the user thinks
The Business Taxonomy
• Tend to be less rigid and constrained
• Influenced by “traditional” usability design
• Driven by the content and needs you have today • Leverages multiple categorization approaches
(via multiple metadata fields and multiple taxonomies) • Accepts imperfect categorization
Open Source Challenges and Considerations
• Open Source is “free” and “easy”
• But taxonomy isn’t…
• There are multiple ways to use taxonomy
• Menus, Search, Tag Clouds, Page Tags
• Taxonomy design is not enough, you need to plan for taxonomy implementation and exposure
• Open Source tools like Drupal favor “flat” taxonomies
• Faceting is easy to enable but requires diligent tagging and oversight
Taxonomy Design for Open Source – Best Practices
• Define taxonomy purpose, audience, and use cases upfront. Design before you build.
• Practice usability design best practices (limit depth and breadth, use plain language, etc). Flat lists work best in Open Source content
management tools.
• Leverage primary category/topic taxonomy with supporting metadata fields. For instance, in Drupal, use of multiple Lists with Views to
enable faceting.
• Design for your end users and publishers.
• Employ analytics and support iterative design.
• Plan for the long-term – ensure governance plans are in place before content migration and rollout.
• Concept Searching’s unique statistical concept identification underpins all technologies
• Multi-word suggestion is explicitly more valuable than single term suggestion algorithms
Concept Searching has a unique approach to ensure success
• conceptClassifier will generate conceptual metadata by extracting multi-word terms that identify ‘triple heart bypass’ as a concept as opposed to single keywords
• Metadata can be used by any search engine index or any application/process that uses metadata.
Concept Searching provides Automatic Concept Term Extraction
Triple Baseball Three Heart Organ Center Bypass Highway Avoid
Unique Approach
• Metadata driven application and enforcement of policies - conceptClassifier has been deployed since 2010 to automatically generate metadata and use that metadata to apply and enforce policies. Many clients are using the platform to support their information governance strategy.
• Proven, mature functionality out of the box - The platform has been deployed in numerous sites and applications across the enterprise, including MOSS and SharePoint 2010, 2013, Solr, Stellent,
Documentum, SQL, Oracle, File Shares, Exchange via SharePoint and across the enterprise.
Smart Content Framework™
Sum of parts is greater than whole
Open Source Considerations
“Given enough eyeballs, all bugs are shallow.”
Linus Torvalds Creator of Linux
• Security • Quality
• Customizability
• Freedom (avoid vendor lock-in) • Interoperability
• Auditability • Support • Cost
• Try Before You Buy
Any difference if you are purchasing ‘proprietary’ software? Not much!
Open Source or Proprietary – OK By Us
• Concept Searching Technology Platform
• conceptSearch • conceptClassifier
• conceptTaxonomyManager • conceptSQL
• conceptTaxonomyWorkflow
• conceptClassifier Technology Platform
• Compound Term Processing Engine • Licensed for concept extraction only • conceptClassifier
• conceptTaxonomyManager • conceptTaxonomyWorkflow
Situation
• Company is the premier global provider of fee based market intelligence, advisory services, and events for the information technology, telecommunications and consumer technology markets
• Seeking a solution to enhance site visitors’ search experience • Potential loss of revenues
Challenge
• Complex taxonomy requirements
• Inability for clients to identify the relevant information they were seeking
Solution
• conceptTaxonomyManager and conceptClassifier
• Solr
• Integrated in-house
Benefits
• Improved search results
• Increased accuracy and relevant retrieval of information for
“Automation is great, but still needs a human eye to gain that last bit of ground.
Anyway, it's a great story and I'm still very happy with
Concept Searching and the flexibility it gives us.”
Director, Enterprise Solutions Planning
Use Case
Smart Content FrameworkTM Building Blocks - Metadata, InsightWhat’s the End Result?
• Technology from Concept Searching complements Enterprise
Knowledge’s strategic and tactical planning experience and expertise in architecting solutions that improve business processes.
• Utilizing Concept Searching’s Smart Content Framework™ and
intelligent metadata enabled solutions, this partnership addresses key challenges in enterprise search, records management, data privacy, migration, and content management in secure and complex
environments.
For a comprehensive demo of the combined solution and discussion of expected ROI, please contact Don Miller at Concept Searching or
Thank You
Don Miller
Vice President of Sales Concept Searching
Twitter @conceptsearch Zach Wahl
President and Chief Executive Officer Enterprise Knowledge
[email protected] Twitter @EKConsulting