#AIIM14 #AIIM14
#AIIM14
A Primer On Metadata Analysis
Jeffrey Lewis Records Management Program Manager SOL Capital Management Co. @Info_Currency
What Is Metadata
§ Data About Data
§ Is used to relate informaGon to other pieces informaGon and their related
counterparts
§ Data that labels informaGon for the purpose of organizing it, idenGfying it
and finding it again.
§ Defines what something is, what it’s about and what characterisGcs it
posses.
§ Allows for finding other pieces of informaGon, and objects that exhibit
#AIIM14
Who Uses Metadata
§
Metadata Users Includes
§
Computer Forensics
§
Records Managers
§
Disc Jockey’s
§
PoliGcians
§
Athletes
§
Everyone!!!!!
Why Is Metadata Important
§
Reasons To PrioriGze Metadata
§
Business Intelligence
§
RetenGon and DisposiGon
§
Security
#AIIM14
Library Metadata Repository #2
#AIIM14
What Can We Learn About Metadata From
Bob Dylan
§ Come gather 'round people
Wherever you roam
And admit that the waters Around you have grown And accept it that soon
You'll be drenched to the bone If your Gme to you
Is worth savin'
Then you be[er start swimmin' Or you'll sink like a stone
Metadata Then and Now in RIM
#AIIM14
Metadata Is Everywhere
§
Regardless of the content, metadata is crucial
to go from informaGon chaos to informaGon
opportunity
§
Content can come in one of three varieGes
§
Structured
§
Unstructured
§
Semi-‐structured
Structured Content
Structured content is what powers many web services and is the Tower of Babel for different types of data and informaGon between servers, computers and humans. Typically structured content is referred to as data and stored in databases.
Examples include
• XML
• Excel Spreadsheets
#AIIM14
Unstructured Content
One of the most common types of unstructured content that we will interact with are
correspondence or other documents.
Structured content has informaGon tagged for
machines to read and parse. Unstructured content is designed for humans to read and extract key informaGon.
Two things to note about our example that tells us it is unstructured:
1) It is text heavy
2) Nothing in it can be readily classified or stored in a structured format (i.e., table or database)
Semi-‐structured Content
Semi-‐structured content is sGll text heavy, but has content that can be parsed out and categorized. One such example is a webpage that has informaGon tagged to make it searchable and also read by a computer so it can be displayed in a format that is easily readable.
#AIIM14
Two Tools For Metadata
Analysis of All Content
Metadata analysis does not have to be
cumbersome, even for unstructured content.
Two tools can make your life easier:
1) OCR
2) AutocategorizaGon and
AutoclassificaGon
Tip #1 For Metadata Analysis
§
Perform Data Quality Checks
#AIIM14
Tip #2 – For Metadata Analysis
§
Don’t Hoard Data
§ According to Compliance, Governance and Oversight Council
“OrganizaGons on average need to archive about 2-‐3% of data for legal hold, 5-‐10% to meet regulatory requirements, and 25% for
business analysis and insights…Once you delete data that’s stale, the algorithms actually funcGon much be[er from an analyGcs
standpoint. Leaving stale data can actually skew the algorithms toward older facts.” – John Bertolucci “Are You A Data Hoarder” published on February 25, 2013 in Informa(on Week
Tip #3 For Metadata Analysis
§
Use Standards When Performing Metadata
Analysis
§
Develop tools such as:
§
Taxonomy
§
Thesauri
#AIIM14
Tip #4 For Metadata Analysis
§
Ask The Right QuesGons of Metadata
§
The QuesGons You Ask Depends On The Content
Tip #5 For Metadata Analysis
§
Know The Tool Necessary
§
Your Search Tool Can Make All The
#AIIM14
Tip #5 For Metadata Analysis
§
Infrastructure Search Techniques
§
Homogeneous search
§
Federated search
§
Universal search
§
FuncGonal Search Techniques
§
ApplicaGon Search
§
Parametric Search
§
Keyword Search
How Does Big Data Fit Into
Metadata
§
Defining Big Data:
§ Gartner Group: The “Four V’s” definiGon: volume, velocity, variety,
veracity
§ Oracle: The derivaGon of value from tradiGonal relaGonal database-‐
driven business decision-‐making, augmented with new sources of unstructured data such as blogs, social media, sensor networks, and image data.
§ Intel: GeneraGng a median of 300 terabytes of data weekly. Includes
#AIIM14
How Does Big Data Fit Into
Metadata
§
Defining Big Data:
§ Microsoq: The process of applying serious compuGng power, the
latest in machine learning and arGficial intelligence, to seriously massive and oqen highly complex sets of informaGon.
§ The Method for an Integrated Knowledge Environment (MIKE2.0)
definiGon: A high degree of permutaGon and interacGon within a dataset, rather than the size of the dataset. “Big Data can be very small, and not all large datasets are Big.”
§ NIST: Data that exceeds the capacity or capability of current or
convenGonal [analyGc] methods and systems.
How Does Big Data Fit Into
Metadata
§
Defining Big Data:
§ The applicaGon definiGon (arrived at by analyzing the Google Trends
results for “big data”): Large volumes of unstructured and/or highly variable data that require the use of several different analysis tools and methods, including text mining, natural language processing, staGsGcal programming, machine learning, and informaGon
visualizaGon.
#AIIM14
Metadata Vs. Big Data
§
How Does One PrioriGze The Two
§
It Is Not An Either/Or
Three Reasons To Start With Metadata
1.
Less Complex
2.
Accessible To All
#AIIM14
Amazon’s RevoluWonary Approach
§
Amazon has leveraged metadata and Big Data
for certain compeGGve advantages over the
compeGGon.
§
PrevenGng Warehouse Theq
§
Improving Customer Service
#AIIM14