• No results found

A Primer On Metadata Analysis

N/A
N/A
Protected

Academic year: 2021

Share "A Primer On Metadata Analysis"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

#AIIM14   #AIIM14  

#AIIM14  

A  Primer  On  Metadata  Analysis  

Jeffrey  Lewis   Records  Management  Program  Manager   SOL  Capital  Management  Co.   @Info_Currency  

(2)

What  Is  Metadata  

§  Data  About  Data  

§  Is  used  to  relate  informaGon  to  other  pieces  informaGon  and  their  related  

counterparts  

§  Data  that  labels  informaGon  for  the  purpose  of  organizing  it,  idenGfying  it  

and  finding  it  again.  

§  Defines  what  something  is,  what  it’s  about  and  what  characterisGcs  it  

posses.  

§  Allows  for  finding  other  pieces  of  informaGon,  and  objects  that  exhibit  

(3)

#AIIM14  

Who  Uses  Metadata  

§

Metadata  Users  Includes  

§

Computer  Forensics  

§

Records  Managers  

§

Disc  Jockey’s  

§

PoliGcians  

§

Athletes  

§

Everyone!!!!!  

(4)

Why  Is  Metadata  Important  

§

Reasons  To  PrioriGze  Metadata  

§

Business  Intelligence  

§

RetenGon  and  DisposiGon  

§

Security  

(5)

#AIIM14  

(6)

Library  Metadata  Repository  #2  

(7)

#AIIM14  

What  Can  We  Learn  About  Metadata  From  

Bob  Dylan  

§  Come  gather  'round  people  

Wherever  you  roam  

And  admit  that  the  waters   Around  you  have  grown   And  accept  it  that  soon  

You'll  be  drenched  to  the  bone   If  your  Gme  to  you  

Is  worth  savin'  

Then  you  be[er  start  swimmin'   Or  you'll  sink  like  a  stone  

(8)

Metadata  Then  and  Now  in  RIM  

(9)

#AIIM14  

Metadata  Is  Everywhere  

§

Regardless  of  the  content,  metadata  is  crucial  

to  go  from  informaGon  chaos  to  informaGon  

opportunity  

§

Content  can  come  in  one  of  three  varieGes    

§

Structured  

§

Unstructured  

§

Semi-­‐structured  

 

(10)

Structured  Content  

Structured  content  is  what  powers  many   web  services  and  is  the  Tower  of  Babel  for   different  types  of  data  and  informaGon   between  servers,  computers  and  humans.   Typically  structured  content  is  referred  to   as  data  and  stored  in  databases.    

Examples  include    

• XML  

• Excel  Spreadsheets  

(11)

#AIIM14  

Unstructured  Content  

One  of  the  most  common  types  of  unstructured   content  that  we  will  interact  with  are  

correspondence  or  other  documents.    

Structured  content  has  informaGon  tagged  for  

machines  to  read  and  parse.    Unstructured  content   is  designed  for  humans  to  read  and  extract  key   informaGon.      

 

Two  things  to  note  about  our  example  that  tells  us   it  is  unstructured:    

1)  It  is  text  heavy  

2)  Nothing  in  it  can  be  readily  classified  or  stored  in   a  structured  format  (i.e.,  table  or  database)  

(12)

Semi-­‐structured  Content  

Semi-­‐structured  content  is  sGll  text  heavy,   but  has  content  that  can  be  parsed  out   and  categorized.    One  such  example  is  a   webpage  that  has  informaGon  tagged  to   make  it  searchable  and  also  read  by  a   computer  so  it  can  be  displayed    in  a   format  that  is  easily  readable.  

(13)

#AIIM14  

Two  Tools  For  Metadata    

Analysis  of  All  Content  

Metadata  analysis  does  not  have  to  be  

cumbersome,  even  for  unstructured  content.    

Two  tools  can  make  your  life  easier:  

1)  OCR  

2)  AutocategorizaGon  and  

AutoclassificaGon  

(14)

Tip  #1  For  Metadata  Analysis  

§

Perform  Data  Quality  Checks  

(15)

#AIIM14  

Tip  #2  –  For  Metadata  Analysis  

§

Don’t  Hoard  Data  

§  According  to  Compliance,  Governance  and  Oversight  Council  

“OrganizaGons  on  average  need  to  archive  about  2-­‐3%  of  data  for   legal  hold,  5-­‐10%  to  meet  regulatory  requirements,  and  25%  for  

business  analysis  and  insights…Once  you  delete  data  that’s  stale,  the   algorithms  actually  funcGon  much  be[er  from  an  analyGcs  

standpoint.    Leaving  stale  data  can  actually  skew  the  algorithms   toward  older  facts.”  –  John  Bertolucci  “Are  You  A  Data  Hoarder”   published  on  February  25,  2013  in  Informa(on  Week  

(16)

Tip  #3  For  Metadata  Analysis  

§

Use  Standards  When  Performing  Metadata  

Analysis  

§

Develop  tools  such  as:  

§

Taxonomy  

§

Thesauri  

(17)

#AIIM14  

Tip  #4  For  Metadata  Analysis  

§

Ask  The  Right  QuesGons  of  Metadata  

§

The  QuesGons  You  Ask  Depends  On  The  Content  

(18)

Tip  #5  For  Metadata  Analysis  

§

Know  The  Tool  Necessary  

§

Your  Search  Tool  Can  Make  All  The  

(19)

#AIIM14  

Tip  #5  For  Metadata  Analysis  

§

Infrastructure  Search  Techniques  

§

Homogeneous  search  

§

Federated  search  

§

Universal  search  

§

FuncGonal  Search  Techniques  

§

ApplicaGon  Search  

§

Parametric  Search  

§

Keyword  Search

 

(20)

How  Does  Big  Data  Fit  Into  

Metadata  

§

Defining  Big  Data:  

§  Gartner  Group:  The  “Four  V’s”  definiGon:  volume,  velocity,  variety,  

veracity  

§  Oracle:  The  derivaGon  of  value  from  tradiGonal  relaGonal  database-­‐

driven  business  decision-­‐making,  augmented  with  new  sources  of   unstructured  data  such  as  blogs,  social  media,  sensor  networks,  and   image  data.  

§  Intel:  GeneraGng  a  median  of  300  terabytes  of  data  weekly.  Includes  

(21)

#AIIM14  

How  Does  Big  Data  Fit  Into  

Metadata  

§

Defining  Big  Data:  

§  Microsoq:  The  process  of  applying  serious  compuGng  power,  the  

latest  in  machine  learning  and  arGficial  intelligence,  to  seriously   massive  and  oqen  highly  complex  sets  of  informaGon.  

§  The  Method  for  an  Integrated  Knowledge  Environment  (MIKE2.0)  

definiGon:    A  high  degree  of  permutaGon  and  interacGon  within  a   dataset,  rather  than  the  size  of  the  dataset.    “Big  Data  can  be  very   small,  and  not  all  large  datasets  are  Big.”  

§  NIST:  Data  that  exceeds  the  capacity  or  capability  of  current  or  

convenGonal  [analyGc]  methods  and  systems.    

(22)

How  Does  Big  Data  Fit  Into  

Metadata  

§

Defining  Big  Data:  

§  The  applicaGon  definiGon  (arrived  at  by  analyzing  the  Google  Trends  

results  for  “big  data”):    Large  volumes  of  unstructured  and/or  highly   variable  data  that  require  the  use  of  several  different  analysis  tools   and  methods,  including  text  mining,  natural  language  processing,   staGsGcal  programming,  machine  learning,  and  informaGon  

visualizaGon.    

(23)

#AIIM14  

Metadata  Vs.  Big  Data  

§

How  Does  One  PrioriGze  The  Two  

§

It  Is  Not  An  Either/Or  

(24)

Three  Reasons  To  Start  With  Metadata  

1.

Less  Complex  

2.

Accessible  To  All  

(25)

#AIIM14  

(26)

Amazon’s  RevoluWonary  Approach  

§

Amazon  has  leveraged  metadata  and  Big  Data  

for  certain  compeGGve  advantages  over  the  

compeGGon.  

§

PrevenGng  Warehouse  Theq  

§

Improving  Customer  Service  

(27)

#AIIM14  

The  End  

 

References

Related documents

It has been shown that inequity averse agents exert higher e ff ort levels than purely self-interested agents for a given prize structure but that fi rst-best e ff orts are no

Grieg Piano Concerto Katya Apekisheva Mar-12 Jonathan Trim John Woodward Holy Trinity CLIC Sargent. Butler-Downton Alexa Star Mar-12 Jonathan Trim John Woodward Holy Trinity

From Fig. 4.9, we can see that the distribution of plastic hinges has the same rule with or without joint model, which is at the ends of all columns in first story and interior

7: Algorithm of Merchant bank Merchant bank verifies merchant, receives payment message from Client bank through payment server and add payment to Merchant’s

Even when the various social vulnerability and inconsistent infrastructure access variables were taken into account, low household income was still a significant and strong

• An estimate of the total economic footprint of the 17 colleges and universities in America’s Urban Campus, including the total spending, earnings, and employ- ment that they

CHECK OUT THE TECHNICAL WRITING ON AN EXPANDED SORT OF THIS PHONE SOLUTIONS FOR SMALL BUSINESS, INCLUDING A CORRECTLY FORMATTED TYPE OF THE INSTANCE MANUAL

• Theus Hossmann, Franck Legendre, Thrasyvoulos Spyropoulos, “From Contacts to Graphs: Pitfalls in Using Complex Network Analysis for DTN Routing,” in Proceedings of IEEE