• No results found

Data Categories and Codes

N/A
N/A
Protected

Academic year: 2022

Share "Data Categories and Codes"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Data  Categories  and  Codes  

V0.01   First  version  

   

   

   

   

   

WG  Charter  

Standardisation  of  data  categories  and  codes  for  human  communication  resources  

Value  Proposition  

A  specific  description  of  who  will  benefit  from  the  adoption  /  implementation  of  the  CWG  outcomes   /“deliverables”  and  what  tangible  impacts  the  adoption  /  implementation  of  the  deliverables  should   have.  

 

The  outputs  of  this  Working  Group  will  assist  with  greater  data  sharing,  data  discovery,  and   interoperability  of  repositories/archives  

 

Who  benefits:  

• interdisciplinary  researchers  

• in-­‐domain  researchers  

• researchers’  institutions  (through  increased  visibility  and  accessibility)  

• data  collections/repositories  (visibility,  accessibility)  

 

Outcomes:  

• Engagement  with  ISO  processes  in  support  of  standardisation  of    ISO639  by   TC37/SC2  

• Establishment  of  Australian  mirror  committee  of  TC37  

 

Deliverables:  

• Recommended  set  of  CMDI  core  components  and  ISO  data  categories  onto  which   they  map  

• Recommended  CMDI  schema  with  mappings  to  metadata  schemas  currently   accepted  in  the  relevant  domains  

 

(2)

Impacts:  

1. Improved  practice  in  identification  of  all  aspects  of  human  communication  in   resource  descriptions  

2. Easier  identification  of  language  resources  in  other  repositories  where  language  is   not  the  primary  focus  

3. Easier  identification  of  non-­‐language  aspects  (e.g.  music)  in  repositories  where   language  is  the  primary  focus  

4. Users  of  language  archives  will  receive  improved  access  to  resources   5. Language  archives  will  be  able  to  more  easily  share  resources  

6. Increased  researcher  influence  on  standards  that  they  use   7. Greater  discovery  (and  hence  potential  for  re-­‐use)  of  resources  

8. More  explicit  semantics  around  language  and  music  resources  enable  more  informed   re-­‐use  and  facilitates  automated  re-­‐use  

 

Engagement  with  existing  work  in  the  area  

A  brief  review  of  related  work  and  plan  for  engagement  with  any  other  activities  in  the  area.  

 

A  number  of  existing  initiatives  are  working  in  related  areas.  The  following  table  shows  these   initiatives  by  impact  area.  This  Working  Group  will  liaise  closely  with  these  initiatives  to  avoid   duplication  of  effort  and  to  ensure  coordination  across  this  space.  

Initiative Full name Relevant Impacts

CMDI Component MetaData

Infrastructure

[Applied in all]

E-MELD Electronic Metastructure

for Endangered Languages Data

1, 6, 8

DOBES/LIBES Dokumentation Bedrohter

Sprachen (Documentation of Endangered Languages)

1, 3, 4, 5, 6 ,7 ,8

CLARIN ERIC Common Language

Resources and Technology Infrastructure

1, 4, 5, 6, 7, 8

ISO TC-37 1, 8

META-SHARE Multilingual Europe

Technology Alliance

All

(3)

FROLIC Framework for the

Organization of Language Identification Codes

1, 6, 8

RELISH Rendering Endangered

Language Lexicons Interoperable through Standards Harmonization

1, 3, 4, 5, 6 ,7 ,8

   

Plan  for  engagement  with  any  other  activities   HuNI

(Humanities Networked Infrastructure)

The HuNI Project is using linked Open Data technology to integrate 28 of Australia’s most important cultural datasets into a ‘virtual laboratory’. Many of these datasets contain material relevant to research on human

communication while not all being primarily oriented to such material. Engagement with this project is therefore important in developing the WG’s aim to enable access to human communication resources outside of domain specific repositories.

HCS vLab

(Human Communication Science Virtual Laboratory)

The HCS vLab will connect HCS researchers, their desks, computers, labs, and universities and so accelerate HCS research and produce emergent knowledge that comes from novel application of previously unshared tools to analyse previously difficult to access data sets. The HCS vLab infrastructure will overcome resource limitations of individual desktops; allow easy access to shared tools and data; and provide the guided use of workflow tools and options to allow researchers to cross disciplinary

boundaries. One of the bases of this project is sharing of data; engagement with it will provide important input to the WG’s activities in improving interoperability of HCS datasets.

ISO TC37

ISO Technical Committee 37 Terminology and other language and content resources

TC37 is the body which has responsibility for overseeing the development of international standards for identifying codes for languages, language families and varieties within languages. Engagement with this committee is crucial to the planned activities of the WG in order to ensure that any recommendations align with proposed standards.

(4)

CLARIN

Common Language Resources and Technology Infrastructure

CLARIN aims at providing easy and sustainable access for scholars in the humanities and social sciences to digital language data (in written, spoken, video or multimodal form) and advanced tools to discover, explore, exploit, annotate, analyse or combine them, independent of where they are located. The aims of the WG are closely aligned with those of CLARIN;

engagement with the organisation will ensure the input of European expertise in the WG’s activity and will provide a platform for the adoption of outputs in European research communities.

LinguistList Through its involvement in projects such as E-MELD and

RELISH (see above), LinguistList has become a point of contact for US-based work on interoperability of language resources. Engagement with this organisation will ensure the input of North American expertise in the WG’s activity and will provide a platform for the adoption of outputs in international research communities (given the worldwide reach of LinguistList as a medium of

communication).

Repositories/Aggregators The endpoint of the WG’s activity should be the adoption of its recommendations by the repositories active in the relevant fields. This would also flow through to

aggregators of metadata from repositories (such as OLAC – Open Language Archiving Community). Engagement with these stakeholders from the WG’s inception is therefore critical. Such engagement will be achieved through direct contact with repositories (WG members Drude, Barwick and Thieberger are staff members of relevant repositories) and through representative bodies such as DELAMAN (Digital Endangered Languages and Music Archives Network).

 

Work  Plan  

A  specific  and  detailed  description  of  how  the  CWG  will  operate  including:  

a)  The  form  and  description  of  final  deliverables  of  the  candidate  Working  Group,  

• Recommended  set  of  CMDI  core  components  and  an  associated  schema  for  use  by   repositories  holding  resources  which  include  a  language  component;  

• Recommended  set  of  data  categories  onto  which  the  CMDI  components  map,  to  be  

proposed  as  candidate  standards  within  the  ISOcat  process.    

(5)

• Mappings  between  common  metadata  schemas  currently  in  use  in  relevant   repositories  and  the  CDMI  schema.  

b)  The  form  and  description  of  milestones  and  intermediate  documents,  code  or  other  deliverables   that  will  be  developed  during  the  course  of  the  CWG’s  work,  

For  each  of  the  deliverables  listed  above:  

i)  Discussion  papers  detailing  the  issues  to  be  addressed  and  canvassing  possible  solutions.  

ii)  Draft  recommendations  based  on  i)  and  input  from  relevant  communities.  

 

Additionally:  

Establishment  of  an  Australian  mirror  committee  of  ISO  TC37  as  an  avenue  for  increased   engagement  of  research  communities  in  the  ISO  639  processes.  

 

Once  agreed,  it  is  expected  that  the  deliverables  will  be  progressively  implemented  in  computer   software  and  systems.  See  also  Adoption  Plan  below.  

 

c)  a  description  of  the  Working  Group’s  mode  and  frequency  of  operation  (e.g.  on-­‐line  and/or  on-­‐site,   how  frequently  will  the  group  meet,  etc.),    

The  WG  will  work  mainly  on-­‐line;  however  opportunities  for  physical  meetings  by  members  of  the   WG  (e.g.  alongside  other  conferences  or  meetings)  will  be  utilised  also.    

 

d)  a  description  of  how  the  Working  Group  plans  to  develop  consensus,  address  conflicts,  stay  on   track  and  within  scope,  and  move  forward  during  operation,  and  

The  WG  will  proceed  by  means  of  open  processes  in  online  environments.  Representatives  of  key   stakeholders  are  involved  which  will  ensure  that  proposals  made  by  the  WG  will  have  an  excellent   chance  of  acceptance  by  consensus  across  those  groups.  The  Australian  involvement  is  based  on   existing  networks  which  have  had  an  outstanding  record  of  co-­‐operative  endeavour  over  at  least  the   last  five  years  (e.g.  HCSNet).  Similarly,  the  European  involvement  in  the  WG  is  based  partially  on  the   CLARIN  network,  which  has  an  excellent  record  in  fostering  co-­‐operative  work.  Wider  co-­‐operation   between  these  two  groups  has  been  underway  for  the  past  year.  Whilst  the  WG  will  call  on  this  basis   of  co-­‐operation  to  enable  its  work,  the  involvement  of  the  stakeholder  groups  will  also  ensure  sound   governance  and  oversight  of  the  progress  of  the  group.  

 

(6)

e)  a  description  of  the  CWG’s  planned  approach  to  broader  community  engagement  and   participation  

The  interim  documents  detailed  at  b)  above  will  be  made  available  as  widely  as  possible.  The  WG   will  have  a  distribution  list  of  major  stakeholders  and  documents  will  be  sent  to  them  for  

consideration;  in  addition  documents  will  be  made  available  online  for  general  consultation  and  this   process  will  be  publicised  in  relevant  fora  e.g.  LinguistList.  

Adoption  Plan  

A  specific  plan  for  adoption  /  implementation  of  the  CWG  deliverables  /  outcomes  within  the   organizations  and  institutions  represented  by  CWG  members,  as  well  as  plans  for  adoption  /   implementation  of  the  deliverables  /  outcomes  more  broadly  within  the  community.  Such  

adoption/implementation  should  start  within  the  12-­‐18  month  timeframe,  prior  to  the  completion  of   the  Working  Group.  

Initial  implementation  of  the  WG  deliverables  will  mean  use  of  the  CMDI  components  and  schema  in   description  of  resources  in  the  two  repositories  represented  in  the  WG  (The  Language  Archive  and   PARADISEC)  [approximate  timescale:  12  months].  At  the  same  time,  the  data  categories  associated   with  the  CMDI  schema  will  be  proposed  as  standards  within  the  ISOcat  framework.  )  [approximate   timescale:  12  months]  When  these  two  initial  stages  are  completed,  the  outcomes  of  the  WG’s   activities  will  be  visible  to  relevant  communities  and  activity  will  move  to  advocating  the  adoption  of   the  schema  and  standards  more  widely.  This  will  be  accomplished  by  conference  presentations,   publications  and  demonstrations.  [approximate  timescale:  12-­‐18  months]  

Initial  Membership  

A  specific  list  of  initial  members  of  the  CWG  and  a  description  of  initial  leadership  of  the  CWG.  

 

Linda  Barwick,  PARADISEC/USyd  

Anna  Belew,  Linguist  List/  Eastern  Michigan  University   Steve  Cassidy,  HCS  V-­‐Lab/Macquarie  

Sebastian  Drude,  TLA/MPI   Dominique  Estival,  MARCS/UWS   Ingrid  Mason,  HuNI/Intersect  

Simon  Musgrave,  ARGILARE/Monash/AusNC  (Initial  leader)   Gary  Simons,  SIL/Graduate  Institute  of  Applied  Linguistics/OLAC   Nick  Thieberger,  PARADISEC/UniMelb  

(7)

Michael  Walsh,  AIATSIS/USyd   Menzo  Windhouwer,  TLA/DANS    

 

References

Related documents

One theme that also recurs frequently within book I, and one indeed occurring frequently in Greek literature in general, and especiaiJy early Greek poetry, concerns

Petrescu-Mag Ioan Valentin: Bioflux, Cluj-Napoca (Romania) Petrescu Dacinia Crina: UBB Cluj, Cluj-Napoca (Romania) Sima Rodica Maria: USAMV Cluj, Cluj-Napoca (Romania)

Within analyzes of production performances in Serbian agriculture are discussed the dynamics of agricultural production as well as partial productivity in

The CAP was built-up to meet requirements of West European agriculture based on family farming, whereas the Czech large-scale production faced other types of problems..

Glass Palaces and Glass Cages: Organizations in Times of Flexible Work, Fragmented Consumption and Fragile Selves* Yiannis Gabriel.. Max Weber’s metaphor of ‘the iron cage’ has

Infraestructura del Perú INTERNEXA REP Transmantaro ISA Perú TRANSNEXA, 5% investment through INTERNEXA and 45% through INTERNEXA (Perú) COLOMBIA ARGENTINA CENTRAL AMERICA

You will then specialise by choosing two elective / specialisation modules in areas such as International Corporate Finance, Risk Management and Ethics, and Advanced Inter-

The ethanol extract of henna leaves 400 mg/kg BW with a dose of leaf ethanol significantly decreased the blood glucose level of wistar mice and there was no