Online Information 2010-11-30
Nelleke Aders Tweede Kamer der Staten-Generaal
From documents to data
From documents to data
House of Representatives of the States
House of Representatives of the States
General
General
Nelleke Aders
Nelleke Aders
Project Manager Linked Data
Project Manager Linked Data
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
2
2
Who am I
•
My background
Library & Information Science (LIS)
•
Specialized in Information Access & Knowledge Organization
Systems
•
Department of Information Services /
House of Representatives of the States-General
•
Project manager Project Linked Data
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
3
3
Agenda
•
Context
– Historical note
– Paper based dissemination
– E-Parliament
– From documents to data
•
Project Linked Data
– Goal:
• More powerfull dissemination of parliamentary information
• Automation
– Oranization
– Procedure
•
Examples
Online Information 2010-11-30 Nelleke AdersTweede Kamer der Staten-Generaal
4
Historical Note
Historical Note
Documents…
�
“Parliaments function through the medium of
documents
”
(World E-Parliament Report, 2008)
•
Dissemination of parliamentary informatie based on documents
But now…
Form of documents changed:
Paper
digital
New possibilities
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
7
7
More documents…
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
8
8
Still lots of paper!
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 9
Registering the parliamentary process
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
10
10
Paper based dissemination
•
No details, important
information hidden
•
Time consuming procedure
•
Not what our users are looking
for!
•
Based on
documents
•
By means of
Manually adding metadata to
documents as a whole
•
Result of search
Unordered pile of documents
Online Information 2010-11-30
Nelleke Aders Tweede Kamer der Staten-Generaal
ICT & Parliament
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 12
E-Parlement
�
Use of Information Technology to improve availabililty, accessibility
and usability of parliamentary information.
E-Parliament
Central principles:
•
Outward oriented:
orientation to the citizen
•
Transparency
of the parliamentary process
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
13
13
�
House of representatives is THE source of parliamentary
information for all stakeholders
That means, that parliamentayr information is:
Available for all stakeholders
•
Up to date
•
Timely
•
Complete
•
Reliable
•
Objective
•
Safe
And also
•
Meaningfull
•
In context
Transparency of the parliamentary process
�
Availability
�
Dissemination
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 14
Project Linked Data (1)
�
From documents to data
•
Deeper dissemination of parliamentary information
•
Automation of manual input
Parliamentary documents:
�
Large quantity
�
Unstructured textual data, but in reality
�
Very structured, contain implicit knowledge
�
Metadata are sparse and on a document level
�
Metadata hidden in structure of documents
LOTS OF INFORMATION UNDER THE SURFACE!
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
15
15
Project Linked Data (2)
•
Cooperation University of Amsterdam
– Political Mashup Project
– Making large quantities of textual data available for
• large scale automatic quantitative data and content analysis
• done by scientists from the humanities and social sciences
AND, IN PRACTICE
Transparency of parliamentary process
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 16
Parlementary documents:
Intrinsic quality
•
When
it was spoken
•
By
whom
•
In what
function
•
Speaking on behalf of which
party
•
In which
context
•
Who were present
during the speech act
Of every word spoken in parliament we know…
Project Linked Data goal and means
1.
1.
MAKE IMPLICIT STRUCTURE OF PARLIAMENTARY DOCUMENTS
MAKE IMPLICIT STRUCTURE OF PARLIAMENTARY DOCUMENTS
EXPLICIT
EXPLICIT
2.
2.
TURN IMPLICIT REFERENCES INTO
TURN IMPLICIT REFERENCES INTO
HYPERLINKS, MAKING USE OF
HYPERLINKS, MAKING USE OF
PERMANENT URI
PERMANENT URI
’
’
S
S
(LINKED DATA)
(LINKED DATA)
How?
•
Textanalitics and XML-, Database en Information Retrieval
technology
•
Named entity recognition
•
Normalisation
•
Data mining, Machine Learning
•
Natural Language Processing, Language Models
1. Make implicit structure of parliamentary
documents explicit
Meeting
(topic)+
Topic
(speech | stage-direction)+
Speech
(p | stage direction)+
P
(#PCDATA | stage-direction)*
Stage-direction
(PCDATA)
Meeting (1 day)
•
Topic
•
Stage direction
•
Scene
•
Stage direction
•
Speech
•
Paragraph
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 19
Structure of Hansards (1)
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 20
Structure of Hansards (2)
Meeting (1 day)
• Topic
• Stage direction
• Scene
• Stage direction
• Speech
• Paragraph
Online Information 2010-11-30 Nelleke AdersTweede Kamer der Staten-Generaal 21
2. Turn implicit references into hyperlinks…
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 22
Procedure
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 23
… using permanent identifiers
PID’s in the parliamentary context:
•
Published parliamentary documents
•
Subunits in parliamentary documents
•
Named entities
–
Persons
–
Parties
–
Organizations
–
Dossiers
–
Controlled vocabulary terms
Needed:
1.
Resolver
2.
Namespace
3.
Internal practice of giving unique names to entities
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
24
24
From documents to data
•
Dissemination
– based on data in
document
•
By making use of
– Analysis data
– XML
– Identification of entities
– Linking of entities
– Meaning (URI’s, links)
•
Result of search: specific
answers to questions
•
Detailed:
Deep dissemination
•
Automatic:
Manual input is limited
•
Users are looking for
specific information,
answers
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
25
New Possibilities
New Possibilities
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 26
Dynamic report of parliamentary activity
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 27
Dynamic homepage for MP’s
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal 28
A picture of parliamentary debate
“Attaquograms”
Social network analysis (1)
•
Normalize the names of the persons
•
Transform all data into GraphML format
•
Computed basic social network statistics for the last six cabinet
periods
•
Visualized networks
•
Created an interactive page with all network data summarizing
co-submission of motions, amendments and written questions during
the last six governmental periods
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
31
31
Not just textual data
Search engine connecting heterogeneous data:
Missed a debate?
Openkamer.tv
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
32
32
Wrap up
From documents to data
�
Deeper dissemination
�
Automation of manual input
�
Unlimited possibilities for visualization and analysis
•
‘document’ may be any form,
•
data may be anywhere!
Online Information 2010-11-30
Nelleke Aders
Tweede Kamer der Staten-Generaal
33
Nelleke Aders
Specialist Information Access & Knowledge
Organization Systems
House of Representatives of the States General
n.aders@tweedekamer.nl
Maarten Marx
Informatics Institute
University of Amsterdam
maartenmarx@uva.nl
Thank you!
����������������������������������������������������
�����������������������
���������������������������������������������������������������������������������
�����������������������������������������������������