Language Technology based
on Big Data: Current Situation
and Future Perspectives
Timo Honkela
30 October 2014
Department of
Modern Languages
Centre for Preservation
and Digitisation
Introductory
remarks
HELSINKI
MIKKELI
Department of
Modern Languages
Language
Digital humanities
●
Research within humanities
with the help of computers
–
Digital resources
–
Computational models
●
Basic motivation
–
One can already fly to moon and
build sophisticated factorial products
–
The most important open questions
in the world are related to humanities
and social sciences
Changing role of computers
●
Machines are increasingly capable of performing
pattern
recognition
and
learning
.
●
Traditionally ICT systems were programmed to perform
their operations in a manner that made them predictable.
●
The systems do not repeat their actions in similar manner
over and over but they evolve and can take contextual
factors into account better than before
Early personal experiences on
rule-based natural language processing
●
H. Jäppinen, T. Honkela, H. Hyötyniemi & A. Lehtola (1988):
A Multilevel Natural Language Processing Model.
Nordic Journal of Linguistics 11:69-87.
What is the turnover of the ten largest stock exchange companies in forestry?
Morphological analysis
Dependency parsing
Logical analysis
Texts
Images
Videos
Computational
models
Numerical
data
DIGITAL RESOURCES
Speeches/
convers.
Multimedia
documents
Interactive
systems
Computer
software
Complexity of language
as an object of study
and as an means
of representation and
communication
> 6000 languages,
many more dialects
Billions of people
blogs.state.gov
en.wikipedia.org
A large number of
different cultures
en.wikipedia.org