www. m o rpho log ic .hu w w .itk.p p ke.h u
HLT in Hungary - 2009
Gábor Prószéky
MorphoLogic
http://www.morphologic.hu
Pázmány Péter Catholic University
Faculty of Information Technology
www. m o rpho log ic .hu .itk.p p ke.h u
Basics On Hungarian
15 million speakers world-wide, 10 million in Hungary
Agglutinative language: Fenno-Ugric roots (with uncertain points) and with a few small relative languages only
Since 896 in Central Europe: Turkish, Slavic, Romance and German areal influences
Complex formal descriptions have been needed, namely simple CL methods (which work for English) don’t work The first detailed and computationally usable morpho-syntactic description of Hungarian was made in 1991
www. m o rpho log ic .hu w w .itk.p p ke.h u
History of Hungarian HLT
1960’s: Russian-Hungarian MT Group, periodical „Computational Linguistics” (Prof. Kiefer)
1970’s: Atergo dictionary, basic language statistics (Debrecen University - Prof. Papp)
1980’s-: Speech applications
(Technical University - Prof. Gordos, Németh, Olaszy, Vicsi) AI applications
(ALL – Gergely et al.)
1991-: Marketable NLP products
(MorphoLogic - Prószéky et al.)
1990’s: Historical dictionary, corpus linguistics
(Linguistics Institute of HAS - Váradi et al.), 2000’s-: Learning methods in NLP
(Szeged University - Prof. Csirik)
Services combined with speech applications (AITIA – Tatai et al.)
2002-: Courses in HLT, PhD’s in HLT
(Pázmány University - Prof. Prószéky, Prof. Takács) 2003-: Series of Annual National HLT Conferences
www. m o rpho log ic .hu .itk.p p ke.h u
Hungarian HLT Research
–
MorphoLogic
(Gábor Prószéky)
• Staff: 15
• Proofing tools, intelligent dictionaries, machine translation, a large scale of linguistic resources for various languages (incl. Hungarian WordNet), text processing tools, lexicographical activities
–
Linguistics Institute
(Tamás Váradi)
• Staff: 8
• Hungarian National Corpus, research activities in various CL projects (incl. Hungarian WordNet)
–
Szeged University
(János Csirik)
• Staff: 6
• Machine learning tools for NLP, speech research, activities in various CL research projects (incl. Hungarian WordNet)
www. m o rpho log ic .hu w w .itk.p p ke.h u
Hungarian HLT Research
(cont’d)
–
Technical University of Budapest (TMIT)
• Staff: 28 (in 9 laboratories)
• Speech Technology Lab: speech information systems, e-mail/sms reader, tools for blind people (Géza Németh) • Speech Acoustics Lab: speech databases, medical
applications, speech correction, acoustic-phonetic research (Klára Vicsi)
• Speech Recognition Lab: speaker recognition, speech
recognition, statistical modeling, multimedia indexing (Tibor Fegyó, Péter Mihajlik)
–
Technical University of Budapest (MOKK)
• Staff: 5
• Corpus collection (mono- and bilingual), text aligning,
audio/video archives, ontology modeling, POS-tagging (Péter Halácsy)
www. m o rpho log ic .hu .itk.p p ke.h u
Hungarian HLT Research
(cont’d)
–
Pázmány Péter Catholic University,
Faculty of Information Technology
(Gábor Prószéky, György Takács)
• 4 researchers, 7 PhD students
• Language: WSD, semantic representation, anaphora resolution, text mining
• Speech: mobile applications (incl. mobile for the deaf!)
–
Pécs University
(Gábor Alberti)
• 4 researchers, 2 PhD students
• Computational semantics, machine translation, Prolog
–
Other universities
(with 1-2 researchers)
• Debrecen (literary computing) • Miskolc (face modeling)
www. m o rpho log ic .hu w w .itk.p p ke.h u
Hungarian HLT Research
(cont’d)
–
Applied Logic Laboratory
(Tamás Gergely)
• 4 researchers, 5 PhD students
• AI tools for medical and pharmacological applications, cognitive systems
–
AITIA
(Gábor Tatai)
• 48 co-workers (a few of them in HLT)
• Speech technology applications, text mining, chat-robots
–
Kilgray
(Balázs Kis)
• 3 full-time employees
www. m o rpho log ic .hu .itk.p p ke.h u
International Cooperations in HLT
Earlier in the 1990’s: MULTEXT-East, GLOSSER, GRAMLEX, ELSNET Goes East, SPECO, BABEL, TELRI, TRACTOR, …
EuroTermBank (MorphoLogic): common EU terminology
ImportNet (ALL): ontology generation
EASAIER (ALL): multimedia search
CACAO (Linguistics Institute): library applications with HLT
EuroMatrix (MorphoLogic): statistical MT for Europe
CLARIN (Linguistics Institute & others): resources
www. m o rpho log ic .hu w w .itk.p p ke.h u
Hungarian HLT Platform
(2008-2010)
Founders of the Platform:
4 industrial partners:
AITIA
Applied Logic Laboratory Kilgray
MorphoLogic
4 academic partners:
Linguistics Institute, HAS
Technical University, Telecomm. & Media-informatics (TMIT) Technical University, Center for Media Res. & Educ. (MOKK) Szeged University, Res. Group of AI (RGAI)
New member:
www. m o rpho log ic .hu .itk.p p ke.h u
Hungarian Education in NLP
Courses in CL/HLT/NLP:Pázmány University: HLT (Prószéky + 5 PhD) speech (Takács + 2 PhD)
Szeged University: machine learning (Csirik, Alexin + 3 PhD)
Technical University: speech (Gordos, Németh, Olaszy, Vicsi + 3 PhD) artificial intelligence (Prószéky)
Others:
Debrecen University: general linguistics programme (Hunyadi)
ELTE University: theoretical linguistics programme (Kálmán, Oravecz) Dept. of Translation Theory (Prószéky + 3 PhD)
Pécs University: semantic representation (Alberti + 3 PhD)
www. m o rpho log ic .hu w w .itk.p p ke.h u
Annual National Conferences
in Computational Linguistics
2-day conferences, always in December:
2003) 1st:
39 long and 20 short papers
2004) 2nd:
46 papers (in 8 sections)
2005) 3rd:
40 papers (in 7 sections), 13 posters & demos
2006) 4th:
34 papers (in 7 sections), 16 posters & demos
2007) 5th:
30 long papers (in 7 sections), 8 posters & demos
2008) Kick-off Conference of the Platform
: 6 plenary
presentations, 9 posters & demos
www. m o rpho log ic .hu .itk.p p ke.h u
Hungary’s Nr.1 HLT website:
www.webforditas.hu
Website for various HLT applications: text & web translation, dictionaries, spell-checking, search with linguistic support For „fordítás” (=„translation”) it is the 1st in Google (among nearly 20 million hits)
60 000 visitors/day
In 2008: 91 million pages translated
(in 2007: 43 million pages) 81 million text translation +
2 million web translation + 7 million dictionary lookup
13,3 GB data traffic/year
(with 1800 char/page it is 7,2 million A4 page translation) … and the human translation market felt nothing
www. m o rpho log ic .hu w w .itk.p p ke.h u
Translation between Hungarian and
33 other languages
Technically, it is rather easy to combine two existing web translation services: HU-EN + EN-X and X-EN + EN-HU EN-X and X--EN language pairs for which commercial translation services are currently available:
Official EU languages to and from Hungarian:
1. Bulgarian-Hungarian/Hungarian-Bulgarian Magyar/български MorphoLogic & SkyCode
2. Czech-Hungarian/Hungarian-Czech Magyar/Čeština
3. Danish-Hungarian/Hungarian-Danish Magyar/Dansk MorphoLogic & GrammarSoft 4. Dutch-Hungarian/Hungarian-Dutch Magyar/Nederlands
5. English-Hungarian/Hungarian-English Magyar/English MorphoLogic (Hu-En: with LI & SU) 6. Finnish-Hungarian/Hungarian-Finnish Magyar/Suomi
7. French-Hungarian/Hungarian-French Magyar/Français MorphoLogic & ProMT 8. German-Hungarian/Hungarian-German Magyar/Deutsch MorphoLogic & ProMT 9. Greek-Hungarian/Hungarian-Greek Magyar/Ελληνικά
10. Italian-Hungarian/Hungarian-Italian Magyar/Italiano MorphoLogic & ProMT 11. Latvian-Hungarian/Hungarian-Latvian Magyar/Latviesu valoda MorphoLogic & Trident 12. Lithuanian-Hungarian/Hungarian-Lithuanian Magyar/Lietuviu kalba
13. Polish-Hungarian/Hungarian-Polish Magyar/Polski MorphoLogic & pwn.pl 14. Portuguese-Hungarian/Hungarian-Portuguese Magyar/Português MorphoLogic & ProMT 15. Romanian-Hungarian/Hungarian-Romanian Magyar/Română
16. Slovak-Hungarian/Hungarian-Slovak Magyar/Slovenčina 17. Slovene-Hungarian/Hungarian-Slovene Magyar/Slovenščina
18. Spanish-Hungarian/Hungarian-Spanish Magyar/Español MorphoLogic & ProMT 19. Swedish-Hungarian/Hungarian-Swedish Magyar/Svenska
20.
Other European languages to and from Hungarian:
www. m o rpho log ic .hu w w .itk.p p ke.h u
www. m o rpho log ic .hu .itk.p p ke.h u
Features of a general
web translation service
Text translation
techniques for any X language if X-En
and En-X services are available
Translation of entire
websites
Combination with various
dictionaries
(Web2, AJAX)
Virtual
keyboard
for all languages
Spell-checking
for all languages
Integrated
text-to-speech
tools (and
speech
recognition
, soon)
Language guesser
tools integrated
www. m o rpho log ic .hu w w .itk.p p ke.h u
MT Service for All European Languages
Proposal for a new Pan-European cooperation
Remark: we have not lost the interest in finding new ways in MT (e.g. we are partners in EuroMatrix), and we are still working on new scientific methods, as well,
BUT THIS PROPOSAL IS DIFFERENT:
it guarantees a usable translation service for a wide range of end-users on the basis
of the existing service www.webforditas.hu,
the above pivot application is running and anybody can use it (for the time being,
60.000 users/day),
to extend the existing application to any other local languages, software technological developments are needed only,
usable final results can be guaranteed,
service providers for many languages are recently available on the market,
cooperation has already started: both from the EU (HU, BG, DK, PL) and from non-EU countries (RU, UKR),
partners’ R&D activity is basically the improvement of their own service to have
www. m o rpho log ic .hu .itk.p p ke.h u
Köszönöm
attention!
Thanks for your