• No results found

HLT in Hungary

N/A
N/A
Protected

Academic year: 2021

Share "HLT in Hungary"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

www. m o rpho log ic .hu w w .itk.p p ke.h u

HLT in Hungary - 2009

Gábor Prószéky

MorphoLogic

http://www.morphologic.hu

Pázmány Péter Catholic University

Faculty of Information Technology

(2)

www. m o rpho log ic .hu .itk.p p ke.h u

Basics On Hungarian

15 million speakers world-wide, 10 million in Hungary

Agglutinative language: Fenno-Ugric roots (with uncertain points) and with a few small relative languages only

Since 896 in Central Europe: Turkish, Slavic, Romance and German areal influences

Complex formal descriptions have been needed, namely simple CL methods (which work for English) don’t work The first detailed and computationally usable morpho-syntactic description of Hungarian was made in 1991

(3)

www. m o rpho log ic .hu w w .itk.p p ke.h u

History of Hungarian HLT

1960’s: Russian-Hungarian MT Group, periodical „Computational Linguistics” (Prof. Kiefer)

1970’s: Atergo dictionary, basic language statistics (Debrecen University - Prof. Papp)

1980’s-: Speech applications

(Technical University - Prof. Gordos, Németh, Olaszy, Vicsi) AI applications

(ALL – Gergely et al.)

1991-: Marketable NLP products

(MorphoLogic - Prószéky et al.)

1990’s: Historical dictionary, corpus linguistics

(Linguistics Institute of HAS - Váradi et al.), 2000’s-: Learning methods in NLP

(Szeged University - Prof. Csirik)

Services combined with speech applications (AITIA – Tatai et al.)

2002-: Courses in HLT, PhD’s in HLT

(Pázmány University - Prof. Prószéky, Prof. Takács) 2003-: Series of Annual National HLT Conferences

(4)

www. m o rpho log ic .hu .itk.p p ke.h u

Hungarian HLT Research

MorphoLogic

(Gábor Prószéky)

• Staff: 15

• Proofing tools, intelligent dictionaries, machine translation, a large scale of linguistic resources for various languages (incl. Hungarian WordNet), text processing tools, lexicographical activities

Linguistics Institute

(Tamás Váradi)

• Staff: 8

• Hungarian National Corpus, research activities in various CL projects (incl. Hungarian WordNet)

Szeged University

(János Csirik)

• Staff: 6

• Machine learning tools for NLP, speech research, activities in various CL research projects (incl. Hungarian WordNet)

(5)

www. m o rpho log ic .hu w w .itk.p p ke.h u

Hungarian HLT Research

(cont’d)

Technical University of Budapest (TMIT)

• Staff: 28 (in 9 laboratories)

• Speech Technology Lab: speech information systems, e-mail/sms reader, tools for blind people (Géza Németh) • Speech Acoustics Lab: speech databases, medical

applications, speech correction, acoustic-phonetic research (Klára Vicsi)

• Speech Recognition Lab: speaker recognition, speech

recognition, statistical modeling, multimedia indexing (Tibor Fegyó, Péter Mihajlik)

Technical University of Budapest (MOKK)

• Staff: 5

• Corpus collection (mono- and bilingual), text aligning,

audio/video archives, ontology modeling, POS-tagging (Péter Halácsy)

(6)

www. m o rpho log ic .hu .itk.p p ke.h u

Hungarian HLT Research

(cont’d)

Pázmány Péter Catholic University,

Faculty of Information Technology

(Gábor Prószéky, György Takács)

• 4 researchers, 7 PhD students

• Language: WSD, semantic representation, anaphora resolution, text mining

• Speech: mobile applications (incl. mobile for the deaf!)

Pécs University

(Gábor Alberti)

• 4 researchers, 2 PhD students

• Computational semantics, machine translation, Prolog

Other universities

(with 1-2 researchers)

• Debrecen (literary computing) • Miskolc (face modeling)

(7)

www. m o rpho log ic .hu w w .itk.p p ke.h u

Hungarian HLT Research

(cont’d)

Applied Logic Laboratory

(Tamás Gergely)

• 4 researchers, 5 PhD students

• AI tools for medical and pharmacological applications, cognitive systems

AITIA

(Gábor Tatai)

• 48 co-workers (a few of them in HLT)

• Speech technology applications, text mining, chat-robots

Kilgray

(Balázs Kis)

• 3 full-time employees

(8)

www. m o rpho log ic .hu .itk.p p ke.h u

International Cooperations in HLT

Earlier in the 1990’s: MULTEXT-East, GLOSSER, GRAMLEX, ELSNET Goes East, SPECO, BABEL, TELRI, TRACTOR, …

EuroTermBank (MorphoLogic): common EU terminology

ImportNet (ALL): ontology generation

EASAIER (ALL): multimedia search

CACAO (Linguistics Institute): library applications with HLT

EuroMatrix (MorphoLogic): statistical MT for Europe

CLARIN (Linguistics Institute & others): resources

(9)

www. m o rpho log ic .hu w w .itk.p p ke.h u

Hungarian HLT Platform

(2008-2010)

Founders of the Platform:

4 industrial partners:

AITIA

Applied Logic Laboratory Kilgray

MorphoLogic

4 academic partners:

Linguistics Institute, HAS

Technical University, Telecomm. & Media-informatics (TMIT) Technical University, Center for Media Res. & Educ. (MOKK) Szeged University, Res. Group of AI (RGAI)

New member:

(10)

www. m o rpho log ic .hu .itk.p p ke.h u

Hungarian Education in NLP

Courses in CL/HLT/NLP:

Pázmány University: HLT (Prószéky + 5 PhD) speech (Takács + 2 PhD)

Szeged University: machine learning (Csirik, Alexin + 3 PhD)

Technical University: speech (Gordos, Németh, Olaszy, Vicsi + 3 PhD) artificial intelligence (Prószéky)

Others:

Debrecen University: general linguistics programme (Hunyadi)

ELTE University: theoretical linguistics programme (Kálmán, Oravecz) Dept. of Translation Theory (Prószéky + 3 PhD)

Pécs University: semantic representation (Alberti + 3 PhD)

(11)

www. m o rpho log ic .hu w w .itk.p p ke.h u

Annual National Conferences

in Computational Linguistics

2-day conferences, always in December:

2003) 1st:

39 long and 20 short papers

2004) 2nd:

46 papers (in 8 sections)

2005) 3rd:

40 papers (in 7 sections), 13 posters & demos

2006) 4th:

34 papers (in 7 sections), 16 posters & demos

2007) 5th:

30 long papers (in 7 sections), 8 posters & demos

2008) Kick-off Conference of the Platform

: 6 plenary

presentations, 9 posters & demos

(12)

www. m o rpho log ic .hu .itk.p p ke.h u

Hungary’s Nr.1 HLT website:

www.webforditas.hu

Website for various HLT applications: text & web translation, dictionaries, spell-checking, search with linguistic support For „fordítás” (=„translation”) it is the 1st in Google (among nearly 20 million hits)

60 000 visitors/day

In 2008: 91 million pages translated

(in 2007: 43 million pages) 81 million text translation +

2 million web translation + 7 million dictionary lookup

13,3 GB data traffic/year

(with 1800 char/page it is 7,2 million A4 page translation) … and the human translation market felt nothing

(13)

www. m o rpho log ic .hu w w .itk.p p ke.h u

Translation between Hungarian and

33 other languages

Technically, it is rather easy to combine two existing web translation services: HU-EN + EN-X and X-EN + EN-HU EN-X and X--EN language pairs for which commercial translation services are currently available:

Official EU languages to and from Hungarian:

1. Bulgarian-Hungarian/Hungarian-Bulgarian Magyar/български MorphoLogic & SkyCode

2. Czech-Hungarian/Hungarian-Czech Magyar/Čeština

3. Danish-Hungarian/Hungarian-Danish Magyar/Dansk MorphoLogic & GrammarSoft 4. Dutch-Hungarian/Hungarian-Dutch Magyar/Nederlands

5. English-Hungarian/Hungarian-English Magyar/English MorphoLogic (Hu-En: with LI & SU) 6. Finnish-Hungarian/Hungarian-Finnish Magyar/Suomi

7. French-Hungarian/Hungarian-French Magyar/Français MorphoLogic & ProMT 8. German-Hungarian/Hungarian-German Magyar/Deutsch MorphoLogic & ProMT 9. Greek-Hungarian/Hungarian-Greek Magyar/Ελληνικά

10. Italian-Hungarian/Hungarian-Italian Magyar/Italiano MorphoLogic & ProMT 11. Latvian-Hungarian/Hungarian-Latvian Magyar/Latviesu valoda MorphoLogic & Trident 12. Lithuanian-Hungarian/Hungarian-Lithuanian Magyar/Lietuviu kalba

13. Polish-Hungarian/Hungarian-Polish Magyar/Polski MorphoLogic & pwn.pl 14. Portuguese-Hungarian/Hungarian-Portuguese Magyar/Português MorphoLogic & ProMT 15. Romanian-Hungarian/Hungarian-Romanian Magyar/Română

16. Slovak-Hungarian/Hungarian-Slovak Magyar/Slovenčina 17. Slovene-Hungarian/Hungarian-Slovene Magyar/Slovenščina

18. Spanish-Hungarian/Hungarian-Spanish Magyar/Español MorphoLogic & ProMT 19. Swedish-Hungarian/Hungarian-Swedish Magyar/Svenska

20.

Other European languages to and from Hungarian:

www. m o rpho log ic .hu w w .itk.p p ke.h u

(14)

www. m o rpho log ic .hu .itk.p p ke.h u

Features of a general

web translation service

Text translation

techniques for any X language if X-En

and En-X services are available

Translation of entire

websites

Combination with various

dictionaries

(Web2, AJAX)

Virtual

keyboard

for all languages

Spell-checking

for all languages

Integrated

text-to-speech

tools (and

speech

recognition

, soon)

Language guesser

tools integrated

(15)

www. m o rpho log ic .hu w w .itk.p p ke.h u

MT Service for All European Languages

Proposal for a new Pan-European cooperation

Remark: we have not lost the interest in finding new ways in MT (e.g. we are partners in EuroMatrix), and we are still working on new scientific methods, as well,

BUT THIS PROPOSAL IS DIFFERENT:

it guarantees a usable translation service for a wide range of end-users on the basis

of the existing service www.webforditas.hu,

the above pivot application is running and anybody can use it (for the time being,

60.000 users/day),

to extend the existing application to any other local languages, software technological developments are needed only,

usable final results can be guaranteed,

service providers for many languages are recently available on the market,

cooperation has already started: both from the EU (HU, BG, DK, PL) and from non-EU countries (RU, UKR),

partners’ R&D activity is basically the improvement of their own service to have

(16)

www. m o rpho log ic .hu .itk.p p ke.h u

Köszönöm

attention!

Thanks for your

References

Related documents

Behavioral character- ization, in terms of system call and program flow, has been previously proposed as an effective alternative to pattern matching for malware detection.In

To examine the role of aquaporins in freeze tolerance, we froze fat body, midgut and salivary gland tissues in the presence and absence of mercuric chloride, an aquaporin

1) Novel polymers can be synthesis based mainly on liquid glucose and sorbitol. Small. proportion of Maleic anhydride, Citric acid and Oxalic acid has been used.

In my previous paper (study) the result &relationship of SRR & RPM shown in which lapping is carried by due to surface contact force in between specimen & lap

In the influence of gravity, Stoneley waves along the common boundary of the general visco-elastic solid media M and M:, involving the strain rate and stress rate of higher order,

differential equations by global or piecewise polynomial collocation methods which are based on.. consideration of the involved differential operator, related matrices and

The need to have some tools to study positive approximation processes on function spaces defined on possibly nonlocally compact spaces (in the locally compact case the theory is

I think you start thinking seriously about freeing yourself from fear, when you--create a burden that is, if you live with fear, if you let fear influence your life, that is