I m sorry, Dave,
I m afraid I can t do that :
Linguistics, Statistics, and Natural-Language
Processing in the Big Data Era
Lillian Lee
Professor, Computer Science
Why is this man smiling?
http: //w w w .na tur e. co m/ na tur e/ jo urna l/v482/ n7386/ ful l/482440a .htmlThe Turing test:
Intelligence è human-level language use
http: //bi tters w eets ag e. bl og spo t.co m/ 2010/ 01/ co mi c-co nv ers e-turi ng -tes t.html
Turing predicted we d be close in about 50 years.
]h ttp :/ /g h o str ad io .fi le s. w o rd p re ss .c o m /2 0 1 1 /0 3 /b la d e_r u n n er _f o n d o .j p g
Why is this man not smiling?
http: //w w w .netbra w l.co m/ ma tchup .php?mi d=11131&bra ck eti d=497 http: //4. bp .bl og spo t.co m/ _Q m9Cekv5Jj 4/ S8Iq3emehg I/AAAAAAAAAR U/ oBZ6Ih5J4fI/ s200/ 2001- a-spa ce-od ys se y. jpgOpen the pod bay doors, Hal.
Goal: create systems that use human language as input/output
• speech-based interfaces
• information retrieval / question answering
• automatic summarization of news, emails, postings, etc.
• automatic translation
… and much more!
Interdisciplinary: computer science; linguistics, psychology,
communication; probability & statistics, information theory…
Recently deployed: Siri
State of the art: Watson
Credit: AP Ph oto/J eopardy Produc tions I nc .The Watson system beat human Jeopardy! champions (and didn t have internet access; it learned by reading
Why are these men smiling?
The session co-organizers
Real-life error (1)
Hey bunch of grapes
is to ck | bl anka bo sk ov
A bunch of grapes.
http: //ra nd omha nd pri nts .bl og spo t.co m/ 2011_ 01_ 01_ ar chi ve .html
Real-life error (2)
We can email you when you're fat.
We can email you when we're back.
http: //ca ta nd gi rl .co m/ ?p=2678 is to ck | bl anka bo sk ov
Real-life error (3)
[This U.S. city s] largest airport …
What is Toronto???
http: //j eo pa rd y. ed og o. co m/ w p-co ntent/ upl oa ds /2009/ 01/ pr og ra m-jeo pa rd y1. jpg
List all flights on Tuesday
Challenge: ambiguity
List all flights on Tuesday = List all the flights leaving on Tuesday.
Retrieve all the local patient files
Baroque example
I saw her duck with a telescope.
[http://www.supercoloring.com/pages/duck-outline/]
Baroque example
I saw her duck with a telescope.
[http://www.supercoloring.com/pages/duck-outline/] http: //w w w .cl ipa rtmo jo .co m/ pl ug ins /Cl ipa rt/ Cl ipa rtSto ck1/ sta r% 20g azi ng .png http: //w w w .g eo ci ties .w s/ lo one yeba y/ del l/bb040. jpg http://pokerfoldingtable.com/wp-content/uploads/2009/02/three-men-gambling-sitting-at-poker-table-playing-cards-betting-party-pen-ink-drawing-300x234.png
Conversation complications
[Grishman 1986]
Q: Do you know when the train to Boston leaves?
A: Yes.
Q: I want to know when the train to Boston leaves.
A: I understand.
Images: http://3.bp.blogspot.com/_o4kq5TNL0Z4/TUx0j6E5BLI/AAAAAAAAA5k/J7xjhvrcNlU/s1600/Trillian-hitchhikers-guide-to-the-galaxy-the-2005.jpg, http://www.tvacres.com/images/robots_androids_marvin_movie.jpg
[h ttp :// br ow se .d evia nt ar t.c om /?q h=&sec tio n=&g lo ba l=1&q =m usc led uc k#/ d14n st 5]
I m sorry, Dave, I m afraid I can t do that.
I m afraid you might be right.
[ http: //s el co uth. co m/ 2011/ 03 ]
1940s – 50s:
From language to probability
The fundamental problem of communication is that of
reproducing at one point either exactly or approximately a message selected at another point ...
[For] the engineering problem, the significant aspect is that the actual message is one selected from a set of possible
messages.
Language, statistics, cryptography
Why is this man smiling?
http: //a rto fr ev ol uti on. co .uk/ ne w /ind ex. php?ma in_ pa ge = pr od uct_ inf o&cP ath =1_ 3&pr od ucts _i d=239&zeni d=vi d94s pfpa 9vtr18s btg ug 1h64I can see Alaska from my house!
Encryption process
Two probabilities to infer
I can see Alaska from my house!
Encryption process
[Russian]
Prob. of generating this
original message?
Prob. of doing this
encryption of the
original?
Another use of message probs:
speech recognition
(1) It s hard to recognize speech
(2) It s hard to wreck a nice beach
Both messages have almost the same acoustics, but
different likelihoods.
1950s-1980s: Breaking with statistics
(a) Colorless green ideas sleep furiously
(b) Furiously sleep ideas green colorless
N. Chomsky (1957):
The argument: Neither sentence has ever occurred in
the history of English. So any statistical model would
given them the same probability (zero).
The field moved to sophisticated non-probabilistic models
of language.
1990s: The empiricists strike back
•
Huge amounts of data start coming online
•
Advances in algorithms, models, and horsepower
Every time I fire a linguist, my [system s] performance
goes up -- F. Jelinek (apocryphal)
2000s and beyond:
integrating language insights and
statistical techniques
Why is this man smiling?
We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision.... I do not know what the right answer is, but I think [different] approaches should be tried.
We can only see a short distance ahead, but we can see plenty there that needs to be done.
Why is this man smiling?
Text
C. Danescu-Niculescu-Mizil et al. ACL 2012
Beyond situational effects,
phrasing also affects memorability:
•
memorable movie quotes (in
aggregate) are unusual word
choices built on a scaffolding of
common part-of-speech patterns
‣
shown via language models
•
carries over to ad slogans
http: //w w w .s chw immerl eg al .co m/ 2006/ 11/ evi dence- of-seco nd ar y-mea ni ng -in- tv-ca tchphra ses .html
Social interaction: who has the lead?
Communicative behaviors are patterned and coordinated, like a dance [Niederhoffer and Pennebaker, 02]
http: //mi ni ma lmo vi epo sters .tumbl r.co m/ po st/ 16082323317/ pul p-ficti on- by-ana -ba ld er ra ma s
adah ja ad
to
the
adajkj theadah ja ad
at
a
adajkj the adah ja adof
adajkj theadah ja ad
of
adajkj the adahto
ja adan
adajkj ghadah ja ad
the
adajkjfor
hghThose with less power tend to immediately match the function-word
Why is this man smiling?
We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision.... I do not know what the right answer is, but I think [different] approaches should be tried.
We can only see a short distance ahead, but we can see plenty there that needs to be done.