• No results found

I m sorry, Dave, I m afraid I can t do that :


Academic year: 2021

Share "I m sorry, Dave, I m afraid I can t do that :"


Loading.... (view fulltext now)

Full text


I m sorry, Dave,

I m afraid I can t do that :

Linguistics, Statistics, and Natural-Language

Processing in the Big Data Era

Lillian Lee

Professor, Computer Science


Why is this man smiling?

http: //w w w .na tur e. co m/ na tur e/ jo urna l/v482/ n7386/ ful l/482440a .html


The Turing test:

Intelligence è human-level language use

http: //bi tters w eets ag e. bl og spo t.co m/ 2010/ 01/ co mi c-co nv ers e-turi ng -tes t.html

Turing predicted we d be close in about 50 years.

]h ttp :/ /g h o str ad io .fi le s. w o rd p re ss .c o m /2 0 1 1 /0 3 /b la d e_r u n n er _f o n d o .j p g


Why is this man not smiling?

http: //w w w .netbra w l.co m/ ma tchup .php?mi d=11131&bra ck eti d=497 http: //4. bp .bl og spo t.co m/ _Q m9Cekv5Jj 4/ S8Iq3emehg I/AAAAAAAAAR U/ oBZ6Ih5J4fI/ s200/ 2001- a-spa ce-od ys se y. jpg

Open the pod bay doors, Hal.


Goal: create systems that use human language as input/output

•  speech-based interfaces

•  information retrieval / question answering

•  automatic summarization of news, emails, postings, etc.

•  automatic translation

… and much more!

Interdisciplinary: computer science; linguistics, psychology,

communication; probability & statistics, information theory…


Recently deployed: Siri


State of the art: Watson

Credit: AP Ph oto/J eopardy Produc tions I nc .

The Watson system beat human Jeopardy! champions (and didn t have internet access; it learned by reading


Why are these men smiling?

The session co-organizers


Real-life error (1)

Hey bunch of grapes

is to ck | bl anka bo sk ov

A bunch of grapes.

http: //ra nd omha nd pri nts .bl og spo t.co m/ 2011_ 01_ 01_ ar chi ve .html


Real-life error (2)

We can email you when you're fat.

We can email you when we're back.

http: //ca ta nd gi rl .co m/ ?p=2678 is to ck | bl anka bo sk ov


Real-life error (3)

[This U.S. city s] largest airport …

What is Toronto???

http: //j eo pa rd y. ed og o. co m/ w p-co ntent/ upl oa ds /2009/ 01/ pr og ra m-jeo pa rd y1. jpg


List all flights on Tuesday

Challenge: ambiguity

List all flights on Tuesday = List all the flights leaving on Tuesday.


Retrieve all the local patient files


Baroque example

I saw her duck with a telescope.



Baroque example

I saw her duck with a telescope.

[http://www.supercoloring.com/pages/duck-outline/] http: //w w w .cl ipa rtmo jo .co m/ pl ug ins /Cl ipa rt/ Cl ipa rtSto ck1/ sta r% 20g azi ng .png http: //w w w .g eo ci ties .w s/ lo one yeba y/ del l/bb040. jpg http://pokerfoldingtable.com/wp-content/uploads/2009/02/three-men-gambling-sitting-at-poker-table-playing-cards-betting-party-pen-ink-drawing-300x234.png


Conversation complications

[Grishman 1986]

Q: Do you know when the train to Boston leaves?

A: Yes.

Q: I want to know when the train to Boston leaves.

A: I understand.

Images: http://3.bp.blogspot.com/_o4kq5TNL0Z4/TUx0j6E5BLI/AAAAAAAAA5k/J7xjhvrcNlU/s1600/Trillian-hitchhikers-guide-to-the-galaxy-the-2005.jpg, http://www.tvacres.com/images/robots_androids_marvin_movie.jpg


[h ttp :// br ow se .d evia nt ar t.c om /?q h=&sec tio n=&g lo ba l=1&q =m usc led uc k#/ d14n st 5]

I m sorry, Dave, I m afraid I can t do that.

I m afraid you might be right.

[ http: //s el co uth. co m/ 2011/ 03 ]


1940s – 50s:

From language to probability

The fundamental problem of communication is that of

reproducing at one point either exactly or approximately a message selected at another point ...

[For] the engineering problem, the significant aspect is that the actual message is one selected from a set of possible



Language, statistics, cryptography


Why is this man smiling?

http: //a rto fr ev ol uti on. co .uk/ ne w /ind ex. php?ma in_ pa ge = pr od uct_ inf o&cP ath =1_ 3&pr od ucts _i d=239&zeni d=vi d94s pfpa 9vtr18s btg ug 1h64

I can see Alaska from my house!

Encryption process


Two probabilities to infer

I can see Alaska from my house!

Encryption process


Prob. of generating this

original message?

Prob. of doing this

encryption of the



Another use of message probs:

speech recognition

(1) It s hard to recognize speech

(2) It s hard to wreck a nice beach

Both messages have almost the same acoustics, but

different likelihoods.


1950s-1980s: Breaking with statistics

(a) Colorless green ideas sleep furiously

(b) Furiously sleep ideas green colorless

N. Chomsky (1957):

The argument: Neither sentence has ever occurred in

the history of English. So any statistical model would

given them the same probability (zero).

The field moved to sophisticated non-probabilistic models

of language.


1990s: The empiricists strike back


Huge amounts of data start coming online


Advances in algorithms, models, and horsepower

Every time I fire a linguist, my [system s] performance

goes up -- F. Jelinek (apocryphal)

2000s and beyond:

integrating language insights and

statistical techniques


Why is this man smiling?

We  may  hope  that  machines  will  eventually  compete  with   men  in  all  purely  intellectual  fields.  But  which  are  the  best   ones  to  start  with?  Even  this  is  a  difficult  decision....  I  do   not  know  what  the  right  answer  is,  but  I  think  [different]   approaches  should  be  tried.  

We  can  only  see  a  short  distance  ahead,     but  we  can  see  plenty  there  that  needs   to  be  done.  


Why is this man smiling?


 C. Danescu-Niculescu-Mizil et al. ACL 2012

Beyond situational effects,

phrasing also affects memorability:


memorable movie quotes (in

aggregate) are unusual word

choices built on a scaffolding of

common part-of-speech patterns


shown via language models


carries over to ad slogans

http: //w w w .s chw immerl eg al .co m/ 2006/ 11/ evi dence- of-seco nd ar y-mea ni ng -in- tv-ca tchphra ses .html


Social interaction: who has the lead?

Communicative behaviors are patterned and coordinated, like a dance [Niederhoffer and Pennebaker, 02]

http: //mi ni ma lmo vi epo sters .tumbl r.co m/ po st/ 16082323317/ pul p-ficti on- by-ana -ba ld er ra ma s

adah ja ad



adajkj the

adah ja ad



adajkj the adah ja ad


adajkj the

adah ja ad


adajkj the adah


ja ad


adajkj gh

adah ja ad





Those with less power tend to immediately match the function-word


Why is this man smiling?

We  may  hope  that  machines  will  eventually  compete  with   men  in  all  purely  intellectual  fields.  But  which  are  the  best   ones  to  start  with?  Even  this  is  a  difficult  decision....  I  do   not  know  what  the  right  answer  is,  but  I  think  [different]   approaches  should  be  tried.  

We  can  only  see  a  short  distance  ahead,     but  we  can  see  plenty  there  that  needs   to  be  done.  


Related documents

As with other rapidly reconfigurable devices, optically reconfigurable gate arrays (ORGAs) have been developed, which combine a holographic memory and an optically programmable

I‟m sorry but I just don‟t know what you are talking about.. I do not

• Refer those ready to quit to the appropriate treatment resources, such as the Arkansas Tobacco Quitline.3 STOP is an academic detailing outreach program developed by the

Impact of Agile Manufacture Impact of Agile Logistics Level 1 Principles Level 2 Programmes Level 3 Actions Postponed fulfillment Postponed fulfillment Agile Supply Chain Agile


#Enables or disables the group common directory to be displayed on the IP phone.0-Disabled,1- Enabled,the default value is 0.. bw_phonebook.group_common_enable

In this paper we describe this phenomenon in detail and work out the conditions when single-channel phase measurements can be used for the reliable measurement of the phase and