• No results found

at Huawei Noah’s Ark Lab

N/A
N/A
Protected

Academic year: 2021

Share "at Huawei Noah’s Ark Lab"

Copied!
45
0
0

Loading.... (view fulltext now)

Full text

(1)

Research on Deep Learning for Natural Language Processing

at Huawei Noah’s Ark Lab

Hang Li

Noah’s Ark Lab

Huawei Technologies

The Hong Kong Polytechnic University June 3, 2015

(2)

Talk Outline

• Research at Huawei Noah’s Ark Lab

• New Breakthrough in Natural Language Processing with Deep Learning

• Research on DL for NLP at Noah’s Ark Lab

• Summary

(3)

Noah’s Ark Lab

• Our future challenge is not about better chips, larger bandwidth, and more complex signal interference models,

 It is about data

• Have you seen the movie ‘2012’ ?

 The flood is coming, but this time it is the ‘data flood’

 Once here, it will never recede

• Noah’s Ark Lab will lead us to tackle the challenges in

 Data Mining

 Artificial Intelligence

Founder and President:

Mr. Ren, Zhengfei

(4)

Noah’s Ark Lab

Founded in 2012

Located in Hong Kong and Shenzhen

Collaborations with Universities in Hong Kong

A large number of

researchers earned PhD at the universities

Many interns

Strong connections with professors and students

• Research Areas

• Machine Learning

• Data Mining

• Speech and Language Processing

• Information and Knowledge Management

• Intelligent Systems

• Human Computer Interaction

(5)

We want to build

Intelligent Mobile Devices

Intelligent

Telecommunication Networks

Intelligent Enterprise

Data Mining

&

Artificial Intelligence

(6)

Software-defined Network

Network Maintenance

Network Planning and Optimization

Intelligent Telecommunication Networks

(7)

Intelligent Enterprise

Customer Relationship Management

Supply Chain Management

Information and Knowledge Management

Communication Human Resources

Management

(8)

Information Recommendation

Personal Information Management

Machine Translation Information

Extraction

Natural Language Dialogue(Question

Answering)

Intelligent Mobile Devices

(9)

Better Communications

(10)

Research Topics We Are Working on

• Machine Translation

• Natural Language Dialogue

• Automatic Speech Recognition

• Image Retrieval

• Deep Learning Platform

(11)

Talk Outline

• Research at Huawei Noah’s Ark Lab

• New Breakthrough in Natural Language Processing with Deep Learning

• Research on DL for NLP at Noah’s Ark Lab

• Summary

(12)

Distributional Linguistics

• Distributional semantics

• You shall know a word by the company it keeps - John Firth

• Sylvan?

• Companying words: woods, forest, tree, living, woodland, beautiful, grove

(13)

Representation of Word Meaning

cat

kitten dog

puppy

Using high-dimensional real-valued vectors to represent the meaning of words

(14)

Representation of Sentence Meaning

John loves Mary Mary is loved by John

Mary loves John

Using high-dimensional real-valued vectors to represent the meaning of sentences

New finding This is possible

(15)

Recent Breakthrough in Distributional Linguistics

• From words to sentences

• Representing syntax, semantics, even pragmatics

(16)

How Is Learning of Sentence Meaning Possible?

• Neural networks (complicated non-linear models) , i.e., deep learning

• Big Data

• Task-oriented

• Error-driven learning, gradient based

• Compositional

(17)

Natural Language Tasks

• Classification: assigning a label to a string

• Generation: creating a string

• Matching: matching two strings

• Translation: transforming one string to another

t s

R

t s,

c s

s

(18)

Natural Language Applications Can Be Formalized as Tasks

• Classification

– Sentiment analysis

• Generation

– Language modeling

• Matching

– Information retrieval – Question answering

• Translation

– Machine translation

– Natural language dialogue (single turn) – Text summarization

– Paraphrasing

(19)

Learning of Representations in Tasks

• Classification

• Generation

• Matching

• Translation

t r

s  

r R t

s,

c r

s  

) (r

s

(20)

Deep Learning Tools for Learning Sentence Representations

• Neural Word Embedding Techniques

• Recurrent Neural Networks

• Recursive Neural Networks

• Convolutional Neural Networks

(21)

Word Representation: Neural Word Embedding

) ( ) (

) , log (

c P w P

c w P

10 1 5

2 1

2.5 1

w1

w2

w3

c1 c2 c3 c4 c5

7 1 1

2 3

1 1.5 2

w1

w2

w3

t1 t2 t3

WCT

M matrix factorization

word embedding or word2vec

W M

(22)

Recurrent Neural Network (RNN) (Mikolov et al. 2010)

the cat sat …. mat

• On sequence

• Variable length

• Long dependency: long short term memory (LSTM)

the cat sat on the mat

) , ( t 1 t

t f h w

h

(23)

Recursive Neural Network (RNN) (Socher et al. 2011)

the cat sat on the mat

• Based on parse tree

• Auto-encoding and decoding

the cat sat on the mat

(24)

Convolutional Neural Network (CNN) (Hu et al. 2014)

• Robust parsing

• Shared parameter

• Fixed length, zero padding

the cat sat on the mat

……

(25)

Talk Outline

• Research at Huawei Noah’s Ark Lab

• New Breakthrough in Natural Language Processing with Deep Learning

• Research on DL for NLP at Noah’s Ark Lab

• Summary

(26)

Researchers Working on Deep Learning in Noah’s Ark Lab

Zhengdong Lu Lifeng Shang Lin Ma

(27)

Natural Language Dialogue

• Vast amount of

conversation data is available

• Powerful technologies like deep learning

developed

• Single turn vs multi- turn dialogue

• Two approaches

– Retrieval based – Generation based

Alan Turing

NTCIR: Short Text Conversation Contest

5.5 million message response pairs

(28)

Natural Language Dialogue System - Retrieval based Approach

index of messages and responses

matching ranking

message

retrieval

retrieved

messages and responses

ranked responses

matching models

ranking model

online

offline

best response matched

responses

(29)

Deep Match CNN - Archtecture I

• First represent two sentences, and then match

29

(30)

Deep Match CNN - Architecture II

• Represent and match two sentences simultaneously

• Two dimensional convolution and pooling

30

(31)

Retrieval based Approach:

Accuracy = 70%+

上海今天好熱,堪比 新加坡。

上海今天热的不一般。

想去武当山 有想同游 的么?

我想跟帅哥同游~哈哈

It is very hot in Shanghai today, just like Singapore . It is unusually hot.

I want to go to Mountain Wudang, it there anybody going together with me?

Haha, I want to go with you, handsome boy

(32)

Natural Language Dialogue System - Generation based Approach

• Encoding messages to intermediate

representations

• Decoding intermediate representations to

responses

• Recurrent Neural Network (RNN)

Message Response

Encoder

xT

x

x1 2

x

Decoder

yt

y

y1 2

y

c h

Generator

(33)

Generation based Approach

Accuracy = 70%+

占中终于结束了。

下一个是陆家嘴吧?

我想买三星手机。

还是支持一下国产的吧。

Occupy Central is finally over.

Will Lujiazui (finance district in Shanghai) be the next?

I want to buy a Samsung phone

Why not buy our national brands?

(34)

Image Retrieval System

Find the picture that I had dinner with my friends at an Italian restaurant in Hong Kong

• Task

– Image search on smartphone

– Key: matching text to images

• Technology

– Deep model for matching text and image

query representation

image representation

index of images Matching

model

(35)

Deep Match Model for Image and Text - CNN Based

• Represent text and image and then match the two

(36)

Deep Match Model for Image and Text - Top 10 Recall > 60%

Outperforms all existing methods in image retrieval

(37)

Machine Translation System

Bilingual Corpus

Phrase Alignment

Rule Extraction

Translation Rule Table

Monolingual Corpus

Language Model Training

Language Model

Decoder

Translation Model Training

Source Language Sentence

Target Language Sentence Training Data

Translation Model

Translation Template

Translation Dictionary

offline online

(38)

CNN based Translation Model

• Improved BLEU Score

1 point vs BBN model (ACL’14 best paper) on benchmark data

(39)

CNN based Language Model

Improved BLEU Score 1 point on benchmark dataset

Chile holds parliament and presidential election

(40)

Talk Outline

• Research at Huawei Noah’s Ark Lab

• New Breakthrough in Natural Language Processing with Deep Learning

• Research on DL for NLP at Noah’s Ark Lab

• Summary

(41)

Research Finding and Advantages

• Possible to learn sentence representations in different tasks

• Completely data-driven, no human knowledge

• With big data and power computer, we can really build human intelligence

(42)

Limitation and Challenges

• Black box: difficult to understand mechanism

• Naturalness vs factualness

– E.g. “Clinton was born in DDDD, and was educated in the university of edinburg”

• Negation

– E.g. “love” vs “does not love”

• Non-compositional expressions

– E.g. “kick the bucket”

(43)

Take-away Messages

• Noah’s Ark Lab is working on

– Intelligent telecommunication networks – Intelligent mobile devices

– Intelligent enterprise

• Recent breakthrough in representation learning of sentence meaning

– Neural networks, i.e., deep learning – Big Data

– Task-oriented

– Error-driven learning, back-propagation – Compositional

• Noah’s Ark Lab is working on DL for NLP

– Natural language dialogue – Machine translation

– Image retrieval

(44)

References

Mikolov, Tomas, Martin Karafiát, Lukas Burget, Jan Cernocký, and Sanjeev

Khudanpur. Recurrent Neural Network based Language Model. INTERSPEECH 2010, 2010.

Socher, Richard, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. Parsing natural

scenes and natural language with recursive neural networks. ICML-11, pp. 129-136.

2011.

Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen. Convolutional Neural Network Architectures for Matching Natural Language Sentences. NIPS'14, 2042-2050, 2014.

Lifeng Shang, Zhengdong Lu, Hang Li. Neural Responding Machine for Short Text Conversation. ACL-IJCNLP'15, 2015.

Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, Qun Liu. GenCNN: A Convolutional Architecture for Word Sequence Prediction. ACL-IJCNLP'15, 2015.

Fandong Meng, Zhengdong Lu, Mingxuan Wang, Hang Li, Wenbin Jiang, Qun Liu.

Encoding Source Language with Convolutional Neural Network for Machine Translation. ACL-IJCNLP'15, 2015.

Lin Ma, Zhengodng Lu, Lifeng Shang, Hang Li . Multimodal Convolutional Neural Networks for Matching Image and Sentence, arXiv preprint arXiv:1504.06063, 2015.

(45)

Thank you!

Our Web Site :

http://www.noahlab.com.hk/

Our Email Addres [email protected]

References

Related documents

When the number ofunvisited URLs in the database is less than a threshold during the crawling process, intelligent Focus Crawlerperforms reverse searching of

“Our role is to bring our expertise to help create a bespoke family office, then administer this complex structure, ensuring that the wishes and needs of the family are

PPPM total health care costs and utilization after lung cancer diagnosis were significantly higher among patients diagnosed at Stage IV disease and lowest among patients diagnosed

This paper aims (1) to study the spatial and interannual dynamics of DOM optical characteristics in shelf waters of the eastern Arctic seas on the basis of multiyear summertime

The second and most important parameter for evaluating the performance is the meaning of posterior probabilities in physical domain (confidences) and their consistency with the

Results of the mean comparison of the effect of the two factors interaction (Table 5) showed similarity with the chlorophyll content and highest plant height (214.7 cm)

 Teacher Preparation Curriculum with designated number of credit hours in General Education, Knowledge of the Learner & Learning Environment, Focus Areas, Methodology and

Despite the fact that Java compiler produces trustworthy class files and it makes sure that Java source code doesn't violate the safety rules, when an application such as the