5. Learning by Imitation

(1)

5. Learning by Imitation

Toyoaki Nishida Kyoto University

Copyright © 2016, Toyoaki Nishida, Atsushi Nakazawa, Yoshimasa Ohmoto, Yasser Mohammad, At ,Inc. All Rights Reserved.

Conversational Informatics, May 18th, 2016

(2)

Learning by imitation—Generic framework

Measurement Corpus Generalization Dialogue

patterns

[Nishida et al 2014]

Endow robots with an ability

of autonomously imitating

human behaviors.

(3)

Interactions from observation—General framework

Causes

a 1

a 2

a 3

t

[Nishida et al 2014]

(4)

Problem formulation:

Find approximately repeated subsequences in a longer time series.

(1) Motif Discovery – Finding Patterns of Interaction

[Nishida et al 2014]

(5)

K-Motif(n, R): Given a time series T, a subsequence length n and a range R, the most significant motif in T (called 1-Motif(n, R) is the subsequence C ₁ that has the highest count of non-trivial

matches. The K ^th most significant motif in T (called K-Motif(n, R)) is the sequence C _K that has the highest count of non-trivial

matches, and satisfies for all . D ( C

_K

, C

_i

)  2 R 1  i  K

K-Motif(n, R, d): The K ^th most d-significant motif in T is the

subsequence C _K that has the highest count of non-trivial matches, and satisfies where d (possibly non-contiguous)

datapoints can be ignored where calculating the distance

between C _K , C _i , for all . In general we have , and typically .

R C

C

D (

_K

,

_i

)  2

n d 

K

 i 

1 d  n

[Chiu 2003]

(1) Motif Discovery – Finding Patterns of Interaction

(6)

[Pavesi 2001]

The pattern-driven method

Given a set of k unaligned sequences, a distance measure d, and a threshold value for d, find all patterns that occur in at least q sequences out of k within distance from the

sequence.

 

2 ,

 l 1 



(1) Motif Discovery – Finding Patterns of Interaction

# of (AA, A, A)’s

(7)

Planted (l, d)-motif problem. You are given t strings of length n, initially generated at random. Each string is planted with exactly one appropriate

occurrence with exactly d substitutions (or defects). Find the unknown motif y.

E.g., Planted (15, 4)-motif problem on t=20 sequences of length 600

n=600

t=20 Motif of length l=15

Exactly d=4 substitutions

[Chiu 2003]

The sequence-driven method

(1) Motif Discovery – Finding Patterns of Interaction

y:

(8)

n=600

t=20 Motif of length l=15

Exactly d=4 substitutions

[Pevzner 2000]

The WINNOWER algorithm

(1) Motif Discovery – Finding Patterns of Interaction

dor less dor less

2dor less

Segments of length l

2dor less

(9)

n=600

t=20 Motif of length l=15

Exactly d=4 substitutions

[Pevzner 2000]

The WINNOWER algorithm

Vertices corresponds to l-mers (e.g., 15-mers) present in t (e.g., 20) input sequences. The edges connect two vertices if and only if the corresponding l-mers differ in at most 2d positions and do not both come from the same input sequence.

(This schema illustrates the case where t=4)

(1) Build a graph

(2) Look for a clique of size t in this graph.

(1) Motif Discovery – Finding Patterns of Interaction

(10)

Motif Discovery

n=600

t=20 Motif of length l=15

Exactly d=4 substitutions

The SP-STAR algorithm

(1) Find a median string W from samples S

(2) Repeat improvements.

) , ( min

arg D W S

W

S W

 ^where, 









t j i

j

i

W

W d S

W D

1

) , ( )

, (

 ^W

₁

^, ^ ^, ^W

_t



W 

- Compute best instances of W in sample sequences S ={s

₁

, ... , s

_t

}.

 ^the ^majority ^string ^whose ⁱ ^th ^letter ^is ^the ^most ^frequent ⁱ ^th ^letter ⁱⁿ ^W ^. 

- W 

- _{and W}

₁

_{, …, W}

_t

are best instances

of W in sample sequences S={s

₁

, ... , s

_t

}

[Pevzner 2000]

(11)

[Buhler 2002]

n=600

t=20 Motif of length l=15

Exactly d=4 substitutions

The PROJECTION algorithm

k

Locality-sensitive hashing ^f ⁽ ^s ⁾ ^ ^s ^[ ⁱ

₁

^], ^s ^[ ⁱ

₂

^], ^ ^, ^s ^[ ⁱ

_k

^] l

t

samples )

1 (  

 n l t

Repeat m times

Most enriched bucket

buckets

|

| into

Projected Σ

^k

Inferred planted motif (substring of inferred motif)

(1) Motif Discovery – Finding Patterns of Interaction

(12)

[Buhler 2002]

Σ

Locality-sensitive hashing

It concatenates characters from at most k distinct positions of s to form a length-k string called an LSH value. Buhler’s filter

accepts the pair (s ₁ , s ₂ ) if and only if f(s ₁ )=f(s ₂ )

] [ , ], [ ], [ )

( s s i ₁ s i ₂ s i _k

f  

Consider two strings s ₁ and s ₂ of common length d over an alphabet .

Fix r<d; we call s ₁ and s ₂ similar if they differ by at most r single character substitutions.

To detect similarity between s ₁ and s ₂ , we construct a randomized filter.

Locality-sensitive hashing

(1) Motif Discovery – Finding Patterns of Interaction

(13)

The projection size k?

 

 



  

 _

E l n

k t ( 1 )

log _| _|

[Buhler 2002]

 

 





 

 



 



k l

k d l

p  ( , , )

The number of independent trials m?

 ^B t p l d k ^s  ^m  ^q

 ( )

1 ^ _, ^ ₍ _, _, ₎

 





 



 

 log ( )

) 1

log(

) , , (

, s

B m q

k d l p t  

The probability that a given planted motif instance hashes to the planted bucket.

E: the allowed expected number of random sequences that hash into the sane bucket.

The bucket threshold s?

Twice the average bucket size?

| ?

|

) 1

2 t ( n l s

k 





 

The probability of failure in which fewer than s planted instances hash to the planted bucket even in given m trials should be less than 1-q.

Parameter setting

(1) Motif Discovery – Finding Patterns of Interaction

(14)

Candidate window Comparison window Noise window

Signal ^Motif

Candidates of motifs

[Catalano 2003]

Catalano’s algorithm

w w w

w w  w  1 sub-windows w w  w  1 sub-windows w w  w  1 sub-windows

Rejection threshold 

Make k best matches with the smallest distances;

Remove pairs whose distance is greater than 

Refine

(1) Motif Discovery – Finding Patterns of Interaction

(15)

[Mohammad 2009]

Problem formulation

• Given a time series X(t)

find recurring patterns of length between L

₁

and L

₂

using distance function D subject to the constraint P(t), where P(t) is an estimation of the probability that a motif occurrence exists near time step t.

P(t)

unlikely likely

Change point discovery

Constrained motif discovery

(1) Motif Discovery – Finding Patterns of Interaction

(16)

0 33.4 14.5 34.5

33.4 0 22.43 2.31

14.5 22.43 0 17.43 34.5 2.31 17.43 0

Signal Motif

Constraint

0 ∞ ∞ ∞

∞ 0 ∞ 2.31

∞ ∞ 0 ∞

∞ 2.31 ^∞ ⁰

DGCMD (Distance Graph Constrained Motif Discovery)

[Mohammad 2009]

(1) Motif Discovery – Finding Patterns of Interaction

(17)

Parameterization

[Ide 2005]

Representative patterns are calculated from m and n subsequences at both sides of a time point (t).

Singular Spectrum Transformation

(1) Motif Discovery – Finding Patterns of Interaction

(18)

Future Change angle

H G

Past t Future

… Hankel matrix

… Singular value decomposition

[Ide 2005]

(1) Motif Discovery – Finding Patterns of Interaction

Singular Spectrum Transformation

(19)

  , x ( 2 ), x ( 1 ), x ( 0 ), x ( 1 ), x ( 2 ),  

T   

 ^~ ⁽ ⁾ ⁽ ⁾ ^~ ^~ ^~  ⁰

~  



 v H t H t v v v v

T T

T



[Ide 2005]

(1) Extraction of past patterns

Given time series:

 ^x ^t ^w ^x ^t ^x ^t 

^T

t

s (  1 )  (  ),  , (  2 ), (  1 ) A consecutive subsequence with length w:

 

 

 



 











2

~ 1 1 ~

) ~ ( max arg ,

,

) 1 ( ), 2 ( , ), (

) (

1 v t H v

v v

t s t

s n

t s t

H u u

v v T

n T

T



Calculate a representative pattern u at t:

v t cH i

t s v c u

n

i

( ) ( )

1





 



Solution:

where,

Solve:



where, is a Lagrange multiplier.

v v

t H t

H ( )

^T

( )   i.e., solve:

u u

t H t

H ( ) ( )

^T

 

Note:



_l



l

u u u

U 

₁

,

₂

,  ,

Obtain a hyperplane of l representative patterns

 ^u

₁

^, ^u

₂

^, ^ ^, ^u

_l



l significant eigenvectors:

Singular Spectrum Transformation

(1) Motif Discovery – Finding Patterns of Interaction

(20)

 ^x ^t ^g ^x ^t ^g ^w 

^T

g t

r (  )  (  ),  , (    1 )

 ⁽ ^), ⁽ ¹ ^), ^, ⁽ ¹ ⁾ 

)

( t  r t  g r t  g  r t  g  m 

G 

u u

t G t

G ( ) ( )

^T

 

) (

) ) (

( U U t

t U t U

T l l



  

) ( ) ( 1

)

( t t t

z   

^T



[Ide 2005]

(2) Extraction of the current pattern

A column vector of length w:

Hankel matrix for the current:

Compute the normalized largest eigenvector for:

(3) Compute the change-point score

)

 (t

(4) Singular spectrum transformation

 ^ ^, ^x ⁽ ² ^), ^x ⁽ ¹ ^), ^x ⁽ ⁰ ^), ^x ⁽ ¹ ^), ^x ⁽ ² ^), ^ 

T   

  , z ( 2 ), z ( 1 ), z ( 0 ), z ( 1 ), z ( 2 ),  

T

_C

  

Singular Spectrum Transformation

(1) Motif Discovery – Finding Patterns of Interaction

(21)

 





 









1 0

1 0 1 0

1 0 1

0 1 0

1 0

1 ) 0 ( H

 







 







 







 







0 1 0 , 0

2 1

 





 





 

3 1

3 1 3 1

1

0 0 u

Singular Spectrum Transform – Example (1)

For the past

) 0 ( ) 0

( H

H

^T

has two significant eigenvectors

Representative patterns:

 





 







 0 0 0

2 1

u

2

 





 







 

0 0 0

0 0

2 1

3 1

3 1 3 1

U

2

Hankel matrix:

(1) Motif Discovery – Finding Patterns of Interaction

} 1 ) 1 ( , 0 ) 2 ( , 1 ) 3 ( , 0 ) 4 ( , 1 ) 5 ( , 0 ) 6 ( , 1 ) 7 ( ,

{                

 x x x x x x x

X

_P



(22)

 





 









0 1 0

1 0 1

0 1 0

1 0

1 0 1 0

) 0 ( G

 





 









0 0 0 )

0 (

2 1



 





 







 0 0 0

) 0 (

2 1

2 2

U

^T

 U

For the future A:

G

T

G ( 0 ) ( 0 ) has the most significant eigenvector:

Computing change-point score

 





 







 0 0 0

) 0 (

2 1

 0

0 0 0

1 ) 0 (

2 1



 





 







 





 









T

z Hankel matrix:

} 1 ) 1 ( , 0 ) 2 ( , 1 ) 3 ( , 0 ) 4 ( , 1 ) 5 ( , 0 ) 6 ( , 1 ) 7 ( ,

{                

 x x x x x x x

X

_P



 ( 0 )  0 , ( 1 )  1 , ( 2 )  0 , ( 3 )   1 ,  

 x x x x

X

_FA

(1) Motif Discovery – Finding Patterns of Interaction

Singular Spectrum Transform – Example (1A)

(23)

 





 









21 13

8 13 8

5 8 5

3 5 3

2 3 2

1 ) 0 ( G

 





 







79 . 0

49 . 0

30 . 0

19 . 0

11 . 0 ) 0

 (

 





 









20 . 0

15 . 0

20 . 0

15 . 0

20 . 0

) 0

2

(

2

U

^T

 U

For the future B:

G

T

G ( 0 ) ( 0 ) has the most significant eigenvector:

Computing change-point score

 





 









49 . 0

37 . 0

49 . 0

37 . 0

49 . 0

) 0

 ( z ( 0 )  1   ( 0 )

^T

 ( 0 )  0 . 59 Hankel matrix:

 ( 0 )   1 , ( 1 )   2 , ( 2 )   3 , ( 3 )   5 ,  

 x x x x

X

_FB

(1) Motif Discovery – Finding Patterns of Interaction

Singular Spectrum Transform – Example (1B)

} 1 ) 1 ( , 0 ) 2 ( , 1 ) 3 ( , 0 ) 4 ( , 1 ) 5 ( , 0 ) 6 ( , 1 ) 7 ( ,

{                

 x x x x x x x

X

_P



(24)

 





 







128 / 43 64 / 21 32 / 11

64 / 21 32

/ 11 16

/ 5

32 / 11 16

/ 5 8

/ 3

16 / 5 8

/ 3 4

/ 1

8 / 3 4

/ 1 2

/ 1

) 0 ( G

 





 







44 . 0

43 . 0

45 . 0

41 . 0

50 . 0 ) 0

 (

 





 









16 . 0

012 . 0

16 . 0

012 . 0

16 . 0

) 0

2

(

2

U

^T

 U

For the future C:

Computing change-point score

 





 









58 . 0

041 . 0

58 . 0

041 . 0

58 . 0

) 0

 ( z ( 0 )  1   ( 0 )

^T

 ( 0 )  0 . 72 Hankel matrix:

 ( 0 )  1 / 2 , ( 1 )  1 / 4 , ( 2 )  3 / 8 , ( 3 )  5 / 16 ,  

 x x x x

X

_FC

(1) Motif Discovery – Finding Patterns of Interaction

Singular Spectrum Transform – Example (1C)

} 1 ) 1 ( , 0 ) 2 ( , 1 ) 3 ( , 0 ) 4 ( , 1 ) 5 ( , 0 ) 6 ( , 1 ) 7 ( ,

{                

 x x x x x x x

X

_P



(25)

 





 









2 / 1 0

2 / 1

0 2

/ 1 1

2 / 1 1

2 / 1

1 2

/ 1 0

2 / 1 0

2 / 1

) 0 ( G

 





 







0 5 . 0

71 . 0

5 . 0

0 )

0  (

 





 









24 . 0

0 24 . 0

) 0

2

(

2

U

^T

 U

For the future D:

Computing change-point score

 





 









58 . 0

0 58 . 0

) 0

 ( z ( 0 )  1   ( 0 )

^T

 ( 0 )  0 . 59 Hankel matrix:

 ( 0 )   1 / 2 , ( 1 )  0 , ( 2 )  1 / 2 , ( 3 )  1 ,  

 x x x x

X

_FD

(1) Motif Discovery – Finding Patterns of Interaction

Singular Spectrum Transform – Example (1D)

} 1 ) 1 ( , 0 ) 2 ( , 1 ) 3 ( , 0 ) 4 ( , 1 ) 5 ( , 0 ) 6 ( , 1 ) 7 ( ,

{                

 x x x x x x x

X

_P



(26)

 





 









0 8 0

8 0

4 0 4

0 4 0

2 0 2 0

) 0 ( G

 





 









0 89 . 0

0 45 . 0

0 )

0  (

 





 









0 67 . 0

0 ) 0

2

(

2

U

^T

 U

For the future E:

Computing change-point score

 





 







 0 0 0

) 0 (

2 1

 z ( 0 )  1   ( 0 )

^T

 ( 0 )  0 . 051 Hankel matrix:

 ( 0 )  0 , ( 1 )   2 , ( 2 )  0 , ( 3 )  4 ,  

 x x x x

X

_FD

(1) Motif Discovery – Finding Patterns of Interaction

Singular Spectrum Transform – Example (1E)

} 1 ) 1 ( , 0 ) 2 ( , 1 ) 3 ( , 0 ) 4 ( , 1 ) 5 ( , 0 ) 6 ( , 1 ) 7 ( ,

{                

 x x x x x x x

X

_P



(27)

Overview

After Discovering Basic Motifs in both actions and commands and detecting their occurrence in all time series as in this graph

Command 1

Action 1

use the natural delay between commands and actions calculated during the discovery phase.

For every command-action pair calculate the joint-activation of them by the number of occurrences of the action within the natural delay interval of the command.

Use the joint-activation values to induce a Baysian Network describing the relation between actions and commands

[Mohammad 2009]

(2) Association

(28)

(2) Association

[Mohammad 2009]

Use Granger-causality to find delay between and

- Regress actions using actions & gestures - Regress actions using actions only

- Compute residues

- Calculate g-causality statistic

- Find the delay that maximizes g-causality ^arg ^max ⁽

^

⁾



_op





S

Cˆ A Cˆ ^G

) ˆ (

) ( )

ˆ (

1 1

1

u t C t i C t i

c t

C

^G

i i A

i i

A

       





 

) ˆ (

) ( )

ˆ (

1

2

e t C t i

c t

C

^A

i i

A

    











^T

i

t u SSR

1

1 ²

( ) 



^T

i

t e SSR

1

0 ²

( )

) 1 2 /(

/ ) (

1

1 0



 



SSR T

SSR S SSR

Example

Estimated delay

(29)

Result of the association phase:

Augmented Bayesian Network (ABN) representing the relation between gestures and actions (interaction protocol)

(2) Association

Bayesian network with two values added to each link:

- Mean of the delay between the

activation of the cause and result nodes:

- Variance of the delay:

2 1 n



n  2

2 1 n



n 

[Mohammad 2009]

(30)

(3) Controller generation Overview

(a) Motor bubbling:

Takes as input an action stream state and produce increment F

_i

⁺ and decrement functions and F

_i

^- that produce a motor command from a given action stream.

(b) Piecewise linear controller generation:

Generates controllers from the piecewise mean approximation of the action nodes of the given ABN, F

_i

⁺ , and F

_i

^- .

(c) Producing the closed loop controller:

The difference between actual action stream as perceived and the piecewise mean approximation is calculated to correct for the error.

[Mohammad 2010]

(31)

(4) Accumulation Overview

Combine two already learned ABNs: AN ¹ and AN ² .

- Generalizing user modeling into task modeling.

- The main assumption: the action nodes will be more similar in the two ABNs than gesture nodes.

[Mohammad 2010]

The main steps:

(1) Create links between nodes in AN ¹ and AN ² if the nodes are similar to each other.

Dynamic Time Warping (DTW) is used to measure the distance between the means of the two nodes.

(2) Repeat the same processing for the gesture nodes in AN ¹ and AN ² .

(3) Remove all the conflicts in the two link lists to have at most one node in the second ABN connected to any node in the first ABN.

(4) After resolving all conflicts, the nodes in the two ABNs that are still linked are

combined and their HMM and motif mean are re-generated from the full sent of

motif occurrences used when creating the two ABNs.

(32)

(4) Accumulation

[Mohammad 2010]

a

b

c

d

f e

1

2

3

4

5

6 a1

b2

c 3

d4

f5 e

6

(33)

(4) Accumulation

[Mohammad 2010]

 Main assumption

 Action nodes are more compatible than gesture nodes

 Algorithm

1. Associate action nodes with similar stored pattern

 Set of action node association links

2. Associate gesture nodes with similar stored pattern

 Set of action node association links

3. Calculate Link Competence Index for association links

 Set of LCIs for gestures and actions

4. Resolve association link conflicts using LCIs

 Final ABN

 ^la

¹²ⁱ^j

^, ^{v la}  

¹²ⁱ^j



 ^{lg ,}

¹²ⁱ^j

^v   ^lg

¹²ⁱ^j



   

 ^{LCI la}

¹²ⁱ^j

^, ^LCI ^lg

¹²ⁱ^j



ABN Combination

(34)

(4) Accumulation

[Mohammad 2010]

 Compile AN ¹ and AN ² lists {every action node}

 Calculate

 Calculate for all nodes and order them

 Create a link iff



for any

 Set

Gesture association links are calculated the same way

2 1

j

la i

 ¹ ^, ² 

DTW i k

d m m

 

 ¹ ¹ 

min ,

i d DTW m m i k

 

 ¹ ^, ² 

DTW i j i

d m m ^ 

 ¹ ^, ²   ¹ ^, ² 

DTW i j DTW i k i

d m m  d m m  

 

2 2 2

: 1 ^l exists

k l i

m  m la

  ¹ ² ⁱ ^j ^DTW  ¹ ⁱ ^, ² ^j 

v la  d m m

Associating action/gesture nodes

(35)

(4) Accumulation

[Mohammad 2010]

LCI Calculation

(36)

(4) Accumulation

[Mohammad 2010]

Effect of accumulation -- example

(37)

Imitation, Simulation and Conversation

[Nishida et al 2014]

Melzoff’s Active Intermodal Mapping (AIM) model

(38)

Imitation, Simulation and Conversation

[Nishida et al 2014]

(39)

Imitation, Simulation and Conversation

Intention

Behavior

Interaction

Reasoning

Simulation

Intention

Behavior

Reasoning

Simulation

Intention

Behavior

Reasoning

Simulation

[Nishida et al 2014]

(40)

Fluid imitation: Imitation in Social Context

The simplest fluid imitation system

[Nishida et al 2014]

(41)

Fluid imitation: Imitation in Social Context

An interaction oriented fluid imitation

[Nishida et al 2014]

(42)

Fluid imitation Engine (FIE)

[Mohammad 2016]

Perspective taking

Perceived behavior and environment state Self-Imitation Engine

Significance Estimator Imitation Engine Learned behaviors

Salience and relevance

Segmented demonstrations

Converts perceived environment state and model behavior into the frame of reference of the model then mapping the whole thing into the learner’s frame of the reference.

Calculates the significance of perceived motions/behaviors based on their top-down relevance to the current set of goals of the learner and bottom-up saliency of the behavior or environment context and objects.

Receives the transformed environmental and behavior signals, uses them to segment the action stream or

perception stream, and combines the segmented motions into sets of that are fed to the imitation engine.

Receives segmented action demonstrations and learns the

corresponding action models (SAXImitate).

(43)

Why should we imitate robots?

[Mohammad 2015]

(H1) Back and mutual imitation both increase human’s perception of robot’s imitative skill over the control situation in which no back or mutual imitation is performed.

(H2) The increase in perceived imitative skill and overall imitative skill will only appear in first person subjective evaluations and will disappear in third person evaluations.

(H3) Back and mutual imitation conditions will increase the

participant’s subjective evaluation of the robot’s imitative skill as

well as her/his intention of future interaction with the robot.

(44)

Why should we imitate robots?

[Mohammad 2015]

(H3) Back and mutual imitation conditions will increase the participant’s

subjective evaluation of the robot’s imitative skill as well as her/his intention of future interaction with the robot.

Fraction of subjects preferred each robot according to each of the measured

variables in the follow-up study

(45)

Summary

1. Autonomous learning is necessary to realize social artifacts that can exhibit natural and sophisticated interaction behaviors.

2. I have shown a framework of autonomous learning consisting of learning by watching and learning by doing.

3. The key algorithm for learning by watching is motif discovery.

4. Change point discovery is effective to improve both efficiency and robustness of the motif discovery algorithm.

5. Causality analysis is used to find plausible latency of reactions.

(46)

References

[Buhler 2002] J. Buhler and M. Tompa. Finding motifs using random projections. Journal of Computational Biology 9(2): 225–242

[Catalano 2006] Joe Catalano, Tom Armstrong, and Tim Oates. Discovering patterns in real-valued time series. In Knowledge Discovery in Databases:

PKDD 2006, pages 462–469, 2006.

[Chiu 2003] B. Chiu, E. Keogh, and S. Lonardi, “Probabilistic discovery of time series motifs,” in KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2003, pp. 493–498.

[Furaoa 2006] Shen Furaoa, Osamu Hasegawa. An incremental network for on-line unsupervised classification and topology learning, Neural Networks 19 (2006) 90–106

[Ide 2005] T. Ide and K. Inoue, “Knowledge discovery from heterogeneous dynamic systems using change-point correlations,” in Proc. SIAM Intl. Conf.

Data Mining, 2005.

[Mohammad 2009] Yasser Mohammad, Toyoaki Nishida, Shogo Okada, Unsupervised Simultaneous Learning of Gestures, Actions and their Associations for Human-Robot Interaction," Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on , pp.2537-2544, 11-15 Oct.

2009

[Mohammad 2015] Yasser Mohammad, Toyoaki Nishida. Why should we imitate robots? -- Effect of back imitation on judgment of imitative skill.

International Journal of Social Robotics, (forthcoming)

[Mohammad 2009 PhDThesis] Yasser Mohammad, Autonomous Development of Natural Interactive Behavior for Robots and Embodied Agents, PhD Thesis, Kyoto University, September 2009

[Mohammad 2010] Yasser Mohammad, Toyoaki Nishida, Learning Interaction Protocols using Augmented Baysian Networks Applied to Guided Navigation, Taipei, Taiwan, IROS 2010.

[Mohammad 2016] Yasser Mohammad and Toyoaki Nishida. Data Mining for Social Robotics: Toward Autonomously Social Robots, Springer, 2016 http://www.springer.com/us/book/9783319252308

[Nishida et al 2014] Toyoaki Nishida, Atsushi Nakazawa, Yoshimasa Ohmoto and Yasser Mohammad, Conversational Informatics, Springer 2014.

[Okada 2009] Shogo Okada and Toyoaki Nishida. Incremental clustering of gesture patterns based on a self organizing incremental neural network, in Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, pp. 2316-2322, 2009.

[Pevzner 2000] Pevzner, P. A. & Sze, S. H. (2000). Combinatorial approaches to finding subtle signals in DNA sequences. In proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. La Jolla, CA, Aug 19-23. pp 269-278.

[Vahdatpour 2009] Alireza Vahdatpour, Navid Amini and Majid Sarrafzadeh. Toward unsupervised activity discovery using multi-dimensional motif detection in time series, in: Proceeding IJCAI'09 Proceedings of the 21st international jont conference on Artificial intelligence, Pages 1261-1266, 2009.

[Xu 2009] Yong Xu, Kazuhiro Ueda, Takanori Komatsu, Takeshi Okadome, Takashi Hattori, Yasuyuki Sumi and Toyoaki Nishida, WOZ Experiments for Understanding Mutual Adaptation, AI&Society, Vol. 23, No. 2, Page 201-212, 2009.

5. Learning by Imitation