Automated Methods for Fuzzy Systems

(1)

Automated Methods for Fuzzy Systems

Gradient Method

Adriano Joaquim de Oliveira Cruz

PPGI-UFRJ

(2)

Summary

(3)

Summary

1 Introduction

(4)

Summary

1 Introduction

2 Training Standard Fuzzy System

(5)

Summary

1 Introduction

3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law

(6)

Summary

1 Introduction

3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law

(7)

Summary

1 Introduction

3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example

(8)

Section Summary

1 Introduction

3 Output Membership Function Centers Update Law

4 Input Membership Function Centers Update Law

5 Input Membership Function Spreads Update Law

(9)

(10)

Bibliography

Kevin M. Passino, Stephen Yurkovich Fuzzy Control in Chapter 5.

Addison Wesley Longman, Inc, USA, 1998.

Timothy J. Ross

Fuzzy Logic with Engineering Applications.

John Wiley and Sons, Inc, USA, 2010.

J. R. Jang, C. Sun, E. Mizutani

Neuro Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence

(11)

Constructing fuzzy systems

(12)

Constructing fuzzy systems

How to construct a fuzzy system from numeric data?

Using data obtained experimentally from a system, it is possible to identify the model.

(13)

Constructing fuzzy systems

How to construct a fuzzy system from numeric data?

Using data obtained experimentally from a system, it is possible to identify the model.

Find a model that fits the data by using fuzzy interpolation capabilities.

(14)

Introduction

We need to construct a fuzzy system f(x, θ) that approximate the functiong represented in the training dataG.

(15)

Introduction

(16)

Introduction

There is no guarantee that it will succeed.

(17)

Section Summary

1 Introduction

(18)

The System

(19)

The System

Gaussian input membership functions with centers c_ji and spreadsσ_ji. Output membership function centers bi.

(20)

The System

(21)

The System

Product for premise and implication. Center-average defuzzification.

(22)

The System

Product for premise and implication. Center-average defuzzification. It is described by f(x|θ_{) =} PR i=1biQn_j₌₁exp " −1₂ xj−ci j σi j 2# PR i=1 Qn j=1exp " −1₂ xj−ci j σi j 2#

(23)

Error

(24)

Error

Suppose that you have the mth _{training data pair (}_x,_y₎_∈_G_.

The GM’s goal is to minimize the error between the predicted output value,f(xm|θ) and the actual output value ym.

(25)

Error

The equation for the error surface is:

em =

1

2[f(x|θ)−y]

(26)

Error

em =

1

2[f(x|θ)−y]

2

(27)

Error

em =

1

2[f(x|θ)−y]

2

We seek to minimize em by choosing the parametersθ that are

(28)

Error

em =

1

2[f(x|θ)−y]

2

bi,c_ji andθi_j,i = 1,2, . . . ,R,j = 1,2, . . . ,n.

(29)

Error

em =

1

2[f(x|θ)−y]

2

bi,c_ji andθi_j,i = 1,2, . . . ,R,j = 1,2, . . . ,n.

R rules, n input variables.

(30)

Section Summary

1 Introduction

(31)

b

_i

_{Update Law}

(32)

b

_i

_{Update Law}

How to adjunt the bi to minimize em.

We will use bi(k+ 1) =bi(k)−λ1 ∂em ∂bi k

(33)

b

_i

_{Update Law}

We will use bi(k+ 1) =bi(k)−λ1 ∂em ∂bi k wherei = 1,2, . . . ,R

(34)

b

_i

_{Update Law}

We will use bi(k+ 1) =bi(k)−λ1 ∂em ∂bi k wherei = 1,2, . . . ,R

(35)

Gradient Descent

The update method would movebi along the negative gradient of the

(36)

Gradient Descent

error surface.

(37)

Gradient Descent

error surface.

The parameter λ1>0 characterizes the step size.

(38)

Gradient Descent

error surface.

If λ1 is chosen too small, then bi is adjusted very slowly.

If λ1 is chosen too big, then it may step over the minimum value of em.

(39)

Gradient Descent

error surface.

(40)

Gradient Descent

error surface.

Some algorithms try to adaptively choose the step size.

If the error is big increaseλ1, but if they are decreasing take small

(41)

b

_i

_{Update Formula I}

(42)

b

_i

_{Update Formula I}

Erro: em = 1₂[f(x|θ)−y]2

Regra da Cadeia: ∂em

∂bi

= (f(xm_|_θ₎₋_ym₎∂f(xm|θ)

(43)

b

_i

_{Update Formula I}

∂bi = (f(xm_|_θ₎₋_ym₎∂f(xm|θ) ∂bi Since f(x|θ_{) =} PR i=1biQnj=1exp  −1₂ xj−ci_j σi j !2  PR i=1 Qn j=1exp  −1₂ xj−ci_j σi j !2 

(44)

b

_i

_{Update Formula I}

∂bi = (f(xm_|_θ₎₋_ym₎∂f(xm|θ) ∂bi Since f(x|θ_{) =} PR i=1biQnj=1exp  −1₂ xj−ci_j σi j !2  PR i=1 Qn j=1exp  −1₂ xj−ci_j σi j !2  then ∂em ∂bi = (f(xm_|_θ₎₋_ym₎ Qn j=1exp  −1₂ xj−ci_j σi j !2  PR i=1 Qn j=1exp  −1₂ xj−ci_j σi !2 

(45)

b

_i

_{Update Formula II}

Let µi(xm,k) = Qn j=1exp  −1₂ xj−cij σi j !2  PR i=1 Qn j=1exp  −1₂ xj−ci_j σi j !2 

(46)

b

_i

_{Update Formula II}

Let µi(xm,k) = Qn j=1exp  −1₂ xj−cij σi j !2  PR i=1 Qn j=1exp  −1₂ xj−ci_j σi j !2  Let ǫm(k) =f(xm|θ(k))−ym

(47)

b

_i

_{Update Formula II}

Let µi(xm,k) = Qn j=1exp  −1₂ xj−cij σi j !2  PR i=1 Qn j=1exp  −1₂ xj−ci_j σi j !2  Let ǫm(k) =f(xm|θ(k))−ym Then bi(k+ 1) =bi(k)−λ1ǫm(k) µi(xm,k) PR i=1µi(xm,k)

(48)

Section Summary

1 Introduction

(49)

c

i j

Update Law

We will use c_ji(k+ 1) =c_ji(k)−λ2 ∂em ∂c_ji _k

(50)

c

i j

Update Law

We will use c_ji(k+ 1) =c_ji(k)−λ2 ∂em ∂c_ji _k whereλ2 >0,i = 1,2, . . . ,R andj = 1,2, . . . ,n

(51)

c

i

j

Update Formula I

(52)

c

i

j

Update Formula I

∂ci j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂ci j

(53)

c

i

j

Update Formula I

∂ci j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂ci j Now ∂f(x m_|_θ₍_k₎₎ µi(xm,k) = ( PR i=1µi(xm,k))bi(k)−(PRi=1bi(k)µi(xm,k))(1) (PR i=1µi(xm,k)) 2

(54)

c

i

j

Update Formula I

∂ci j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂ci j Now ∂f(x m_|_θ₍_k₎₎ µi(xm,k) = ( PR i=1µi(xm,k))bi(k)−(PRi=1bi(k)µi(xm,k))(1) (PR i=1µi(xm,k)) 2 So that ∂f(x m_|_θ₍_k₎₎ µi(xm,k) = bi(k)−f(xm|θ(k)) PR i=1µi(xm,k)

(55)

c

i j

Update Formula II

Also we have ∂µi(x m_,_k₎ ∂ci j =µi(xm,k) xm j −c j i(k) (σi j(k))2

(56)

c

i j

Update Formula II

(57)

c

i j

Update Formula II

(58)

c

i j

Update Formula II

The update formula for c_ji is

c_ji(k+1) =c_ji(k)−λ2ǫm(k) bi(k)−f(xm|θ(k)) PR i=1µi(xm,k) ! µi(xm,k) xm j −cji(k) (σi j(k))2 !

(59)

Section Summary

1 Introduction

(60)

σ

i j

Update Law

We will use σi_j(k+ 1) =σi_j(k)−λ3 ∂em ∂σ_ji _k

(61)

σ

i j

Update Law

We will use σi_j(k+ 1) =σi_j(k)−λ3 ∂em ∂σ_ji _k whereλ3 >0,i = 1,2, . . . ,R andj = 1,2, . . . ,n

(62)

σ

i

j

Update Formula I

(63)

σ

i

j

Update Formula I

∂σi j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂σi j

(64)

σ

i

j

Update Formula I

∂σi j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂σi j We already calculated ∂f(x m_|_θ₍_k₎₎ µi(xm,k) = bi(k)−f(xm|θ(k)) PR i=1µi(xm,k)

(65)

σ

i j

Update Formula II

Also we have ∂µi(x m_,_k₎ ∂σ_ji =µi(x m_,_k₎ (xm j −cji(k)2) (σi j(k))3

(66)

σ

i j

Update Formula II

(67)

σ

i j

Update Formula II

(68)

σ

i j

Update Formula II

The update formula for σi_j is

σi_j(k+ 1) =σi_j(k)−λ3ǫm(k) bi(k)−f(xm|θ(k)) PR i=1µi(xm,k) µi(xm,k) x_jm−c_ji(k)2 (σ_ji(k))3

(69)

Section Summary

1 Introduction

(70)

Training Data Set

We will use the training data set of the table to illustrate the algorithm.

x1 x2 y x1 0 2 1

x2 2 4 5

x3 3 6 6

(71)

Choosing the step size

The algorithm requires that a step sizeλbe specified for each of the three parameters.

(72)

Choosing the step size

Selecting a large λwill converge faster but may risk overstepping the minimum.

(73)

Choosing the step size

(74)

Choosing the step size

Selecting a small step means converging very slowly.

(75)

Choosing initial values

(76)

Choosing initial values

Initial values for the rules must be designated.

For the first rule, we choose x₁1,x₂1,y1 as the input and output membership centers.

(77)

Choosing initial values

For the second rule, we choosex₁2,x₂2,y2 as the input and output membership centers.

(78)

Choosing initial values

(79)

Choosing initial values

Select spread equals to 1.

(80)

Choosing initial values

Rule1 c₁1(0) c₂1(0) = 0 2 σ1₁(0) σ1₂(0) = 1 1 b1(0) = 1 Rule2 c₁2(0) c₂2(0) = 2 4 σ2₁(0) σ2₂(0) = 1 1 b2(0) = 5

(81)

Plotting initial values

0 2 4 6 8 10 0 0.5 1 x 1 µ (x 1 ) c 11 c12 0.5 1 µ (x 2 )

(82)

Calculating predicted outputs

Calculate the membership values of the implication of each rule using:

µi(xm,k = 0) = n Y j=1 exp  −1 2 x_jm−c_ji(k = 0) σi j(k = 0) !2 

(83)

Calculating predicted outputs

Calculate the membership values of the implication of each rule using:

µi(xm,k = 0) = n Y j=1 exp  −1 2 x_jm−c_ji(k = 0) σi j(k = 0) !2 

Calculate the outputs using (defuzzification):

f(xm|θ(k = 0)) =

PR

i=1bi(0)µi(xm,k = 0)

(84)

Membership degrees rule 1

µ₁(x1,0) = exp " −1 2 0−0 1 2# ∗exp " −1 2 2−2 1 2# = 1 µ1(x2,0) = exp " −1 2 2−0 1 2# ∗exp " −1 2 4−2 1 2# = 0.0183156 µ1(x3,0) = exp " −1 2 3−0 1 2# ∗exp " −1 2 6−2 1 2# = 3.72665×10−6

(85)

Membership degrees rule 2

µ₂(x1,0) = exp " −1 2 0−2 1 2# ∗exp " −1 2 2−4 1 2# = 0.0183156 µ2(x2,0) = exp " −1 2 2−2 1 2# ∗exp " −1 2 4−4 1 2# = 1.0 µ2(x3,0) = exp " −1 2 3−2 1 2# ∗exp " −1 2 6−4 1 2# = 0.082085

(86)

Defuzzification

f(x1|θ(0)) = b1(0)×µ1(x 1_,_{0) +}_b 2(0)×µ2(x1,0) µ₁(x1_,_{0) +}_µ 2(x1,0) f(x1|θ(0)) = 1×1 + 5×0.0183156 1 + 0.0183156 f(x1|θ(0)) = 1.0719447 f(x2|θ(0)) = b1(0)×µ1(x 2_,_{0) +}_b 2(0)×µ2(x2,0) µ1(x2,0) +µ2(x2,0) f(x2|θ(0)) = 1×0.0183156 + 5×1 0.0183156 + 1

(87)

Defuzzification

f(x3|θ(0)) = b1(0)×µ1(x 3_,_{0) +}_b 2(0)×µ2(x3,0) µ₁(x3_,_{0) +}_µ 2(x3,0) f(x3|θ(0)) = 1×3.72665×10 −6_{+ 5}_×₀_.₀₈₂₀₈₅ 3.72665×10−6_{+ 0}_.₀₈₂₀₈₅ f(x3|θ(0)) = 4.999818

(88)

Calculating erros

em = 1₂[f(xm|θ(k = 0))−ym]2

e1 = 1₂[1.0719447−1]2 = 2.58802×10−3 e2 = 1₂[4.9280550−5]2 = 2.58802×10−3 e3 = 1₂[4.9998180−6]2 = 0.500182

(89)

Calculating erros

em = 1₂[f(xm|θ(k = 0))−ym]2

e1 = 1₂[1.0719447−1]2 = 2.58802×10−3 e2 = 1₂[4.9280550−5]2 = 2.58802×10−3 e3 = 1₂[4.9998180−6]2 = 0.500182

(90)

Calculating erros

em = 1₂[f(xm|θ(k = 0))−ym]2

e1 = 1₂[1.0719447−1]2 = 2.58802×10−3 e2 = 1₂[4.9280550−5]2 = 2.58802×10−3 e3 = 1₂[4.9998180−6]2 = 0.500182

The first two data points are mapped better than the third. The result can be improved by cycling through the model.

(91)

Calculating erros

em = 1₂[f(xm|θ(k = 0))−ym]2

e1 = 1₂[1.0719447−1]2 = 2.58802×10−3 e2 = 1₂[4.9280550−5]2 = 2.58802×10−3 e3 = 1₂[4.9998180−6]2 = 0.500182

The first two data points are mapped better than the third. The result can be improved by cycling through the model.

The GM will update the rule-base parameters bi,cji andσji using the

(92)

Updating ...

ǫm(k = 0) =f(xm|θ(k = 0))−ym

(93)

Updating

b

_i bi(k) =bi(k−1)−λ1×(ǫk(k−1)) µi(xk,k−1) PR i=1µi(xk,k−1) b1(1) =b1(0)−λ1×(ǫ1(0)) µ1(x1,0) µ1(x1,0) +µ2(x1,0) = 1−1×(0.0719447) 1 1 + 0.0183156 = 0.9644354 b2(1) =b2(0)−λ1×(ǫ1(0)) µ2(x1,0) µ₁(x1_,_{0) +}_µ 2(x1,0) = 5−1×(0.0719447) 0.0183156 1 + 0.0183156 = 4.998706

(94)

Updating

c

1 j c_ji(k) =c_ji(k−1)−λ2(ǫk(k−1)) " bi(k −1)−f(xk|θ(k−1)) PR i=1µi(xk,k −1) # ×µi(xk,k−1) xk j −cji(k−1) (σ_ji(k−1))2 ! c₁1(1) =c₁1(0)−1ǫ1(0) b1(0)−f(x1|θ(0)) µ₁(x1_,_{0) +}_µ 2(x1,0) ×µ1(x1,0) x₁1−c₁1(0) (σ₁1(0))2 c₁1(1) = 0 c₂1(1) =c₁2(0)−1ǫ1(0) b1(0)−f(x1|θ(0)) µ₁(x1_,_{0) +}_µ 2(x1,0) ×µ2(x1,0) x₂1−c₂1(0) (σ₂1(0))2

(95)

Updating

c

2 j c₁2(1) =c₁2(0)−1ǫ1(0) b2(0)−f(x1|θ(0)) µ1(x1,0) +µ2(x1,0) ×µ2(x1,0) x₁1−c₁2(0) (σ₁2(0))2 c₁1(1) = 2.010166 c₂2(1) =c₂2(0)−1ǫ1(0) b2(0)−f(x1|θ(0)) µ1(x1,0) +µ2(x1,0) ×µ2(x1,0) x₂1−c₂2(0) (σ₂2(0))2 c₂2(1) = 4.010166

(96)

Updating

σ

i j σ_ji(k) =σ_ji(k−1)−λ3(ǫk(k−1)) " bi(k−1)−f(xk|θ(k−1)) PR i=1µi(xk,k−1) # ×µi(xk,k−1) (x_jk −c_ji(k−1))2 (σi j(k−1))3 ! σ₁1(1) = 1 σ₂1(1) = 1 σ₁2(1) = 0.979668 σ₂2(1) = 0.979668

(97)