Automated Methods for Fuzzy Systems
Gradient Method
Adriano Joaquim de Oliveira Cruz
PPGI-UFRJ
Summary
Summary
1 Introduction
Summary
1 Introduction
2 Training Standard Fuzzy System
Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law
Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law
Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example
Section Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law
4 Input Membership Function Centers Update Law
5 Input Membership Function Spreads Update Law
Bibliography
Kevin M. Passino, Stephen Yurkovich Fuzzy Control in Chapter 5.
Addison Wesley Longman, Inc, USA, 1998.
Timothy J. Ross
Fuzzy Logic with Engineering Applications.
John Wiley and Sons, Inc, USA, 2010.
J. R. Jang, C. Sun, E. Mizutani
Neuro Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence
Constructing fuzzy systems
Constructing fuzzy systems
How to construct a fuzzy system from numeric data?
Using data obtained experimentally from a system, it is possible to identify the model.
Constructing fuzzy systems
How to construct a fuzzy system from numeric data?
Using data obtained experimentally from a system, it is possible to identify the model.
Find a model that fits the data by using fuzzy interpolation capabilities.
Introduction
We need to construct a fuzzy system f(x, θ) that approximate the functiong represented in the training dataG.
Introduction
We need to construct a fuzzy system f(x, θ) that approximate the functiong represented in the training dataG.
Introduction
We need to construct a fuzzy system f(x, θ) that approximate the functiong represented in the training dataG.
There is no guarantee that it will succeed.
Section Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law
4 Input Membership Function Centers Update Law
5 Input Membership Function Spreads Update Law
The System
The System
Gaussian input membership functions with centers cji and spreadsσji. Output membership function centers bi.
The System
Gaussian input membership functions with centers cji and spreadsσji. Output membership function centers bi.
The System
Gaussian input membership functions with centers cji and spreadsσji. Output membership function centers bi.
Product for premise and implication. Center-average defuzzification.
The System
Gaussian input membership functions with centers cji and spreadsσji. Output membership function centers bi.
Product for premise and implication. Center-average defuzzification. It is described by f(x|θ) = PR i=1biQnj=1exp " −12 xj−ci j σi j 2# PR i=1 Qn j=1exp " −12 xj−ci j σi j 2#
Error
Error
Suppose that you have the mth training data pair (x,y)∈G.
The GM’s goal is to minimize the error between the predicted output value,f(xm|θ) and the actual output value ym.
Error
Suppose that you have the mth training data pair (x,y)∈G.
The GM’s goal is to minimize the error between the predicted output value,f(xm|θ) and the actual output value ym.
The equation for the error surface is:
em =
1
2[f(x|θ)−y]
Error
Suppose that you have the mth training data pair (x,y)∈G.
The GM’s goal is to minimize the error between the predicted output value,f(xm|θ) and the actual output value ym.
The equation for the error surface is:
em =
1
2[f(x|θ)−y]
2
Error
Suppose that you have the mth training data pair (x,y)∈G.
The GM’s goal is to minimize the error between the predicted output value,f(xm|θ) and the actual output value ym.
The equation for the error surface is:
em =
1
2[f(x|θ)−y]
2
We seek to minimize em by choosing the parametersθ that are
Error
Suppose that you have the mth training data pair (x,y)∈G.
The GM’s goal is to minimize the error between the predicted output value,f(xm|θ) and the actual output value ym.
The equation for the error surface is:
em =
1
2[f(x|θ)−y]
2
We seek to minimize em by choosing the parametersθ that are
bi,cji andθij,i = 1,2, . . . ,R,j = 1,2, . . . ,n.
Error
Suppose that you have the mth training data pair (x,y)∈G.
The GM’s goal is to minimize the error between the predicted output value,f(xm|θ) and the actual output value ym.
The equation for the error surface is:
em =
1
2[f(x|θ)−y]
2
We seek to minimize em by choosing the parametersθ that are
bi,cji andθij,i = 1,2, . . . ,R,j = 1,2, . . . ,n.
R rules, n input variables.
Section Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law
4 Input Membership Function Centers Update Law
5 Input Membership Function Spreads Update Law
b
iUpdate Law
b
iUpdate Law
How to adjunt the bi to minimize em.
We will use bi(k+ 1) =bi(k)−λ1 ∂em ∂bi k
b
iUpdate Law
How to adjunt the bi to minimize em.
We will use bi(k+ 1) =bi(k)−λ1 ∂em ∂bi k wherei = 1,2, . . . ,R
b
iUpdate Law
How to adjunt the bi to minimize em.
We will use bi(k+ 1) =bi(k)−λ1 ∂em ∂bi k wherei = 1,2, . . . ,R
Gradient Descent
The update method would movebi along the negative gradient of the
Gradient Descent
The update method would movebi along the negative gradient of the
error surface.
Gradient Descent
The update method would movebi along the negative gradient of the
error surface.
The parameter λ1>0 characterizes the step size.
Gradient Descent
The update method would movebi along the negative gradient of the
error surface.
The parameter λ1>0 characterizes the step size.
If λ1 is chosen too small, then bi is adjusted very slowly.
If λ1 is chosen too big, then it may step over the minimum value of em.
Gradient Descent
The update method would movebi along the negative gradient of the
error surface.
The parameter λ1>0 characterizes the step size.
If λ1 is chosen too small, then bi is adjusted very slowly.
If λ1 is chosen too big, then it may step over the minimum value of em.
Gradient Descent
The update method would movebi along the negative gradient of the
error surface.
The parameter λ1>0 characterizes the step size.
If λ1 is chosen too small, then bi is adjusted very slowly.
If λ1 is chosen too big, then it may step over the minimum value of em.
Some algorithms try to adaptively choose the step size.
If the error is big increaseλ1, but if they are decreasing take small
b
iUpdate Formula I
b
iUpdate Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂bi
= (f(xm|θ)−ym)∂f(xm|θ)
b
iUpdate Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂bi = (f(xm|θ)−ym)∂f(xm|θ) ∂bi Since f(x|θ) = PR i=1biQnj=1exp −12 xj−cij σi j !2 PR i=1 Qn j=1exp −12 xj−cij σi j !2
b
iUpdate Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂bi = (f(xm|θ)−ym)∂f(xm|θ) ∂bi Since f(x|θ) = PR i=1biQnj=1exp −12 xj−cij σi j !2 PR i=1 Qn j=1exp −12 xj−cij σi j !2 then ∂em ∂bi = (f(xm|θ)−ym) Qn j=1exp −12 xj−cij σi j !2 PR i=1 Qn j=1exp −12 xj−cij σi !2
b
iUpdate Formula II
Let µi(xm,k) = Qn j=1exp −12 xj−cij σi j !2 PR i=1 Qn j=1exp −12 xj−cij σi j !2 b
iUpdate Formula II
Let µi(xm,k) = Qn j=1exp −12 xj−cij σi j !2 PR i=1 Qn j=1exp −12 xj−cij σi j !2 Let ǫm(k) =f(xm|θ(k))−ymb
iUpdate Formula II
Let µi(xm,k) = Qn j=1exp −12 xj−cij σi j !2 PR i=1 Qn j=1exp −12 xj−cij σi j !2 Let ǫm(k) =f(xm|θ(k))−ym Then bi(k+ 1) =bi(k)−λ1ǫm(k) µi(xm,k) PR i=1µi(xm,k)Section Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law
4 Input Membership Function Centers Update Law
5 Input Membership Function Spreads Update Law
c
i jUpdate Law
We will use cji(k+ 1) =cji(k)−λ2 ∂em ∂cji kc
i jUpdate Law
We will use cji(k+ 1) =cji(k)−λ2 ∂em ∂cji k whereλ2 >0,i = 1,2, . . . ,R andj = 1,2, . . . ,nc
ij
Update Formula I
c
ij
Update Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂ci j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂ci j
c
ij
Update Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂ci j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂ci j Now ∂f(x m|θ(k)) µi(xm,k) = ( PR i=1µi(xm,k))bi(k)−(PRi=1bi(k)µi(xm,k))(1) (PR i=1µi(xm,k)) 2
c
ij
Update Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂ci j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂ci j Now ∂f(x m|θ(k)) µi(xm,k) = ( PR i=1µi(xm,k))bi(k)−(PRi=1bi(k)µi(xm,k))(1) (PR i=1µi(xm,k)) 2 So that ∂f(x m|θ(k)) µi(xm,k) = bi(k)−f(xm|θ(k)) PR i=1µi(xm,k)
c
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂ci j =µi(xm,k) xm j −c j i(k) (σi j(k))2c
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂ci j =µi(xm,k) xm j −c j i(k) (σi j(k))2c
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂ci j =µi(xm,k) xm j −c j i(k) (σi j(k))2c
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂ci j =µi(xm,k) xm j −c j i(k) (σi j(k))2The update formula for cji is
cji(k+1) =cji(k)−λ2ǫm(k) bi(k)−f(xm|θ(k)) PR i=1µi(xm,k) ! µi(xm,k) xm j −cji(k) (σi j(k))2 !
Section Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law
4 Input Membership Function Centers Update Law
5 Input Membership Function Spreads Update Law
σ
i jUpdate Law
We will use σij(k+ 1) =σij(k)−λ3 ∂em ∂σji kσ
i jUpdate Law
We will use σij(k+ 1) =σij(k)−λ3 ∂em ∂σji k whereλ3 >0,i = 1,2, . . . ,R andj = 1,2, . . . ,nσ
ij
Update Formula I
σ
ij
Update Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂σi j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂σi j
σ
ij
Update Formula I
Erro: em = 12[f(x|θ)−y]2
Regra da Cadeia: ∂em
∂σi j =ǫm(k) ∂f(xm|θ(k)) ∂µi(xm,k) ∂µi(xm,k) ∂σi j We already calculated ∂f(x m|θ(k)) µi(xm,k) = bi(k)−f(xm|θ(k)) PR i=1µi(xm,k)
σ
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂σji =µi(x m,k) (xm j −cji(k)2) (σi j(k))3σ
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂σji =µi(x m,k) (xm j −cji(k)2) (σi j(k))3σ
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂σji =µi(x m,k) (xm j −cji(k)2) (σi j(k))3σ
i jUpdate Formula II
Also we have ∂µi(x m,k) ∂σji =µi(x m,k) (xm j −cji(k)2) (σi j(k))3The update formula for σij is
σij(k+ 1) =σij(k)−λ3ǫm(k) bi(k)−f(xm|θ(k)) PR i=1µi(xm,k) µi(xm,k) xjm−cji(k)2 (σji(k))3
Section Summary
1 Introduction
2 Training Standard Fuzzy System
3 Output Membership Function Centers Update Law
4 Input Membership Function Centers Update Law
5 Input Membership Function Spreads Update Law
Training Data Set
We will use the training data set of the table to illustrate the algorithm.
x1 x2 y x1 0 2 1
x2 2 4 5
x3 3 6 6
Choosing the step size
The algorithm requires that a step sizeλbe specified for each of the three parameters.
Choosing the step size
The algorithm requires that a step sizeλbe specified for each of the three parameters.
Selecting a large λwill converge faster but may risk overstepping the minimum.
Choosing the step size
The algorithm requires that a step sizeλbe specified for each of the three parameters.
Selecting a large λwill converge faster but may risk overstepping the minimum.
Choosing the step size
The algorithm requires that a step sizeλbe specified for each of the three parameters.
Selecting a large λwill converge faster but may risk overstepping the minimum.
Selecting a small step means converging very slowly.
Choosing initial values
Choosing initial values
Initial values for the rules must be designated.
For the first rule, we choose x11,x21,y1 as the input and output membership centers.
Choosing initial values
Initial values for the rules must be designated.
For the first rule, we choose x11,x21,y1 as the input and output membership centers.
For the second rule, we choosex12,x22,y2 as the input and output membership centers.
Choosing initial values
Initial values for the rules must be designated.
For the first rule, we choose x11,x21,y1 as the input and output membership centers.
For the second rule, we choosex12,x22,y2 as the input and output membership centers.
Choosing initial values
Initial values for the rules must be designated.
For the first rule, we choose x11,x21,y1 as the input and output membership centers.
For the second rule, we choosex12,x22,y2 as the input and output membership centers.
Select spread equals to 1.
Choosing initial values
Rule1 c11(0) c21(0) = 0 2 σ11(0) σ12(0) = 1 1 b1(0) = 1 Rule2 c12(0) c22(0) = 2 4 σ21(0) σ22(0) = 1 1 b2(0) = 5Plotting initial values
0 2 4 6 8 10 0 0.5 1 x 1 µ (x 1 ) c 11 c12 0.5 1 µ (x 2 )Calculating predicted outputs
Calculate the membership values of the implication of each rule using:
µi(xm,k = 0) = n Y j=1 exp −1 2 xjm−cji(k = 0) σi j(k = 0) !2
Calculating predicted outputs
Calculate the membership values of the implication of each rule using:
µi(xm,k = 0) = n Y j=1 exp −1 2 xjm−cji(k = 0) σi j(k = 0) !2
Calculate the outputs using (defuzzification):
f(xm|θ(k = 0)) =
PR
i=1bi(0)µi(xm,k = 0)
Membership degrees rule 1
µ1(x1,0) = exp " −1 2 0−0 1 2# ∗exp " −1 2 2−2 1 2# = 1 µ1(x2,0) = exp " −1 2 2−0 1 2# ∗exp " −1 2 4−2 1 2# = 0.0183156 µ1(x3,0) = exp " −1 2 3−0 1 2# ∗exp " −1 2 6−2 1 2# = 3.72665×10−6Membership degrees rule 2
µ2(x1,0) = exp " −1 2 0−2 1 2# ∗exp " −1 2 2−4 1 2# = 0.0183156 µ2(x2,0) = exp " −1 2 2−2 1 2# ∗exp " −1 2 4−4 1 2# = 1.0 µ2(x3,0) = exp " −1 2 3−2 1 2# ∗exp " −1 2 6−4 1 2# = 0.082085Defuzzification
f(x1|θ(0)) = b1(0)×µ1(x 1,0) +b 2(0)×µ2(x1,0) µ1(x1,0) +µ 2(x1,0) f(x1|θ(0)) = 1×1 + 5×0.0183156 1 + 0.0183156 f(x1|θ(0)) = 1.0719447 f(x2|θ(0)) = b1(0)×µ1(x 2,0) +b 2(0)×µ2(x2,0) µ1(x2,0) +µ2(x2,0) f(x2|θ(0)) = 1×0.0183156 + 5×1 0.0183156 + 1Defuzzification
f(x3|θ(0)) = b1(0)×µ1(x 3,0) +b 2(0)×µ2(x3,0) µ1(x3,0) +µ 2(x3,0) f(x3|θ(0)) = 1×3.72665×10 −6+ 5×0.082085 3.72665×10−6+ 0.082085 f(x3|θ(0)) = 4.999818Calculating erros
em = 12[f(xm|θ(k = 0))−ym]2
e1 = 12[1.0719447−1]2 = 2.58802×10−3 e2 = 12[4.9280550−5]2 = 2.58802×10−3 e3 = 12[4.9998180−6]2 = 0.500182
Calculating erros
em = 12[f(xm|θ(k = 0))−ym]2
e1 = 12[1.0719447−1]2 = 2.58802×10−3 e2 = 12[4.9280550−5]2 = 2.58802×10−3 e3 = 12[4.9998180−6]2 = 0.500182
Calculating erros
em = 12[f(xm|θ(k = 0))−ym]2
e1 = 12[1.0719447−1]2 = 2.58802×10−3 e2 = 12[4.9280550−5]2 = 2.58802×10−3 e3 = 12[4.9998180−6]2 = 0.500182
The first two data points are mapped better than the third. The result can be improved by cycling through the model.
Calculating erros
em = 12[f(xm|θ(k = 0))−ym]2
e1 = 12[1.0719447−1]2 = 2.58802×10−3 e2 = 12[4.9280550−5]2 = 2.58802×10−3 e3 = 12[4.9998180−6]2 = 0.500182
The first two data points are mapped better than the third. The result can be improved by cycling through the model.
The GM will update the rule-base parameters bi,cji andσji using the
Updating ...
ǫm(k = 0) =f(xm|θ(k = 0))−ym