Neural Computing in Quaternion Algebra

(1)

by

Toshifumi Minemoto

A dissertation submitted in partial fulfillment

of the requirements for the degree of

Doctor of Engineering

University of Hyogo, Japan

(2)

Chapter 1 Introduction

Recently, the term of ‘Artificial Intelligence’ frequently appears in various news articles. In Japan, it is predicted that the productive age population will decrease sharply in the future as the population ages and fewer babies are born and it is strongly expected that artificial intelligence as a substitute for labor force and the use of robots will be utilized. It is attracting attention as not only industrial fields with shortage of manpower but also means for creating new value in fields that have been dependent on human experience and intuition, e.g., Fintech such as financial services making full use of information technology and predictive maintenance of factory production facilities in the manufacturing industry.

The driving force behind the expectations for industrial applications of these arti-ficial intelligence, which actually means computational intelligence, is deep learning. Deep learning is a generic term for machine learning methods using multi-layered mod-els of neural networks which is inspired by the information processing in the brain. Machine learning has been developed as a field of artificial intelligence since the end of the 1950s as a research to realize learning abilities equivalent to brains by computer, not by explicit programming. As shown by the success of DeepMind’s AlphaGo [1], this deep learning has attracted a great deal of attention in the field of machine learning in recent years due to its high performance.

(5)

information infrastructure including the Internet has been improved. The global data volume doubles every two years, and it will exceed 40,000 exabytes in 2020 [2]. In particular, it becomes possible to acquire various kinds of information such as images and moving images with the development of social media and the evolution of var-ious sensors, That is, it is easy to collect large amounts of data that can be used for learning as compared with before. Next is a dramatic improvement in the performance of the computer. Due to the significant improvement in computing performance by the evolution of both hardware and software, such as GPUs, multi-core CPUs, large capacity memories, parallel processing by computer clusters, a large-scale neural net-work model which has been difficult to evaluate has become possible to execute on a computer. Thus, the exponential improvement in computer performance indicated in the performance transition of TOP 500 [3] supercomputer is accelerating research on machine learning. The last and most important is the steady accumulation of long years of solid sober research in neural networks. Deep learning is a neural network models which has various layers at least on the surface. That is, its basic part has not changed much from the conventional neural networks. Various basic algorithms and models for neural networks have been proposed before 1990, e.g., back propagation algorithm enables learning of multi-layered neural networks, and Boltzmann machine is the prototype of deep brief network [4] which is cited as a breakthrough of deep learning. Moreover, Neocognitron [5] and LeNet [6] are the prototype of convolutional neural networks which is frequently used for recent image recognition systems. It can be said that there is the development of deep learning today because of these findings. Several approaches for solving problems which are caused in deep networks has been proposed such as an activation function by rectified linear unit and effective adjustment methods of learning rate for gradient descent learning by ADAM and AdaGrad. They are also the main factor for the achievement of deep learning.

Extension of neural networks on hypercomplex number system is one of such re-search efforts. Input, output, and internal state of a neuron which is the basic computa-tional unit are represented by hypercomplex number in these types of neural networks.

(6)

Complex numbers and quaternions, which are ones of hypercomplex numbers, are useful number systems for various engineering problems. It is well known that com-plex numbers are employed for signal processing dealing with amplitude and phase information. Quaternions are widely employed in robotics and computer graphics due to the fact that they provide a convenient mathematical notation for representing orien-tations and roorien-tations of objects in three dimensions. From the viewpoint of information processing, the essential significance of neural networks and other machine learning methods is to acquire information expressions which capture the intrinsic structure inherent in data through learning process. In that sense, it is thought that it is very useful to employ complex numbers and quaternions that can calculate two or three dimensional information as a unit as expressions of neurons. In fact, it is suggested that complex-valued and quaternionic feed forward neural networks have a remarkable learning ability in terms of affine transformation problems in two or three dimensional space in previous studies [7, 8, 9]. Hypercomplex-valued neural networks hold the potential for higher performance than that of the conventional real-valued neural net-works. Therefore, introducing complex numbers and quaternions into neural networks is expected to achieve performance improvement of deep learning. Although the num-ber of applications of neural networks employing quaternions is comparatively less than that of complex-valued neural networks.

In this study, we focus on quaternion-based neural computing which can be ex-pected to be effective in future application fields such as robot control and color image processing and recognition. We address several problems existing mutually connected neural networks and multilayered neural networks, and propose methods for solving these problems and aim for gaining a new understanding of quaternionic extensions of the existing methods.

(7)

Chapter 2 Quaternion Algebra

2.1 Definitions of Quaternions

In this section, we show the definitions and notations of quaternions. The Quaternion algebra was first described by W. R. Hamilton in 1843 [10]. Quaternions form a class of hypercomplex numbers that consist of a real number and three kinds of imaginary units,i,j,k. Formally, a quaternion is defined as a vector in a four-dimensional vector

space,

x x(e)+x(i)_i+x(j)_j +x(k)_k_, (2.1) where x(e)_, _x(i)_, _x(j) _and _x(k) _{are real numbers. The multiplications between the three}

imaginary units obey the following rules:

i2 _j2_k2_ijk−1_, _ij −_ji_k_. (2.2)

These rules are summarized in The multiplication rules of the quaternions are shown in table 2.1. In this table, the columns on the left-hand side give the first factor while the top row indicates the second factor. This is important since hypercomplex numbers are not commutative.

A quaternion is also written using 4-tuple or 2-tuple notations as follows:

(8)

Table 2.1: Multiplication rules between imaginary unitsi,j,k. 1 i j k 1 1 i j k i i −1 k −j j j −k −1 i k k j −i −1

where #»x (x(i)_,x(j)_,x(k)). In this representationx(e)is the scalar part ofx, and #»x forms the vector part.

Now, we show the definition of arithmetic operations between quaternions,pandq. The addition and subtraction of quaternions are defined in the same manner as those of complex numbers or vectors by

p_±q (p(e)±q(e)_, #»p ± #»q) (2.4)

(p(e)±q(e)_,p(i)±q(i)_,p(j)±q(j)_,p(k)±q(k))_. (2.5) The product ofp andq, denoted aspq, is defined from Eq.(2.2) as

pq (p(e)q(e)− #»p · #»q_,p(e)#»q +q(e)p#»+ #»p × #»q)_, (2.6) where #»p · #»q and #»p × #»q denote the dot and cross products respectively between three dimensional vectors #»p and #»q. The multiplication between scalar a (a_, #»0) and quaternion qis given by

aq (aq(e)_,a#»q) (aq(e)_,aq(i)_,aq(j)_,aq(k))_. (2.7) The quaternion conjugate is defined as

q∗ (q(e)_,−#»q) q(e)−q(i)_i−q(j)_j−q(k)_k_. (2.8) The conjugate of product holds the relation of(pq)∗q∗p∗. The quaternion norm ofq, notation _|q_|, is defined by

|q_| √qq∗ √

(9)

and the reciprocal of a non-zero quaternion qis defined as q−1 q

∗

|q|2. (2.10)

The quaternion involutions, which are self-inverse mappings, are defined [11] as: qi₋_i_q_i _q(e)₊_q(i)_i₋_q(j)_j₋_q(k)_k_, (2.11)

qj ₋_j_q_j _q(e)₋_q(i)_i₊_q(j)_j₋_q(k)_k_, (2.12)

qk ₋_k_q_k _q(e)₋_q(i)_i₋_q(j)_j ₊_q(k)_k_. (2.13)

By using these involutions, four components of a quaternion can be represented as: q(e) 1 4 ( q+qi+qj +qk)_, (2.14) q(i) 1 4i ( q +qi−qj −qk)_, (2.15) q(j) 1 4j ( q−qi₊_qj ₋_qk)_, _(2.16) q(k) 1 4k ( q₋qi −qj ₊_qk) . (2.17)

A quaternion and its conjugate can be also expressed as a linear function of the involu-tions as follows: q 1 2 ( qi∗₊_qj∗₊_qk∗₋_q∗)_, _(2.18) q∗ 1 2 ( qi₊_qj ₊_qk₋_q)_. (2.19)

(10)

2.2 Quaternions in polar representation

A complex value c c(e) +_ic(i) in Cartesian form can also be represented as one in polar form: c r·eiθ, wherer √c(e)2+c(i)2and _θtan−1c(i)/c(e). In a similar way, a quaternionxin Cartesian form can be transformed into one in a polar representation. We adopt the representation defined by Bülow [12, 13] in this study. Quaternionxcan be represented in the polar form

x |x|eiφekψejθ_, (2.20)

where

eiφ _cos_φ₊_i_sin_φ, _ejθ _cos_θ₊_j_sin_θ, _ekψ _cos_ψ₊_k_sin_ψ. _(2.21) The three kinds of angular phaseφ,θ, andψare defined within the following interval:

−π≤φ<π, −π₂≤θ<π₂, −π₄≤ψ≤π₄. (2.22)

The formulas for calculating these phases from the quaternion in Cartesian form are given as Algorithm 1.

begin

ψ ← −arcsin(2(q(i)q₂(j)−q(e)q(k)))

ifψ ±π 4 then φ ←0, θ← argj(qk∗q) 2 else φ ← argi(qqj∗) 2 , θ← argj(qi∗q) 2 end ifeiφekψejθ −q _then ifφ ≥ 0then φ← φ−π else φ← φ+_π end end end

(11)

Chapter 3 Associative Memories in Quaternionic

Neural Networks

Hopfield network is an associative memory model based on fully connected neural net-works in which information is embedded as connection weights between the neurons. Quaternionic Hopfield associative memory is expected to process three-dimensional multilevel values such as intensities of RGB color components or body coordinates in three dimensional space. The state of a neuron is represented by three kinds of phases in distinct complex planes in the associative memory. However, the performance and properties of such the quaternionic extended models are not clarified.

We evaluate the performance of quaternionic Hopfield associative memory such as storage performance of two learning rules, i.e., Hebbian learning rule and projection learning rule. We also examine noise robustness of the stored memory patterns to evaluate the recall performance. Then, we propose another types of quaternionic as-sociative memory models for improving the performance of the quaternionic Hopfield associative memory. One is quaternionic bipartite auto-associative memory which has two types of neuron layers. The other is quaternionic Hopfield associative memory with dual connections which are utilizing non-commutative property of quaternions. We show these models have superior performance compared to that of the quaternionic Hopfield associative memory through several experiments.

(12)

3.1 Introduction

Extensions in complex and hypercomplex number systems on Hopfield-type associative memories (HAMs) [14] have been extensively investigated. Representations for states in the networks are enriched due to the degrees of freedom in these systems. Complex-valued extension of Hopfield Associative memories, called CHAMs, were proposed [15, 16, 17, 18] and their learning schemes for embedding the patterns in the networks were also investigated [19, 20, 21]. Various CHAMs adopt the representation for the state of a neuron as a discrete point in the complex plane, so multilevel values, such as the intensity of a pixel in the gray-scaled image, can be naturally represented in these networks.

One difficulty in CHAMs is the existence of rotated patterns in the network by degenerated states of embedded patterns. When a pattern is to be embedded to a CHAM network where a neuron takesK of quantization levels,K₋1 rotated patterns are also to be embedded in the network. These rotated patterns makes their mixture patterns, and they become spurious patterns in the network, leading to reduce the noise robustness in retrieving patterns from the network. One possible solution to cope with rotated patterns is to compose a heterogeneous network where real-valued neurons are incorporated in the complex or quaternionic networks. A pattern in the real-valued HAM has only one rotated pattern in which each of elements takes the inverted value. A combination of real-valued and complex-valued/quaternionic patterns makes fewer rotated patterns. Such networks are presented and investigated as Complex-valued Bipartite Auto-Associative Memory (CBAAM) [22].

A quaternionic extension of HAMs is called QHAMs where the state of a neuron is represented by three kinds of phases in distinct complex planes [23]. QHAMs are expected to process three-dimensional multilevel values such as intensities of RGB color components or body coordinates in three dimensional space. QHAMs adopt a similar principle in representing the patterns in the network to that in CHAMs, thus they also suffer from the rotated and mixture patterns. However, the detailed performance of

(13)

QHAM is not provided in the previous studies.

In this chapter, we first evaluate the performance of QHAM such as storage perfor-mance of two learning rules, i.e., Hebbian learning rule and projection learning rule. We also examine noise robustness of the stored memory patterns to evaluate the recall performance of QHAM. Then, we propose another types of quaternionic associative memory models for improving the performance of QHAMs. One is Quaternionic Bi-partite Auto-Associative Memory (QBAAM) and the other is Quaternionic Hopfield Associative Memory with Dual Connections (QHAMDC). QBAAM is a quaternionic extended model of CBAAM. The network architecture of QBAAM has two layers where one layer is composed of quaternionic neurons and the other layer is composed of real-valued neurons. QHAMDC is an another approach for improving the performance of QHAM by utilizing non-commutative property of quaternions.

This chapter is organized as follows. In Section 3.2, we first recapitulate the QHAM model and its learning algorithm. QBAAM and QHAMDC are described in Section 3.3 and Section 3.4, respectively. In Section 3.5, Several experiments results are given in Section 3.5 for comparing the learning algorithms and recall performance. We finish with conclusions in Section 3.6.

3.2 Quaternionic Hopfield Associative Memory

We assume a quaternionic Hopfield associative memory (QHAM) withNquaternionic multistate neurons. All neurons are connected to each other in QHAM as shown in Fig. 3.1. The state of ap-th neuron in QHAM is represented in the polar form described in Sec. 2.2,

up eiφpekψpejθp, (3.1)

where|up|1. The internal state of the p-th neuron at a discrete timetis defined as hp(t) N ∑ q1 wpquq(t) N ∑ q1 wpqeiφq(t)_ekψq(t)_ejθq(t)_, _(3.2)

(14)

wherewpq _∈Hdenotes the connection weight from thep-th neuron to theq-th neuron.

We use qsign function which consists of three types of complex-valued multistate signum function as an activation function for quaternionic multistate neurons. Thus, the state of a neuronpat timet is updated as

up(t+1)qsign( hp(t)) , (3.3) where qsign( hp) qsign(eiφp_ekψp_ejθp) _(3.4) csign A ( eiφp) _csign B ( ekψp)_csign C ( ejθp)_. _(3.5)

The function csign_A is used for updatingφ, and it is defined as

csign_A (eiφ) ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ q(₀φ) for −π≤ arg(eiφ) <−π+_φ₀ q(₁φ) for −π+_φ₀ ≤arg(eiφ) _<−_π+2_φ₀ ... q(_Aφ₋)₁ for −π+(A−1)_φ₀ ≤arg(eiφ) _<−_π+A_φ₀ , (3.6)

where A is the quantization level for φ, and φ0 denotes a quantization unit which is

defined by φ02_π/A. q(_aφ)is a distinct point on a unit circle which is defined as q(aφ)exp ( i(−π+a_φ₀+ φ0 2 ) ) , a 0_{, . . . ,}A−1_. (3.7) uq up wpq Quaternionic Neuron Figure 3.1: Network structure of QHAM.

(15)

Therefore the function csign_A outputs the closest quaternion in_{q₀ , . . . ,q_A₋₁}

corre-sponding to the input. Similarly, the function csign_B for updatingψ and the function

csign_Cfor updatingθare defined as follows:

csign_B (ekψ) ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ q(₀ψ) for − π 4 ≤arg(ekψ) < −π4 +ψ0 q(₁ψ) for − π 4 +ψ0 ≤arg(ekψ) <−π4 +2ψ0 ... q(_Bψ₋)₁ for − π 4 +(B−1)ψ0 ≤arg(ekψ) < −π4 +Bψ0 , (3.8) csign_C(ejθ) ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ q(₀θ) for − π 2 ≤arg(ejθ) <−π2 +θ0 q(₁θ) for − π 2 +θ0 ≤ arg(ejθ) < −π2 +2θ0 ... q(_Cθ₋)₁ for − π 2 +(C−1)θ0 ≤arg(ejθ) <−π2 +Cθ0 , (3.9)

whereB andC are the quantization levels forφ and θ, respectively. The quantization

unitsψ0andθ0are defined byψ0_π/2Band_θ₀ _π/C, respectively. q(ψ)

b and q( θ)

c are

also defined as follows: q(_bψ)exp ( j ( −π₄ +b_ψ₀+ ψ0 2 ) ) , b 0_{, . . . ,}B−1_, (3.10) q(_cθ)exp ( k ( −π₂ +c_θ₀+ θ0 2 ) ) , c 0_{, . . . ,}C−1_. (3.11) Hence, a quaternionic neuron takes a total of A× B ×C states. An example of the quantized output points of the quaternionic neuron is shown in Fig. 3.2, whereA 4, B2, andC 3.

The energy function of QHAM takes real value when the connection weights wpq satisfy the following conditions:

wpq w∗_qp_, wpp w∗_pp (w(e)_, #»0)_, w(e) ≥0_. (3.12) The function monotonically decreases under the condition_|∆φ| < φ0,|∆φ| < ψ0,|∆ψ| < φ0. ∆φ, ∆ψ, ∆θare a phase difference between the state at timet +1 and the internal state at timet for the neuron undergoing its update [23].

(16)

Two types of learning rules for QHAM are described. Let ξp (_ξp

1, . . . , ξpN)

T

be a p-th memory pattern (p 1_{, . . . ,}P), where _ξp

i ∈ {q( φ)

a q(_bψ)q(cθ)}andP is the number of embedded memory patterns. Then, the training pattern matrixX is represented as

X (ξ1_,ξ2_{, . . . ,}ξp)_. (3.13) The connection weight matrixW,whose element at₍p,q)is denoted aswpq, is given by Hebbian learning rule as:

W XX∗−diag(XX∗) (3.14)

where X∗ _{denotes the conjugate transpose of} _X_{, and diag}₍_A₎ _{is the diagonal matrix}

whose diagonal elements equal to the diagonal of A. This learning rule is simple and fast, however it is difficult to achieve effective storage capacity when memory patterns are not orthogonal to each other.

Next, we describe the projection learning rule, which is a learning algorithm that can embed non-orthogonal (correlated) memory patterns in a network [24]. A key idea of the projection rule is that non-orthogonal patterns are first projected onto orthogonal ones by using inverse matrix, and then the Hebbian rule is applied to these projected patterns. The connection weight matrixW is given by the projection rule as follows:

W A−diag(A)_, AX(X∗X)−1X∗_, (3.15) ϕ0 2π A q₀(ϕ) q(₁ϕ) q(₂ϕ) q₃(ϕ) ψ0 π 2B q(₀ψ) q(₁ψ) θ0 π C q(₀θ) q(₁θ) q(₂θ)

Figure 3.2: An example of quantized output points in the quaternionic multistate neuron₍A4_,B2_,C 3)

(17)

· · ·

· · · Quaternionic Neuron

Real-valued Neuron

Visible Layer (Nneurons)

Invisible Layer (Mneurons)

W

iv

W

vi

Figure 3.3: Network structure of QBAAM.

The number of patterns that can be stored in the network equals to the number of neurons in the network by using the projection rule.

3.3 Quaternionic Bipartite Auto-Associative Memory

The model of Quaternionic Bipartite Auto-Associative Memory (QBAAM) is described in this section. QBAAM is a quaternionic extension of Complex-valued Bipartite Auto-Associative Memory (CBAAM) [22], in which quaternionic multistate neurons, defined in section 3.2, are used in the network. QBAAM has two layers called visible layer and invisible layer, as shown in Fig. 3.3. The visible layer contains quaternionic mul-tistate neurons, and the invisible layer contains real-valued neurons, i.e., the state of real-valued neurons is +1 or −1. The connection establishes neurons between layers, encoded by quaternionic values, and there are no connections between neurons in each layer. The neurons’ outputs in the visible layer are quaternionic values and the connec-tion weights from visible to invisible layers are also quaternionic values, the real-valued neurons in the invisible layers only takes real parts of these quaternionic values. The visible layer can process 3-tuple of multilevel signals, and the recall of rotated patterns of memory patterns can be suppressed by utilizing the neurons in the invisible layer .

(18)

a combination of quaternionic values (for visible layer) and real values (for invisible layer), thus the pairs of storing patterns are necessary. Suppose that the patterns for visible layers are given by ξ1v, ξ2v, . . ., ξPv. For each of these patterns, the patterns for invisible layers are also given by ξ1_i, ξ2_i, . . ., ξP_i. The storing patterns for QBAAM are

(ξ1v,ξ1i),(ξ2v,ξ2i),· · ·,(ξPv,ξPi). The storing pattern matrices are

X (ξ1 v,ξ2v,· · · ,ξPv ) , (3.16) Y (ξ1 i,ξ2i,· · · ,ξPi ) . (3.17)

The bidirectional connection weights are constructed from the matrices X and Y. The most straightforward algorithm for the construction is known as the Hebbian learning rule:

Wiv Y X∗, (3.18)

Wvi XY∗, (3.19)

where Wiv is a connection matrix from the visible to invisible layers and Wvi is a connection matrix from the invisible to visible layers, and these matrices are Hermitian, i.e., Wiv Wiv∗ holds. This learning scheme makes the embedded patterns in the network unstable and leads to extremely low storage capacity if the patterns are not orthogonal to each other. In order to improve the storage capacity, generalized inverse matrix learning scheme was proposed in [25], given as

Wiv Y(X∗X)−1X∗, (3.20)

Wvi X(Y∗Y)−1Y∗. (3.21)

This scheme first orthogonalized the embedded patterns, after which the orthogonal-ized patterns are embedded by Hebbian rule. Though Wiv W∗_iv does not hold, this scheme effectively works.

Retrieving the embedded patterns from the input patterns conducts in a ping-pong manner between neurons in the visible layer and neurons in the invisible layer. Let

(19)

y₍t₎ (y₁(t)_,y₂(t)_{, . . . ,}yM(t)) be states of the neurons in the invisible layer at the time stept. The states of each layer are updated as follows:

y₍t₎sign(W_ivx(t))_, (3.22)

x₍t+1) qsign(W_viy(t))_, (3.23)

where siqn_(·)and qsign_(·)are activation functions for neurons in the invisible layer and visible layer, respectively, and these activation functions are applied to each neurons. sign is defined as sign₍u₎ ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 Re(u) ≥0 −1 Re(u₎< 0 , (3.24)

and qsign(·)is the same definition as Eq. (3.3). At the initial states, it is necessary to set an input pattern on the neurons in the visible layers. The initial configuration in the visible layer and connection weights makes the neurons’ states in the invisible layers, after which the neurons in the visible layer can be updated by the neurons’ states in the invisible layer and the connection weights from the invisible to visible layers. The updating of the network continues until no changes being detected at the neurons in the visible layer. Finally we can obtain the patternξvin the visible layer which corresponds to the input pattern to the network. These recall process is summarized in Fig. 3.4.

3.4 Quaternionic Hopfield Associative Memory with Dual

Connections

This section presents quaternionic Hopfield associative memory with dual connections (QHAMDC). We can consider two types of multiplication order of operations to cal-culate weighted input because of non-commutative nature of quaternion algebra. All neurons are connected to each other and they have two types of connection weights as shown in Fig. 3.5. In QHAMDC, two types of connection weights are used for updating

(20)

Input Wiv . . . . . . (a) → Wvi . . . . . . (b) → · · · → Wvi . . . . . . Output (c)

Figure 3.4: Recall process of QBAAM. (a) An input pattern is given to the visible layer and the states of the neurons in the invisible layer are updated by using the connection weights from the visible layer to the invisible layer. (b) The states of the neurons in the invisible layer are updated by using the connection weights from the invisible layer to the visible layer. (c) After iterating step (a) and step (b), the states of the neurons in the visible layer are obtained as output pattern.

the internal states of neurons: hp(t) N ∑ q1 ( wL_pquq(t)+uq(t)wR pq ) , (3.25)

wherewL_pqis a connection weight fromp-th neuron toq-th neuron which is multiplied from the left andwR_pqis one which is multiplied from the right. The connection weight matrixWL whose element iswL_pq is given by Eq. (3.15). The connection weight matrix

WRwhose element is wR_pqis given as follows:

WR B−diag(B)_, B Y∗(YY∗)−1Y_, Y XT_, (3.26) where X is the training pattern matrix. Thus, WR denotes the connection weights obtained by using the projection rule with multiplications in the reverse order. The state of a neuron in QHAMDC is updated by Eq. (3.3) in the same manner as in QHAM.

(21)

uq

up

wLpq,wRpq

Quaternionic Neuron Figure 3.5: Network structure of QHAMDC.

We define the energy function of the network with dual connections as E₍t₎−1 2 N ∑ p1 N ∑ q1 ( u∗ p(t)wLpquq(t)+uq(t)wRpqup(t)∗ ) , (3.27)

where E takes real value (E E∗) when wL

pq and wRpq satisfy the condition Eq. (3.12). Since Re₍x y₎Re(yx)holds for x_,y ∈ _H, the energy function can be represented as

E(t) −1 2Re N ∑ p1 N ∑ q1 u∗_p(t)(wL_pquq(t)+uq(t)wR pq ) . (3.28)

Suppose that the state of the r-th neuron is updated at time t +1 according to the Eq. (3.25) and Eq. (3.3). The energy gap ∆E between the E(t + 1) and E(t) finally becomes as follows: ∆E −Re N ∑ r1 N ∑ q1 u∗ r(t+1) ( wL_rquq(t)+uq(t)wRrq ) +Re N ∑ r1 N ∑ q1 u∗_r(t)(wL_rquq(t)+uq(t)wR rq ) . (3.29)

∆Ebecomes non-positive because u∗_r(t)maximize Re(u∗_r(t)hr(t))according to the

(22)

3.5 Experiments

3.5.1 Storage Capacity

We first evaluate the storage capacity of QHAM by using the Hebbian learning rule and the projection learning rule. In this experiment, the patterns with randomly generated values are used as memory patterns. The size of the patterns, which is the number of neurons in the network, was set to N 100, and the quantization level was set as follows: ₍A,B,C) (8_,2_,4), (16_,4_,8), (32_,8_,16). In these conditions, the quantization units φ0, ψ0, and θ0are the same size. The number of the memory patterns, denoted

byP, varies such thatP 1_,2_{, . . . ,}100.

The stability of the patterns are investigated by the following procedure. First, for givenA,B,C, andP, memory patterns are generated and stored into the network. Next,

each of the memory patterns is set to the network as its initial states, then the states for all neurons are updated. If the network state does not change, this stored pattern can be regarded as stable.

Figure 3.6(a) shows P dependency of the retrieval success rates against various quantization level. The retrieval success rates are calculated from 1000 trials. From Fig. 3.6(a), we find that the memory patterns are hardly embedded to the network by using the Hebbian learning rule. The memory patterns tend to be more unstable with increases of the quantization level and the number of stored patterns. If the quantization level(A,B,C)is set to(32,8,16), only one pattern is stable in the network as shown in Fig. 3.6(a). That is, two or more memory patterns cannot to be stable in the network by using the Hebbian rule for large quantization level. In contrast, the projection rule can store up to 99 patterns in the network regardless of the quantization levels as shown in Fig. 3.6(b). In this figure, the retrieval success rate is 0% in the case of P 100. It is due to that the self-connection weights, wpps, are set to 0. If these weights are set to positive values, the retrieval success rate becomes 100%. From these results, all the stored patterns are local minima in the network by using the projection

(23)

0 20 40 60 80 100 0 20 40 60 80 100 # of memory patterns Retrie val Success Rate [ % ] (A,B,C)(8_,2_,4) (A,B,C)(16_,4_,8) (A,B,C)₍32_,8_,16₎

(a) Hebbian learning rule

0 20 40 60 80 100 0 20 40 60 80 100 # of memory patterns Retrie val Success Rate [ % ] (A,B,C)(8_,2_,4) (A,B,C)(16_,4_,8) (A,B,C)₍32_,8_,16₎

(b) Projection learning rule

Figure 3.6: Stability of the stored memory patterns in QHAM (N 100).

learning rule. Therefore the memory patterns stored by using the projection rule have higher stability than those by using the Hebbian rule in QHAM.

3.5.2 Noise Robustness of Recall

We evaluate the recall performance by retrieving patterns from noisy patterns. The experiments for QHAMs, QBAAMs, and QHAMDCs are conducted with the following parameters:

1. The number of neurons in the network (N) is set to 100.

2. The number of patterns to be embedded to the network (P) is set to 10, 30, and 50. 3. The quantization levels for quaternionic neurons ₍A,B,C) are set to (8,2,4),

(16,4,8), and(32,8,16).

4. The noise level (r) changes to 0.00, 0.05,· · ·, and 0.80.

5. The update of neurons in the network is repeated for 1000 iterations.

6. For each parameter, 1000 experiments are conducted with randomly generated patterns being embedded.

(24)

0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 Noise Level Retrie val Success Rate [ % ] P30,(A_,B_,C)(8_,2_,4) P30,(A_,B_,C)(16_,4_,8) P30,₍A_,B_,C₎₍32_,8_,16₎ 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 Noise Level Retrie val Success Rate [ % ] P10,(A_,B_,C)(16_,4_,8) P30,(A_,B_,C)(16_,4_,8) P50,₍A_,B_,C₎₍16_,4_,8₎

Figure 3.7: Noise robustness of recall performance for QHAM (N 100)

The memory patterns are randomly generated and they are embedded to the each associative memory by using projection learning rule. The noisy input pattern is generated from one of the memory patterns with each pixel value being changed to random value by a specific ratior(noise level). The retrieval is regarded as successful if the correct pattern corresponding to the input pattern can be retrieved from the noisy input pattern.

First, we examine the performance of QHAMs. Figure 3.7 shows the retrieval success rates against the noise level. The number of memory patterns was fixed to P 30 and the quantization level was set to (A_,B_,C) (8_,2_,4)_,(16_,4_,8)_,(32_,8_,16)in Fig. 3.7 on the left. The number of memory patterns was set toP 10_,30_,50 and the quantization level was fixed to₍A,B,C₎ (16_,4_,8)in Fig. 3.7 on the right. It is shown that the retrieval success rates for QHAMs get worse when the number of embedded patterns increases or the quantization levels increases. The noise robustness of QHAM highly depends on the quantization levels. Thus, the quantization level is particularly sensitive parameters for the performance of QHAMs.

(25)

0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 Noise Level Retrie val Success Rate [ % ] P30,(A_,B_,C)(8_,2_,4) P30,(A_,B_,C)(16_,4_,8) P30,₍A_,B_,C₎₍32_,8_,16₎ 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 Noise Level Retrie val Success Rate [ % ] P10,(A_,B_,C)(16_,4_,8) P30,(A_,B_,C)(16_,4_,8) P50,₍A_,B_,C₎₍16_,4_,8₎

Figure 3.8: Noise robustness of recall performance for QBAAM (N 100_,M 100) experiment, the number of neurons in the invisible layer is set to M100. From these results, we find that the retrieval success rates are maintained when the quantization levels are increased. Thus, the noise robustness of QBAAM does not depend on the quantization levels. In QBAAMs, the noise robustness can be controlled by changing the number of neurons in the invisible layer. Figure 3.9 shows the dependency of the success rate on the number of neurons in the invisible layer. The parameters for the network were set to (A,B,C) (16_,4_,8) and P 30. The number of neurons in the invisible layer was changed from M 20 to M 180. This result shows that neurons in the invisible layer assist to retrieve the correct pattern.

Finally, the recall perforamnce for QHAMDCs is shown in Figure 3.10. From these results, we find that the retrieval success rates are maintained when the quantization levels are increased similar to that of QBAAM. Also, the noise robustness of QHAMDC does not depend on the quantization levels.

(26)

0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 Noise Level Retrie val Success Rate [ % ] M_M180160 M140 M120 M100 M80 M60 M40 M20

Figure 3.9: Dependency of recall performance on the number of neurons in the invisible layer for QBAAM (N 100_,P 30_,(A_,B_,C)(16_,4_,8))

0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 Noise Level Retrie val Success Rate [ % ] P30,(A_,B_,C)(8_,2_,4) P30,₍A_,B_,C₎₍16_,4_,8₎ P30,(A_,B_,C)(32_,8_,16) 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 Noise Level Retrie val Success Rate [ % ] P10,(A_,B_,C)(16_,4_,8) P30,₍A_,B_,C₎₍16_,4_,8₎ P50,(A_,B_,C)(16_,4_,8)

(27)

3.5.3 Image Retrieval Task

We have explored the noise robustness of three types of quaternionic associative mem-ories by using random patterns in the previous experiments. In this section, we inves-tigate the performances of by storing and retrieving natural images, which have more intensity resolutions in each pixel, from the viewpoint of practical applications such as color image database.

Figure 3.11 shows memory patterns to be embedded to the network for this ex-periments. These images are 200 images included in CIFAR-10 dataset [26]. Each image consists of 32_×32 1024 pixels and each pixel value is represented by 24-bit for RGB color. The three channels, which are red, green, and blue, are assigned to

φ, θ, and ψ of a quaternion in polar representation. Thus, the quantization level is

set to (A,B,C) (256_,256_,256). For example, a tuple of RGB pixel value (a_,b_,c) is represented as a quaternion q(aφ)q_b(ψ)q(cθ) based on Eq. 3.7, Eq. 3.10, and Eq. 3.11. The number of neurons in the network is the same as the number of pixels of the images, i.e., N 1024. In this experiment the number of neurons in the invisible layer for the QBAAM was set to M 1024, thus the degree of freedom of the connection weights in the QBAAM and QHAMDC is equal. The memory patterns are embedded to the networks with the projection learning rule. Then, noisy patterns as shown in Fig. 3.12 are set as the initial configuration of the network and the output patterns are obtained after the 1000 iterations of the update process.

Figures 3.13, 3.14, and 3.15 show recalled output patterns for QHAM, QBAAM, and QHAMDC, respectively. From these figures, it is found that the QHAM cannot retrieve the stored pattern in all patterns. In contrast, the recalled patterns in the QBAAM and QHAMDC are closer to the truth compared to those in the QHAM. These are confirmed by the peak signal-to-noise ratio (PSNR) shown in Table 3.1, which is defined as,

PSNR20 log

10(255) −10 log10(MSE). (3.30)

where, MSE is mean squared error of the original image and the recalled one. The av-eraged PSNR of the patterns retrieved from the QHAM and the corresponding original

(28)

Figure 3.11: Memory patterns for image retrieval task(P 200).

Figure 3.12: Input patterns for associative memories. 50 % of all pixels in each image are altered by random values.

patterns is lower than that of the original patterns and the noisy patterns. This result is caused by the spurious patterns due to the rotation invariance of memory patterns in QHAMs. These spurious patterns are suppressed to be retrieved due to the real-valued neurons in QBAAMs or the non-commutative property of quaternionic connection weights in QHAMDCs, so that he higher PSNRs were achieved in the QBAAM and QHAMDC.

(29)

Figure 3.13: Output patterns of QHAM (N 1024).

Figure 3.14: Output patterns of QBAAM (N 1024_,M 1024).

(30)

Table 3.1: Averaged PSNR for all recalled images. Noisy Input 11.42 [dB] QHAM 10.77 [dB] QBAAM (M 1024) 12.81 [dB] QHAMDC 16.86 [dB]

3.6 Conclusion

In this chapter, we investigate the performance of quaternionic associative memory models. First, the storage capacity of QHAM by using Hebbian learning rule and projection learning rule is examined. From the experimental results, the Hebbian rule hardly stores the memory patterns with increasing the quantization levels of quaternionic neurons. In contrast, the projection rule can stabilize all the memory patterns into the network regardless of quantization levels.

The noise robustness of the retrieval patterns is also investigated and it is found that the performance from noisy input depends on the quantization levels of the quater-nionic neuron and the number of stored memory patterns. The quantization level is particularly sensitive parameter for the recall performance of QHAMs.

In order to obtain better performances in retreiving patterns, we propose quaternioni bipartite auto-associative memory (QBAAM). The recall performance of QHAMs suffer from the spurious patterns caused by the rotation invariance of their representation of neuron states. The proposed model has two-layered network structure where one layer is the visible layer and contains quaternionic neurons and the neurons in the other layer, called invisible layer, are real-valued neurons. A combination of real-valued neurons and quaternionic neurons suppresses these spurious patterns and leads to higher noise robustness in the networks.

we also propose an another type of quaternionic associative memory model, quater-nionic Hopfield associative memory with dual connections (QHAMDC). Two types of

(31)

connection weights are used in QHAMDC: one are connection weights multiplied from the left and the other are connection weights multiplied from the right. Noise robustness of the conventional quaternionic Hopfield associative memory (QHAM) is deteriorated by the spurious patterns caused by rotational invariance of training memory patterns. In QHAMDC, rotated patterns, which is one of the typical spurious patterns, can be reduced by a combination of two types of connection weights utilizing the non-commutative nature of quaternions. It is shown that the noise robustness for retrieving patterns in QHAMDC is superior to those in QHAM from experimental re-sults. More detailed analysis on spurious patterns in QHAM, QBAAM, and QHAMDC remains for our future work.

(32)

Chapter 4 Pseudo-Orthogonalization of Memory

Patterns for Quaternionic Hopfield

Neural Network

Hebbian learning rule is well known as a memory storing method for associative mem-ories. This method is simple and fast, however, its performance gets decreased when memory patterns are not orthogonal to each other. Pseudo-orthogonalization is a one solution for embedding these non-orthogonal memory patterns into associative memories. By a combination of the pseudo-orthogonalization and Hebbian learning rule, storage capacity of associative memory concerning non-orthogonal patterns is improved without high computational cost. The memory patterns can also be re-trieved based on a simulated annealing method from an external stimulus pattern. By utilizing quaternions, we can extend the pseudo-orthogonalization scheme for quater-nionic Hopfield neural networks. In this study, the extended pseudo-orthogonalization scheme for associative memories based on quaternions. We show that the proposed scheme has stable recall performance on highly correlated memory patterns compared to the conventional real-valued scheme.

(33)

4.1 Introduction

Hebbian learning rule is a well-known method for embedding patterns onto associa-tive memories, such as Hopfield neural networks [27, 14]. This method is simple and straightforward, however, it has a crucial issue for the embedding patterns; the pat-terns should be orthogonal to each other. On embedding correlated patpat-terns by this method, the storage performance of the network is significantly decreased. Thus, many researches for storing correlated patterns direct to orthogonalization of these patterns, such as pseudo-inverse matrix method [24] and iterative learning method [28]. Though these methods enable all the correlated patterns to be stable local minima in the net-work, their computational costs grow with respect to the network size and the number of patterns to be embedded.

A novel method has been proposed that enables correlated patterns onto asso-ciative memories with low computational cost [29]. The method called pseudo-orthogonalization first prepares a random pattern (mask pattern) of which length is the same as that of a memory pattern, and element-wise exclusive not-or (XNOR) operation is applied between the random pattern and the memory pattern. The pattern to be embedded in the network is a concatenation of XNORed pattern (masked pattern) and the corresponding random pattern. These pseudo-orthogonalized patterns have low correlation compared to that of the original memory patterns, thus, they can be embedded by using Hebbian learning rule without a degradation of storage capacity. The length of these patterns become double, however, the embedding process is simple and requires rather low computational cost.

The complex-valued or quaternionic extensions would be suitable for the pseudo-orthogonalization scheme; the pair of a pattern can be naturally embedded by utilizing imaginary part(s). In this study, we extend the pseudo-orthogonalization scheme by us-ing quaternions. We investigate the performance of the quaternionic extended method through storing binary memory patterns to quaternionic Hopfield networks [30], and the performance is also examined from the view point of correlations in memory

(34)

pat-terns and the loading rate (the ratio of the number of memory patpat-terns to the number of units in the network). We show that the extended method has stable recall perfor-mance on highly correlated memory patterns compared to the conventional real-valued method does.

This chapter is organized as follows. In Section 4.2, we summarize real-valued and quaternionic Hopfield neural networks. Quaternionic extension of pseudo ortho-gonalization scheme is described in Section 4.3. Several experimental results for eval-uating the proposed scheme are given in Section 4.5. In Section 4.7, we discuss the superior performances for quaternionic pseudo-orthogonalization. We finish with conclusions in Section 4.8.

4.2 Preliminaries

In this section, we explain Hebbian learning rule and the network dynamics for real-valued and quaternionic Hopfield neural networks.

4.2.1 Real-Valued Hopfield Neural Network

Let₍ξµ₁, . . . , ξµ_N),ξmµ ∈ {+1,−1}be theµ-th learning pattern. Hebbian learning rule for real-valued Hopfield neural network (RHNN) is represented as

wmn 1 N P ∑ µ1 ξµmξnµ, (4.1)

where wmn is a synaptic weight between m-th and n-th neurons, which satisfies the conditionswmm 0 andwmn wnmfor allmandn, andPis the number of the learning patterns. The dynamics of the network is given as,

xm(t+1)sgn ( _N ∑ n1 wmnxn(t) ) , (4.2)

where xm₍t_{) ∈ {}+1_,−1} denotes the output of the m-th neuron at the time step t and N is the total number of neurons in the network. The function sgn(·) is an activation function which is defined by sgn₍u₎ 1 whenu ≥0, and sgn(u)−1 whenu _<0.

(35)

4.2.2 Quaternionic Hopfield Neural Network

In quaternionic Hopfield neural network (QHNN), all neuronal parameters in the net-work are encoded by quaternions [30, 31]. Let the m-th element of aµ-th quaternionic

learning pattern beξµm ξmµ(e)+ξmµ(i)i+ξmµ(j)j+ξmµ(k)kwhereξµm(e), ξµm(i), ξµm(j), ξmµ(k)∈

{+1_,−1}. Hebbian learning rule for QHNN is represented as

wmn 1 4N P ∑ µ1 ξµmξnµ∗, (4.3)

where synaptic weights satisfy the conditionswmm ≥0 andwmn w∗nm. The dynamics of the network is given as follows:

xm(t+1)qsgn (∑N n1 wmnxn(t) ) . (4.4)

The function qsgn_(·)is an activation function for quaternionic neurons which is defined by

qsgn₍s₎sgn(s(e))+sgn(s(i))_i+sgn(s(j))_j +sgn(s(k))_k_. (4.5)

4.3 Pseudo-Orthogonalization based on Quaternions

The purpose of the pseudo-orthogonalization is to randomize memory patterns so that they can be stored by Hebbian learning. We present a pseudo-orthogonalization based on quaternions in this section.

First, let us recapitulate the real-valued pseudo-orthogonalization [29]. The memory patterns are masked by using random mask patterns as shown in Fig. 4.1. Fig. 4.2 shows the generation method for the masked patterns. The masked patterns are obtained by element-wise multiplication of the memory patterns and random patterns. Thus, the original patterns can be reconstructed from the masked patterns and the random patterns as shown in Fig. 4.3. The pseudo-orthogonalized pattern is obtained as the concatenation of these random mask pattern and masked pattern.

(36)

Memory pattern

×

Random pattern

→

Masked pattern Pseudo-orthogonalized pattern Figure 4.1: Schematic of pseudo-orthogonalization.

ξ1 × r1 ↓ r1ξ1 ξ2 × r2 ↓ r2ξ2 ξN × rN ↓ rNξN · · · · · · · · · Original pattern Random pattern Masked pattern

Figure 4.2: Generation of masked patterns.

r1ξ1 × r1 ↓ ξ1 r2ξ2 × r2 ↓ ξ2 rNξN × rN ↓ ξN · · · · · · · · · Masked pattern Random pattern Original pattern

(37)

Let₍ξ1, . . . , ξN)whereξm ∈ {+1,−1}be the original memory pattern and(r1, . . . ,rN)

whererm ∈ {+1_,−1})be a random pattern corresponding to the original pattern. The m-th element of the real-valued pseudo-orthogonalized pattern is generated by

ηr_m ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ rn, m 2n−1 rnξn, m 2n . (4.6)

This is the concatenation of the random pattern and the masked pattern between the original memory pattern and the random pattern (see Fig. 4.4(a)). Thus, The element

ηm takes either+1 or−1 and the length of the pseudo-orthogonalized pattern, denoted asN′_{, becomes twice as the original one, i.e.} _N′2N.

Next, we show extended pseudo-orthogonalization by using quaternions. Them-th element of the quaternionic pseudo-orthogonalized pattern is defined by

ηqm r2m−1+r₂_m₋₁_ξ₂_m₋₁_i+r₂_m_j+r₂_m_ξ₂_m_k_. (4.7) That is, odd numbered elements in random patterns are assigned to the real part and the rest is assigned to the imaginary partj. Also, odd numbered elements in masked

patterns are assigned to the imaginary partiand the rest is assigned to the imaginary

partk. Therefore, the original patterns can be reconstructed by

ξm ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ ηqn(e)ηqn(i), m 2n−1 ηqn(j)ηnq(k), m 2n . (4.8)

N′ N/2 is obtained in the quaternionic pseudo-orthogonalization (see Fig. 4.4(b)).

4.4 Retrieval Dynamics for Quaternionic Pseudo

Ortho-gonalization

In this section, we describe the retrieval dynamics for pseudo-orthogonalization scheme. In Hopfield network, a pattern to be a clue for recall is set as the initial state of the network, and then the memory pattern is retrieved after iterations of updating the states

(38)

of neurons. However, the information of random mask pattern which is used in the pseudo-orthogonalization is unknown in the retrieval process, so that the initial state of the network cannot be determined in pseudo-Orthogonalization scheme. Therefore, the network dynamics to be extended for retrieving memory patterns without random mask patterns. By using a simulated annealing method, the recall dynamics for real-valued psuedo-orthogonalizaion is defined as follows [29]:

hi(t) ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ ∑N n1wijxj(t)+szix2k, i 2k−1 ∑N n1wijxj(t)+szix2k−1, i 2k (4.9) Prob(x(∗)m 1) ₁₊_exp₍₋1_β₍_t₎_hm₍_t₎₎. (4.10) Similarly, we can extend the dynamics for quaternionic pseudo-orthogonalization, and

r1 r1ξ1 r2 r2ξ2 · · · rNξN

ηr

1 η2r ηr3 ηr4 ηr2N

(a) Real-valued pseudo-orthogonalization

r1 r1ξ1 r2 r2ξ2 r3 r3ξ3 r4 r4ξ4 η₁q ηq₂ r rξ rN rNξN ηq_N_/₂ · · · (b) Quaternionic pseudo-orthogonalization

Figure 4.4: Pseudo-orthogonalized patterns generated from random patterns and masked patterns utilizing (a) real numbers and (b) quaternions.

(39)

it is formulated as following equations: hm(t) N ∑ n1 wmnxn(t)+sz₂_m₋₁(x_m(i)(t)+x(_me)(t)_i)+sz₂_m (x(_mk)(t)_j+x(_mj)(t)_k)_, (4.11) Prob(x(∗)m 1) 1 1+exp(−_β(t)h(∗)_m(t)), (∗) ∈ {(e),(i),(j),(k)}. (4.12) where zm ∈ {+1_,−1} denotes them-th element of an external stimulus which is to be cue signal pattern,s is the strength of the external stimulus, andβ(t+1) _γβ(t)is the inverse temperature parameter that increases with time step t. γ with (γ > 1) is the increase rate forβ. The states of the neuronsxm(t)are initialized randomly att 0 and they evolve stochastically by using Eq. (4.12) Here, the real part and the imaginary part of the internal statehm(t)are separately updated.

4.5 Experiments

4.5.1 Recalling patterns by using extended dynamics

First, We show recall capabilities of the networks from an external stimulus pattern by using the extended dynamics defined by Eq. 4.12. In this experiment, 6 kanji patterns are used as memory patterns as shown in Fig. 4.5. Each pattern is constituted by 42_×42 pixels. These images are strongly correlated, because they have a same local pattern on their left hand. Thus, these patterns cannot be stabilize in Hopfield networks by using Hebbian learning rule.

Figure 4.6 shows the retrieval result by using real-valued and quaternionic pseudo-orthogonalization scheme. Fig. 4.6(a) shows the evolutions of overlaps between ground

(40)

truth pattern (鮃_{) and recalled pattern. The overlap is defined by the average of}

correla-tions among the patterns stored in the network. We define the overlaporfor real-valued

patterns and oq for quaternionic patterns are defined as

or 1 N N ∑ m1 ξµmxm(t), (4.13) oq₄1_N N ∑ m1 ( ξmµ(e)xm(t)(e)+ξµm(i)xm(t)(i)+ξµm(j)xm(t)(j)+ξmµ(k)xm(t)(k) ) . (4.14)

In this results, the parameters for updating state of the neurons in the network were set toβ(0) 1_.0, _γ 1_.001, s 1_.0, and the pattern shown in Fig 4.6(b) is used as the external stimulus. At the final step(t 2000), the overlaps for each scheme become nearly 1.0, so that the original memory pattern are retrieved successfully. This can be

confirmed from Figs. 4.6(c) and 4.6(d). These figures are reconstructed images from the pseudo-orthogonalized patterns recalled in the network for each time step.

Figure 4.7 shows the retrieval result when a noisy pattern is used as external stimu-lus. In this results, the strength for external stimulus is set tos 0_.5 to avoid attracting the neuron state to the noise component excessively. The noisy external stimulus is shown in Fig. 4.6(b). This pattern is generated by inverting the pixel value of 30% of the original pattern. We find that even if the noisy patterns are given to the network as external stimulus, the pattern corresponding to the stimulus is successfully recalled as shown the overlap and reconstructed images. Therefore, the quaternionic extended recall dynamics functions correctly.

4.6 Retrieval performance

Next, we investigate the retrieval performance by using the proposed scheme compared to the conventional real-valued scheme. For this experiment, random patterns are used as the original memory pattern which are obtained by using the following probability:

(41)

0 500 1000 1500 2000 0.0 0.2 0.4 0.6 0.8 1.0 Time Step Ov er lap Real-valued Pseudo-Orthogonalization Quaternionic Pseudo-Orthogonalization (a) Overlap

(b) External stimulus (Input pattern)

t0 _t200 _t400 _t600 _t800 _t1000 _t1200 _t1400 _t1600 _t1800 _t2000

(c) Retrieved patterns by real-valued pseudo-orthogonalization scheme

t0 t200 t400 t600 t800 t1000 t1200 t1400 t1600 t1800 t2000

(d) Retrieved patterns by quaternionic pseudo-orthogonalization

(42)

0 500 1000 1500 2000 0.0 0.2 0.4 0.6 0.8 1.0 Time Step Ov er lap Real-valued Pseudo-Orthogonalization Quaternionic Pseudo-Orthogonalization (a) Overlap

(b) External stimulus (30% of original pattern are inverted)

t0 _t200 _t400 _t600 _t800 _t1000 _t1200 _t1400 _t1600 _t1800 _t2000

(c) Retrieved patterns by real-valued pseudo-orthogonalization scheme

t0 t200 t400 t600 t800 t1000 t1200 t1400 t1600 t1800 t2000

(d) Retrieved patterns by quaternionic pseudo-orthogonalization scheme

(43)

whereζm is them-th element in a random pattern generated according to the uniform probability:

Prob₍ζm ±1)1/2. (4.16)

bis a correlation parameter for the random patterns which satisfies E[Corr(ξm, ζm)] b. First, we investigated how the correlation in the original memory patterns affects the retrieval performance for pseudo-orthogonalization scheme. Figure 4.8 shows the retrieval success rates with changing the correlation parameter b of original memory patterns. In this experiments, the length of the original memory patterns was set to 1000, so that the number of neurons in RHNNs and QHNNs were set to 2000, and 500, respectively. The loading rate, which is the ratio of the number of the stored patternsP and the number of neuronsN, was fixed to 0.13, and the correlation parameter was set

to 0.1 from 0.0 to 0.5 with a step of 0.05. The parameters for the recall process were set as

s 0_.9, _γ 1_.002,_β(0) 1_.0. We obtained the retrieval pattern after 1000 iterations of updates and 100 trials were conducted. For each experiment, an initial configuration of the network is determined randomly, and one of the original memory patterns is used as the external stimulus. The recall was considered to be successful when the overlap between the retrieved pattern and its true pattern achieved 0.95. From the figure, we

find that the retrieval success rates are decreased with the increase of the correlation for the original memory patterns in all types of networks. However, the success rates in quaternionic pseudo-orthogonalization scheme are decreased slower than those in real-valued scheme.

Next, we show the dependency of the critical loading rate on the correlation in the memory patterns. The critical loading rate is defined as the loading rate when the overlaps of the retrieved pattern and its original pattern is lower than a threshold. Figure 4.9 shows the critical loading rates against the correlations. The threshold was set to 0.95 in this result. From this figure, we find that the critical loading rates in

quaternionic pseudo-orthogonalizaion scheme are also decreased slower than those in real-valued scheme. The higher critical loading rate means that memory patterns can be stably embedded in and recalled from the network in the same loading rate. Therefore,

(44)

pseudo-orthogonalizaion scheme utilizing quaternions is robust to the correlation of the memory patterns compared to that of real-valued scheme.

4.7 Discussion

We discuss reasons why the retrieval performance in pseudo-orthogonalization is main-tained by utilizing quaternions. First, we evaluate the dependence on the correlation of original memory patterns to examined the stability of pseudo-orthogonalized patterns in RHNNs and QHNNs.

Figure 4.10 shows the difference of the stability of pseudo-orthogonalized patterns in RHNN and QHNN. The changes of critical loading rates with increasing the correlation of the original memory patterns are shown in Fig. 4.11. The number of neurons in RHNNs and QHNNs was set to 1000. The overlaps were obtained by averaging 100 trials in 1000 updates. In each of these trials, an initial configuration are set to one of the memory patterns (pseudo-orthogonalized patterns), and the neurons are updated by using Eqs. (4.2) and (4.4). From these figures, the memory patterns become more

0.0 0.1 0.2 0.3 0.4 0.5 0 50 100 Correlationb Retrie val Success Rate [ % ] Real-valued pseudo-orthogonalizaion Quaternionic pseudo-orthogonalizaion

Figure 4.8: Retrieval success rates with changing the correlation in the original memory patterns. The parameters for recall process: s 0_.9, _γ 1_.002,_β(0)1_.0.

(45)

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.1 0.2 0.2 Correlationb Critical loading rate αc Real-valued Pseudo-Orthogonalization Quaternionic Pseudo-Orthogonalization

Figure 4.9: Critical loading rates with changing the correlation in the original memory patterns.

unstable with the increase of the correlation of the original memory patterns in RHNNs. In contrast, the memory patterns in QHNNs are stable when the correlation is increased. Therefore, the retrieval performance of QHNNs are maintained even if the correlation in the memory patterns is increased.

4.8 Conclusion

In this paper, we have investigated the embedding and retrieval performances for quaternionic extensions of the pseudo-orthogonalization scheme.

The numerical results show that the proposed scheme can embed the patterns better than the conventional (real-valued) scheme from the viewpoint of loading rates. This seems due to the improvement of orthogonalization achieved by the degree of freedom on the dimensions in quaternion number systems.

we have also investigated the stability and retrieval performances for the proposed scheme from the viewpoint of correlations in memory patterns. The experimental re-sults show that the pseudo-orthogonalized patterns tend to be more unstable with the

(46)

0.0 0.1 0.1 0.2 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Loading Rate Ov er lap RHNN 0.0 0.1 0.1 0.2 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Loading Rate Ov er lap QHNN b0_.0 b0_.1 b0_.2 b0_.3 b0_.4 b0_.5

Figure 4.10: The stability of the pseudo-orthogonalized patterns in Hopfield neural networks. 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.1 0.2 0.2 Correlation Critical loading rate RHNN QHNN

Figure 4.11: The critical loading rate against the correlation of the original memory patterns.

(47)

increase of the correlation in the original memory patterns in conventional real-valued pseudo-orthogonalization scheme. In contrast, the patterns by using the extended pseudo-orthogonalization are stable even if the correlation of memory patterns is in-creased. Thus, the extended scheme can stabilize highly correlated memory patterns better than conventional real-valued one. On retrieving the stored patterns from an external stimulus pattern, the performance of the extended pseudo-orthogonalization scheme is maintained compared to the real-valued scheme under the condition of high loading rate and strong correlation in the memory patterns. This is because that the memory patterns stored by the proposed scheme are stable even if the correlation in memory patterns is increased.

Parameter dependencies for the storing and retrieval performances, such as the strength of input stimuli on the retrieval stage, should be explored in detail. Also, it is important to investigate the structure on basins of attractors for quaternionic networks, as compared to the real-valued networks. These remain for our future work.

(48)

Chapter 5 Feed Forward Neural Network with

Random Quaternionic Neurons

A quaternionic extension of feed forward neural network, for processing three or four dimensional signals, is proposed. This neural network is based on the three layered network with random weights, called Extreme Learning Machines (ELMs), in which iterative least-mean-square algorithms are not required for training networks. All parameters and variables in the proposed network are encoded by quaternions and op-erations among them follow the quaternion algebra. Neurons in the proposed network are expected to operate multi-dimensional signals as single entities, rather than real-valued neurons deal with each element of signals independently. The performances for the proposed network are evaluated through two types of experiments: classifications and reconstructions for color images in the CIFAR-10 dataset. The experimental re-sults show that the proposed networks are superior in terms of classification accuracies for input images than the conventional (real-valued) networks with similar degrees of freedom. The detailed investigations for operations in the proposed networks are conducted.

(49)

5.1 Introduction

Processing multi-dimensional signals, such as color images, is an important problem in artificial neural networks. Artificial neural networks consist of many neurons, in-terconnected to each other, that accept only real-valued signals for their input, internal states, and output. Of course, these neural networks cope with high dimensional signals by configuring neurons so that each of them covers each element in these sig-nals. But this type of configuration would be unnatural because each of elements in multi-dimensional signals is not independent to each other and these signals should be processed as a single entity. Thus, for over two decades, applications of complex values to neural networks have been extensively investigated, as summarized in the references [32, 33, 34]. Besides these studies, neural networks with dimensions more than two have also been explored: one motivation is inspired by a natural extension from real-valued neural networks to complex-valued ones. Another motivation and necessity arise from engineering applications in which multi-dimensional signals, such as three-dimensional components in color images (red, green, and blue) or body co-ordinates in three dimensional space₍X,Y,Z₎, should be processed. Although neural networks for these applications can be composed by real-valued or complex-valued neurons, it would be useful to introduce a number system with high dimensions, the so-called hypercomplex number systems.

Quaternion is a four-dimensional hypercomplex number system introduced by W. R. Hamilton [10, 35]. This number system has been extensively employed in the fields of modern mathematics, physics, control of satellites, computer graphics, signal processing, and so on [36, 37, 38, 39, 40]. One of the benefits provided by quater-nions is that operators in quaterquater-nions efficiently accomplish the affine transformations in three-dimensional space, especially spatial rotations, with their compact representa-tions. Thus, it is expected that neurons with quaternionic representation and operations would be useful for processing three- and four-dimensional signals.

Neural Computing in Quaternion Algebra

by

Toshifumi Minemoto

A dissertation submitted in partial fulfillment

of the requirements for the degree of

Doctor of Engineering

University of Hyogo, Japan

Contents

Chapter 1

Introduction

Chapter 2

Quaternion Algebra

2.1

Definitions of Quaternions

2.2

Quaternions in polar representation

Chapter 3

Associative Memories in Quaternionic

Neural Networks

3.1

Introduction

3.2

Quaternionic Hopfield Associative Memory

W

W

3.3

Quaternionic Bipartite Auto-Associative Memory

3.4

Quaternionic Hopfield Associative Memory with Dual

Connections

3.5

Experiments

3.5.1

Storage Capacity

3.5.2

Noise Robustness of Recall

3.5.3

Image Retrieval Task

3.6

Conclusion

Chapter 4

Pseudo-Orthogonalization of Memory

Patterns for Quaternionic Hopfield

Neural Network

4.1

Introduction

4.2

Preliminaries

4.2.1

Real-Valued Hopfield Neural Network

4.2.2

Quaternionic Hopfield Neural Network

4.3

Pseudo-Orthogonalization based on Quaternions

×

→

4.4

Retrieval Dynamics for Quaternionic Pseudo

Ortho-gonalization

4.5

Experiments

4.5.1

Recalling patterns by using extended dynamics

4.6

Retrieval performance

4.7

Discussion

4.8

Conclusion

Chapter 5

Feed Forward Neural Network with

Random Quaternionic Neurons

5.1

Introduction