Abstract—In this paper, a variant of GreyWolfOptimizer (GWO) that uses reinforcementlearning principles combined with neuralnetworks to enhance the performance is proposed. The aim is to overcome, by reinforced learning, the common chal- lenges of setting the right parameters for the algorithm. In GWO, a single parameter is used to control the exploration/exploitation rate which influences the performance of the algorithm. Rather than using a global way to change this parameter for all the agents, we use reinforcementlearning to set it on an individual basis. The adaptation of the exploration rate for each agent depends on the agent’s own experience and the current terrain of the search space. In order to achieve this, an experience repository is built based on the neural network to map a set of agents’ states to a set of corresponding actions that specifically influence the exploration rate. The experience repository is updated by all the search agents to reflect experience and to enhance the future actions continuously. The resulted algorithm is called ExperiencedGreyWolfOptimizer (EGWO) and its performance is assessed on solving feature selection problems and on finding optimal weights for neuralnetworks algorithm. We use a set of performance indicators to evaluate the efficiency of the method. Results over various datasets demonstrate an advance of the EGWO over the original GWO and other meta-heuristics such as genetic algorithms and particle swarm optimization.
1407 | Page layered networks. In (Fiete, 2006) the gradient is estimated according to random perturbations of membrane potentials in neurons. Moreover, in (Florian, 2007) the random perturbations of firing thresholds in neurons estimate the gradient. Razvan V. Florian (Florian, 2005) presented a mechanism, in which learning has been achieved by synaptic changes that depend on the firing of pre and postsynaptic neurons, and that are modulated with a global reinforcement signal. They also discovered a biologically-plausible RL mechanism that works by spike-timing-dependent plasticity (STDP) modulation with a global reward signal. A learning method based on perturbation of synaptic conductance was proposed by Suszynski et al., which was slow and had large variance (Suszynski, 2013). In another study, Seung considered the hypothesis that the randomness of synaptic transmission is harnessed by the brain for learning and demonstrated that a network of hendonistic synapses can be trained to perform a desired computation by administrating reward appropriately (Sebastian, 2003). Also, Brea et al. derived a tractable learning rule for the synaptic weights towards hidden and visible neurons that leads to optimal recall of the training sequences. They have illustrated that learning synaptic weights towards hidden neurons improves the storing capacity of the network significantly (Brea et al., 2011). Christodoulou et al. investigated the application of RL on spiking neuralnetworks in a demanding multi-agent setting and showed that their learning algorithms achieved to exhibit ‘sophisticated intelligence’ in a non-trivial task (Christodoulou, 2010). Also, they revealed that they could enhance the learningthrough strong memory for each agent and firing irregularity. Potjans et al(2009). presented a spiking neural network model that implements actor-critic temporal-difference learning by combining local plasticity rules with a global reward signal. This network could solve a nontrivial gridworld task with sparse rewards and learned with a similar speed to its discrete time counterpart and attained the same equilibrium performance (Potjans et al., 2009). Moreover, in Frémaux et al.’s model, the critic learns to predict expected future rewards in real time in continuous time actor-critic framework with spiking neurons. They have shown that such architecture can solve a Morris water-maze-like navigation task, the acrobat, and the cartpole problems (Frémaux et al., 2013). 3.5. Cascading NeuralNetworksReinforcementLearning
The advantage of quantum computers over classical computers fuels the re- cent trend of developing machine learning algorithms on quantum comput- ers, which can potentially lead to breakthroughs and new learning models in this area. The aim of our study is to explore deep quantum reinforcementlearning (RL) on photonic quantum computers, which can process informa- tion stored in the quantum states of light. These quantum computers can na- turally represent continuous variables, making them an ideal platform to create quantum versions of neuralnetworks. Using quantum photonic circuits, we implement Q learning and actor-critic algorithms with multilayer quantum neuralnetworks and test them in the grid world environment. Our experi- ments show that 1) these quantum algorithms can solve the RL problem and 2) compared to one layer, using three layer quantum networks improves the learning of both algorithms in terms of rewards collected. In summary, our findings suggest that having more layers in deep quantum RL can enhance the learning outcome.
the chess playing machine with a function being updated online. This was of major importance since in the majority of cases regarding trial and error learning, the updates would be made just after an episode of learning, meaning that the machine would perform the whole task and then, be rewarded in accord to the chosen actions. One important difference between the studies in neuralnetworks at the time and the reinforcementlearning subject is that the reinforcementlearning is not a supervised type of learning, meaning that it has no parameter of comparison, such as error functions or examples to compare the taken actions to, while the pattern classifi- cation methods widely used by Frank Rosenblatt and by Bernard Widrow and Marcian Hoff were supervised. In their research on artificial neuralnetworks, they used concepts of reinforcementlearning using even terms as reward and punishment but the learning was made in a supervised manner. In bibliography, as suggested by Sutton and Barto (1998), the term trial-and-error is related to unsupervised learning, but it can be often seen erroneously used to describe a network that learns based on error to correct the weights, which is in fact supervised learning since the error comes from a comparison to a given parameter. The confusion between these types of learning and the advance in neuralnetworks using supervised learning and later, the disappoint- ment regarding neuralnetworks made the research of reinforcementlearning to come almost to a halt, still, some notable works deserve comment as they brought light to new concepts that would be used in modern reinforcementlearning.
Recurrent neuralnetworks (RNNs) for reinforcementlearning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on im- proving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.
It was about ten years after Hebb that the next major influence in the development of neuralnetworks occurred with research by Rosenblatt (1958). His main success was the creation of the perceptron network that was able to perform pattern recognition. Not long after Widrow and Hoff (1988) created an ADAptive LINEar neuron (ADALINE) which had a similar structure to Rosenblatt’s perceptron model. It was considered a pattern classification device and demonstrated key concepts of adaptive behaviour and learning (Hagan et al, 1996). However it was discovered by Minsky and Papert (1969) that the perceptron and ADALINE networks could not manage large classes of problems but were only able to perform well in limited problems thus not providing much real value (Sondak and Sondak, 1989).
The goal of the RL agent is to learn a policy that can gain the maximum expected return. So by definition, it is natural to work directly with these expectations. However, this approach cannot render the whole picture of the randomness as seen from the possible multimodal distribution over returns. When an agent in- teracts with the environment in a RL problem, the state transitions, rewards, and actions can all carry certain intrinsic randomness. Distributional RL explicitly models the future random rewards as a full distribution, allowing more accurate actions to be learned. In order to introduce QR distributional Q learning, which utilizes quantile regression to approximate the quantile function for the state-action return distribution, we need to introduce several concepts as back- ground materials .
Probabilistic topic models have been used widely in nature language processing (Li et al., 2016; Zeng et al., 2018). The fundamental principle is that words are assumed to be generated from la- tent topics which can be inferred from data based on word co-occurrence patterns (Neal, 1993; An- drieu et al., 2003). In recent years, Variational Autoencoder (VAE) has been proved more effec- tive and efficient to approximating deep, complex and underestimated variance in integrals (Kingma and Welling, 2013; He et al., 2017). However, the VAE-based topic models focus on the construc- tion of deep neuralnetworks to approximate the
Deep reinforcementlearning (DRL) has achieved significant breakthroughs in various tasks. How- ever, most DRL algorithms suffer a problem of generalising the learned policy, which makes the policy performance largely affected even by mi- nor modifications of the training environment. Except that, the use of deep neuralnetworks makes the learned policies hard to be inter- pretable. To address these two challenges, we propose a novel algorithm named Neural Logic ReinforcementLearning (NLRL) to represent the policies in reinforcementlearning by first-order logic. NLRL is based on policy gradient methods and differentiable inductive logic programming that have demonstrated significant advantages in terms of interpretability and generalisability in supervised tasks. Extensive experiments con- ducted on cliff-walking and blocks manipulation tasks demonstrate that NLRL can induce inter- pretable policies achieving near-optimal perfor- mance while showing good generalisability to environments of different initial states and prob- lem sizes.
Big data development in biomedical and medical service networks provides a research on medical data benefits, early ailment detection, patient care and network administrations.e-Health applications are particularly important for the patients who are unfit to see a specialist or any health expert. The objective is to encourage clinicians and families to predict disease using Machine Learning (ML) procedures. In addition, diverse regions show important qualities of certain provincial ailments, which may hinder the forecast of disease outbreaks. The objective of this work is to predict the different kinds of diseases using GreyWolf optimization and auto encoder based Recurrent Neural Network (GWO+RNN). The features are selected using GWO and the diseases are predicted by using RNN method. Initially the GWO algorithm avoids the irrelevant and redundant attributes significantly, after the features are forwarded to the RNN classifier. The experimental result proved that the performance of GWO+RNN algorithm achieved better than existing method like Group Search Optimizer and Fuzzy Min-Max Neural Network (GFMMNN) approach. The GWO-RNN method used the medical UCI database based on various datasets such as Hungarian, Cleveland, PID, mammographic masses, Switzerland and performance was measured with the help of efficient metrics like accuracy, sensitivity and specificity. The proposed GWO+RNN method achieved 16.82% of improved prediction accuracy for Cleveland dataset.
An Artificial neural network is an information processing structure designed by means interlinked elementary processing devices (neurons). Feed-forward neural network and feedback neural network are the general types of artificial neural network. The feedback neural network has a profound effect on the modeling nonlinear dynamic phenomena performance and its learning capacity. Determination of hidden layer units in artificial neural network is a crucial and challenging task, over fitting and under fitting problems is caused due to random choice of hidden layer units. Therefore, previous work in determining the hidden layer neuron units: “Arai  found the hidden units by means of TPHM”. “Jin-Yan Li, et al.  Searched optimal hidden units with the help of estimation theory”. “Proper hidden units for three and four layered feed-forward neuralnetworks are defined by Tamura and Tateishi ”. “Kanellopoulas and Wilkinson  stated required amount hidden units”. “Osamu Fujita  determined feed forward neural network sufficient hidden units”. “Three layer binary neural network need hidden units are found based on set covering algorithm (SCA) by Zhaozhi Zhang, et al. ”. “Jinchuan Ke and Xinzhe Liu  pointed out neural network proper hidden units for stock price prediction”. “Shuxiang Xu and Ling Chen  de- scribed feed-forward neural network optimal hidden units”. “Multilayer perceptron network sufficient hidden units are defined by Stephen Trenn ”. “Katsunari Shibata and Yusuke Ikeda  stated large scale layered neural network sufficient hidden units and learning rate”. “David Hunter, et al.  Suggested the hidden units for multi- layer perceptron, bridged multilayer perceptron and fully connected cascade”. “Gnana Sheela and Deepa  pointed out ELMAN neural network hidden units based on three input neurons”. “Guo Qian and Hao Yong  determined back propagation neural network hidden units”. “Madhiarasan and Deepa  Estimated improved back propagation network hidden units by means of novel criterion”. “Madhiarasan and Deepa  pointed out, recursive radial basis function network proper hidden layer neurons estimation based on new criteria”.
Abstract. In this paper, it is a question of identification of the parameters in the equation of Richards modelling the flow in unsaturated porous medium. The mixed formulation pressure head-moisture content has been used. The direct problem was solved using Multiquadratic Radial Basis Function ( RBF-MQ ) method which is a meshless method. The Newton-Raphson’s method was used to linearize the equation. The function cost used is built by using the infiltration. The optimization method used is a meta-heuristic called Modified hybrid GreyWolfOptimizer -Genetic Algorithm (HmGWOGA). A test on experimental data has been carried. We compared the results with genetic algorithms. The results showed that this new method was better than genetic algorithms.
In a High Voltage Direct Current (HVDC) transmis- sion system, an inverter station converts the AC elec- trical power into DC. After transmission, a rectifier converts the DC electrical power back to AC. These converters can be located in one place as a back-to- back HVDC system, or electrical power can be trans- mitted from one converter station to another over long distance via an overhead transmission line or an un- derground cable . HVDC systems serve as ideal sup- plements to existing AC power networks. The advan- tages of using HVDC systems include providing eco- nomical and more efficient transmission of electrical power over long distances, solving synchronism-related problems by connecting asynchronous networks or net- works which operate at different frequencies, providing controlled power supply in either direction and offering access for onshore and offshore power generation from renewable energy sources .
Long term load forecasting data is important for grid expansion and power system operation. Besides, it also important to ensure the generation capacity meet electricity demand at all times. In this paper, Least-Square Support Vector Machine (LSSVM) is used to predict the long-term load demand. Four inputs are considered which are peak load demand, ambient temperature, humidity and wind speed. Total load demand is set as the output of prediction in LSSVM. In order to improve the accuracy of the LSSVM, GreyWolfOptimizer (GWO) is hybridized to obtain the optimal parameters of LSSVM namely GWO-LSSVM. Mean Absolute Percentage Error (MAPE) is used as the quantify measurement of the prediction model. The objective of the optimization is to minimize the value of MAPE. The performance of GWO-LSSVM is compared with other methods such as LSSVM and Ant Lion Optimizer – Least-Square Support Vector Machine (ALO-LSSVM). From the results obtained, it can be concluded that GWO-LSSVM provide lower MAPE value which is 0.13% as compared to other methods.
GreyWolfOptimizer (GWO) is a new algorithm pro- posed by Mirjalili et al. in 2014 . This algorithm mimics the leadership hierarchy and hunting technique used by grey wolves to catch their prey until stopping its movement. GWO is similar to other population- based meta-heuristic algorithms, by simulating the nat- ural behavior of grey wolves in their social life when searching for food; they follow hierarchy structure in the group (Fig. 1). The first level representing the lead- ers of the group is called (alpha), the second level in the hierarchy of grey wolves is (beta) which helps alpha to make decisions. The next levels are delta and omega; they are the lowest ranks in the group; they have to eat after all levels. In fact, these wolves are group-hunting that take three main steps; chasing, encircling and at- tacking. The algorithm starts with a given number of wolves whose positions are randomly generated.
High performance controllers are currently developed for mobile robots in order to cope with the three main navigation control problems , , , , i.e. path planning (or following), point stabilization and tracking control (or tracking a reference trajectory, further divided in local and global tracking problems . The control of Nonholonomic Wheeled Mobile Robots (NWMRs) has received much research interest during the past two decades because of the effects of nonholo- nomic constraints on the feasible control signals of this class of nonsmooth or nonholonomic systems. Representative approaches to tracking control are backstepping , adaptive , , fuzzy , periodic  and neuralnetworks  control. Multi-robot Path planning (PaPl) problems are gen- erally solved by centralized and decentralized algorithms. While decentralized algorithms generate independently (separately) collision-free paths for each robot avoiding possible inter-robot collisions, centralized algorithms consider each robot as a subsystem, thus global optimization is enabled.
The unpredictability of gold price in recent years has attracted much attention from many parties which includes commodities traders, mining companies, investors and academia as well. Govern by high nonlinearity features, the price of gold has experienced an expeditious increases during the last several years  and is continually predicted to be on steady condition in 2015 . Nonetheless, to accurately predict the price of gold is such a great challenge. With the uncertainties of world economic and surrounded by various factors, this challenge has paved a positive way for academic community in exploring a new method for predictive analysis purposes.
In 2016, Solmaz Abbasi and Farshad Tajeripour  have introduced a brain tumor detection automatically in 3D images. First, there performed the preprocessing work, and in this, the correction of bias field and matching histogram were employed. Next, the ‘Region of Interest’ was recognized and alienated, which was from Flair image’s background. Additionally, there employed Local binary pattern and histogram of orientation gradients as the learning features. Thus from the investigational results, it was shown that this approach was better in identifying brain tumor when comparing with the other techniques.
We adopt the Transformer model with trans- former big setting as defined in (Vaswani et al., 2017) for Zh-En and En-Zh translations, which achieves SOTA translation quality in several oth- er datasets. For En-De translation, we utilize the transformer base v1 setting. These settings are ex- actly same as used in the original paper, except we set the layer prepostprocess dropout for Zh-En and En-Zh translation to be 0.05. The optimizer used for MLE training is Adam (Kingma and Ba, 2015) with initial learning rate is 0.1, and we fol- low the same learning rate schedule in (Vaswani et al., 2017). During training, roughly 4, 096 source tokens and 4, 096 target tokens are paired in one mini batch. Each model is trained using 8 NVIDI- A Tesla M40 GPUs. For RL training, the model is initialized with parameters of the MLE model (trained with only bilingual data), and we continue training it with learning rate 0.0001. Same as (Bah- danau et al., 2017), to calculate the BLEU reward, we start all n-gram counts from 1 instead of 0 and
open up a wide assortment of administrations for vehicular systems, for example, going about as a slanted edge for messages on scantily populated streets, serving up geologically applicable information, or filling in as a portal to the Internet . A standout amongst the most significant angles that decide the achievement of VANET is the solid message routing from a source hub to a goal hub. Directing in VANET depends on the nearness of an adequate number of VANET hubs that establish strong ways to permit the sending of messages in the system. These ways can be influenced by the vehicles' versatility and traffic thickness, rapid topology changes making them unsustainable and questionable. In this manner, the structure of an effective routing protocol for VANET is viewed as a basic issue . VANETs are an actualization of mobile ad hoc networks (MANETs). MANETs have no fixed framework and rather depend on customary hubs to perform directing of messages and system the network capacities. In this way, various research provokes should be tended to for between vehicular correspondences to be broadly conveyed. For instance, directing in traditional mobile ad hoc networks is a difficult errand in view of the system's dynamic topology changes. Various investigations and proposition of routing protocols have been led to transfer information in such a unique situation; anyway these arrangements can't be connected to the vehicular condition because of the particular requirements and qualities of VANETs . Figure 1 represents the vehicular social networks and figure 2 represents the architecture of VANETs.