Accelerating the algorithm - Out of equilibrium Statistical Physics of learning

Despite the fact that the number of MC steps is drastically reduced (with respect to standard SA) and scales well with the size N, the computational time required by the EdMC scheme can still be very high, due to the fact that

Fig. 5.6 Comparison between different Monte Carlo-based solver algo-

rithms for one sample with N = 501, α = 1.2, L = 4 and f = 0.1. The curves show

in log-log scale the number of errors of the system as a function of the number of iterations (note that while the number of errors is used as the energy throughout the rest of the paper, none of the algorithms shown here uses it as its objective function). The curves shown are labeled from worst to best: simulated annealing on E_∆ (gray curve, see equation (5.12), more than 106 iterations required to find a solution); EdMC starting from random initial condition with zero-temperature dynamics (red curve, less than 104 _{iterations), EdMC using BP marginals as initial}

condition with zero-temperature dynamics (blue curve, less than 103 iterations); EdMC using BP marginals both as initial condition and to propose the Monte Carlo moves (green curve, less than 102 _{iterations). The local-entropy landscape is clearly}

convergence of the BP algorithm has to be awaited at each move proposal. We can therefore introduce some heuristic modifications, which are able to greatly boost the speed of the algorithm.

Instead of starting from a random configuration, which could raise some numerical issues related to BP convergence (especially at high α), a good starting point can instead be found by using the information in the BP Replica Symmetric fixed point (which can be easily reached, in absence of external fields, for all α < αc): the initial configuration can thus be chosen as ˜W = sign (mRS) in the binary Perceptron. In the Potts Perceptron case, instead, one can assign to ˜Wi the argmax between the RS state marginal probabilities ui(W ).

Moreover, it is also possible to use the BP fixed point messages for the proposal of efficient MC moves, rather than performing them at random. The idea is that an extensive number of synaptic flips (in the direction of maximum free energy) can be done all at once in a single step. Each time BP reaches convergence, one can identify the set of synapses whose cavity marginals mi,

in absence of the external field that plants the reference configuration, are not

in agreement with the direction ˜Wi of the field itself. Once this set of synaptic indices is ranked, so that the ones associated to the largest difference between the external and the cavity fields come first, we can propose a collective move changing all the identified synapses and compute the new value of F . As in a normal Metropolis algorithm, the move is always accepted if there is an increase in the free energy, or with probability ey∆F _{otherwise. If the collective flip is} accepted, the heuristic procedure can move on, computing a new set of synaptic variables to be changed with the help of BP marginals. If, on the contrary, the collective move is rejected, the new proposal can be on the reduced set where the last ranked variable is removed; this procedure can be repeated until the set is empty, and in the end one goes back to the standard EdMC scheme. We observed that most of these collective moves are immediately accepted. The physical interpretation is that, in this way, one tries to maximize, at each step, the local contributions to the Bethe free energy associated to each variable Wi, in presence of an external field γ ˜Wi.

Finally, in case BP was unable to reach convergence, one can still obtain some information from F , obtained from the time average of the cavity marginals

Input: problem sample; parameters tmax, tstep, y, γ, fy and fγ ;

Randomly initialize ˜x0

i. Alternalively, run BP with γ = 0 and set ˜x0

i = sign(hi);

Run BP with external fields γ˜x0 i;

Compute free energy F0 _{from BP fixed point ( ¯}_F0 _{if BP does not} converge);

t ←0;

while t ≤ tmax do Retrieve fields ht

i (¯hti if BP did not converge);

for i= 1 to N do ∆i ← ˜xti(γ˜xti− hti);

Collect V = {i | ∆i >0} and sort it in descending order of ∆i;

accepted ←FALSE;

while NOT accepted do Propose a flip of the ˜xt

i for all i ∈ V , producing ˜xt+1; Run BP with new proposed external fields γ˜xt+1

i ;

Compute free energy Ft+1 _{from BP fixed point ( ¯}_Ft+1 _{if BP does} not converge);

with probability ey(Ft+1−Ft) do accepted ← TRUE; if NOT accepted then

Remove the last element from V ;

if |V |= 0 then exit and run EdMC with ˜xt _{as initial} configuration;

end end

t ← t+ 1;

Compute energy E of configuration ˜xt_;

if E = 0 then retrieve solution ˜x∗ = ˜xt _{and exit;}

if t ≡0 (mod tstep) then Annealing: y ← y × fy;

Scoping: γ ← γ × fγ (run BP and update Ft);

end end

(over a few BP iterations), which can be useful for bootstrapping the EdMC to a region where convergence can be obtained more easily.

With the implementation of the heuristic modifications, the EdMC algorithm turns out to be extremely fast and capable of solving hard instances, achieving a higher algorithmic threshold αU. The resulting algorithm is detailed in Algorithm 1.

5.6 Generalization to multi-layer continuous

In document Out of equilibrium Statistical Physics of learning (Page 136-140)