#!/usr/bin/env python
# coding: utf-8
# # SYDE 556/750: Simulating Neurobiological Systems
#
#
# ## Lecture 10: Learning
# ## Learning
#
# - What do we mean by learning?
# - When we use an integrator to keep track of location, is that learning?
# - What about the learning used to complete a pattern in the Raven's Progressive Matrices task?
# - Neither of these require any connection weights to change in the model
# - But both allow future performance to be affected by past performance
# - I suggest the term 'adaptation' to capture all such future-affected-by-past phenomena
#
# - So, we'll stick with a simple definition of learning
# - Changing connection weights between groups of neurons
# - Why might we want to change connection weights?
# - This is what traditional neural network approaches do
# - Change connection weights until it performs the desired task
# - Once it's doing the task, stop changing the weights
# - But we have a method for just solving for the optimal connection weights
# - So why bother learning?
# ### Why learning might be useful
#
# - We might not know the function at the beginning of the task
# - Example: a creature explores its environment and learns that eating red objects is bad, but eating green objects is good
# - what are the inputs and outputs here?
# - The desired function might change
# - Example: an ensemble whose input is a desired hand position, but the output is the muscle tension (or joint angles) needed to get there
# - why would this change?
# - The optimal weights we solve for might not be optimal
# - How could they not be optimal?
# - What assumptions are we making?
# ### The simplest approach
#
# - What's the easiest way to deal with this, given what we know?
# - If we need new decoders
# - Let's solve for them while the model's running
# - Gather data to build up our $\Gamma$ and $\Upsilon$ matrices
#
# - Example: eating red but not green objects
# - Decoder from state to $Q$ value (utility of action) for eating
# - State is some high-dimensional vector that includes the colour of what we're looking for
# - And probably some other things, like whether it's small enough to be eaten
# - Initially doesn't use colour to get output
# - But we might experience a few bad outcomes after red, and good after green
# - These become new $x$ samples, with corresponding $f(x)$ outputs
# - Gather a few, recompute decoder
# - Could even do this after every timestep
# - Example: converting hand position to muscle commands
# - Send random signals to muscles
# - Observe hand position
# - Use that to train decoders
# - Example: going from optimal to even more optimal
# - As the model runs, we gather $x$ values
# - Recompute decoder for those $x$ values
#
# ### What's wrong with this approach
#
# - Feels like cheating
# - Why?
# - Two kinds of problems:
# - Not biologically realistic
# - How are neurons supposed to do all this?
# - store data
# - solve decoders
# - timing
# - Computationally expensive
# - Even if we're not worried about realism
# ## Traditional neural networks
#
# - Traditionally, learning is the main method of constructing a model network
# - Usually incremental learning (gradient descent)
# - As you get examples, shift the connection weights slightly based on that example
# - Don't have to consider all the data when making an update
# - Example: Perceptron learning (1957)
# - $\Delta w_j = \alpha(y_d - y)x_i$
#
#
#
# - Problems with perceptron
# - Can't do all possible functions
# - Effectively just linear functions of $x$ (with a threshold; i.e. a linear classifier)
# - Is that a problem (X)OR not?
#
#
# ## Backprop and the NEF
#
# - How are nonlinear functions included?
# - Multiple layers
#
#
#
# - But now a new rule is needed
# - Standard answer: backprop
# - Same as perceptron for first (output) layer
# - Backprop adds: Estimate correct "hidden layer" input, and repeat
#
# - What would this be in NEF terms?
# - Remember that we're already fine with linear decoding
# - Encoders (and $\alpha$ and $J^{bias}$) are input layer of weights, decoders are output layer
# - Note that in the NEF, we combine many of these layers together
# - We can just use the standard perceptron rule for decoders
# - As long as there are lots of neurons, and we've initialized them well with the desired intercepts, maximum rates, and encoders we should be able to decode lots of functions
# - So, what might backprop add to that?
# - Think about encoders
# ## Biologically realistic perceptron learning
#
# - [(MacNeil & Eliasmith, 2011)](http://compneuro.uwaterloo.ca/publications/macneil2011.html) derive a simple, plausible learning rule starting with a delta rule
# - $E = 1/2 \int (x-\hat{x})^2 dx$
# - $\delta E/\delta d_i = (x-\hat{x})a_i$ (as usual for finding decoders)
# - So, to move down the gradient:
# - $\Delta d_i = -\kappa (x - \hat{x})a_i$ (NEF notation)
# - $\Delta d_i = \kappa (y_d - y)a_i$ (the standard perceptron/delta rule)
#
# - How do we make it realistic?
# - Decoders don't exist in the brain
# - Need weights
# - The NEF tells us:
# - $\omega_{ij} = \alpha_j d_i \cdot e_j$
# - $\Delta \omega_{ij} = \alpha_j \kappa (y_d - y)a_i \cdot e_j$
# - Let's write $(y_d - y)$ as $E$ (for error)
# - $\Delta \omega_{ij} = \alpha_j \kappa a_i E \cdot e_j$
# - $\Delta \omega_{ij} = \kappa a_i (\alpha_j E \cdot e_j)$
# - What's $\alpha_j E \cdot e_j$?
# - That's the current that this neuron would get if it had $E$ as an input
# - But we don't want this current to drive the neuron
# - Rather, we want it to change the weight
# - It's a *modulatory* input
#
# - This is the "Prescribed Error Sensitivity" PES rule
# - Any model in the NEF could use this instead of computing decoders
# - Requires some other neural group computing the error $E$
# - Used in Spaun for Q-value learning (reinforcement task)
# - Can even be used to learn circular convolution
# - Only demonstrated up to 3 dimensions in [(Bekolay et al, 2013)](http://compneuro.uwaterloo.ca/publications/bekolay2013.html)
# - Why not more? Patience.
#
# - Is this realistic?
# - Local information only
# - Need an error signal
# - Does it look like anything like this happens in the brain?
# - Yes
# - Retinal slip error is computed in oculomotor system
# - Dopamine seems to act as prediction error
# - Weight changes proportional to pre-synaptic activity and post-synaptic activity (Hebbian rule)
# In[1]:
#From the learning examples in Nengo - a Communication Channel
get_ipython().run_line_magic('pylab', 'inline')
import nengo
from nengo.processes import WhiteSignal
model = nengo.Network('Learn a Communication Channel')
with model:
stim = nengo.Node(output=WhiteSignal(10, high=5, rms=0.5), size_out=2)
pre = nengo.Ensemble(60, dimensions=2)
post = nengo.Ensemble(60, dimensions=2)
nengo.Connection(stim, pre)
conn = nengo.Connection(pre, post, function=lambda x: np.random.random(2))
inp_p = nengo.Probe(stim)
pre_p = nengo.Probe(pre, synapse=0.01)
post_p = nengo.Probe(post, synapse=0.01)
sim = nengo.Simulator(model)
#sim.run(10.0)
# In[9]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/pre_learn.py.cfg')
# In[3]:
t=sim.trange()
figure(figsize=(12, 8))
subplot(2, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Random function computation')
subplot(2, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');
# In[2]:
#Now learn
with model:
error = nengo.Ensemble(60, dimensions=2)
error_p = nengo.Probe(error, synapse=0.03)
# Error = actual - target = post - pre
nengo.Connection(post, error)
nengo.Connection(pre, error, transform=-1)
# Add the learning rule to the connection
conn.learning_rule_type = nengo.PES()
# Connect the error into the learning rule
learn_conn = nengo.Connection(error, conn.learning_rule)
sim = nengo.Simulator(model)
sim.run(10.0)
# In[3]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/simple_learn.py.cfg')
# In[7]:
t=sim.trange()
figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a communication channel')
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');
subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')
# In[4]:
#Turning learning on and off to test generalization
def inhibit(t):
return 2.0 if t > 10.0 else 0.0
with model:
inhib = nengo.Node(inhibit)
inhib_conn = nengo.Connection(inhib, error.neurons, transform=[[-1]] * error.n_neurons)
sim = nengo.Simulator(model)
#sim.run(16.0)
# In[5]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/control_learn.py.cfg')
# In[9]:
t=sim.trange()
figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a communication channel')
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');
subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')
# In[6]:
#Compute a nonlinear functions
#model.connections.remove(err_fcn) #uncomment to try other fcns
#del err_fcn
model.connections.remove(inhib_conn)
del inhib_conn
model.nodes.remove(inhib)
model.connections.remove(learn_conn)
del learn_conn
def nonlinear(x):
return x[0]*x[0], x[1]*x[1]
with model:
err_fcn = nengo.Connection(pre, error, function=nonlinear, transform=-1)
conn.learning_rule_type = nengo.PES(learning_rate=1e-4)
# Connect the error into the learning rule
learn_conn = nengo.Connection(error, conn.learning_rule)
sim = nengo.Simulator(model)
#sim.run(26.0)
# In[7]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/square_learn.py.cfg')
# In[28]:
t=sim.trange()
figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a nonlinear function')
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');
subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')
# - This rule can be used to learn any nonlinear vector function
# - It does as well, or better, than the typical NEF decoder optimization
# - It's a 'spike-based' rule... meaning it works in a spiking network
# - It has been used for 'constant supervision' as well as 'reinforcement learning' (occasional supervision) tasks (Spaun uses it for the RL task)
# - It moves the focus of learning research from weight changes or 'learning rules' to error signals
# - Backprop is one way of propagating error signals (unfortunately not bio-plausible)
# - Pretty much ignores encoders (which should maybe be about capturing all the incoming information, so as to compute any function over that information... though they can be optimized for a given fcn as well)
# ## Applications of PES
# ### Classical conditioning
#
# - Classical or Pavlovian conditioning uses an unconditioned stimuli (US) (meat for a dog) that ellicits an unconditioned response (UR) (salivating) to cause a conditioned response (CR) (salivating after learning) to be ellicited by a conditioned stimulus (CS) (ringing a bell).
#
# - The best known model of this is the Rescorla-Wagner model that states:
#
# $\Delta V_x = \alpha (\lambda - \sum_x V)$
#
# where $V_x$ is the value of conditioned stimulus $x$, $\alpha$ is a learning rate and salience parameter, $\lambda$ is the max value (usually 1).
#
# - In the model below there is only 1 element in $\sum V$ (or you can assume there is little association between other stimuli and the US). The difference in brackets is like a reward prediction error.
# In this model:
# - There are three different US that are provided
# to the model, one after the other.
# - Each has a different hardwired
# UR.
# - There is also a CS provided (a different
# one for each US)
# - The model attempts to learn to trigger the correct
# CR in response to the CS.
# - After learning, the CR should start to respond *before* the corresponding UR.
# In[8]:
import nengo
import numpy as np
D = 3
N = D*50
def us_stim(t):
# cycle through the three US
t = t % 3
if 0.9 < t< 1: return [1, 0, 0]
if 1.9 < t< 2: return [0, 1, 0]
if 2.9 < t< 3: return [0, 0, 1]
return [0, 0, 0]
def cs_stim(t):
# cycle through the three CS
t = t % 3
if 0.7 < t< 1: return [0.7, 0, 0.5]
if 1.7 < t< 2: return [0.6, 0.7, 0.8]
if 2.7 < t< 3: return [0, 1, 0]
return [0, 0, 0]
model = nengo.Network(label="Classical Conditioning")
with model:
us_stim = nengo.Node(us_stim)
cs_stim = nengo.Node(cs_stim)
us = nengo.Ensemble(N, D)
cs = nengo.Ensemble(N*2, D*2)
nengo.Connection(us_stim, us[:D])
nengo.Connection(cs_stim, cs[:D])
nengo.Connection(cs[:D], cs[D:], synapse=0.2)
ur = nengo.Ensemble(N, D)
nengo.Connection(us, ur)
cr = nengo.Ensemble(N, D)
learn_conn = nengo.Connection(cs, cr, function=lambda x: [0]*D)
learn_conn.learning_rule_type = nengo.PES(learning_rate=3e-4)
error = nengo.Ensemble(N, D)
nengo.Connection(error, learn_conn.learning_rule)
nengo.Connection(ur, error, transform=-1)
nengo.Connection(cr, error, transform=1, synapse=0.1)
stop_learn = nengo.Node([0])
nengo.Connection(stop_learn, error.neurons, transform=-10*np.ones((N, 1)))
# In[9]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/learning2-conditioning.py.cfg')
# ### Cortical Consolidation
#
# - There is evidence that when you first learn a skill, it takes a lot of effort and you tend to perform fairly slowly.
# - We would think of this as requiring a lot of intervention from the basal ganglia in selecting actions.
# - As you get better at the skill you become much faster, and BG is used less because cortex 'takes over' cental aspects of that skill, consolidating it into cortico-cortical connections.
# - The next model shows a toy version of this kind of behaviour.
#
#
# In this model:
# - there is a slow mapping from pre->wm->target (because of long synaptic time constants)
# - there is a fast, direct connection from pre->post
# - the fast connection is trained using the error signal from the slow system
# - the fast system learns to produce the correct output before the slow system
# - if you change the 'context' the fast system will learn the output
# - there is a more complete model that uses this kind of thing (but with BG) in [Aubin et al., 2016](http://compneuro.uwaterloo.ca/publications/aubin2016a.html)
# In[16]:
import nengo
import numpy as np
tau_slow = 0.2
model = nengo.Network("Cortical Consolidation")
with model:
pre_value = nengo.Node(lambda t: np.sin(t))
pre = nengo.Ensemble(100, 1)
post = nengo.Ensemble(100, 1)
target = nengo.Ensemble(100, 1)
nengo.Connection(pre_value, pre)
conn = nengo.Connection(pre, post, function=lambda x: np.random.random(),
learning_rule_type=nengo.PES())
wm = nengo.Ensemble(300, 2, radius=1.4)
context = nengo.Node(1)
nengo.Connection(context, wm[1])
nengo.Connection(pre, wm[0], synapse=tau_slow)
nengo.Connection(wm, target, synapse=tau_slow,
function=lambda x: x[0]*x[1])
error = nengo.Ensemble(n_neurons=100, dimensions=1)
nengo.Connection(post, error, synapse=tau_slow*2, transform=1) #Delay the fast connection so they line up
nengo.Connection(target, error, transform=-1)
nengo.Connection(error, conn.learning_rule)
stop_learn = nengo.Node([0])
nengo.Connection(stop_learn, error.neurons, transform=-10*np.ones((100,1)))
both = nengo.Node(None, size_in=2) #For plotting
nengo.Connection(post, both[0], synapse=None)
nengo.Connection(target, both[1], synapse=None)
# In[17]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/learning3-consolidation.py.cfg')
# ### Reinforcement Learning - Do evaluations!!!! Brain Day!
#
# - As mentioned in the last lecture, RL is a useful way to think about action selection.
# - You have a set of actions and a set of states, and you figure out the value of each action in each state, letting you construct a big table $Q(s,a)$ which you can use to pick good actions.
# - RL figures out what those values are through trial and error. (This is SARSA.)
#
# $\Delta Q(s,a) = \alpha (R + \gamma Q_{predicted} - Q_{old})$ where $R$ is reward, $\alpha$ is a learning rate and $\gamma$ is a discount factor.
#
#
# In the model:
# - the agent has three actions (go forward, turn left, and turn right)
# - its only sense are five range finders (radar)
# - initially it should always go forward
# - it gets reward proportial to its forward speed, but a large negative reward for hitting walls
# - the error signal is simply the difference between the computed utility and the instantaneous reward
# - $\Delta Q(s,a) = \alpha (R - Q_{current})$
# - this error will only be applied to whatever action is currently being chosen, which means it cannot learn to do actions that will lead to *future* rewards
# In[ ]:
import grid
mymap="""
#########
# #
# #
# ## #
# ## #
# #
#########
"""
class Cell(grid.Cell):
def color(self):
return 'black' if self.wall else None
def load(self, char):
if char == '#':
self.wall = True
world = grid.World(Cell, map=mymap, directions=4)
body = grid.ContinuousAgent()
world.add(body, x=1, y=3, dir=2)
import nengo
import numpy as np
def move(t, x):
speed, rotation = x
dt = 0.001
max_speed = 20.0
max_rotate = 10.0
body.turn(rotation * dt * max_rotate)
success = body.go_forward(speed * dt * max_speed)
if not success: #Hit a wall
return -1
else:
return speed
model = nengo.Network("Simple RL", seed=2)
with model:
env = grid.GridNode(world, dt=0.005)
#set up node to project movement commands to
movement_node = nengo.Node(move, size_in=2, label='reward')
movement = nengo.Ensemble(n_neurons=100, dimensions=2, radius=1.4)
nengo.Connection(movement, movement_node)
def detect(t):
#put 5 sensors between -45 to 45 compared to facing direction
angles = (np.linspace(-0.5, 0.5, 5) + body.dir ) % world.directions
return [body.detect(d, max_distance=4)[0] for d in angles]
stim_radar = nengo.Node(detect)
#set up low fidelity sensors; noise might help exploration
radar = nengo.Ensemble(n_neurons=50, dimensions=5, radius=4)
nengo.Connection(stim_radar, radar)
#set up BG to allow 3 actions (left/fwd/right)
bg = nengo.networks.actionselection.BasalGanglia(3)
thal = nengo.networks.actionselection.Thalamus(3)
nengo.Connection(bg.output, thal.input)
#start with a kind of random selection process, but like going fwd most
def u_fwd(x):
return 0.8
def u_left(x):
return 0.6
def u_right(x):
return 0.7
conn_fwd = nengo.Connection(radar, bg.input[0], function=u_fwd, learning_rule_type=nengo.PES())
conn_left = nengo.Connection(radar, bg.input[1], function=u_left, learning_rule_type=nengo.PES())
conn_right = nengo.Connection(radar, bg.input[2], function=u_right, learning_rule_type=nengo.PES())
nengo.Connection(thal.output[0], movement, transform=[[1],[0]])
nengo.Connection(thal.output[1], movement, transform=[[0],[1]])
nengo.Connection(thal.output[2], movement, transform=[[0],[-1]])
errors = nengo.networks.EnsembleArray(n_neurons=50, n_ensembles=3)
nengo.Connection(movement_node, errors.input, transform=-np.ones((3,1)))
#inhibit learning for actions not currently chosen (recall BG is high for non-chosen actions)
nengo.Connection(bg.output[0], errors.ensembles[0].neurons, transform=np.ones((50,1))*4)
nengo.Connection(bg.output[1], errors.ensembles[1].neurons, transform=np.ones((50,1))*4)
nengo.Connection(bg.output[2], errors.ensembles[2].neurons, transform=np.ones((50,1))*4)
nengo.Connection(bg.input, errors.input, transform=1)
nengo.Connection(errors.ensembles[0], conn_fwd.learning_rule)
nengo.Connection(errors.ensembles[1], conn_left.learning_rule)
nengo.Connection(errors.ensembles[2], conn_right.learning_rule)
# In[ ]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/learning5-utility.py.cfg')
# ### Better RL
#
# - To improve our RL it would be good to predict future rewards more accurately.
# - It would be good to learn the function $Q(s,a)$.
# - Let's assume that your policy is fixed, so future actions are fixed.
# - As well, future rewards are 90% as good as current rewards (i.e. they are discounted).
# - Consequently, we have:
#
# $Q(s,t) = R(s,t) + 0.9 R(s+1, t+1) + 0.9^2 R(s+2, t+2) + ...$.
#
# - So also,
#
# $Q(s+1,t+1) = R(s+1,t+1) + 0.9 R(s+2, t+2) + 0.9^2 R(s+3, t+3) + ...$.
# $0.9 Q(s+1,t+1) = 0.9 R(s+1,t+1) + 0.9^2 R(s+2, t+2) + 0.9^3 R(s+3, t+3) + ...$.
#
# - Substituting this last equation into the first gives
#
# $Q(s,t) = R(s,t) + 0.9 Q(s+1, t+1)$.
#
# - This suggests an error rule:
#
# $Error(t) = Q(s-1) - (R(s-1) + 0.9 Q(s))$
#
#
# In this model:
# - the agent always moves randomly, it's not *using* what
# it learns to change its movement (it is just trying to anticipate future rewards)
# - the agent is given a reward whenever it is in the green square, and a
# punishment (negative reward) whenever it is in the red square
# - it learns to anticipate the reward/punishment as shown in the value graph
# - we convert the error rule into the continuous domain by using a long time constant for s-1
# and a short time constant for s (assuming we switch states at each time step).
# In[1]:
import grid
mymap="""
#######
# #
# # # #
# # # #
#G R#
#######
"""
class Cell(grid.Cell):
def color(self):
if self.wall:
return 'black'
elif self.reward > 0:
return 'green'
elif self.reward < 0:
return 'red'
return None
def load(self, char):
self.reward = 0
if char == '#':
self.wall = True
if char == 'G':
self.reward = 10
elif char == 'R':
self.reward = -10
world = grid.World(Cell, map=mymap, directions=4)
body = grid.ContinuousAgent()
world.add(body, x=1, y=2, dir=2)
import nengo
import numpy as np
tau=0.1
def move(t, x):
speed, rotation = x
dt = 0.001
max_speed = 20.0
max_rotate = 10.0
body.turn(rotation * dt * max_rotate)
body.go_forward(speed * dt * max_speed)
if int(body.x) == 1:
world.grid[4][4].wall = True
world.grid[4][2].wall = False
if int(body.x) == 4:
world.grid[4][2].wall = True
world.grid[4][4].wall = False
model = nengo.Network("Predict Value", seed=2)
with model:
env = grid.GridNode(world, dt=0.005)
movement = nengo.Node(move, size_in=2)
def detect(t):
angles = (np.linspace(-0.5, 0.5, 3) + body.dir) % world.directions
return [body.detect(d, max_distance=4)[0] for d in angles]
stim_radar = nengo.Node(detect)
radar = nengo.Ensemble(n_neurons=50, dimensions=3, radius=4, seed=2,
noise=nengo.processes.WhiteSignal(10, 0.1, rms=1))
nengo.Connection(stim_radar, radar)
def braiten(x):
turn = x[2] - x[0]
spd = x[1] - 0.5
return spd, turn
nengo.Connection(radar, movement, function=braiten)
def position_func(t):
return body.x / world.width * 2 - 1, 1 - body.y/world.height * 2, body.dir / world.directions
position = nengo.Node(position_func)
state = nengo.Ensemble(100, 3)
nengo.Connection(position, state, synapse=None)
reward = nengo.Node(lambda t: body.cell.reward)
value = nengo.Ensemble(n_neurons=50, dimensions=1)
learn_conn = nengo.Connection(state, value, function=lambda x: 0,
learning_rule_type=nengo.PES(learning_rate=1e-4,
pre_tau=tau))
nengo.Connection(reward, learn_conn.learning_rule,
transform=-1, synapse=tau)
nengo.Connection(value, learn_conn.learning_rule,
transform=-0.9, synapse=0.01)
nengo.Connection(value, learn_conn.learning_rule,
transform=1, synapse=tau)
# In[2]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model, 'configs/learning6-value.py.cfg')
# ### Adaptive control
#
# In this example we again use the PES rule to learn an unknown function:
# - the function is part of a controller for a system controlling a pendulum
# - desired position is blue, actual position is black
# - there is gravity, which makes the actual position go too far
# - the learning rule determines how to supplement a standard PID controller to get the two to align
# - If we turn learning off at the start, we'll notice that the two won't align.
#
# The PES rule takes the control output and treats it as the error
# - in this case we don't have an explicit difference between two values
# - the PES rule effectively learns any constant error (like the I term does)
# - the learning population gets the system state, and learns the error as a function of that state
#
# In[7]:
import pendulum as pd
import nengo
import numpy as np
model = nengo.Network(seed=3)
with model:
env = pd.PendulumNode(seed=1, mass=4, max_torque=100)
desired = nengo.Node(lambda t: np.sin(t*np.pi))
nengo.Connection(desired, env[1], synapse=None)
pid = pd.PIDNode(dimensions=1, Kp=1, Kd=0.2, Ki=0)
nengo.Connection(pid, env[0], synapse=None)
nengo.Connection(desired, pid[0], synapse=None, transform=1)
nengo.Connection(env[0], pid[1], synapse=0, transform=1)
nengo.Connection(env[3], pid[3], synapse=0, transform=1)
nengo.Connection(desired, pid[2], synapse=None, transform=1000)
nengo.Connection(desired, pid[2], synapse=0, transform=-1000)
state = nengo.Ensemble(n_neurons=1000, dimensions=1,
radius=1.5,
#neuron_type=nengo.LIFRate(),
)
nengo.Connection(env[0], state, synapse=None)
c = nengo.Connection(state, env[0], synapse=0,
function=lambda x: 0,
learning_rule_type=nengo.PES(learning_rate=1e-5))
stop_learning = nengo.Node(0)
error = nengo.Node(lambda t, x: x[0] if x[1]<0.5 else 0, size_in=2)
nengo.Connection(pid, error[0], synapse=None, transform=-1)
nengo.Connection(stop_learning, error[1], synapse=None)
nengo.Connection(error, c.learning_rule, synapse=None)
# In[8]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model, 'configs/pendulum.py.cfg')
# ### Unsupervised learning
#
# - Hebbian learning
# - Neurons that fire together, wire together
# - $\Delta \omega_{ij} = \kappa a_i a_j$
# - Just that would be unstable
# - Why?
#
#
# - BCM rule (Bienenstock, Cooper, & Munro, 1982)
# - $\Delta \omega_{ij} = \kappa a_i a_j (a_j-\theta)$
# - $\theta$ is an activity threshold
# - If post-synaptic neuron is more active than this threshold, increase strength
# - Otherwise decrease it
# - Other than that, it's a standard Hebbian rule
# - Where would we get $\theta$?
# - Need to store something about the overall recent activity of neuron $j$ so it can be compared to its current activity
# - Just have $\theta$ be a psc-filtered spiking of $a_j$
# In[31]:
get_ipython().run_line_magic('pylab', 'inline')
import nengo
model = nengo.Network()
with model:
sin = nengo.Node(lambda t: np.sin(t*4))
pre = nengo.Ensemble(100, dimensions=1)
post = nengo.Ensemble(100, dimensions=1)
nengo.Connection(sin, pre)
conn = nengo.Connection(pre, post, solver=nengo.solvers.LstsqL2(weights=True))
pre_p = nengo.Probe(pre, synapse=0.01)
post_p = nengo.Probe(post, synapse=0.01)
sim = nengo.Simulator(model)
sim.run(2.0)
# In[32]:
plot(sim.trange(), sim.data[pre_p], label="Pre")
plot(sim.trange(), sim.data[post_p], label="Post")
ylabel("Decoded value")
legend(loc="best");
# In[33]:
conn.learning_rule_type = nengo.BCM(learning_rate=5e-10)
with model:
trans_p = nengo.Probe(conn, 'weights', synapse=0.01, sample_every=0.01)
sim = nengo.Simulator(model)
sim.run(20.0)
# In[39]:
figure(figsize=(12, 8))
subplot(2, 1, 1)
plot(sim.trange(), sim.data[pre_p], label="Pre")
plot(sim.trange(), sim.data[post_p], label="Post")
ylabel("Decoded value")
ylim(-1.6, 1.6)
legend(loc="lower left")
subplot(2, 1, 2)
# Find weight row with max variance
neuron = np.argmax(np.mean(np.var(sim.data[trans_p], axis=0), axis=1))
plot(sim.trange(dt=0.01), sim.data[trans_p][..., neuron])
ylabel("Connection weight");
# In[48]:
def sparsity_measure(vector):
# Max sparsity = 1 (single 1 in the vector)
v = np.sort(np.abs(vector))
n = v.shape[0]
k = np.arange(n) + 1
l1norm = np.sum(v)
summation = np.sum((v / l1norm) * ((n - k + 0.5) / n))
return 1 - 2 * summation
print("Starting sparsity: {0}".format(sparsity_measure(sim.data[trans_p][0])))
print("Ending sparsity: {0}".format(sparsity_measure(sim.data[trans_p][-1])))
# - Result: only a few neurons will fire
# - Sparsification
# - What would this do in NEF terms?
# - Still represent $x$, but with very sparse encoders (assuming the function doesn't change)
# - This is still a rule on the weight matrix, but functionally seems to be more about encoders than decoders
# - What could we do, given that?
# ## The homeostatic Prescribed Error Sensitivity (hPES) rule
#
# - Just do them both [(Bekolay et al., 2013)](http://compneuro.uwaterloo.ca/publications/bekolay2013.html)
# - And have a parameter $S$ to adjust how much of each
#
# $\Delta \omega_{ij} = \kappa a_i (\alpha_j S e_j \cdot E + (1-S) a_j (a_j-\theta))$
#
# - Works as well (or better) than PES
# - Seems to be a bit more stable, but analysis is ongoing
# - Biological evidence?
# - Spike-Timing Dependent Plasticity
#
#
#
#
#
# - Still work to do for comparison, but seems promising
# - Error-driven for improving decoders
# - Hebbian sparsification to improve encoders
# - Perhaps to sparsify connections (energy savings in the brain, but not necessarily in simulation)
#
#
# In[ ]: