#!/usr/bin/env python
# coding: utf-8

# # Learning a sensorimotor model with a sensorimotor context
# 
# In this notebook, we will see how to use the Explauto libarary to allow the learning and control of local actions that depend on a sensory and motor context. We suppose that the reader is familiar with the main components of the Explauto library explained in another notebook ([full tutorial](http://nbviewer.ipython.org/github/flowersteam/explauto/blob/master/notebook/full_tutorial.ipynb)): the environment, the sensorimotor model and the interest model.
# 
# Another [tutorial](http://nbviewer.ipython.org/github/flowersteam/explauto/blob/master/notebook/learning_with_environment_context.ipynb) describes how to define actions (not local) that only depends on a context provided by the environment.
# 
# Let's suppose we are in a motor state $m$ and a sensory state $s$.
# If the goal is to reach a sensory state $s'$ from $s$, 
# a *local* action $\Delta m$ has to be found, and $m + \Delta m$ will be evaluated in the environment.
# The result of the command in the *environment* is defined as $(s, \Delta s)$, with $\Delta s = s' - s$.
# Section I will show how to simply define an environment suited to the control of local actions from a usual Explauto environment.
# 
# 
# In Section II we explain how the *sensorimotor model* is thus adapted to store and learn with tuples of $(m, \Delta m, s, \Delta s)$ instances.
# We will explain different possibilities to query the sensorimotor model. 
# 
# To predict the result of an action in the environment, we can use a forward prediction of $\Delta s$ given $(m, \Delta m, s$.
# To infer the right motor command that should best reach a sensory state $s'$, we can query $\Delta m$ from the sensorimotor model given $(m, s, \Delta s = s' - s)$. This use case is explained in Section V.
# 
# Another use case is to query $(m, \Delta m)$ from $(s, \Delta s)$. 
# This can be used when the robot is allowed to choose the starting and end position of a movement $m \rightarrow m + \Delta m$ at each iteration of the learning algorithm. This use case is explained in Section IV.
# 
# An *interest model* is also used to estimate how a given action is useful for learning, and to sample the best ones.
# In our case, if we are in a state $(m, s)$, the local action to be sampled is defined as $\Delta s$, and depends on $s$. In Section III we show how the interest models are adapted to this purpose.
# 
# In Sections IV and V, we explain how to automoatically create the environment, sensorimotor model, interest model and the learning procedure adapted to local actions with the class 'Experiment'.
# 

# # I. Environment
# 
# In this section we define an environment suited to the control of local actions from a usual Explauto environment.
# 
# We will use the available SimpleArm environment (this one is by default perturbated by a small random noise).

# In[1]:


from __future__ import print_function
from explauto.environment.simple_arm import SimpleArmEnvironment
from explauto.environment import environments

env_cls = SimpleArmEnvironment
env_conf = environments['simple_arm'][1]['low_dimensional']


# Now we use the class 'ContextEnvironment' to convert an Explauto environment that takes as input a motor position and outputs a sensory position to an environment that takes a motor command $\Delta m$ or $(m, \Delta m)$ and outputs a sensory position $(s, \Delta s)$.
# 
# To instanciate such an environment, one must provide the class and configuration of the underlying environment, and define a simple config for local actions called 'context_mode'.
# The 'choose_m' parameter defines if the robot is allowed to choose $m$ and $\Delta m$ at each iteration instead of only $\Delta m$.
# A rest position also has to be specified, and the bounds for the delta motor actions and delta sensory goals.

# In[2]:


from explauto.environment.context_environment import ContextEnvironment

context_mode = dict(mode='mdmsds',
                    choose_m=False,
                    rest_position=[0]*3,
                    dm_bounds=[[-0.2, -0.2, -0.2],
                              [0.2, 0.2, 0.2]],
                    ds_bounds=[[-0.2, -0.2],
                              [0.2, 0.2]])

environment = ContextEnvironment(env_cls, env_conf, context_mode)


# Here we sample and execute a few $\delta m$ actions.

# In[3]:


# Create the axes for plotting:
get_ipython().run_line_magic('pylab', 'inline')
ax = axes()

for dm in environment.random_dm(n=10):
    m = environment.current_motor_position
    mdm = np.hstack((m, dm))
    environment.update(mdm, reset=False)
    environment.plot(ax)


# # II. Sensorimotor model
# 
# In this Section we show how to store the motor and sensory signals into the database and how to predict the result of an action. The inference of a motor action given a sensory goal is explicited later in Sections IV and V.

# The adapted sensorimotor models are 'NN', 'LWLR-BFGS', and 'LWLR-CMAES' altough CMAES' exploration sigma and bounds might need to be adapted.
# The database contains tuples of $(M, \Delta M, S, \Delta S)$ so we create the sensorimotor model with the dimensions and bounds of the environment.

# In[4]:


from explauto import SensorimotorModel

sm_model = SensorimotorModel.from_configuration(environment.conf, 'nearest_neighbor', 'default')


# In the following we ramdomly draw delta actions and update the sensorimotor model and the environment.

# In[5]:


# Create the axes for plotting:
get_ipython().run_line_magic('pylab', 'inline')
ax = axes()

for dm in environment.random_dm(n=1000):
    m = environment.current_motor_position
    mdm = np.hstack((m, dm))
    sds = environment.update(mdm, reset=False)    
    sm_model.update(mdm, sds)
    environment.plot(ax, alpha=0.3)
    
print("Size of database:", sm_model.size())


# Now we can query the sensorimotor model given $(m, \Delta m)$ and the context on given dimensions.
# Let's say we want the hand y position to be considered as the context (but in more complex setups it could be the position of some objects in the environment).
# 
# In the plot, the black dot and red x are the predicted $s$ and $s'$ given the motor position $m$ and delta $\Delta m$, in the context c.
# The corresponding reached arm positions are also represented.

# In[6]:


# Predict with sensori context
m = environment.current_motor_position
s = environment.current_sensori_position
dm = [0.1]*3
context = s # context
c_dims = [0, 1] # hand dimensions
sds = sm_model.predict_given_context(np.hstack((m, dm)), context, c_dims)
s = sds[0:2]
ds = sds[2:4]
print("Predicted s=", s, "predicted ds=", ds)

ax = axes()
environment.plot(ax)
environment.update(np.hstack((m, dm)), reset=False)
environment.plot(ax, color='red')
ax.plot(*s, marker='o', color='k')
ax.plot(*list(np.array(s)+np.array(ds)), marker='x', color='red')


# # III. Goal babbling using interest models

# In this section, we create an *interest model* that can sample given a context $s$ and output an interesting delta goal $\Delta s$ only on the dimensions that are not in the context.
# This feature is implemented with the Random and Discretized interest models.

# In[7]:


# Random interest model
from explauto.interest_model.random import RandomInterest

im_model = RandomInterest(environment.conf, environment.conf.s_dims)


# In[8]:


# Discretized interest model
from explauto.interest_model.discrete_progress import DiscretizedProgress, competence_dist

im_model = DiscretizedProgress(environment.conf, environment.conf.s_dims, **{'x_card': 1000,
                                                     'win_size': 10,
                                                     'measure': competence_dist,
                                                     'eps_random':0.1})


# Sampling with context:

# In[9]:


c = [0.7, 0.6] # context
c_dims = [0, 1] # hand position's dimensions

ds = im_model.sample_given_context(c, c_dims)

#print im_model.discrete_progress.progress()
print("Sampling interesting goal with hand position=", c, ": ds=", ds)


# # IV. Learning choosing m

# In this section, we consider that the agent can choose the motor position $m$ at each iteration (parameter 'choose_m'=True).
# 
# Here we run the whole procedure without resetting the arm to its rest position during the experiment, first using motor babbling and after using goal babbling.
# 
# We also describe how to automatically create the environment, sensorimotor model, interest model and the learning procedure.

# In[10]:


context_mode = dict(mode='mdmsds',
                    choose_m=True,
                    rest_position=[0]*3,
                    dm_bounds=[[-0.2, -0.2, -0.2],
                              [0.2, 0.2, 0.2]],
                    ds_bounds=[[-0.2, -0.2],
                              [0.2, 0.2]])

environment = ContextEnvironment(env_cls, env_conf, context_mode)


# ### Motor Babbling

# In[11]:


# Random Motor Babbling
ax = axes()
environment.reset()
motor_configurations = environment.random_dm(n=500)

# Plotting 10 random motor configurations:
for dm in motor_configurations:
    m = environment.current_motor_position
    environment.update(np.hstack((m, dm)), reset=False)
    environment.plot(ax)


# ### Goal Babbling

# In[12]:


# Random Goal Babbling
im_model = RandomInterest(environment.conf, environment.conf.s_dims)

# Reset environment
environment.reset()

# Reset sensorimotor model
sm_model = SensorimotorModel.from_configuration(environment.conf, 'nearest_neighbor', 'default')

c_dims = [0, 1] # hand position's dimensions

# Add one point to boostrap sensorimotor model
sm_model.update([0.]*6, np.hstack((environment.current_sensori_position, [0., 0.])))
    
ax = axes()

for _ in range(500):
    # Get current context
    s = environment.current_sensori_position
    # sample a random sensory goal using the interest model:
    ds_g = im_model.sample_given_context(s, c_dims)
    # infer a motor command to reach that goal using the sensorimotor model:
    mdm = sm_model.inverse_prediction(np.hstack((s, ds_g)))
    # execute this command and observe the corresponding sensory effect:
    sds = environment.update(mdm, reset=False)
    # update the sensorimotor model:
    sm_model.update(mdm, sds)
    # update interest model
    im_model.update(hstack((mdm, s, ds_g)), hstack((mdm, sds)))
    # plot arm
    environment.plot(ax, alpha=0.3)
    

# Here we test the learned sensorimotor model on a given goal ds_goal in sensory context s_goal. The agent chooses also the starting $m$ position.
# 
# In the plot, the black dot and red x are the goal $s$ and $s + \Delta s$.
# The corresponding reached arm positions are represented.

# In[13]:


# Inverse without context: (M, dM) <- i(S, dS)
sm_model.mode = "exploit" # no exploration noise
print(sm_model.size())
s_goal = [0.8, 0.5]
ds_goal = [-0.1, 0.1]
mdm = sm_model.inverse_prediction(s_goal + ds_goal)
m = mdm[0:3]
dm = mdm[3:6]
print("Inverse without context: m =", m, "dm =", dm)

ax = axes()
environment.update(np.hstack((m, [0]*3)))
environment.plot(ax)
environment.update(np.hstack((m, dm)), reset=False)
environment.plot(ax, color='red')
ax.plot(*s_goal, marker='o', color='k')
ax.plot(*list(np.array(s_goal)+np.array(ds_goal)), marker='x', color='red')


# ## Using 'Experiment'

# In[14]:


import numpy as np
from explauto import Agent
from explauto import Experiment
from explauto.utils import rand_bounds
from explauto.experiment import make_settings
get_ipython().run_line_magic('pylab', 'inline')

context_mode = dict(mode='mdmsds',
                    choose_m=True,
                    rest_position=[0]*3,
                    dm_bounds=[[-0.2, -0.2, -0.2],
                              [0.2, 0.2, 0.2]],
                    ds_bounds=[[-0.2, -0.2],
                              [0.2, 0.2]])

goal_babbling = make_settings(environment='simple_arm', environment_config = 'low_dimensional',
                                     babbling_mode='goal', 
                                     interest_model='discretized_progress',
                                     sensorimotor_model='nearest_neighbor',
                                     context_mode=context_mode)

expe = Experiment.from_settings(goal_babbling)

expe.evaluate_at([50, 100, 150, 200, 500], 
                 rand_bounds(np.vstack(([0.8, -0.1, -0.1, -0.2], [1., 0.1, 0.1, 0.2])), n=50))

expe.run()

ax = axes()
expe.log.plot_learning_curve(ax)


# # V. Learning without choosing m
# 
# In this section, we consider that the agent can't choose the motor position $m$ at each iteration (parameter 'choose_m'=False). In that case, the environment can be resetted to the rest position each N iterations if the parameter 'reset_iterations' is provided in 'context_mode'.
# 
# Here we run the whole procedure first using motor babbling and after using goal babbling.
# 
# We also describe how to automatically create the environment, sensorimotor model, interest model and the learning procedure.

# In[15]:


from explauto.environment.context_environment import ContextEnvironment
from explauto.environment.simple_arm import SimpleArmEnvironment
from explauto.environment import environments

env_cls = SimpleArmEnvironment
env_conf = environments['simple_arm'][1]['low_dimensional']

context_mode = dict(mode='mdmsds',
                    choose_m=False,
                    rest_position=[0]*3,
                    reset_iterations=20,
                    dm_bounds=[[-0.2, -0.2, -0.2],
                              [0.2, 0.2, 0.2]],
                    ds_bounds=[[-0.2, -0.2],
                              [0.2, 0.2]])

environment = ContextEnvironment(env_cls, env_conf, context_mode)


# ### Motor Babbling

# In[16]:


# Random Motor Babbling
ax = axes()
environment.reset()
motor_configurations = environment.random_dm(n=500)

for dm in motor_configurations:
    
    m = list(environment.current_motor_position)
    environment.update(np.hstack((m, dm)), reset=False)
    environment.plot(ax, alpha=0.3)


# ### Goal Babbling

# In[17]:


ax = axes()

# Random Goal Babbling
im_model = RandomInterest(environment.conf, environment.conf.s_dims)

# Reset environment
environment.reset()

# Reset sensorimotor model
sm_model = SensorimotorModel.from_configuration(environment.conf, 'nearest_neighbor', 'default')

# Add points to boostrap sensorimotor model
for i in range(10):
    sm_model.update([0.]*6, np.hstack((environment.current_sensori_position, [0., 0.])))
  

in_dims = list(range(3)) + list(range(6,10))
out_dims = list(range(3, 6))

for i in range(500):
    if np.mod(i, context_mode['reset_iterations']) == 0:
        environment.reset()
    m = list(environment.current_motor_position)
    s = list(environment.current_sensori_position)

    ds_g = list(im_model.sample_given_context(s, range(environment.conf.s_ndims//2)))
    #print "ds_g", ds_g

    dm = sm_model.infer(in_dims, 
                        out_dims, 
                        m + s + ds_g)
    mdm = np.hstack((m, dm))
    #print "mdm", mdm
    sds = environment.update(mdm, reset=False)
    # update the sensorimotor model:
    sm_model.update(mdm, sds)
    # update interest model
    im_model.update(np.hstack((mdm, s, ds_g)), np.hstack((mdm, sds)))
    # plot arm
    environment.plot(ax, alpha=0.3)
    #print "m", m, "s", s, "ds_g", ds_g, "dm", dm
    

# Here we test the learned sensorimotor model on a given goal ds_goal in sensory context s_current. 
# 
# In the plot, the black dot and red x are the current $s$ and goal $s + \Delta s$.
# The corresponding reached arm positions are represented.

# In[18]:


# Inverse with sensorimotor context: dM <- i(M, S, dS)
ax = axes()
dm = [0.1]*3
environment.update(np.hstack(([0]*3, dm)))
environment.plot(ax)

in_dims = list(range(3)) + list(range(6,10))
out_dims = list(range(3, 6))
ds_goal = [-0.05, 0.1]

m_current = list(environment.current_motor_position)
s_current = list(environment.current_sensori_position)
print("current m = ", m_current)
print("current s = ", s_current)

dm = sm_model.infer(in_dims, 
                    out_dims, 
                    m_current + s_current + ds_goal)

print("Inverse with context: dm =", dm)
sds = environment.update(np.hstack((m_current, dm)), reset=False)
environment.plot(ax, color='red')
ax.plot(*s_current, marker='o', color='k')
ax.plot(*list(np.array(s_current) + np.array(ds_goal)), marker='x', color='red')

print("Goal ds=", ds_goal, "Reached ds=", environment.current_sensori_position - s_current)


# ## Using 'Experiment'

# In[19]:


import numpy as np
from explauto import Agent
from explauto import Experiment
from explauto.utils import rand_bounds
from explauto.experiment import make_settings
get_ipython().run_line_magic('pylab', 'inline')

n_dims = 3

context_mode = dict(mode='mdmsds',
                    choose_m=False,
                    reset_iterations=20,
                    rest_position=[0]*n_dims,
                    dm_bounds=[[-0.2]*n_dims,
                              [0.2]*n_dims],
                    ds_bounds=[[-0.2, -0.2],
                              [0.2, 0.2]])

goal_babbling = make_settings(environment='simple_arm', environment_config = 'low_dimensional',
                                     babbling_mode='goal', 
                                     interest_model='random',
                                     sensorimotor_model='nearest_neighbor',
                                     context_mode=context_mode)

expe = Experiment.from_settings(goal_babbling)
expe.evaluate_at([10, 100, 200, 300, 400, 500], 
                 rand_bounds(np.vstack(([1., 0., -0.1, -0.1], [1., 0., 0., 0.1])), n=200))

    
expe.run()

ax = axes()
expe.log.plot_learning_curve(ax)


# # Possible extensions
# 
#  - choosing s at each iteration
#  - multistep planning