Learning from Demonstration (LfD) in a Soft Robot
Kinesthetic teaching is an approach for providing demonstrations to a robot in Learning from Demonstration (LFD) whereby a human physically guides a robot to perform a skill. Soft robots have the advantage of performing complex behaviors while having a relatively simple design.
Usually, these behaviors are hardcoded in the controller and their customization is not user-friendly. When soft robots are deployed in real human environments, a new way of teaching
behavior is needed. In this paper, we present the implementation of kinesthetic LfD in a soft robot which despite its benefit have not been explored before in this kind of embodiment. A tendon driven soft robot with 3 DOF is taught the different bending behaviors using a keyframe kinesthetic method.
An evaluation criterion is used to check the effectiveness of the learned behavior in a soft robot based on the error between human demonstration and learn behavior.
The embodiment of the soft robot
We designed and manufactured a simple soft robot which can perform bending motion up to 90 degrees with three degrees of freedom. Our soft robot consists of an elastic cylindrical bar, motors, tendons, and acrylic circular disk. Circular acrylic disks are mounted on the elastic cylindrical bar at fixed interval distance for maintaining the weight balance of the robot and stability. Each motor is connected to the respective tendon which runs through.
For sensing, the soft robot has an IMU mounted on its upper face that communicates with an Arduino using I2C protocol. The IMU is a 9 DoF BNO055 sensor which outputs the robot absolute orientation in Euler angles (pitch, yaw, roll). LED’s and music are added as an enhancer to augment the interaction human and soft robot. All the processing is done in a computer running MATLAB. The computer interfaces with the Arduino with USB using serial protocol.
The software can be described in two major stages: Model Learning and Policy Derivation. The model learner takes a set of state and action pairs <s,a> and derives a model R using
regression. A state <s> is defined as the triplet of the Euler angles: <yaw, pitch, roll> and each action is defined as the input value of each motor <u1 ,u2 ,u3 >, where <u> is a value from
0-255 that represents the PWM input value of a servo. On the other hand, the policy derivation module takes the teacher demonstration discretized in keyframes of state and time pairs <s,t> to derive a policy for the trajectory of the behavior learnt by the robot described as a sequence of states <s1 >,<s2 >…<sn >. In addition, the system includes a motor controller that takes the model R and the sequence of states <si > to map them into three inputs <u1 ,u2 ,u3 >, corresponding to each motor. As the servo is continuous, it does not have its own position feedback control loop and it relies on the model R as a means of correspondence between the input and output.
In the probabilistic approach, the soft robot is modeled as a black box that has an input and an output. In this case, the input is the motor control signals U and the output is the motion that is represented by a state S, as shown in Fig. Learning the space state in the inertial state is a challenge when the robot is modeled as a black-box due to the model has no meaning in physics. To solve this issue, we define the space in terms of Euler angles using the IMU to output a unique triplet of Euler angles for every position of the robot. Therefore, the space state is defined by a simple exploration of all possible inputs of the servos, where s:< α, β, γ >.
Black box model for a soft robot
Implemented framework for LfD
The following figure represents the learned and demonstrated trajectory in Euler space S: α , β , γ . This graph is difficult to interpret because the representation of a trajectory in Euler space is unconventional and different from the 3D inertial space. The purpose of this graph is to see both of this two trajectory has co-relation in terms of shape.
Example of trajectory of the human demonstration and learnt behavior in Euler space
In the next figure, we showed the results of the trajectory of human demonstration and learned behavior for every n keyframe. To evaluate the results, we use cross-correlation coefficients between two trajectories for every Euler angle.
α, β, γ Trajectories vs time and cross correlation coefficients
This work was developed in collaboration with Krutarth Shah (firstname.lastname@example.org) under the supervision of Prof. Guy Hoffman as part of the final project of MAE6710: Human Robot Interaction: Algorithms & Experiments.