Value Function - Automation of camera motion

7.2 Automation of camera motion

7.2.3 Value Function

Value function as defined earlier in the thesis is the function that quantifies how good or bad a particular given state is. It is in some sorts an equivalent of a potential function in control theory. Though in this case we control the camera to go to a state with higher value at all times rather.

Value function is a superposition of the reward function and the dynamics of the system over time. The policy which is responsible for taking actions does so by taking an action so that the next state has the highest value amongst the possible next states for the current state. Hence to evaluate the behavior of the obtained policy, it suffices to observe and infer from the value function.

Fig. 7.5 shows the value functions calculated for the corresponding reward functions shown in Fig. 7.4. From the figure it is evident that highest value obtained when the gripper is stationary is when it is at the center of the image obtained from the endoscope (in the final value function). In other words, the algorithm would move the endoscope in such a way that the gripper is at the center of the image if not already. This is the case for subtasks 1, 2, and 4.

For subtask 3 however, the maximum value is seen at (0,0.5) which is above the center of the image. This is expected behavior given the corresponding reward function. Hence, if the gripper is stationary during this subtask, the camera would move in such a way that the gripper would be directly above the center of the image. Another observation that can be made from the figure is with respect to the radius of the yellow contour corresponding to the maxima of the function. For

RESULTS AND DISCUSSIONS

Figure 7.5: Value function computed by the inverse reinforcement learning algorithm for all subtasks (arrange by the row). Columns show the value function for iteration 0, iteration 1 and iteration 10 of the algorithm

RESULTS AND DISCUSSIONS

Figure 7.6: Final value function obtained from IRL plotted for different speeds of gripper motion in the negative x direction: 0, 0.01, 0.05, 0.1 units in order in columns for all subtasks (rows).

subtask 1, it is wider than compared to subtask 2. This implies that the camera would allow a larger movement of gripper without moving than in case for subtask 2 and hence would be more stable in this particular case for subtask 1. It would be even more stable for subtask 3 and subtask 4.

Discussing the movement of gripper, it is important to analyze the value function when the gripper would be moving as well to get a sense of how the algorithm performs. Fig. 7.6 shows the value function for varying gripper speeds when the gripper is moving in the negative x direction. From this figure, it can be observed that the maxima shift in the x-direction which is the direction opposite to the

RESULTS AND DISCUSSIONS

Figure 7.7: Final value function obtained from IRL plotted for different directions of motion of gripper at speed of 0.05 units: right, up, left, down in order in columns for all subtasks (rows).

movement of the gripper. The center of the image is ahead of the gripper in the direction of its motion and hence the algorithm moves endoscope to look ahead of when the gripper is moving. Even in the case of subtask 3, it is shifted from where it was when the gripper is stationary. This suggests a continuous motion of the camera for a continuous motion of gripper.

This figure shows the variation of value function for differing speeds of 0, 0.01, 0.05 and 0.1 from left to right in each row. It can be noted that as the speed increases, the maxima shifts more implying that the endoscope is looking more and more ahead of the gripper. It is almost like the endoscope predicts where the gripper

RESULTS AND DISCUSSIONS

would be stopping and looks ahead so that there would not be sudden disruptive movements of the camera when the gripper moves. That is if the gripper is moving and stops, in all probability the camera would already be in position such that the gripper would be in the center of the image as suggested by the earlier figure.

This behavior is not only observed when the gripper is moving left but also in any other direction as can be seen from Fig. 7.7. The maxima always shifts in the direction opposite to the motion of the gripper so that endoscope can provide a view to look ahead to where the gripper is moving.

Hence the algorithms used to automate the endoscopic camera motion not only track the instruments but is also intelligent in tracking of these instruments. It is aware of the subtask the user is performing, the direction and speed of motion of the gripper and changes its behavior accordingly. It is probably very similar to how we move our heads when picking and placing objects which was the objective of this thesis. The comparisons between the automated camera trajectory and human camera trajectory obtained from head tracking in the user study is shown in the next section.

In document Automating endoscopic camera motion for teleoperated minimally invasive surgery using inverse reinforcement learning (Page 99-103)