Da Vinci Intelligent Surgical Assistance

Last updated: May. 8, 2014, 5:32 pm

Contact Information

Chris Paxton: cpaxton3 (at) jhu.edu


My goal is to develop tools to automate parts of a surgical procedure, so that the machine can make procedures faster and more efficient. Surgeries can be hours long, and contain many repetitive motions; automating parts of a task makes a surgeon's job easier and lets them focus on the patient. The Da Vinci robot has a third arm which is often used for supplementary tasks like cutting threads. A surgeon cannot control all three arms at once, so they need to clutch to switch which arm is being controlled. Automating small parts of the procedure that need the third arm would help reduce task complexity and decrease load on the user.

I want to focus on applying this to a simple collaborative peg-passing procedure first, then to a suturing example based on surgical training procedures, leveraging the large amount of information collected by the Language of Surgery project.

  • Students: Chris Paxton
  • Mentor(s): Jonathan Bohren, Kelleher Guerin, Prof. Greg Hager

Background, Specific Aims, and Significance

Human-robot collaboration is increasingly important as robots become more capable of contributing to skilled tasks in the workplace. Robotic Minimally Invasive Surgery (RMIS) is a part of this trend. Partial automation would decrease the load on surgeons during long procedures by automating repetitive sub-tasks, and it would improve surgeon performance if procedures are being performed over long distances in conditions of high latency.

This is a project that uniquely fits our position at JHU; The Language of Surgery project here has collected a large amount of surgical data used for skill classification and for providing feedback to surgeons in training. Recent work has also looked into automatic segmentation of video and kinematic data from these surgical procedures. I worked with Amir Masoud on methods for learning from demonstration that can incorporate information about the environment into following a preset trajectory during a manipulation task; the models used for this work are pictured above.

  1. Stereo registration for the Da Vinci to perform 3D reconstruction of scenes and extract objects of interest.
  2. Design algorithms for collaborative control incorporating visual and object information.
  3. Complete a simple peg passing task with cooperation between user and computer controlled arms of the Da Vinci. In this task, the user takes a peg with one manipulator, hands it to a second manipulator, and then clutches to switch to a third manipulator arm and takes the peg. The approach described by this project should learn how to take control of the third arm and grab the peg, without the need for the user to take control.
  4. Assist suturing task in a test Da Vinci surgical procedure through control of the third arm. The robot should be able to grab a suture needle after a needle drive.


  • Minimum:
    1. Simple Stereo Registration and Reconstruction: calibrate the robot using chessboard images to perform 3D reconstruction of scenes, without finding tooltips. This is already possible, and will be useful if the proposed approach for stereo reconstruction fails.
    2. Adapt Formal Algorithmic Approach: The approach for motion and task modeling described below is well established, but needs to be formalized and adapted into an algorithm suited for the specific task. This is very important, and so it needs to be done as soon as possible. I have settled on an approach, described in the technical approach section. Actual implementation of my plan is another step.
    3. Model Task Components: Using this algorithmic approach and a stereo registration approach, I should be able to learn an IOC model for individual components of a task. Even if other components of the project fail, this model can still be useful for assessing surgical skill or providing real-time feedback if a user starts to deviate from the expected trajectory. Since I am planning on switching between arms in the peg task with an operation from the user, I do two things:
      1. Learn a model based on
      2. If this fails, rely on the user “clutching” to activate a new arm to signal when the automation should take over.
    4. Simulation of Simple Training Tasks: Design and build components to test the robot in Gazebo. These can be used later on, and will help build a unified framework across multiple people working in inverse optimal control in LCSR. This turns out to have been a much larger problem than anticipated; I have been working on modifying simulation code. I am using a 3DConnexion Space Navigator mouse to control the robots; one button switches between which arm is being controlled and one button tells the gripper to open or close. I still need to do the following, ideally by 4/4/2014:
      1. Finish debugging components for controlling two WAM arms simultaneously.
      2. Finish the models I will use for the peg task.
      3. Test Space Navigator for controlling the robot.
  • Expected: (Expected by mid-April)
    1. Tooltip-based Stereo Registration and Reconstruction: Develop a procedure for stereo calibration and reconstruction from collected surgical videos with different camera intrinsics. I have decided to delay the deadline for this component until after the peg task is done; while this is important for working on the Da Vinci robot, it is not necessary immediately.
    2. Peg Passing Task: Finish models to the point where a simple block-passing task can be accomplished through partial task automation. Possibly in simulation, if available; using the simulator makes development easier.
  • Maximum: (Expected by end of April/early May)
    1. Suturing Task: Adapt approach to assist a user in a simple suturing task as described above. Use the Da Vinci to assist the user.
    2. Semi-Automation Toolkit: e are also interested in intention recognition and semi-automation for industrial robotics and for high-latency telemanipulation projects. If all other work is successful, I will adapt this project into a toolkit that can be applied to new tasks and robots such as the WAM arms and the Raven surgical robot.

Technical Approach

Stereo Registration and Reconstruction: Using the available video data for stereo reconstruction is difficult because the camera intrinsic parameters of the robots change from trial to trial, because the robots' camera focal distances can change. In addition, trials are recorded on different robots.

Recent work has been able to identify the Da Vinci tooltips in video. I want to use this to find locations of the tooltips in all collected video data, then use this together with the available camera position and tooltip.

Motion Model: Prior work from Prof. Hager's group used Gaussian Mixture Models to determine when rotations needed to occur. I am interested in using a different and hopefully more robust approach to model how interactions should occur. Some recent work in Inverse Optimal Control (IOC) has dealt with learning in continuous environments from locally optimal examples, and recent work submitted to IROS by Amir Masoud and myself under Prof. Hager has looked at learning how to incorporate new environmental information into demonstrated trajectories.

The approach I plan on using for modeling agents' motion is based on maximum-entropy IOC. In this case, we maximize the probability of each given actions a from each state s for the observed expert trajectories.

In this case, however, we also need to take into account noisy environmental features. Motions need to be in relation to observed features of interest (needle, suture points, peg being passed) and the tissue. Previous work has looked at this problem before through the use of hidden variable Markov Decision Processes for activity forecasting. This has also been used to predict the intention of an actor, which is useful for predicting when intervention should take place.

There are two possible approaches for this, based on work by two different groups. In the paper Continuous Inverse Optimal Control using Locally Optimal Examples by Levine et al., the authors look at methods for learning inverse optimal control solutions based on only locally optimal solutions. This is ideal for our application because we do not want to assume that the human demonstrations are globally optimal. The human can only control the end point of the arm, for example, and not the entire arm. Work by Pieter Abbeel's group in IOC has also recently looked into real world examples. They solve map a demonstration scene onto a test scene, and then solve a trajectory optimization problem.

The Peg Transfer task requires two components: grabbing a ring from one hand and putting it on a peg.To solve this IOC problem, I need a concrete list of features. For placing a ring on a peg, these would be:

  • Relative position of pegs other than destination peg
  • Relative position of destination peg
  • Relative orientation of ring to destination peg

When grabbing a ring from another gripper, necessary features are:

  • Relative orientation of the ring as compared to the gripper
  • Relative position of the ring as compared to the gripper
  • Relative position of pegs to the ring and gripper (to avoid collisions)

Task Model: While the IOC component is capable of modeling individual segments of a complex task, we also need some idea of how different task components fit together. Luckily, our surgical data has already been manually labeled and segmented with a set of rigorously tested and well-defined definitions available on the Language of Surgery wiki. We also know that, when performing third-arm tasks, the user will clutch to switch arm control, providing an easy segmentation of which parts of the task the software will be responsible for handling. Previous work in temporal planning and hierarchical control has elaborated on how to combine multiple sub-tasks.

 Screenshot of the Barret WAM arm with 7 degrees of freedom in simulaion.

Simulation: I am using the ROS/OROCOS toolkit created by Jon Bphren, together with models and controllers for the Barrett WAM arm. This will allow me to develop methods that can be tested easily and quickly in simulation and on the physical WAM arm. I can pass simple trajectories and cartesian coordinates to the controller, which greatly simplifies programming and testing IOC algorithms.

I also upgraded my workstation with a new graphics card (NVidia GTX760) and 8 GB of additional RAM to improve its simulation capabilities. These additional dependencies are an integral part of the project; unfortunately, previously it had been unclear as to what would be the best way to move forward.

I can issue commands to the robot by publishing on a ROS topic:

rostopic pub -r 1
  "{ positions: [0.0,-1.57,0,3.0,0,-0.8,0.0] }"

 IK controller showing command frame and joint coordinate frames for the Barrett WAM arm.

Another way to control the robot is by providing position commands to an inverse kinematics solver, like either MoveIt (developed by Willow Garage) or an IK controller for the ROS/Orocos integrated code described above. The ROS/Orocos IK controller follows a destination TF frame with the tip of the robot end effector. I can control the position of the destination frame with a Phantom Omni (haptic feedback controller) or with a SpaceNav 3D mouse.

As of 3/28, the code to control the arm with a 3DConnexion SpaceNavigator mouse works reliably. There are a few features to add, however, like controlling the Barrett hand and switching between arms.

Controlling multiple arms has proven to be a bigger problem than anticipated. I mostly have the two arms working together now, working with (mentor) Jon Bohren.

The multiple arms required a change in the way the current Orocos/ROS integration set up handled components: there were a number of issues just because it was not set up to be able to launch multiple robots in the same context. Since ROS and Gazebo are multithreaded, there were a few race conditions where sometimes both arms would launch, sometimes only one arm would be able to launch, or sometimes neither would launch and the whole system would crash. This problem was caused by a mutex in the Orocos Gazebo plugin: threads could access Gazebo before everything was initialized, or attempt to access the world state at the same time.

As of April 4, both arms can be controlled by the 3DConnexion Space Navigator mouse in a simulated version of the stage set up in the LCSR Robotorium. While it would be possible to use either a Phantom Omni or the Da Vinci console, I am working with the Space Navigator mouse for the time being because it is simple to use, works cross platform, and because I can toggle controlling each of the two arms with a simple interface.

I can then use this setup to pick up and manipulate objects, like the cordless drill pictured here. The arm was struggling to pick up objects like this, so I tweaked the integral gains and integral bounds in the PID controller (the component responsible for compensating for gravity).

The Space Navigator mouse lets me move the end point of the WAM arm, and the buttons let me toggle which arm is being controlled and close or open the gripper. I can now use this to walk through the simulated peg transfer task, where both arms need to work together to move a torus modelled in Blender from one peg to another. I created the peg URDF models specifically for this task.

  • Arm 1 holding torus for Arm 2: Arm 1 holding torus for Arm 2
  • Arm 2 grabs torus from Arm 1: Arm 2 grabs torus from Arm 1
  • Arm 2 places torus on peg:  Arm 2 places torus on peg

Here, you can see the handoff in action.

I made some changes to the UI. I wrote a plugin for Gazebo that publishes information to TF; the goal is to expose more of the necessary information about the world to my code. Right now, positions and orientations are published via TF as a tree of coordinate transforms. I can also record joint space positions and other information.

I can now control the arms through interactive TF frames, not just through the Space Navigator mouse. This gives me a slightly more precise way of controlling the arms.

Update 5/1/2014: The simulation works well, and I can use it to reliably manipulate objects. I can also use rosbag to record and replay trajectories. My next steps are to use the rosbag API to load and modify these trajectories, so that I can replay them given that an object is at a different location. To do this, I might want to have something publish a set of features I can record as well, and then modify the planned trajectory in response to the difference in feature counts.


To develop intelligent assistance for surgical procedures, I need access to a robot, task models the robot can perform, and a set of training data. As of Feb. 2014, I already have access to the BB API necessary for read/write instructions to the Da Vinci robot, and I can use the robot in Hackerman for research and development. I also have access to collected surgical data already, and can collect more using the Da Vinci for specific tasks.

I will also be using the open-source CISST and OpenCV libraries. CISST has a number of useful tools for robotics, but it also has a video codec necessary for recent data collected at MISTIC. OpenCV has tools to perform stereo calibration and 3D reconstruction. Both of these systems are already set up on my laptop and workstation. I will use NLOPT, an efficient cross-platform nonlinear optimization library written in C++, to solve necessary parts of the IOC problem for modeling motions.

It may be possible to speed up development with the use of a simulator instead of performing all experiments on the actual Da Vinci; however, at present it is unclear when or if this will happen. The Mimic simulator in question (at Johns Hopkins Bayview) is capable of simulating deformable materials and threads, but we need to wait on the company itself to see whether we can access the position of the needle and thread during the task and to be able to read out the kinematics. Current plans assume I will not be able to use the simulator this semester.

Another option for a simulator would be to use the Gazebo simulator integrated with ROS. This would speed up development for the peg-passing task, but Gazebo cannot simulate deformable materials, so it would not be useful for the suturing task I described.

Update 03/12/2014: I am planning on using ROS/OROCOS and Gazebo to simulate the rigid manipulation task described above.

Training and Certification

I have completed the necessary training for laboratory safety and human subjects research:

Milestones and Status

  1. Milestone name: Formal Algorithmic Approach
    • Planned Date: 3/14/2014
    • Expected Date: 3/14/2014
    • Status: Plans set and described above. Algorithm and features may change as the simulation changes and as results from tests are complete. Approach is based on modifying proposed IOC methods to suit a simulated robotic manipulation task.
  2. Milestone name: Tooltip-based Stereo Registration and Reconstruction
    • Planned Date: 3/7/2014
    • Expected Date: 4/25/2014
    • Status: Major changes: Rescheduled. This component is less immediately important because I want to focus on the algorithm and code for shared automation before I can even begin to develop code for the Da Vinci. I want to focus on actually developing the algorithms and software for these collaborative methods; registration will wait until the algorithm is proven and I can move on to working with the Da Vinci.
  3. Milestone name: Model Task Components
    • Planned Date: 3/14/2014
    • Expected Date: 3/14/2014
    • Status: Complete; implementation requires simulation. For peg transfer task, I can train a classifier based on the trajectories and actions of the user before the user switches to control the second arm. Another option is to let the user trigger the automated component manually; this is probably more likely in a “real world” application because human experts are unlikely to want automation to take initiative on its own. Plans are discussed above.
  4. Milestone name: Simulation
    • Planned Date: 4/4/2014
    • Expected Date: 4/10/2014
    • Status: Complete as of 4/10. I can perform the peg transfer task manually with two WAM arms, with one arm picking up a torus (the ring) and the other taking it and putting it down on another peg.
  5. Milestone name: Automate Peg Transfer Task
    • Planned Date: 4/11/2014
    • Expected Date: 5/7/2014
    • Status: In progress. Delays due to simulation set-up and UI issues. There are still some minor tweaks to the arm PID settings that may be necessary, but now I am at the point where I need to collect data and do my actual modeling. I have incorporated NLOPT and some preliminary Gaussian Process/IOC code into my setup, but there remains a lot to be done. I now have the software set up so that I can record and replay trajectories; I can use rosbag and its C++/Python API to modify these trajectories and read them in software.
  6. Milestone name: Suturing Task
    • Planned Date: 4/25/2014
    • Expected Date: 6/1/2014
    • Status: Waiting on other components. Has been pushed into the summer.

Reports and presentations

Project Bibliography

References and Background Reading:

  • S. Levine and V. Koltun. Continuous inverse optimal control with locally optimal examples. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, volume 1, pages 41 – 48, 2012.
  • Conor McGann, Eric Berger, Jonathan Bohren, Sachin Chitta, Brian Gerkey, Stuart Glaser, Bhaskara Marthi, Wim Meeussen, Tony Pratkanis, Eitan Marder-Eppstein, et al. Model-based, hierarchical control of a mobile manipulation platform. In 4th workshop on planning and plan execution for real world systems, ICAPS, 2009.
  • Austin Reiter, Peter K Allen, and Tao Zhao. Appearance learning for 3d tracking of robotic surgical tools. The International Journal of Robotics Research, page 0278364913507796, 2013.
  • John Schulman, Ankush Gupta, Sibi Venkatesan, Mallory Tayson-Frederick, and Pieter Abbeel. A case study of trajectory transfer through non-rigid registration for a simplified suturing scenario. In Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on, pages 4111–4117. IEEE, 2013.
  • Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In AAAI, pages 1433 –1438, 2008.
  • Sebastian Bodenstedt, Nicolas Padoy, and Gregory Hager. Learned partial automation for shared control in tele-robotic manipulation. In 2012 AAAI Fall Symposium Series, 2012.
  • Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. Activity forecasting. In Computer Vision–ECCV 2012, pages 201–214. Springer, 2012.

Other Resources and Project Files

  • Software resources used for simulating the Barrett WAM arms and other robots used in LCSR is available on the JHU LCSR GitHub page.
  • The ROS framework will be used for developing robot control software that can be applied in a number of different settings. It also has Matlab bindings for testing and algorithm development.
  • OpenCV is a cross-platform open source library implementing a range of computer vision techniques. These will be used in camera calibration and for object interaction.
  • NLOPT is a free and open-source library for nonlinear optimization; this may be useful for solving for inverse optimal control parameters.
  • Blender is an open-source 3D modeling program, used to create the torus (ring) for the peg task simulation.

Project Code:

  • Many of the changes have been merged with the projects on the JHU-LCSR GitHub.
  • Code to move the Barrett around with a 3D mouse and command the simulated WAM arm is available on GitHub (lcsr_spacenav).
  • Code to simulate objects and scenarios for collaborative tasks is also available on GitHub (lcsr_collab).
  • My plugin to Gazebo, which publishes transforms and provides some other functionality, is on GitHub (lcsr_construction_plugin) as well. This additionally contains code to attach/detach joints to simulate objects latching together in simulation, which is not used in this project.
  • My code has branched substantially from Jon's; my Barrett code configured to work with multiple arms is likewise available on GitHub (lcsr_barrett).
  • Da Vinci task assistance code, including inverse reinforcement learning code and Da Vinci video/data input code, is available on the TaskAssistance Bitbucket page.
  • Code to capture and replay trajectories and features is on the LCSR Replay Bitbucket page.
  • Documentation for installing and using the simulation is posted on the LCSR intranet wiki
  • Barrett controller code configured to work with multiple arms is also on GitHub (lcsr_controllers). This code uses a modification to the Orocos-ROS integration stack to provide a parameter for a specific URDF model instead of automatically checking a name in the global namespace, among other changes.
courses/446/2014/446-2014-16/da_vinci_intelligent_surgical_assistance.txt · Last modified: 2014/05/09 18:39 by cpaxton3@johnshopkins.edu