AR-Assisted Medical Training: Tutorial Generation & Eye Gaze Tracking Analysis

Last updated: 9:00AM, 11 May 2018


The goal of this project is to create a AR app that will semi-automate the creation of HMD tutorials. Gaze-tracking data collection is included to both aid tutorial creation and evaluate performance.

  • Students: Prateek Bhatnagar, Allan Wang
  • Mentor(s): Ehsan Azimi, Chien-Ming Huang, Peter Kazanzides, and Nassir Navab

Background, Specific Aims, and Significance

HMDs have been employed in medical treatment, education, rehabilitation, and surgery. Graphical layers on an OST-HMD can display relevant data, images, text, or videos and thus facilitate procedures that would otherwise require the practitioner to refer to a side display, turning his or her gaze away from the object of interest. HMDs not only facilitate these processes, but also are relatively unobtrusive as they may be controlled through voice commands. These devices are best used in controlled training environments due to potential dependencies on internet and appropriate lighting conditions for the optical display. Moreover, the virtual markers used in current training files rely on static visual landmarks that the device uses to determine the location of the markers. If a training file involves virtual markers, it is vital for the user or trainer to ensure that visual landmarks are placed consistently and visible to the HoloLens camera

This project seeks to provide a tool for doctors and other health professionals to easily create their own HMD-based tutorials. In addition, eye-gaze tracking can provide an additional metric to compare a tutorial user's performance to the expert who created the tutorial.


  • Minimum: March 30 (complete)
    1. HoloLens app that records speech-to-text
    2. 2D heatmap generation
  • Expected: April 16 (in progress)
    1. HoloLens app that records image/text-based tutorials (complete)
    2. 2D & 3D heatmap generation (in progress)
  • Maximum: May 6 (in progress)
    1. HoloLens app that records image/text-based tutorials (complete)
    2. 2D & 3D heatmap generation (in progress)
    3. Virtual marker creation (in progress)

Technical Approach

This project consists of two primary modules for tutorial generation and eye-gaze tracking:

Tutorial generation

The HoloLens Development requires Visual Studio with the Windows 10 SDK along with the Vuforia extension to work with Unity. The content generation modules are developed within Unity with C# scripts to control the behavior of the app. Our first step in development was to include initial directions for the user in order to navigate the user interface and begin tutorial generation. The app is controlled using voice commands, since gestures in the current HoloLens libraries are relatively limited and most trainees will be fully occupied with the task. Unity for the HoloLens offers built-in speech recognition modules tha use either online or offline libraries to convert speech to text. While recording speech input, there is a visual indicator for active recording. Unity also is able to directly access the camera in the HoloLens device, although it may only be used by one process at a time, and would prevent our program from capturing video and images simultaneously. After image capture, the image will be displayed to the user for approval, retake, or rejection.

The recorded text will be written to a JSON file and images will be saved as they are captured and approved. The path to the final image will be written to the JSON with each step. Recorded audio will also be saved so necessary corrections can be made manually to the text afterwards.

Eye Gaze Tracking

Eye Gaze Module Components Pupil service - Pupil Labs proprietary software required to interface with the pupil labs cameras. Stores Calibration information, as well as facilitates transfer of gaze coordinate positions to python scripts via a zmq networking paradigm. Calibration Program for Hololens -Unity project provided by Pupil labs to allow for calibrating the HoloLens user’s eye gaze to with the pupil labs service. Gaze Streaming - A script that starts a zmq subscriber to listen to the gaze coordinate positions from the pupil labs service then streams this data over a UDP connection to required destinations on the python scripts running in parallel. Heatmap Server script - Receives raw gaze coordinate data, this is then transformed to the screen frame coordinates for the known display screen size of the HoloLens(approximately 720p). Coordinates are cleaned then transferred to the Heatmap Creation script. Heatmap Creation script - Stores a history of gaze coordinate data and creates a heatmap image from the data. The image is updated after a set number of entries. Image Transfer script - Code to transfer image of the heatmap using a TCP/IP connection to the Hololens. Hololens Scene - All the required code for displaying the Heatmap image is condensed within a unity canvas and a couple of managing components allowing for easy integration with other unity applications


  1. Access to HoloLens and Pupil Labs Add-on: Complete
  2. Access to existing codebase (GitHub): Complete
  3. Installation of toolkits and Unity: Complete
  4. Neurosurgeon to try demo: Complete

Milestones and Status

  1. User Interface: App can accept voice command and gaze-tracking camera is synced with HoloLens camera.
    • Planned Date: 3/18
    • Status: Completed
  2. Text-to-Speech & 2D Heatmaps: Text tutorials can be created and 2D heatmaps created.
    • Planned Date: 3/31
    • Status: Complete
  3. Image Capture & Working Demo: Text/image based tutorials can be created and heatmaps are fully functional.
    • Planned Date: 4/15
    • Status: Complete
  4. Final Report and Demo App
    • Planned Date: 5/6
    • Status: Almost Complete

Reports and presentations

Project Bibliography

  • Evaluation of Optical See-Through Head-Mounted Displays in Training for Critical Care and Trauma.
  • Kato, H., & Billinghurst, M. (1999). Marker Tracking and HMD Calibration for a Video-Based Augmented Reality Conferencing System. In Proceedings of the 2Nd IEEE and ACM International Workshop on Augmented Reality (p. 85–). Washington, DC, USA: IEEE Computer Society. Retrieved from
  • Birt, J., Cowling, M., & Moore, E. (2015). Augmenting distance education skills development in paramedic science through mixed media visualisation.
  • Armstrong, D. G., Rankin, T. M., Giovinco, N. A., Mills, J. L., & Matsuoka, Y. (2014). A heads-up display for diabetic limb salvage surgery: a view through the google looking glass. Journal of Diabetes Science and Technology, 8(5), 951–6.
  • Tai, B. L., Rooney, D., Stephenson, F., Liao, P.-S., Sagher, O., Shih, A. J., & Savastano, L. E. (2015). Development of a 3D-printed external ventricular drain placement simulator: technical note. Journal of Neurosurgery, 123(4), 1070–6.
  • Atkins, M. S., Tien, G., Khan, R. S. A., Meneghetti, A., & Zheng, B. (2013). What do surgeons see: capturing and synchronizing eye gaze for surgery applications. Surgical Innovation, 20(3), 241–8.
  • Kersten-Oertel, M., Jannin, P., & Collins, D. L. (2012). DVV: a taxonomy for mixed reality visualization in image guided surgery. IEEE Transactions on Visualization and Computer Graphics, 18(2), 332–52.
  • Eck, U., Stefan, P., Laga, H., Sandor, C., Fallavollita, P., & Navab, N. (2016). Exploring Visuo-Haptic Augmented Reality User Interfaces for Stereo-Tactic Neurosurgery Planning. In G. Zheng, H. Liao, P. Jannin, P. Cattin, & S.-L. Lee (Eds.), Medical Imaging and Augmented Reality (pp. 208–220). Cham: Springer International Publishing.

Other Resources and Project Files

courses/456/2018/456-2018-10/project-10.txt · Last modified: 2019/08/07 12:01 (external edit)