Table of Contents

Vital Monitor and ID Detection through Machine Vision for Improving EMS Communication Efficiency

Last updated: February 18, 2021

Summary

Glasses that provide live feed are currently used by emergency medical technicians simply as a camera. Consequently, there is opportunity to use the live feed, along with AI, to streamline certain processes. In particular, this project seeks to use this live feed to: 1) allow remote doctors to gain visual access to the information available from medical devices (such as ultrasound) in order to facilitate physician guidance, and 2) automatically pull key information from standard documents like driver’s licenses (by holding up identification to the glasses) to fill in personal data in forms and databases so that medics may spend less time on paperwork and more time on treating patients. The first objective's basis is to use deep learning, along with computer vision, to detect and deform medical data screens such that a person may remotely view them head-on. The second objective's basis is to use deep learning to detect IDs, and then to use optical character recognition and generic text parsing to extract the personal information. Training sets include those of desired monitors, as well as those of desired IDs. By the end of the project, at minimum, this work hopes to develop the ID detection and extraction algorithm, and at maximum, this work hope to have built and incorporated these algorithms into the workflow of a current model of smart glasses.

Background, Specific Aims, and Significance

Smart glasses, while themselves are not a recent invention, are only just being implemented into healthcare environments. These glasses are most often used by surgeons or emergency medical workers to record and live-stream video feed from, respectively, surgical rooms and on-the-field operations [1] [2]. This feed is fed through a smart phone and then sent over the internet to remote healthcare workers, who can either provide their advice and assistance for the current patient, or, if they are trainees, learn from watching these healthcare procedures[1]. Currently, only the raw live footage is used. Therefore, there is opportunity in taking this feed and running it through artificial intelligence in order to streamline healthcare processes to ultimately increase efficiency of patient care. In particular, there are two objectives:

Therefore, the main objectives in this project will be to provide direct visuals of on-the-field data monitors to remote health care workers and to extract and input information from identification into digital medical notes. Providing visuals will improve on-the-field outcomes by increasing confidence and treatment speeds, and by reducing repeated procedures. Extracting information from identification will improve on-the-field outcomes by moving time spent on documentation to patient care. Furthermore, this project's success will serve as an initial assessment into the viability of artificial intelligence in smart glasses in a health care environment.

Deliverables

Table:

Technical Approach

The ideal overall workflow of the smart glasses is shown above. Video feed of images of size 1280×960 will be sent through Wifi. From there, these images will be sent to the remote healthcare provider Zoom stream and then over the internet into the cloud, where the algorithm will process the video. For this initial project, video feed of the target objects will be recorded from the remote health care provider's Zoom stream and then the input into the algorithm.

Prior to reaching the algorithms, the video feed will be preprocessed through resizing, normalization, and gaussian blur to ensure standardization and removal of noise. Then, for either algorithms, it will first check if its respective object is within the image. If the object is detected, the algorithms will process the images respectively.

After the individual processing, the results from both algorithms will be input into a frame weighting algorithm. Every frame generated, while the object of interest is within view, will, depending on their weight, contribute to a running overall estimation of the textual results. For monitors, weights are based on how in-focus they are, and their time since detection (so that, if a heart beat changes, its estimation will change too). For identification, weights are based on how in-focus the frame is.

Finally, these values will be sent to their respective destinations: ID information will be sent to a digital medical memo, while vitals information will be sent appended to the stream.

Objective 1: Detection and Extraction of Identification

There are three main steps in order to detect and extract information from identifications.

  1. Detect the presence of an ID. In order to detect the presence of identification, a YOLO Deep Learning Framework will be constructed and trained on the driver license dataset. YOLO was chosen due to its aptitude in processing large data at many frames per second. Furthermore, according to the problem, it is only necessary to detect whether one object is present in the frame, which YOLO excels at.
  2. Detect the spatial orientation of an ID and deform. To optimize optical character recognition, the identification's image will be determined using Hough Transform Edge Detection and deformed such that it appears head-on with no rotation.
  3. Read and categorize information on the ID. Once an ID is detected, and after some preprocessing, Tesseract OCR in Python will be used to extract text from the license. Generic Text Parsing will then be used to sort the information into desired categories, such as name, birth date, etc.

For each code, the target accuracy will be 95%, with a decision speed of <8ms (to support a ~60 frames per second camera). Ultimately, a combined accuracy of 95% with a decision speed of <16ms is desired.

Objective 2: Detection and Deformation of Data Monitors

There are three main steps in order to obtain a head-on image of the data monitor.

  1. Detect the display of a vital signs monitor or ultrasound image. In order to detect the presence of a medical monitor, the same YOLO Deep Learning Framework will be constructed and trained on the medical devices dataset. YOLO was chosen due to its aptitude in processing large data at many frames per second. Furthermore, according to the problem, it is only necessary to detect whether one object is present in the frame, which YOLO excels at.
  2. Deform monitor such that the monitor appears head-on. To optimize optical character recognition, the identification's image will be determined using Hough Transform Edge Detection and deformed such that it appears head-on with no rotation.
  3. Read vitals signs data. Once a monitor is detected, and after some preprocessing, Tesseract OCR in Python will be used to extract text from the license. The location of numbers will then be used to sort the information into desired categories, such as heart rate, blood pressure, and oxygen saturation.

For each code, the target accuracy will be 95%, with a decision speed of <8ms (to support a ~60 frames per second camera). Ultimately, a combined accuracy of 95% with a decision speed of <16ms is desired.

Frame Integration

In order to increase the accuracy of the program, reduce the effect of glare, and exploit the fact that multiple frames are used per object, the algorithm runs an integration algorithm that:

  1. Takes the string outputs of the OCRs of multiple frames with respect to a information category (ex. Birth Date, Heart Rate). For instance, we could look at each frame’s Birth Date result.
  2. Creates weights for each frame’s string output, depending on the focus/blurriness of the image of the string. For instance, we may have one blurry frame, one in-focus frame, and one in-between that each have a reported Birth Date. More weight would be given to the in-focus frame, some would be given to the one in-between, and the least weight would be given to the blurry frame.
  3. Using the weights and the string outputs, we estimate a running actual value of the information. This estimation also considers if some frames have read too many or too few characters.

In the case of the ID, once the object is no longer detected for some time, the running values in step 3 will be reported. In the case of the vitals monitors, since vitals information will change, frames that were taken in the past will gradually lose weight, and the running estimation of the vitals will continuously be reported.

Dependencies

Information regarding project dependencies is best organized in table form as shown below.

Milestones and Status

  1. Milestone name: Obtain Datasets 8-)
  2. Milestone name: Code YOLO 8-)
    • Start Date: 2/19
    • Planned Date: 2/22
    • Expected Date: 2/22
    • Status: **Completed**
  3. Milestone name: Train YOLO on IDs and Record Results 8-)
    • Start Date: 2/22
    • Planned Date: 2/25
    • Expected Date: 2/25
    • Status: **Completed**
  4. Milestone name: Code OCR 8-)
    • Start Date: 2/25
    • Planned Date: 2/29
    • Expected Date: 2/29
    • Status: **Completed**
  5. Milestone name: Test OCR on Camera Images and Record Results 8-)
  6. Milestone name: Code and Assess Deformation with Hough Transform 8-)
    • Start Date: 3/1
    • Planned Date: 3/4
    • Expected Date: 3/4
    • Status: **Completed**
  7. Milestone name: Create Ground Truth Set for Generic Text Parsing 8-)
  8. Milestone name: Code and Assess Generic Text Parsing and Record Results 8-)
  9. Milestone name: Combine all Individual Codes to Read IDs 8-)
  10. Milestone name: Duplicate YOLO and Train on Devices Dataset and Record Results 8-)
  11. Milestone name: Generate Test Set for Blob Feature Deform Algorithm Testing
    • Planned Date: 3/24
    • Expected Date: 3/25
    • Status: Removed
  12. Milestone name: Code Monitor Deformation Algorithm 8-)
    • Planned Date: 3/28
    • Expected Date: 3/28
    • Status: Completed
  13. Milestone name: Assess Monitor Deformation Algorithm 8-)
  14. Milestone name: Investigate How to Write to ePCR
    • Planned Date: 4/5
    • Expected Date: 4/5
    • Status: Removed
  15. Milestone name: Feed Video Back Into Smartphone
    • Planned Date: 4/10
    • Expected Date: 4/10
    • Status: Removed
  16. Milestone name: Incorporate all Algorithms into Workflow 8-)
  17. Milestone name: Generate Tests to Assess the Functionality of the Application 8-)
  18. Milestone name: Assess Overall Application Functionality
    • Planned Date: 4/26
    • Expected Date: 5/1
    • Status: Removed

Reports and presentations

Project Bibliography

Background and Motivation References

  1. Vuzix Corporation. Vuzix Corporation, 2020, VUZIX SMART GLASSES AT THE CHI MEI MEDICAL CENTER, TAIWAN, ss-usa.s3.amazonaws.com/c/308483104/media/21105f5a523ce21ce43889049199725/Vuzix-Chi-Mei-Medical-Case-Study-2020.pdf.
  2. Schaer, et al. “Using Smart Glasses in Medical Emergency Situations, a Qualitative Pilot Study.” 2016 IEEE Wireless Health (WH), 2016, doi:10.1109/wh.2016.7764556.
  3. GreenLight Medical. “Standardizing Medical Devices: Value Analysis.” GreenLight Medical, 17 Mar. 2020, www.greenlightmedical.com/standardizing-medical-devices-in-hospitals/.
  4. Crawford S, Kushner I, Wells R, Monks S. “Electronic health record documentation times among emergency medicine trainees.” Perspect Health Inf Manag, 2019;16:1f.
  5. Lester, Laeben. “Inquiry into EMS Documentation Times.” 14 Feb. 2021.

Reading Materials

Other Resources and Project Files

Here give list of other project files (e.g., source code) associated with the project. If these are online give a link to an appropriate external repository or to uploaded media files under this name space (2021-11).

Dataset for Driver License YOLOv3 Training:

https://drive.google.com/drive/folders/1vHMG2MuN_UT3xDQW2D2M0xbONkEN8NgN?usp=sharing

Dataset for Medical Monitor YOLOv3 Training:

https://drive.google.com/drive/folders/1p2EYb_wphHtfJ2YzOVttm_TyCGae5OM2?usp=sharing

YOLOv3 Implementation Code, Deformation Code, OCR and Text Parsing Code:

https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing