Vital Monitor and ID Detection through Machine Vision for Improving EMS Communication Efficiency
Last updated: February 18, 2021
Summary
Glasses that provide live feed are currently used by emergency medical technicians simply as a camera. Consequently, there is opportunity to use the live feed, along with AI, to streamline certain processes. In particular, this project seeks to use this live feed to: 1) allow remote doctors to gain visual access to the information available from medical devices (such as ultrasound) in order to facilitate physician guidance, and 2) automatically pull key information from standard documents like driver’s licenses (by holding up identification to the glasses) to fill in personal data in forms and databases so that medics may spend less time on paperwork and more time on treating patients. The first objective's basis is to use deep learning, along with computer vision, to detect and deform medical data screens such that a person may remotely view them head-on. The second objective's basis is to use deep learning to detect IDs, and then to use optical character recognition and generic text parsing to extract the personal information. Training sets include those of desired monitors, as well as those of desired IDs. By the end of the project, at minimum, this work hopes to develop the ID detection and extraction algorithm, and at maximum, this work hope to have built and incorporated these algorithms into the workflow of a current model of smart glasses.
Students:
Mentor(s):
Dr. Nick Dalesio: MD AirSupport Co-founder, Pediatric Anesthesiologist
Dr. Laeben Lester: MD AirSupport Co-founder, Cardiac Anesthesiologist and Emergency Physician
Dr. Mathias Unberath: Assistant Professor in the Department of Computer Science
Background, Specific Aims, and Significance
Smart glasses, while themselves are not a recent invention, are only just being implemented into healthcare environments. These glasses are most often used by surgeons or emergency medical workers to record and live-stream video feed from, respectively, surgical rooms and on-the-field operations [1] [2]. This feed is fed through a smart phone and then sent over the internet to remote healthcare workers, who can either provide their advice and assistance for the current patient, or, if they are trainees, learn from watching these healthcare procedures[1]. Currently, only the raw live footage is used. Therefore, there is opportunity in taking this feed and running it through artificial intelligence in order to streamline healthcare processes to ultimately increase efficiency of patient care. In particular, there are two objectives:
Therefore, the main objectives in this project will be to provide direct visuals of on-the-field data monitors to remote health care workers and to extract and input information from identification into digital medical notes. Providing visuals will improve on-the-field outcomes by increasing confidence and treatment speeds, and by reducing repeated procedures. Extracting information from identification will improve on-the-field outcomes by moving time spent on documentation to patient care. Furthermore, this project's success will serve as an initial assessment into the viability of artificial intelligence in smart glasses in a health care environment.
Deliverables
Minimum: (Starting by 2/19, Expected by 3/15)
Dataset of IDs and Medical Devices (2/19)
-
-
-
-
Documentation of Overall Code and Performances of Each Section (3/15)
-
-
Expected: (Starting by 3/15, Expected by 3/30)
Documentation of Code and Performance of YOLO on Devices (3/20)
-
-
Documentation of Color Threshold Edge Detechtion and Its Performance (3/30)
-
-
Maximum: (Starting by 3/30, Expected by 5/1)
Documentation of Incorporation. (4/15)
-
Testing procedures for performance assessment. (4/20)
-
Results of the Tests. (5/1)
-
Table:
Technical Approach
The ideal overall workflow of the smart glasses is shown above. Video feed of images of size 1280×960 will be sent through Wifi. From there, these images will be sent to the remote healthcare provider Zoom stream and then over the internet into the cloud, where the algorithm will process the video. For this initial project, video feed of the target objects will be recorded from the remote health care provider's Zoom stream and then the input into the algorithm.
Prior to reaching the algorithms, the video feed will be preprocessed through resizing, normalization, and gaussian blur to ensure standardization and removal of noise. Then, for either algorithms, it will first check if its respective object is within the image. If the object is detected, the algorithms will process the images respectively.
If a data monitor is detected, the pixels of the monitor will be cropped out, and the image will be deformed such that it appears head-on. This image will then be sent back to the smartphone, and the image will be appended to the video feed and sent to the remote health care worker. At the same time, vitals information will be grabbed from the image and categorized.
If identification is detected, the algorithm will crop out and deform the identification, grab the text from the resulting image, and categorize the text into meaningful values (such as Name, Date of Birth, etc.).
After the individual processing, the results from both algorithms will be input into a frame weighting algorithm. Every frame generated, while the object of interest is within view, will, depending on their weight, contribute to a running overall estimation of the textual results. For monitors, weights are based on how in-focus they are, and their time since detection (so that, if a heart beat changes, its estimation will change too). For identification, weights are based on how in-focus the frame is.
Finally, these values will be sent to their respective destinations: ID information will be sent to a digital medical memo, while vitals information will be sent appended to the stream.
Objective 1: Detection and Extraction of Identification
There are three main steps in order to detect and extract information from identifications.
Detect the presence of an ID. In order to detect the presence of identification, a YOLO Deep Learning Framework will be constructed and trained on the driver license dataset. YOLO was chosen due to its aptitude in processing large data at many frames per second. Furthermore, according to the problem, it is only necessary to detect whether one object is present in the frame, which YOLO excels at.
Detect the spatial orientation of an ID and deform. To optimize optical character recognition, the identification's image will be determined using Hough Transform Edge Detection and deformed such that it appears head-on with no rotation.
Read and categorize information on the ID. Once an ID is detected, and after some preprocessing, Tesseract OCR in Python will be used to extract text from the license. Generic Text Parsing will then be used to sort the information into desired categories, such as name, birth date, etc.
For each code, the target accuracy will be 95%, with a decision speed of <8ms (to support a ~60 frames per second camera). Ultimately, a combined accuracy of 95% with a decision speed of <16ms is desired.
There are three main steps in order to obtain a head-on image of the data monitor.
Detect the display of a vital signs monitor or ultrasound image. In order to detect the presence of a medical monitor, the same YOLO Deep Learning Framework will be constructed and trained on the medical devices dataset. YOLO was chosen due to its aptitude in processing large data at many frames per second. Furthermore, according to the problem, it is only necessary to detect whether one object is present in the frame, which YOLO excels at.
Deform monitor such that the monitor appears head-on. To optimize optical character recognition, the identification's image will be determined using Hough Transform Edge Detection and deformed such that it appears head-on with no rotation.
Read vitals signs data. Once a monitor is detected, and after some preprocessing, Tesseract OCR in Python will be used to extract text from the license. The location of numbers will then be used to sort the information into desired categories, such as heart rate, blood pressure, and oxygen saturation.
For each code, the target accuracy will be 95%, with a decision speed of <8ms (to support a ~60 frames per second camera). Ultimately, a combined accuracy of 95% with a decision speed of <16ms is desired.
Frame Integration
In order to increase the accuracy of the program, reduce the effect of glare, and exploit the fact that multiple frames are used per object, the algorithm runs an integration algorithm that:
Takes the string outputs of the OCRs of multiple frames with respect to a information category (ex. Birth Date, Heart Rate). For instance, we could look at each frame’s Birth Date result.
Creates weights for each frame’s string output, depending on the focus/blurriness of the image of the string. For instance, we may have one blurry frame, one in-focus frame, and one in-between that each have a reported Birth Date. More weight would be given to the in-focus frame, some would be given to the one in-between, and the least weight would be given to the blurry frame.
Using the weights and the string outputs, we estimate a running actual value of the information. This estimation also considers if some frames have read too many or too few characters.
In the case of the ID, once the object is no longer detected for some time, the running values in step 3 will be reported. In the case of the vitals monitors, since vitals information will change, frames that were taken in the past will gradually lose weight, and the running estimation of the vitals will continuously be reported.
Dependencies
Information regarding project dependencies is best organized in table form as shown below.
Milestones and Status
Milestone name: Obtain Datasets
Start Date: 2/18
Planned Date: 2/19
Expected Date: 2/19
Status: Completed
Milestone name: Code YOLO
Start Date: 2/19
Planned Date: 2/22
Expected Date: 2/22
-
Milestone name: Train YOLO on IDs and Record Results
Start Date: 2/22
Planned Date: 2/25
Expected Date: 2/25
-
Milestone name: Code OCR
Start Date: 2/25
Planned Date: 2/29
Expected Date: 2/29
-
Milestone name: Test OCR on Camera Images and Record Results
Planned Date: 3/1
Expected Date: 3/1
-
Milestone name: Code and Assess Deformation with Hough Transform
Start Date: 3/1
Planned Date: 3/4
Expected Date: 3/4
-
Milestone name: Create Ground Truth Set for Generic Text Parsing
Planned Date: 3/4
Expected Date: 3/5
-
Milestone name: Code and Assess Generic Text Parsing and Record Results
Planned Date: 3/15
Expected Date: 3/15
-
Milestone name: Combine all Individual Codes to Read IDs
Planned Date: 3/15
Expected Date: 3/15
-
Milestone name: Duplicate YOLO and Train on Devices Dataset and Record Results
Planned Date: 3/17
Expected Date: 3/20
-
Milestone name: Generate Test Set for Blob Feature Deform Algorithm Testing
Planned Date: 3/24
Expected Date: 3/25
Status: Removed
Milestone name: Code Monitor Deformation Algorithm
Planned Date: 3/28
Expected Date: 3/28
-
Milestone name: Assess Monitor Deformation Algorithm
Planned Date: 3/30
Expected Date: 3/30
-
Milestone name: Investigate How to Write to ePCR
Planned Date: 4/5
Expected Date: 4/5
Status: Removed
Milestone name: Feed Video Back Into Smartphone
Planned Date: 4/10
Expected Date: 4/10
Status: Removed
Milestone name: Incorporate all Algorithms into Workflow
Planned Date: 4/14
Expected Date: 4/15
-
Milestone name: Generate Tests to Assess the Functionality of the Application
Planned Date: 4/18
Expected Date: 4/20
-
Milestone name: Assess Overall Application Functionality
Planned Date: 4/26
Expected Date: 5/1
Status: Removed
Reports and presentations
Project Bibliography
Background and Motivation References
Vuzix Corporation. Vuzix Corporation, 2020, VUZIX SMART GLASSES AT THE CHI MEI MEDICAL CENTER, TAIWAN, ss-usa.s3.amazonaws.com/c/308483104/media/21105f5a523ce21ce43889049199725/Vuzix-Chi-Mei-Medical-Case-Study-2020.pdf.
Schaer, et al. “Using Smart Glasses in Medical Emergency Situations, a Qualitative Pilot Study.” 2016 IEEE Wireless Health (WH), 2016, doi:10.1109/wh.2016.7764556.
-
Crawford S, Kushner I, Wells R, Monks S. “Electronic health record documentation times among emergency medicine trainees.” Perspect Health Inf Manag, 2019;16:1f.
Lester, Laeben. “Inquiry into EMS Documentation Times.” 14 Feb. 2021.
Reading Materials
Redmon, Joseph, et al. “You Only Look Once: Unified, Real-Time Object Detection.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, doi:10.1109/cvpr.2016.91.
Shaifee, Mohammad Javad, et al. “Fast YOLO: A Fast You Only Look Once System for Real-Time Embedded Object Detection in Video.” Journal of Computational Vision and Imaging Systems, vol. 3, no. 1, 2017, doi:10.15353/vsnl.v3i1.171.
Wojna, Zbigniew, et al. “Attention-Based Extraction of Structured Information from Street View Imagery.” 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, doi:10.1109/icdar.2017.143.
Llados, J., et al. “ICAR: Identity Card Automatic Reader.” Proceedings of Sixth International Conference on Document Analysis and Recognition, doi:10.1109/icdar.2001.953834.
Mikolajczyk K, Schmid C. Scale & affine invariant interest point detectors. International Journal on Computer Vision. 2004;60:63. doi: 10.1023/B:VISI.0000027790.02288.f2.
V. V. Arlazarov, K. Bulatov, T. Chernov and V. L. Arlazarov, “A dataset for identity documents analysis and recognition on mobile devices in video stream”, Comput. Opt., vol. 43, no. 5, pp. 818-824, 2019.
Y. S. Chernyshova, A. V. Sheshkus and V. V. Arlazarov, “Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images,” IEEE Access, vol. 8, pp. 32587-32600, 2020, doi: 10.1109/ACCESS.2020.2974051.
S. Gould, R. Fulton, D. Koller. Decomposing a Scene into Geometric and Semantically Consistent Regions. Proceedings International Conference on Computer Vision (ICCV), 2009.
Other Resources and Project Files