======Vital Monitor and ID Detection through Machine Vision for Improving EMS Communication Efficiency====== **Last updated: February 18, 2021** ======Summary====== Glasses that provide live feed are currently used by emergency medical technicians simply as a camera. Consequently, there is opportunity to use the live feed, along with AI, to streamline certain processes. In particular, this project seeks to use this live feed to: 1) allow remote doctors to gain visual access to the information available from medical devices (such as ultrasound) in order to facilitate physician guidance, and 2) automatically pull key information from standard documents like driver’s licenses (by holding up identification to the glasses) to fill in personal data in forms and databases so that medics may spend less time on paperwork and more time on treating patients. The first objective's basis is to use deep learning, along with computer vision, to detect and deform medical data screens such that a person may remotely view them head-on. The second objective's basis is to use deep learning to detect IDs, and then to use optical character recognition and generic text parsing to extract the personal information. Training sets include those of desired monitors, as well as those of desired IDs. By the end of the project, at minimum, this work hopes to develop the ID detection and extraction algorithm, and at maximum, this work hope to have built and incorporated these algorithms into the workflow of a current model of smart glasses. * **Students:** * Robert Huang: Undergraduate student, Department of Biomedical Engineering, Senior * **Mentor(s):** * Dr. Nick Dalesio: MD AirSupport Co-founder, Pediatric Anesthesiologist * Dr. Laeben Lester: MD AirSupport Co-founder, Cardiac Anesthesiologist and Emergency Physician * Dr. Mathias Unberath: Assistant Professor in the Department of Computer Science ======Background, Specific Aims, and Significance====== Smart glasses, while themselves are not a recent invention, are only just being implemented into healthcare environments. These glasses are most often used by surgeons or emergency medical workers to record and live-stream video feed from, respectively, surgical rooms and on-the-field operations [1] [2]. This feed is fed through a smart phone and then sent over the internet to remote healthcare workers, who can either provide their advice and assistance for the current patient, or, if they are trainees, learn from watching these healthcare procedures[1]. Currently, only the raw live footage is used. Therefore, there is opportunity in taking this feed and running it through artificial intelligence in order to streamline healthcare processes to ultimately increase efficiency of patient care. In particular, there are two objectives: * **1. Automatically record information into a digital medical note from IDs** * During a typical emergency medical response, 25-65% of the patient's care is spent documenting [4], with 1-10 minutes of it being spent on obtaining and recording 'simple' information of both the patient and health care workers, such as names, birth dates, ages, addresses, or ID numbers [5]. Therefore, streamlining patient identification by pulling key information from standard documents like driver’s licenses means that medics spend less time on paperwork and more time treating patients. Consequently, this project seeks to produce a deep learning and optical character recognition (OCR) algorithm that detects identification, extracts the relevant information, and writes directly into the necessary documents, such as an ePCR. * **2. Provide View of Data Monitors Remotely with AI and Computer Vision** * During an emergency response on-the-field, it is often necessary to obtain advice, assistance, and/or clearance from a remote physician to carry out a procedure. To improve care, the remote physician should have direct access to the information available from medical devices used by the medics, especially vitals signs monitors and ultrasound scanners. With visual confirmation of vitals, physicians are 2-3x more confident and give treatments 2-3x faster [2]. Furthermore, without clear view and recording of vitals, 43.4% of information can be lost, resulting in repeated procedures once the patient reaches the hospital [2]. Therefore, it is pertinent to obtain the visuals of these data monitors. Unfortunately, while certain data monitors that come packaged with smart glasses will provide direct access of the data to physicians, all in all there are poor interoperability standards [3]. However, the human-readable displays of these devices are always readable when in view of the smart glasses camera. Therefore, this project aims to take the video feed of these devices and incorporate machine vision to provide physicians direct visual access to data monitors on the field. Therefore, the main objectives in this project will be to provide direct visuals of on-the-field data monitors to remote health care workers and to extract and input information from identification into digital medical notes. Providing visuals will improve on-the-field outcomes by increasing confidence and treatment speeds, and by reducing repeated procedures. Extracting information from identification will improve on-the-field outcomes by moving time spent on documentation to patient care. Furthermore, this project's success will serve as an initial assessment into the viability of artificial intelligence in smart glasses in a health care environment. ======Deliverables====== * **Minimum:** (Starting by 2/19, Expected by 3/15) - Dataset of IDs and Medical Devices (2/19) - {{ :courses:456:2021:projects:456-2021-11:readme_driver_license_yolov3_dataset_documentation.pdf |Documentation of IDs}} - [[https://drive.google.com/drive/folders/1vHMG2MuN_UT3xDQW2D2M0xbONkEN8NgN?usp=sharing|IDs Dataset Drive]] - {{ :courses:456:2021:projects:456-2021-11:readme_medical_device_yolov3_dataset_documentation.pdf |Documentation of Devices}} - [[https://drive.google.com/drive/folders/1p2EYb_wphHtfJ2YzOVttm_TyCGae5OM2?usp=sharing|Devices Dataset Drive]] - Documentation of Overall Code and Performances of Each Section (3/15) - {{ :courses:456:2021:projects:456-2021-11:readme_drivers_license_extraction_code_documentation.pdf |Documentation of Code}} - [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|Code Drive]] * **Expected:** (Starting by 3/15, Expected by 3/30) - Documentation of Code and Performance of YOLO on Devices (3/20) - {{ :courses:456:2021:projects:456-2021-11:readme_vitals_extraction_code_documentation.docx.pdf |Documentation of Code}} - [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|Code Drive]] - Documentation of Color Threshold Edge Detechtion and Its Performance (3/30) - {{ :courses:456:2021:projects:456-2021-11:readme_vitals_extraction_code_documentation.docx.pdf |Documentation of Code}} - [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|Code Drive]] * **Maximum:** (Starting by 3/30, Expected by 5/1) - Documentation of Incorporation. (4/15) - {{ :courses:456:2021:projects:456-2021-11:readme_full_implementation_code_documentation.docx_1_.pdf |Documentation}} - Testing procedures for performance assessment. (4/20) - {{ :courses:456:2021:projects:456-2021-11:readme_full_implementation_code_documentation.docx_1_.pdf |Documentation}} - Results of the Tests. (5/1) - {{ :courses:456:2021:projects:456-2021-11:readme_full_implementation_code_documentation.docx_1_.pdf |Documentation}} Table: {{ :courses:456:2021:projects:456-2021-11:screen_shot_2021-05-07_at_5.05.40_am.png?600 |}} ======Technical Approach====== {{ :courses:456:2021:projects:456-2021-11:screen_shot_2021-05-07_at_1.11.16_am.png?600 |}} The ideal overall workflow of the smart glasses is shown above. Video feed of images of size 1280x960 will be sent through Wifi. From there, these images will be sent to the remote healthcare provider Zoom stream and then over the internet into the cloud, where the algorithm will process the video. For this initial project, video feed of the target objects will be recorded from the remote health care provider's Zoom stream and then the input into the algorithm. Prior to reaching the algorithms, the video feed will be preprocessed through resizing, normalization, and gaussian blur to ensure standardization and removal of noise. Then, for either algorithms, it will first check if its respective object is within the image. If the object is detected, the algorithms will process the images respectively. * If a data monitor is detected, the pixels of the monitor will be cropped out, and the image will be deformed such that it appears head-on. This image will then be sent back to the smartphone, and the image will be appended to the video feed and sent to the remote health care worker. At the same time, vitals information will be grabbed from the image and categorized. * If identification is detected, the algorithm will crop out and deform the identification, grab the text from the resulting image, and categorize the text into meaningful values (such as Name, Date of Birth, etc.). After the individual processing, the results from both algorithms will be input into a frame weighting algorithm. Every frame generated, while the object of interest is within view, will, depending on their weight, contribute to a running overall estimation of the textual results. For monitors, weights are based on how in-focus they are, and their time since detection (so that, if a heart beat changes, its estimation will change too). For identification, weights are based on how in-focus the frame is. Finally, these values will be sent to their respective destinations: ID information will be sent to a digital medical memo, while vitals information will be sent appended to the stream. == Objective 1: Detection and Extraction of Identification == There are three main steps in order to detect and extract information from identifications. - **Detect the presence of an ID.** In order to detect the presence of identification, a YOLO Deep Learning Framework will be constructed and trained on the driver license dataset. YOLO was chosen due to its aptitude in processing large data at many frames per second. Furthermore, according to the problem, it is only necessary to detect whether one object is present in the frame, which YOLO excels at. - **Detect the spatial orientation of an ID and deform.** To optimize optical character recognition, the identification's image will be determined using Hough Transform Edge Detection and deformed such that it appears head-on with no rotation. - **Read and categorize information on the ID.** Once an ID is detected, and after some preprocessing, Tesseract OCR in Python will be used to extract text from the license. Generic Text Parsing will then be used to sort the information into desired categories, such as name, birth date, etc. {{ :courses:456:2021:projects:456-2021-11:screen_shot_2021-03-18_at_2.17.33_pm.png?600 |}} For each code, the target accuracy will be 95%, with a decision speed of <8ms (to support a ~60 frames per second camera). Ultimately, a combined accuracy of 95% with a decision speed of <16ms is desired. == Objective 2: Detection and Deformation of Data Monitors == There are three main steps in order to obtain a head-on image of the data monitor. - **Detect the display of a vital signs monitor or ultrasound image.** In order to detect the presence of a medical monitor, the same YOLO Deep Learning Framework will be constructed and trained on the medical devices dataset. YOLO was chosen due to its aptitude in processing large data at many frames per second. Furthermore, according to the problem, it is only necessary to detect whether one object is present in the frame, which YOLO excels at. - **Deform monitor such that the monitor appears head-on.** To optimize optical character recognition, the identification's image will be determined using Hough Transform Edge Detection and deformed such that it appears head-on with no rotation. - **Read vitals signs data.** Once a monitor is detected, and after some preprocessing, Tesseract OCR in Python will be used to extract text from the license. The location of numbers will then be used to sort the information into desired categories, such as heart rate, blood pressure, and oxygen saturation. {{ :courses:456:2021:projects:456-2021-11:screen_shot_2021-04-30_at_1.11.38_pm.png?600 |}} For each code, the target accuracy will be 95%, with a decision speed of <8ms (to support a ~60 frames per second camera). Ultimately, a combined accuracy of 95% with a decision speed of <16ms is desired. == Frame Integration == In order to increase the accuracy of the program, reduce the effect of glare, and exploit the fact that multiple frames are used per object, the algorithm runs an integration algorithm that: - Takes the string outputs of the OCRs of multiple frames with respect to a information category (ex. Birth Date, Heart Rate). For instance, we could look at each frame’s Birth Date result. - Creates weights for each frame’s string output, depending on the focus/blurriness of the image of the string. For instance, we may have one blurry frame, one in-focus frame, and one in-between that each have a reported Birth Date. More weight would be given to the in-focus frame, some would be given to the one in-between, and the least weight would be given to the blurry frame. - Using the weights and the string outputs, we estimate a running actual value of the information. This estimation also considers if some frames have read too many or too few characters. In the case of the ID, once the object is no longer detected for some time, the running values in step 3 will be reported. In the case of the vitals monitors, since vitals information will change, frames that were taken in the past will gradually lose weight, and the running estimation of the vitals will continuously be reported. ======Dependencies====== Information regarding project dependencies is best organized in table form as shown below. {{ :courses:456:2021:projects:456-2021-11:screen_shot_2021-05-06_at_2.02.59_am.png?600 |}} ======Milestones and Status ====== - Milestone name: Obtain Datasets 8-) * Start Date: 2/18 * Planned Date: 2/19 * Expected Date: 2/19 * Status: **Completed** * [[https://drive.google.com/drive/folders/1vHMG2MuN_UT3xDQW2D2M0xbONkEN8NgN?usp=sharing|Licenses]] * [[https://drive.google.com/drive/folders/1p2EYb_wphHtfJ2YzOVttm_TyCGae5OM2?usp=sharing| Medical Monitors]] - Milestone name: Code YOLO 8-) * Start Date: 2/19 * Planned Date: 2/22 * Expected Date: 2/22 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Train YOLO on IDs and Record Results 8-) * Start Date: 2/22 * Planned Date: 2/25 * Expected Date: 2/25 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Code OCR 8-) * Start Date: 2/25 * Planned Date: 2/29 * Expected Date: 2/29 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Test OCR on Camera Images and Record Results 8-) * Planned Date: 3/1 * Expected Date: 3/1 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Code and Assess Deformation with Hough Transform 8-) * Start Date: 3/1 * Planned Date: 3/4 * Expected Date: 3/4 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Create Ground Truth Set for Generic Text Parsing 8-) * Planned Date: 3/4 * Expected Date: 3/5 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Code and Assess Generic Text Parsing and Record Results 8-) * Planned Date: 3/15 * Expected Date: 3/15 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Combine all Individual Codes to Read IDs 8-) * Planned Date: 3/15 * Expected Date: 3/15 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing|**Completed**]] - Milestone name: Duplicate YOLO and Train on Devices Dataset and Record Results 8-) * Planned Date: 3/17 * Expected Date: 3/20 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ|**Completed**]] - Milestone name: Generate Test Set for Blob Feature Deform Algorithm Testing * Planned Date: 3/24 * Expected Date: 3/25 * Status: **Removed** - Milestone name: Code Monitor Deformation Algorithm 8-) * Planned Date: 3/28 * Expected Date: 3/28 * Status: **[[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ|Completed]]** - Milestone name: Assess Monitor Deformation Algorithm 8-) * Planned Date: 3/30 * Expected Date: 3/30 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ|**Completed**]] - Milestone name: Investigate How to Write to ePCR * Planned Date: 4/5 * Expected Date: 4/5 * Status: **Removed** - Milestone name: Feed Video Back Into Smartphone * Planned Date: 4/10 * Expected Date: 4/10 * Status: **Removed** - Milestone name: Incorporate all Algorithms into Workflow 8-) * Planned Date: 4/14 * Expected Date: 4/15 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ|**Completed**]] - Milestone name: Generate Tests to Assess the Functionality of the Application 8-) * Planned Date: 4/18 * Expected Date: 4/20 * Status: [[https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ|**Completed**]] - Milestone name: Assess Overall Application Functionality * Planned Date: 4/26 * Expected Date: 5/1 * Status: **Removed** ======Reports and presentations====== * Project Plan * {{:courses:456:2021:projects:456-2021-11:cis_presentation_1.pdf]]|Project plan presentation}} * {{:courses:456:2021:projects:456-2021-11:cis2_project_proposal.pdf|Project plan proposal}} * Project Background Reading * See Bibliography below for links. * Project Checkpoint * {{ :courses:456:2021:projects:456-2021-11:copy_of_cis_presentation_1.pdf |Project checkpoint presentation}} * Paper Review * {{ :courses:456:2021:projects:456-2021-11:cis_paper_review.pdf |Paper Presentation Presentation}} * {{ :courses:456:2021:projects:456-2021-11:weighted-combination-of-per-frame-recognition-results-for-text-recognition-in-a-video-stream.pdf |Paper Paper}} * {{ :courses:456:2021:projects:456-2021-11:cis2_project_proposal_copy_.pdf |Paper Presentation Paper }} * Project Final Presentation * {{ :courses:456:2021:projects:456-2021-11:group_11_teaser_slides.pptx |PPTX of Poster Teaser}} * {{ :courses:456:2021:projects:456-2021-11:cis_2_project_poster_template_copy.pdf |PDF of Poster}} * Project Final Report * {{ :courses:456:2021:projects:456-2021-11:cis2_final_report_1_.pdf |Final Report}} * {{ :courses:456:2021:projects:456-2021-11:driver_license_yolov3_dataset_documentation.pdf |Driver License Dataset Documentation}} * {{ :courses:456:2021:projects:456-2021-11:medical_device_yolov3_dataset_documentation.pdf |Vitals Monitor Dataset Documentation}} ======Project Bibliography======= ===== Background and Motivation References ===== - Vuzix Corporation. Vuzix Corporation, 2020, VUZIX SMART GLASSES AT THE CHI MEI MEDICAL CENTER, TAIWAN, ss-usa.s3.amazonaws.com/c/308483104/media/21105f5a523ce21ce43889049199725/Vuzix-Chi-Mei-Medical-Case-Study-2020.pdf. - Schaer, et al. “Using Smart Glasses in Medical Emergency Situations, a Qualitative Pilot Study.” //2016 IEEE Wireless Health (WH)//, 2016, doi:10.1109/wh.2016.7764556. - GreenLight Medical. “Standardizing Medical Devices: Value Analysis.” //GreenLight Medical//, 17 Mar. 2020, www.greenlightmedical.com/standardizing-medical-devices-in-hospitals/. - Crawford S, Kushner I, Wells R, Monks S. “Electronic health record documentation times among emergency medicine trainees.” //Perspect Health Inf Manag//, 2019;16:1f. - Lester, Laeben. “Inquiry into EMS Documentation Times.” 14 Feb. 2021. ===== Reading Materials ===== * Redmon, Joseph, et al. “You Only Look Once: Unified, Real-Time Object Detection.” //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)//, 2016, doi:10.1109/cvpr.2016.91. * Shaifee, Mohammad Javad, et al. “Fast YOLO: A Fast You Only Look Once System for Real-Time Embedded Object Detection in Video.” //Journal of Computational Vision and Imaging Systems//, vol. 3, no. 1, 2017, doi:10.15353/vsnl.v3i1.171. * Wojna, Zbigniew, et al. “Attention-Based Extraction of Structured Information from Street View Imagery.” //2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)//, 2017, doi:10.1109/icdar.2017.143. * Llados, J., et al. “ICAR: Identity Card Automatic Reader.” //Proceedings of Sixth International Conference on Document Analysis and Recognition//, doi:10.1109/icdar.2001.953834. * Mikolajczyk K, Schmid C. Scale & affine invariant interest point detectors. //International Journal on Computer Vision//. 2004;60:63. doi: 10.1023/B:VISI.0000027790.02288.f2. * V. V. Arlazarov, K. Bulatov, T. Chernov and V. L. Arlazarov, "A dataset for identity documents analysis and recognition on mobile devices in video stream", //Comput. Opt.//, vol. 43, no. 5, pp. 818-824, 2019. * Y. S. Chernyshova, A. V. Sheshkus and V. V. Arlazarov, "Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images," //IEEE Access//, vol. 8, pp. 32587-32600, 2020, doi: 10.1109/ACCESS.2020.2974051. * S. Gould, R. Fulton, D. Koller. Decomposing a Scene into Geometric and Semantically Consistent Regions. Proceedings International Conference on Computer Vision (ICCV), 2009. ======Other Resources and Project Files====== Here give list of other project files (e.g., source code) associated with the project. If these are online give a link to an appropriate external repository or to uploaded media files under this name space (2021-11). Dataset for Driver License YOLOv3 Training: https://drive.google.com/drive/folders/1vHMG2MuN_UT3xDQW2D2M0xbONkEN8NgN?usp=sharing Dataset for Medical Monitor YOLOv3 Training: https://drive.google.com/drive/folders/1p2EYb_wphHtfJ2YzOVttm_TyCGae5OM2?usp=sharing YOLOv3 Implementation Code, Deformation Code, OCR and Text Parsing Code: https://drive.google.com/drive/folders/1GwAU4EtcKPWo7edGlc0yy3nBuqcTdaiZ?usp=sharing