Last updated: Date and time
The goal of this project is to evaluate human eye gaze behavior in response robot errors. We hope to understand how human eye gaze may be incorporated into a robot error detection system.
In the field of human-robot interaction (HRI), researchers focus on how humans interact with robots and the different conclusions we can draw upon these interactions. HRI is a growing field with the potential for many applications in medicine or health care. Robots could assist with kitting in surgical settings, help patients in hospitals by providing bedside assistance, or even interact with COVID-19 patients in the ICU to help minimize the risk of exposure for healthcare workers. Nevertheless, robots are fallible and can make mistakes when executing their actions.
As the potential for robotic applications in the real world begins to grow, it is crucial to study how humans react to robot errors to gain a better understanding of humans' interactions with robots. Studying a human’s physical and social responses to robot errors can inform researchers and roboticists of how not only robot errors may affect the trust humans place in a robot’s actions, but also of ways to make future predictions of robot errors. If a robotic system does not have a built-in error detection system in place, recognizing that an error has occurred from a human’s reaction is critical information; the feedback from such a detection system can allow a robot to minimize the severity of an error, allow a robot to correct its error, or even implement early stopping if the error is detected with enough notice.
A robot error detection system, created by Stiber, detects a robot error in real time by analyzing facial action units (AUs). Initial studies from Stiber show that people react very differently to physical mistakes executed by robots. Other previous works indicate that gaze can be a potential metric for detecting robot errors. Depending on the task at hand and the users present, there may be a variety of human reactions. See Figure 1. The overall workflow of this current system consists of using two cameras aimed at a human’s face, determining the current AUs present, detecting if an error has occurred based on these AUs, and logging the result with the corresponding timestamps. The ML algorithm that detects the error has been trained on a set of data of 19 participants reacting to robotic errors. The data collected was manually coded frame by frame in Microsoft PSI by two independent coders.
The goal of this project is to introduce human eye gaze as a potential metric of robot error detection to an automatic robot error detection system created by Maia Stiber. The current AU detection system will still function correctly if a user wears glasses. Pupil Labs has released a mobile gaze tracker in the form of a pair of eyeglasses for a human to wear. We intend to have users wear these glasses while interacting with a robot as a robot error occurs to gain a better understanding of their gaze patterns. We hope to learn whether gaze can be an informative metric in robot error detection scenarios. Based on different types of errors, we want to investigate if human eye gaze is consistent; if there is a noticeable pattern that we could use down the line. Furthermore, we are interested in discovering if fixation points will appear that can inform us later in future scenarios. Thus, the aims of this project consist of collecting data to understand human gaze reactions associated with physical robotic error, adding the gaze tracker data as an additional component to the Microsoft PSI system pipeline, and creating a ML algorithm with the data collected so that it will automatically detect the robot error as it occurs.
We plan to integrate the gaze tracking component (shown in green) to the existing Microsoft PSI workflow. We hope to run this independently first to determine how informative the gaze fixations are without the influence of AUs. Some gaze fixation data from the following user study may need to be manually coded; if necessary, two independent coders will be used.
We created an exploratory study design to analyze conceptual and physical robot errors during a human-robot collaborative packing task. We hoped to gather data on where people look during the task and where they look when the robot makes a mistake. While not a true factorial design, the breakdown of our study design conditions is shown in the figure below. We had four arrangements of two conditions, with a different type of robot error in each arrangement.
A total of 6 participants were convenience sampled for our pilot study. They were tasked with completing a packing task with the Kinova robot. The participant used voice commands to direct the robot to pack certain items in its designated box. There was a planned robot error that the participant was not aware of, allowing for a scenario that could generate a genuine human reaction to an unexpected event. The data was recorded via the Pupil Labs invisible gaze tracker eyeglasses, a microphone, and a verbal questionnaire. The invisible gaze tracker is automatically calibrated. The verbal questionnaire included questions about if the participants witnessed an error, what the error was, and its corresponding severity.
The image below depicts the setup of the space where the user will complete the packing task in collaboration with the robot. The items for packing can be seen to the left of the robot, and the boxes used can be seen to the right of the robot. The list of items was given via the tablet on the table.
The data was recorded via the Pupil Labs invisible gaze tracker eyeglasses and a microphone. The invisible gaze tracker was automatically calibrated but recalibrated to each participants' gaze at the beginning of the study. While no paper questionnaire was used, the experimenters verbally asked the participants after the study if they witnessed an error, what the error was, and its corresponding severity. The recorded data was downloaded from the Pupil Labs cloud software and manually annotated using the datavyu software, as shown in the image below.
The coder used the fixation circle provided by the pupil labs software to determine where the participant was looking. The coder then identified what object they were looking at during a timestamp range and labeled the timestamps accordingly to that object. They then matched up the corresponding time stamps to the fixation IDs with those associated timestamps in the data provided by pupil labs.
While debriefing our participants, we learned that they did not always perceive some of the robot errors as an error. During the wrong object condition, they did not perceive it as a robot error that the wrong object was selected. It seemed to be more of an inherent error related to the study. Also, the error of the box not being able to close due to the selected object being too large once again was not perceived as a robot error, but rather an error by whoever requested those specific items to be packed for the box of that size.
In analyzing our manually coded video data, we confirmed that gaze appears to be goal-oriented, as the literature indicates. Furthermore, gaze appeared to linger in an area where an error occurred (back at the object if the error involved missing the object, or at the gripper if the object was dropped). There also appeared to be occasional gaze shifts during the study. These mostly consisted of shifts back to the set of instructions or the tablet after an error occurred, or between the object involved in the error and the robot/gripper.
Since this analysis of results is qualitative, we decided our study could be improved if we implemented a more quantitative metric for gaze analysis to better categorize the gaze fixations. Thus, we began looking into how to measure gaze velocity during gaze shifts in real-time for our current workflow which only consisted of 2D gaze data with respect to the camera.
In order to measure gaze velocity, we needed to perform 3D localization of gaze within the world coordinate system. Thus, accurate detection of AR markers is essential for this process. However, we experienced several technical difficulties while learning how to measure gaze fixation velocity from the pupil labs gaze tracker. While first using AprilTags, we learned that they were not always detected during post-processing, and thus switched over to ArUco Markers. While these resulted in better detection than the AprilTags, we learned that there were still some issues with detecting the ArUco markers despite their size. We also encountered some inconsistencies in our transformation matrices for the markers/tags across images while using individual markers/tags. Thus, we transitioned to using an ArUco Grid Board detection rather than an individual ArUco marker detection. We are currently in the process of setting up the ArUco Grid Board in our study layout for automatic gaze velocity detection across fixations and hope to conduct our future user studies using this technical approach.
\subsection{Limitations} In order to measure gaze velocity, we needed to perform 3D localization of gaze within the world coordinate system. Thus, accurate detection of AR markers is essential for this process. However, we experienced several technical difficulties while learning how to measure gaze fixation velocity from the pupil labs gaze tracker. While first using AprilTags, we learned that they were not always detected during post-processing, and thus switched over to ArUco Markers. While these resulted in better detection than the AprilTags, we learned that there were still some issues with detecting the ArUco markers despite their size. We also encountered some inconsistencies in our transformation matrices for the markers/tags across images while using individual markers/tags. Thus, we transitioned to using an ArUco Grid Board detection rather than an individual ArUco marker detection. We are currently in the process of setting up the ArUco Grid Board in our study layout for automatic gaze velocity detection across fixations, and hope to conduct our future user studies using this technical approach.
While our qualitative analysis provided some insightful results, we realized it was essential to have a quantitative metric for gaze fixations. Due to the video coding being time-consuming (at least 3 hours of coding per participant), we determined it was necessary to have a quantitative metric that could automatically provide additional information surrounding the fixation data in order to increase the scale of our user study. We hoped to soon run the gaze component independently in Microsoft PSI first to determine how informative the gaze fixations are without the influence of AUs. Once our automatic gaze velocity measure is working and it is integrated into Microsoft PSI, we can consider if implementing an ML algorithm to automatically detect the error would provide accurate results.
Our next steps, apart from finishing the physical setup for our new ArUco approach, involve finalizing our user study with errors that can be perceived more clearly and explicitly as errors by the participants (fixing the confusing objects by the participants and clarifying that the robot is a supposed packing expert). We also intend to include a full written questionnaire that will include questions regarding basic demographic information, and if the participants witnessed an error and its corresponding severity. We then hope to run a full-scale user study.