Human-Robot Interaction and Collaboration (HRI-C) Utilizing Top-View RGB-D Camera System

In this study, a smart and affordable system that utilizes an RGB-D camera to measure the exact position of an operator with respect to an adjacent robotic manipulator was developed. This developed technology was implemented in a simulated human operation in an automated manufacturing robot to achieve two goals; enhancing the safety measures around the robot by adding an affordable smart system for human detection and robot control and developing a system that will allow the between the human-robot collaboration to finish a predefined task. The system utilized an Xbox Kinect V2 sensor/camera and Scorbot ER-V Plus to model and mimics the selected applications. To achieve these goals, a geometric model for the Scorbot and Xbox Kinect V2 was developed, a robotics joint calibration was applied, an algorithm of background segmentation was utilized to detect the operator and a dynamic binary mask for the robot was implemented, and the efficiency of both systems based on the response time and localization error was analyzed. The first application of the Add-on Safety Device aims to monitor the working-space and control the robot to avoid any collisions when an operator enters or gets closer. This application will reduced and remove physical barriers around the robots, expand the physical work area, reduce the proximity limitations, and enhance the human-robots interaction (HRI) in an industrial environment while sustaining a low cost. The system was able to respond to human intrusion to prevent any collision within 500 ms on average, and it was found that the system’s bottleneck was PC and robot inter-communication speed. The second application was developing a successful collaborative scenario between a robot and a human operator, where a robot will deposit an object on the operator’s hand, mimicking a real-life human-robot collaboration (HRC) tasks. The system was able to detect the operator’s hand and it’s location then command the robot to place an object on the hand, the system was able to place the object within a mean error of 2.4 cm, and the limitation of this system was the internal variables and data transmitting speed between the robot controller and main computer. These results are encouraging and ongoing work aims to experiment with different operations and implement gesture detection in real-time collaboration tasks while keeping the human operator safe and predicting their behavior. Keywords—Robotics manipulator; robot end-effector; computer vision; human-robot interaction (HRI); human-robot collaboration (HRC); robotics safety; scorbot; Kinect; RGB camera; industrial system modeling; manufacturing systems design


I. INTRODUCTION
The demands and trends of the current market require enhanced manufacturing systems with reduced delivery times, mass production, and product customization, which impose a greater need for system flexibility and adaptability. Collaboration between humans and robots is considered a promising technique to increase productivity and decrease the cost of production by combining both the robot's fast repetition and high production capabilities, and a human operator's ability to judge, react and plan. Collaborative robots (Co-bots) represent an evolution that can resolve a few challenges presented in the manufacturing and assembly environments. Co-bots allow physical interaction with humans within the work-space. Matheson and his team [1] described different ways a robot and an operator can work together, (1) Co-existence: the operator and robot are in the same work-space, but no interaction, (2) Synchronized: the operator and robots work within the same work-space, but at different times, (3) Cooperation: the operator and robots work together in the same work-space but have independent tasks, (4) Collaboration: the operator and robots work together to complete an assigned task. In a collaboration environment, it is important to note that any action will have immediate consequences for the other entity.
According to the International Standard ISO 10218 (1 and 2), and more extensively in Technical Specification ISO/TS 15066:2016, [2][3][4][5] four classes of safety requirements for collaborative robots are required: • Supervised stop: The movement of the robot is stopped before an operator enters the collaborative work-space to interact with the robot and complete the desired task.
• Manual guide: The operator uses a manually operated device located on or near the robot's end-effect to transmit movement commands to the robot's system.
• Monitoring speed and separation: The robot and operator can move within the collaborative work-space simultaneously. The reduction of risk is achieved by always maintaining a distant separation between the operator and robot.
• Power and force limitation: Where the system must be designed to adequately reduce the risk for an operator by not exceeding the threshold as defined by the risk assessment.
Additionally, it is important to note that collaborative methods can be adopted even when using traditional robots. However, this requires the use of several and expensive safety devices such as laser sensors or visual systems. For these reasons, the team started to work on evaluating and developing affordable and accurate sensory systems that can measure the distance between the operator and the robot. This study utilizes a lowcost RGBD camera to measure the position of an operator with respect to the robotic manipulator. While this configuration of specific measurement was utilized to track human beings [6], to our knowledge and based on the conducted literature review, it was not previously studied in the context of human-robot interaction and collaboration. Some researcher [7][8][9][10][11][12][13][14][15] analyzed the literature review and found that most of the RGB-D use was meant for human identification and tracking, human activity recognition, human behavior analysis for shopping and security purposes, intelligent health care systems, detecting defects in produce and animal recognition, also a data-based had been developed to summarize all these uses and algorithms. It was proven that the top-view RGB-D cameras can be utilized successfully in several applications where behaviors and interactions can be analyzed and they are very attractive due to their affordability and the sufficient information extracted from the provided pictures or live feed.
The paper is organized as follows: Section II is a literature review about robotics and their application in the industrial system, robotics safety regulations and standards, and collaboration and interaction between human and robots, Section III is a description of the robot and sensory system developed in this research, Section IV describes the followed methodology including the geometric model of the robot-sensor system, the process of calibration, and the detection of the operator, while in Section V, we evaluate the two methods of interaction between human and robot and reporting our findings and Section VI concludes the paper and describes the future plan.

II. LITERATURE REVIEW
The world has come to a point of many technological innovations where the presence and use of robotics are growing. Robots had been presented in manufacturing, hospitals, personal-use robots, service robots, etc. These robots aid the productivity of several tasks depending on their surrounding environment. In general, robotics could be used in many different settings where their intended purpose is to aid on a specific goal, complete a set of tasks that is difficult/tedious for a human to achieve, or simply make processes faster. Expedite services in systems such as Industrial/manufacturing, Health, or personal use, is a great enhancement to all current systems as their efficiency will increase. Therefore, safety standards are essentially required and must be implemented to achieve a safe operation of robotics in certain areas and or near human beings. Traditional robots have been separated from humans in workplaces trying to avoid any risk, injuries, or fatal incidents. This separation was implemented in the form of physical barricades or shut off robots whenever a human is present. However, technological improvements have shown great results where robotics no longer need to be separated and robots can be collaborative by working closely with humans, by developing new safety standards to design collaborative robotics to ensure humans' safety.
The Existence of robots in industrial settings enhances the production to meet the required demands while keeping the cost low. Robotics is considered as a flexible cell within a manufacturing line as they can be programmed to conduct different processes when needed. Safety is of utmost priority when designing robots and placing them in such environments and because of the rapid rise of robotics presence, safety standards are to be frequently developed and improved to meet the new technology trends.
Few researchers and their teams [3,[16][17][18][19][20][21] discussed different industrial environments, the safety approaches that should be followed, and some real-life case studies. it was showed that lead designers must develop and evaluate safe, human-centered, ergonomics, and efficient collaborative assembly workstations, where the operator's feedback was provided in regards to occupational health and safety. Additionally, the Human Industrial Robot Collaboration (HIRC) workstation design process was evaluated through computerbased simulations based on the performance and safety characteristics such as Ergonomics, Operation time, operational costs, Maximum contact forces, and maximum energy density, this research illustrated how difficult is to evaluate safety and performance characteristics due to lack of physical workstations.
Parigi-Polverini [18], developed a new safety assessment tool "Kinetostatic Safety Field" which identifies sources of danger which could be an obstacle, human body part, or another robot link. The main advantage of this tool is the realtime applications and real-time collision avoidance with the use of a reactive control strategy. Another researcher suggested that robotics no longer need to be separated from humans, as robots can enforce safety by proposing a kinematic control strategy and maintain the robots' max level of productivity by reduced when humans are present in a working area.
Incorporating the industrial regulations such as the International Standard ISO 10218, Technical Specification ISO/TS 15066:2016, the American ANSI/RIA R15.06, the European EN 775 ISO 10218, and the national standards Spanish Association of Normalization and Certification, is the main procedure that is followed by manufacturing systems. These standards are outdated and have not been improved in the last five years, therefore some researchers introduced new concepts to cover techniques for estimation and evaluation of injuries focusing on various areas of the human body and the importance of developing new devices to detect impact, and minimize the human-robot impact.
Risk assessment is a crucial tool that must be used to enhance safety for both humans and robot systems. from literature review [22][23][24][25][26][27] discussed some history of operators and robots and how industrial robots have been evolved, differences between collaborative and non-collaborative robot cell safeguarding, voluntary industry consensus standards, and the risk assessment. Risk assessment should include quantitative head injury index for service robots as mechanical risk and incidents such as robot throws or drops and trapping and crushing are more to happen with such robots. Another proposed method to address safety in the human-Robot collaboration setting is Cooperative Collision Avoidance in dynamic environments [25]. This method computes a collision-free local motion for a short time horizon, which restricts the actuator motion but allows a smooth and safe control. Modeling human behavior and errors is another proposed method [28]. this formal verification methodology was developed to analyze the safety of collaborative robotic applications with a rich nondeterministic formal model of operator behaviors that captures the hazardous situations, which allows safety engineers to refine their designs until all plausible erroneous behaviors are considered and mitigated.
Other researchers [29][30][31] discussed different aspects of robotics design and their relationship to their safety ranking. Robot design principles should include robustness, fast reaction time, context awareness, energy, and power limitations. These principles will facilitate the following features as speech processing, vision processing, and robot control that also follow guidelines that will allow the robot to recognize speech, gestures, and correlations which eventually learns in the long run while also keeping humans safe. Predicting human behaviors, collision avoidance, collision reduction by data analysis, collisions reduction by design, perceptions affecting design, boundaries, sensors, adaptability to the surrounding environment, path planning, statistical probability, and robotic decision making are some of the safeguards that can be implemented in a high speeds and payload levels industrial settings.

III. SYSTEM DETAILS AND SETUP
The system is developed based on available educational and off-the-shelf components to model real-life robotics tasks, which are explained below.

A. Robotic Manipulator
The robotic manipulator selected for this project was the Scorbot ER-V Plus show in Fig. 1. This robot has five degrees of freedom, the Fig. 2 shows the length of the links and the degree of rotation and operation range determining the work-space of the robot. The direct kinematics of this robot determine the pose of tool {T } with respect to the base {B} is resolved using equation (2)   The robot is controlled using ACL, which is a language that can be used as a multitask robotic programming environment [ 33,34]. MATLAB functions were created to establish bidirectional serial communication with the Scorbot controller. Both systems (robot and Computer Vision systems) are running in MATLAB, and give ACL commands which allowed the robot to execute specific tasks, read and load pose data into the controller, and modified the manipulator's movement speed. Fig. 3 also shows the flow of exchanging information between system components.

B. Vision Sensory System
The Kinect V2 sensor (RGB-D sensor) is composed of two cameras, the RGB and an infrared IR camera. The IR camera can be utilized to obtain depth maps, with a field of vision (70 • horizontal and 60 • vertical). The Kinect camera is capable of running at a rate of (30 f ps) at a resolution of (512X424 pixels) and the operational range for the IR camera is between (0.5 m to 4.5 m). The sensor operates based on the time-of-flight principle [35]. The depth data obtained in each pixel corresponds to the Z i coordinate measured on the optical axis of the IR camera as illustrated in Fig. 4.

A. Kinect-Robot Modeling and Calibrating
The objective of this work is to provide a system that allows the robot to sense its surroundings and act accordingly. It is necessary to represent the three-dimensional space around the manipulator. There are three important frames [36], the center base of the robot {B}, the robot's tool {T }, and the origin of the physical model of the Kinect's depth camera {K} as shown in Fig. 3. The robot's task was defined in Cartesian coordinates referred to the base frame {B}.
For the geometric description of the system, a homogeneous coordinates based on the knowledge gained from [36] was used. The coordinates of a point p with respect to the frame {K} is written as K p = (X K , Y K , Z K , 1) T . To calculate the coordinates with respect to frame {B} the expression B p = B K T · K p is used. The homogeneous matrix T is given by equation (1) where B K R is a rotation matrix that describes the orientation of the frame {K} with respect to the base {B}, and B t K corresponds to the coordinates of origin {K} in frame {B}.
1) Robot Geometric Model: In a previous project [37], the direct and inverse kinematics of the Scorbot ER-V Plus were studied. The results presented allowed the calculating of position ( B p = (X B , Y B , Z B , 1) T ) and orientation (α = yaw, β = pitch, γ = roll) of the tool {T } in a function of the five rotational angles of the robot (θ 1 , θ 2 , θ 3 , θ 4 , θ 5 ). Equations (2) and (3) represent the results while Fig. 1 represents the parameters of the robot. www.ijacsa.thesai.org Note that once θ 1 is defined, the robot is contained in a plane that coincides with the Z-axis of the first articulation. This observation is important because it allows the construction of a binary mask that allows the Scorbot detection when moving.
2) RGB-D geometric model: The camera model used was a pinhole type. The hypothesis is as follow: the origin of the frame {K} coincides in XY with the center of the image (c x , c y ), and the focal distance f is the same in X as in XY . Spherical coordinates were employed to map the coordinates (u i , v i ) and data Z i with coordinates (X i , Y i , Z i , 1) T {K} as shown in equation (4)-(6).
Thus, the three-dimensional point p i has coordinates in frames {K} and {B} given by (7)-(8). .
3) Geometric calibration of Kinect-Scorbot system: The geometric calibration of the robot was accomplished in [38]. The only intrinsic parameter that was considered unknown in the Kinect was the focal distance of f . Also, the extrinsic parameter that represents the pose of the camera {K} with respect to the robot {B} needed to be calibrated.
The experiment had the robot take a wooden cube using its claw in a way that allows the center mass of the block to be aligned with the manipulator's tool frame. The Kinect was placed on the roof of the lab as seen in Fig. 3. Each measurement is represented by the index i, making a total of N = 22. As a pattern for the adjustment of the camera model, the coordinates were obtained from the robot driver ( B p exp,i ) through serial port communication. Given the parameters (f, B K T ) coordinates can be predicted B p pred,i and calculate a prediction error, defined by equation (9).
To simplify the optimization problem, it was assumed that the optical axis of the Kinect camera was perpendicular to the XY plane of the plot {B}, and the Z axes of the two frames were parallel and opposite to each other. The optimized parameters resulted in the following values: the focal length, and the position of the camera {K} with respect to the frame {B}: B t K .
First, f was adjusted so that equation (9) is minimal, starting with a B t K = (0, 0, 2.40) T m. This resulted in a perfect alignment in XY of the camera and the robot in Fig.  3 with a value of Z i taken from a depth image. With the focal distance optimized, the 22 points are re-projected and a mean error for each axis was computed. These deviations were introduced as corrections in B t K to reduce the mean reprojection error. With the adjusted transformation, a new focal point was computed. With the mean error being negligible for each axis, the parametric adjustment at that point was finalized.
The optimized focal distance resulted in 362.8 pixels.

B. Human Detection using Background-Foreground Technique
For human or foreign object detection in the scene, a Background-Foreground (B-F) technique [39] was used. 100 frames of depth images were captured within 10 sec and used to form images of the background making sure the scene stayed static.
As previously mentioned, rotating and fixed rectangular binary masks were generated to avoid the detection of the robot's movement by the foreground. A captured image was printed on the screen, and the mouse determined the vertices of the two rectangles and a fixed point for one of them to rotate. The non-rotating rectangle was used to hide the base of the robot from the foreground. The rotating rectangle did the same with the extension of the maximum possible arm. The fixed point corresponded approximately with the robot axis. The angle of rotation of this mask was computed by reading the status of the encoders of the robot, and applying direct kinematics as in (2) so that it could follow the movement of the plane occupied by the robot Fig. 2.
Human detection scenarios differ slightly for the applications selected, and they are described below.
1) Collision prevention: Three areas were determined to be evaluated in the depth images, which represent the severity of the collision. Starting from the robot base, and utilizing the Kinect sensor calibration, two sections were established to determine the red and yellow areas in the images. The sections were 660 mm which is 50 mm more than the maximum reach of the robot for the red zone and 1150 mm for the yellow zone. The green zone was considered outside the radius of the yellow zone. The behavior of the robot was modeled as a machine of finite states. There were (1) Green is a normal speed, (2) Yellow is a medium speed, (3) Red is minimum/very slow speed, which is illustrated in Fig. 5. At the beginning of each iteration, a depth image will be captured, and the value of the robot's base encoder will be gathered. Then a binary mask will be added to the foreground where the captured image was subtracted from the background then a binary mask was applied to hide the robot. An opening was performed to the resulting image with a 5 pixels radius kernel disk to remove the noise. Finally, a 50 mm depth threshold was used to binarize the image.
The B-F results combined with the areas of interest to determine the behavior of the robot's speed. If the foreground binary area within the red zone exceeded 100 pixels, the state turns red. If the area is not exceeding 100 pixels in the red zone but reaching at least 500 pixels in the yellow zone, then the state turns yellow. If none of the above conditions are met, the state updates to green.
2) Collaborative Scenario: Each iteration started with the robot taking an object located at a pre-established location and a request that will appear on the user screen to guide the operator to position her/his hand where she/he wanted to receive the object from the robot. Subsequently, a binary mask will be generated for the foreground. The background image will be subtracted from the captured image and applied a 15 mm depth threshold to make it binary. It was decided to analyze a 200 pixels radius to avoid dealing with peripheral noise. The radius was equivalent to the calibration at 1.32 m at the height of the workbench. The foreground was cleaned by imposing an opening using a disk of 4 pixels radius like a kernel. After closing was imposed with a kernel disk with a radius of 3 pixels to remove any imperfections remaining in the blobs. The blobs with an area smaller than 800 pixels were discarded. A binary mask was generated with the remaining blobs. Two zones were separated by heights zones in the resulting blobs using Otsu's method [40].
Given the characteristics of the system, the system will be able to identify the head, torso, and arms. The portion of blobs that had the torso and arms inside the robot's work area will be isolated. The pixels within the radius of the work-space were filtered from the processed mask. Finally, the coordinate (u i , v i ) of the pixel corresponding to the center of the hand was found. To find the center of the palm, a skeletonized binary image was obtained [39]. The team looked for the radius with the maximum circumference in pixels that could fit the binary mask. The coordinate of the pixel with the largest radius was preserved and the depth value Zi assigned was that of greater repetition within the maximum circumference that could fit in the mask with center (u i , v i ), and applied to the originally captured image.
The coordinates (u i , v i ) and Zi obtained were transformed. First to coordinates (X i , Y i , Z i , 1) T {K} using equations (4)- (6), and then to (X i , Y i , Z i , 1) T {B} , by using equation (10). At this position, a height increase, Z B , of 7 cm was made to prevent collisions with the operator's hand. Then adjusted height was entered automatically by serial communication to an internal variable of the robot controller. This allowed the end effector to deposit the object at the desired position. As a result, the collaborative job will be completed as shown in Fig (6).

V. RESULTS
This work is meant to develop an affordable prototype that can be added to industrial robots to increase robot safety, decrease the barriers between human operators and robots, and facilitate a collaboration system between them. The designed system addressed these goals as follow:

A. Collision Prevention System
The detection of an operator in the pre-established zones is exemplified in Fig. 5. To test the operation of the system, 10 tests were made in areas of interest, where a human operator will introduce his/her hand into the robot surroundings. The system was used to detect the operator when first entering the yellow zone (Operator's leg) where the system forced the robot to move at half of its original operation speed. Then the operator introduced his/her hands within the red zone to force the robot to slow significantly to almost not moving. These actions were captured by the camera and highlighted by the associated colors shown in Fig. 5, which displays the areas of interest corresponds to the detection of the pixels as part of the foreground and the outline of the ScorBot is shown inside the binary mask.
In all test cases, the system behaved correctly as intended, by identify the existence of the human operator and change the robot speed according to the distance between the human and the robot. The robot response time to change the endeffector speed was recorded and the mean system update time was 0.45 s with a standard deviation of 0.30 s, which is a significantly fast response.
Modify the speed of the robot was accomplished through an ACL command called 'CLRBUF', which was introduced as an instant stop to the robot followed by an immediately a new movement speed was set, and a new trajectory was generated from the current pose until the next corresponding task resuming the job. the team implemented other methods to change the speed by changing the task priorities on the robot or send speed change commands during a test but all failed since these commands could only be utilized after completing the previous tasks.

B. Collaborative Scenario
Collaboration between the robot and human operator was simulated by having the automated system detect the operator's hand and estimate the spatial coordinate of the center of the hand then command the robot will move to pick up an object from a predefined location then place it on the operator's hand, this is illustrated in Fig. 6. The blue region represents the Scorbot work-space, the orange lines show the skeletonization of the operator's arm while the yellow area shows the mask's maximum circumference where the robot should place the object on.
Experiments with 20 different hand positions within the robot's work-space were conducted, the system gave satisfactory results, where the job was done correctly, and placement coordinates mean error was 2.4 cm.

VI. CONCLUSION
This work showed that an overhead low-cost RGB-D camera can measure the position of an operator with respect to a robotic manipulator, and thus improve human-robot interaction safety and increase the collaboration opportunities through 3D sensing of the robot surrounding environment. This proposed system will allow manufacturing and industrial companies to update their existing robotics and automation system by adding an affordable add-on safety and collaboration device without influencing their manufacturing lines with a lower cost of investment.
In the collision prevention scenario, the captured video analysis proved that the reaction times of the system was 500ms and the system's bottleneck was the PC and robot inter-communication which required relatively longer times and added pauses and checkpoints to make sure it is reliable.
In the collaborative scenario, detecting the operator's hand and have the robot placing an object was achieved, and similar to the other scenario, the internal variables, and date transmitting speed between the robot controller and the main computer was the main factor to defined the speed of the system.
The team is working on a few improvements to the proposed system including enhancing the B-F algorithm internal variables and date, exploring the application of dynamics methods that can assimilate changes in the scene on slower times scales. Also, an RGB camera system development is being conducted to detect a particular color or clothing as an activator for robot tasks. Additionally, more sophisticated moving object classification techniques such as convolution neural networks will be explored.