Alarm System using Image Processing to Prevent a Patient with Nasogastric Tube Feeding from Removing Tube

—A removal nasogastric (NG) tube of a patient is a critical problem especially the patients resist swallowing. To solve this problem, the conventional approach using a personal caretaker is a time-consuming and intense focus on the patient’s hands. However, visual technology can decrease the intense focus of a personal caretaker by using image processing to evaluate the patient’s gesture and warn the personal caretaker when the patient acts in a risky pose. This work illustrates the feasible solution to prevent a patient with nasogastric tube feeding on removing tube by applied the face detection using Haar and Fiducial markers which consist of color marker and ArUco marker. An image processing can evaluate the patient’s gesture and warn the personal caretaker when the subject acts the risky pose. A Raspberry Pi 3 Model B and a Camera module with Python and Open CV package are applied to detect and evaluate the warning gestures with 648 measurements. Six detection methods to evaluate and warn when the patient on bed tries to remove a nasal feeding tube were performed and the results were analyzed. The results show that the detection method using ArUco marker is found to be a good candidate for the alarm system preventing nasogastric (NG) tube removal of a patient.


I. INTRODUCTION
The insertion of nasogastric (NG) tube feeding into the gastrointestinal tract (GI) is very irritating, high risk and it heavily damages the GI tract if the patients resist to swallow [1], [2], [3], [4], [5]. It is typical in the patients with dementia to resist and remove the NG tube. Therefore, a common approach to prevent the NG tube removal by the patients is to tie their hands with the bed or wear hand mittens [6], [7], [8], [9]. The mentioned methods leave many problems like bruises on the patient's wrists, wounds on the patient's finger and palms, or anchylose. The preferred interventions to prevent tube removal or dislodgement are taping the NG tube to the face, application of hand mittens and insertion of nasal loop systems (bridle) [10], [11], [12]. Unfortunately, these are the best practices for keeping NG tube in place. At present these interventions are controversial among patient and relative feeling because of the GI damage, hand restrain, skin exacerbation, diminishing autonomy and justice [13].
According to the FOOD Trials, a family of three multicentre international randomized controlled trials to address feeding issues for acute stroke patients, the results indicate that stroke patients frequently pulled out their tubes up to 18 times per a patient [14], [15]. The tube removal interrupts their nutrition, hydration or medication. Moreover, dislodging the tube may result in feed fluid entering respiratory tract and causing respiratory tract infections [16]. A long-term study on the accidental removal of endotracheal and nasogastric tubes shows that the patients accidentally remove tubes 13.1 and 41.0 times per 1000 days, respectively [17]. In the aspect of patient numbers, the accidental removal rates are 38 patients out of 289 endotracheal patients and 151 patients out of 368 NG patients. There are many factors involving the incidental removal rate such as age, intervention methods, tube placement position and consciousness level of patients [18]. According to the review, the most used and taught securing technique is using adhesive tape because of its feasibility, convenience and fairly comfortable for the patients. The biggest drawback is the high risk of tube dislodgement or removal and correlates to complications [19]. It is obvious that there is no health system assisting endotracheal or NG patients over conventional interventions.
Recently, Internet of Things (IoT) has been deployed widely in medication, rehabilitation, and intensive caring. There are many applications for autonomous patient monitoring using image processing to evaluate patient status such as postures, facial action units and expressions, head pose variation, and extremity movements [20]. The detection relies mainly on object recognition, objection position and ambient evaluation using sensors. Another example on vision assisted healthcare system is fall detection. There are many techniques used for fall detection like multi-camera systems, monocular systems, infrared and range sensor based systems, and bioinspired vision sensor based systems [21], [22], [23], [24]. For the monocular camera systems, the camera is mounted on the ceiling or the wall. The 3D space ellipsoid is projected into 2D image plane and the object is tracked with markers [25], [26]. The camera calibration and inverse perspective mapping are performed for the area of interest. Adopting the vision technology, the risky pose can be detected to warn the personal caretaker.
In order to evaluate the risky pose of the subjects, there are many available technologies such as gesture recognition [27], Bluetooth sensors [28], RFID sensors [29], gyroscope [30], [31], face detection [32], [33], [34], and fiducial markers [35], [36]. Each technology has advantages and drawbacks depending on the application. The gesture recognition uses a stereo camera to collect 3D images. Its accuracy and sensitivity are very high as well as the cost and calculation power. Moreover, the device size is relatively large compared to the patient. Therefore, the installation requires huge space. The Bluetooth and RFID sensors work on calculating the distance between two sensors on the patient's body. They are fast, sensitive and accurate but the sensors carry a certain area and load, approximately 10 cm 2 and 50 g. Both sensor size and weight mostly rely on the battery. In contrast, the gyroscope gives the exact 3D position of the patient. A gyroscope sensor works on the principle of conservation of angular momentum. It works by preserving the angular momentum. Therefore, the position resolution depends on the change in angular momentum and the calibration. In contrast, the image processing method works on an acquired image and algorithms to evaluate the patient's pose. However, the image is sensitive to light condition, image resolution, perspective and background colors. It provides fast, portable and cheap solution.
A face detection using Haar cascades was proposed by Paul Viola and Michael Jones [32], [33]. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images. The algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. The image then tested with the trained classifier to give the result. This method requires the full subject face including two eyes, one nose and one mouth. Moreover, it is sensitive the subject face angle to the camera.
Using fiducial markers are an alternative approach to detect the patient's pose. The markers are small and light. They can be attached on the subject face and hands. The color on images can be easily detected and evaluated. The process is to set a certain range of selected color and transform the rest into black. Unfortunately, the detection of color heavily depends on light condition and the background color. Recently, Sergio Garrido and Rafael Muñoz proposed a set of marker library called ArUco to estimate the pose [35], [37], [38]. An ArUco marker is a synthetic square marker composed of black border and inner white binary matrix which determines its identifier (id). The ArUco marker is a fiducial marker designed for pose estimation in many vision applications such as robot navigation and Augmented Reality (AR). Therefore, it is very insensitive to light, image resolution, background noise, nor perspective distortion in 2D or 3D spaces [35], [36].
This work aims to propose an alarm system preventing a patient on bed try to remove a nasogastric (NG) tube which this system can be processed and executed on a small device. This system evaluates the patient's gesture and warns the personal caretaker when the subject acts the risky pose by using visual technology. Therefore a face detection and markers technique was considered in the experiment on Raspberry Pi 3 with a camera module. The contribution of this paper is as follows: • To detect and evaluate the risky pose of the subjects using visual technology. • To compare and evaluate detection methods using visual technology. • To demonstrate an alarm system preventing nasogastric (NG) tube removal of a patient.

II. METHODOLOGY
The subject is equipped with markers to estimate the warning gestures of the subject. The warning gestures are defined as the distance between two markers or the subject's face detection. For the warning using distance, the alert creates when the distance between two nearest markers are less than or equal to the setting value. For the warning using face detection, the alarm creates when the camera cannot detect the subject's face in the frame. Using both warning gestures and different marker types, there are six approaches used to evaluate the warning gestures as follows: a) Two ArUco markers (ArucoX2): This method uses ArUco markers to find the distance between two nearest marker centers. One marker is placed on the subject's face and another one is placed on the subject's hand. b) Two colors markers (ColorX2): This method uses red and blue markers to find the distance between two nearest marker centers. A blue marker is placed on the subject's face and a red marker is placed on the subject's hand. The proposed system aims to invent a small device to alarm the risky pose of a patient in the next phase of this research, therefore this work choose a Raspberry Pi with a camera module. A Raspberry Pi 3 Model B and a camera module V2 with Python and OpenCV package are applied to detect and evaluate the warning gestures. In order to find the best detection methods many test conditions are performed and scoring are given to compare the six methods. A list of test conditions is given in Table I. Five test conditions are selected to represent the performance close to the real environment as follows: (i) The subject face angle with respect to the camera is performed to find the limit when the patient turns around. This measurement also represents the perspective distortion on the markers. (ii) The subject to camera distance represents many effects such as the perspective distortion, marker size, image resolution and the background disturbance. (iii) The marker size is performed to find the optimum marker size. Both ArUco and color makers are prepared as squares on white paper as shown in Fig. 1. (iv) The light intensity also affects the detected images mainly on the color properties like shade, tone, saturation and hue. Low intensity means the light in a patient's room is off, while high intensity means the light is on. (v) The background color on the image usually disturbs the color marker method and sometime misleads the face detection or ArUco marker detection. Therefore, the multicolor background color can provoke the disturbance.
This experiment consists of five test conditions and six approaches, thereby there are 108 measurements times six methods, 648 measurements in total.
During the test, a subject lies on a patient's bed with a camera hung above the patient's head. There are three different face angles in our experiment as shown in Fig. 2. The subject starts with the normal gesture, both hands lie parallel to the body in order to verify the false-positive results. Then, one hand is moved close to the face for the warning gesture. The markers are attached to the subject using a transparent tape. For the monocolor background, bed sheets of green and yellow are used while the multicolor background has at least 7 colors on the bed sheets and subject's clothes. The colors of markers are red and blue. The distance between the camera and subject is measured from the subject's forehead to the camera's lens. During the warning pose, if the method detects the warning, the score of 1 is registered to the performance. Otherwise, the score of 0 is registered.
According to the mentioned before test conditions, four attributes can be extracted from the performances of each method naming speed, accuracy, resolution and tolerance and reliability. A scoring system is used quantified each method. A value of 1 is given for the successful detection and 0 for fail detection. In this study, we neglect the false positive or false-negative results. There are four attributes to measure the performance as follows: • The speed is acquired from the Frame Per Second.
• The accuracy is calculated from a summation of all test conditions (108 measurements). The accuracy shows how good is the method related the others. • The tolerance and reliability are acquired under the hardest condition when the marker size is kept at 1 cm 2 (36 measurements). The score comes from the summation under this condition. It shows the robustness of the detection method. • The resolution is calculated from the summation of test conditions when the subject face angle is kept at 0 • (36 measurements). This attribute compares the effects of image resolution on each method.
The performance representation of each attribute is shown as percentage of the observables over total success cases.

III. RESULT AND DISCUSSION
A set of python codes is implemented on Raspberry Pi 3 to detect the warning gestures under several conditions. The results are recorded and analyzed according to the four attributions to estimate the best method of detection. The experiment was conducted in a patient's room which has two windows near the patient's bed. The performance representation of each attribute is shown as percentage of the observables over total success cases is shown in Table II. All methods show no difference in the aspect of speed. The average frame per second is 2. The two colors method (ColorX2) is supposed to be the fastest method because it demands the smallest calculation time than others. Surprisingly, the results show no significant difference. The explanation lies in the Raspberry Pi 3 and the camera limits. Moreover, the two colors method (ColorX2) is very sensitive to marker size, subject to camera distance, light intensity, perspective distortion, background noise and subject's face angle which give the lowest score [39], [40]. In general, it is the worst method of detection. The performance of all methods is represented using a radar plot as shown in Fig. 3.
According to the Fig. 3, the overall best performance is assigned to the Face detection method (FaceOnOff). This method has the highest resolution regardless of the image resolution and the highest tolerance to the light intensity and background noise. On the contrary, this method requires the full subject's face including two eyes, one nose and one mouth. Therefore, it is insensitive to the subject's face angle and perspective distortion. These two effects reflect through the relatively low accuracy.
The second place in term of overall performance is the two ArUco markers method (ArucoX2). Though the markers suffer from the image resolution. The image resolution limit is still good, meaning 1cm 2 markers can be detected only when the distance is 30cm, not detected at the longer distances. In addition, the ArUco markers are very robust to the light intensity, perspective distortion, background noise and subject's face angle which give the highest accuracy.
Another group of detection method is the combination of color markers, ArUco markers and face detection (ColorAruco, FaceAruco, FaceColor). These three methods show relatively similar performances. Unfortunately, the combination does not give the better result than the original because they do not overcome the intrinsic problems on each detection method. They only join those drawbacks and lower the detection performances.
Six detection methods to evaluate and warn when the patient on bed try to remove a nasal feeding tube were www.ijacsa.thesai.org performed and the results were analyzed. The detection method using ArUco markers is found to be a good candidate. This method is robust to any practical disturbances on site and gives good reliability. Only drawback is the minimum marker size. It is related to the image resolution. Hence, increasing the image resolution can solve this problem by changing the camera or reduce the subject to camera distance. A summary of pros and cons for each method is placed in Table III.  Furthermore, the results also indicate that color markers are the worst method of detection. This is due to the acquired image does not have any color quality correction in order to have the fastest speed as much as possible. An improvement for this is to use image color correction tools to compensate the light, saturation, tone and hue [41].

IV. CONCLUSION
To prevent a patient on the bed from trying to remove a nasogastric (NG) tube, an alarm system evaluates the patient's gesture and warns the personal caretaker when the subject acts the risky pose by using visual technology. This system was evaluated a face detection and markers technique on Raspberry Pi 3 with a camera module. This work showed the detection and evaluate the risky pose of the subjects using visual technology, and also showed the comparison and evaluation of detection methods using visual technology. The experiment consists of five test conditions and six approaches, thereby there are 108 measurements times six methods, 648 measurements in total.
The results showed that a color marker method is the fastest method, however, it is very sensitive to marker size, subject to camera distance, light intensity, perspective distortion, background noise, and subject's face angle. A face detection method has the highest resolution regardless of the image resolution and the highest tolerance to the light intensity and background noise. On the contrary, this method requires the full subject's face including two eyes, one nose, and one mouth. An ArUco marker method is very robust to the light intensity, perspective distortion, background noise, and subject's face angle which give the highest accuracy. In contrast, it requires a good resolution image. As a result, The detection method using ArUco marker is found to be a good candidate. This method is robust to any practical disturbances on-site and gives good reliability. The only drawback is the minimum marker size. It is related to image resolution. Therefore, the ArUco marker is appropriate to be used in an alarm system preventing nasogastric (NG) tube removal of a patient.

V. LIMITATIONS AND FUTURE WORK
For the further work, a clinical study is needed to collect data and feedback for the patients with various conditions. In order to implement this work in the clinical study, the correlation between Aruco marker size and the subject to camera distance has to be evaluated. This is the most important factor to determine the warning. These factors can be compensated with high resolution camera. Another crucial issue is the performance under low light and dark conditions. Infrared light source and detectors can handle this issue.