An Approach for Requirements Engineering Analysis using Conceptual Mapping in Healthcare Domain

Healthcare systems aim to achieve the best possible support for patient care and to provide good medical care. Good analysis of requirements is essential to avoid any crises. Elicitation of healthcare systems requirements is an emerging and critical phase. It is a challenging task to deal with constraints from the stakeholders and restrictions of the legal issues. In this research, an approach "Conceptual Mapping for non-functional Health care Requirements; CMHR" is proposed to perform an analysis and to evaluate the relationship between the clinical nonfunctional requirements of medical devices (ventilators as an example in this research) according to the following five attributes: prioritization of requirements, suitability, feasibility, achievability, and risky. Requirements are automatically clustered using the K-means++ algorithm to find out the optimal number of clusters. Requirements are then clustered to visualize the concept map. Clustering is applied on different combinations of the attributes to sort the requirements and to visualize them. Label names are assigned to the classes of requirements to assign each requirement to the appropriate class. Consequently, a prediction of a new requirement can be figured automatically. The approach achieved less rework, fast delivery of the project with good quality, and achieved a higher level of user satisfaction. Keywords—Conceptual mapping; healthcare systems; clustering; requirement engineering analysis


I. INTRODUCTION
Healthcare systems aim to achieve the best possible support for patient care and to provide good medical care. software engineering is an emerging field for healthcare systems. Clinical requirements are more complex comparing to other types of requirements as they should be in specific terms. Software requirement engineers should concentrate when eliciting requirements to avoid making mistakes that could be disastrous in healthcare [1]. Software Requirement analysis; also known as software requirement engineering, aims to ensure that systems meet the needs of stakeholders while also determining user expectations. Requirement analysis consists of activities that are important to the software development process [2]. This process is composed of the following activities [2]: elicitation, modeling and analysis, assurance, management, and Evolution. First, the elicitation phase is called software requirements elicitation, which is a challenging task to understand and analyze software requirements specifications. Its objective is discovering stakeholders' needs, and understand the context [2]. It's important to perform this step carefully, to minimize the changes that may occur in order to save software development time [3]. Second, requirements analysis and negotiation; requirements are identified, and system modeling is performed. Third, requirement specification while the requirement should be documented in a format of Software Requirements. Fourth, validate the requirements to ensure that requirements meet the needs of the stakeholders. Finally, Requirements management is the last activity to manage all requirements-related activities [4] [5]. Further, the requirements are written in natural text format, requirements should be thoroughly analyzed to generate the software requirements specifications needed to validate and verify the final product. Software requirements are divided into Functional and non-Functional requirements [6]. Functional requirements are about the software service and its functional behavior, while non-functional requirements focus on the performance and the quality of requirements that are unrelated to software functionality [7].
The healthcare system is an emerging field. Projects should be well analyzed to avoid any crises. The consequences of poor requirement analysis lead to project failure, consuming effort and time, and repeating tasks and processes. In this research, an approach is presented to analyze nonfunctional requirements through the use of unsupervised learning to arrange the related requirements into clusters. This approach visualizes and sorts the nonfunctional requirements through conceptual mapping. Furthermore, the approach can predict the class of the new requirements.
The rest of this paper is organized as follows: Section 2 explains the background of some related concepts, Section 3 illustrates the literature review, Section 4 discusses the proposed methodology, and finally Section 5 explains the experimental results.

II. BACKGROUND
Machine learning consists of three different types, which are supervised learning, unsupervised learning, and reinforcement learning [8]. In this research, the k-means clustering method is used to group similar data into one group. The related data is combined to make it easier to find relationships between data.
The requirement engineering consists of a series of activities, [4] that begin with requirement elicitation and inception to meet the needs and desires of the stakeholders. Then requirements analysis and negotiation; requirements are identified, and system modeling is performed, which necessitates that the product is completely modeled and designed prior to the construction. Then requirement specification; requirement should be documented in a format of SRS. While the validation of the requirements to ensure that they meet the needs of the stakeholders is the next activity. Finally, Requirements management is the last activity to manage all requirements-related activities. [5].
Conceptual mapping is a diagram that visualizes the semantic relations between concepts [9]. Conceptual maps help to give a consistent evaluation. This process mainly composed of the following steps; first, make brainstorming then generate the statement of needs, after that sorting and rating these statements, then represent the statements on the map that's could be called as conceptual mapping analysis. Finally, they make interpretation for the map and utilize it [10] [11]. Conceptual mapping creates a structure of clustered concepts and provides a visual or graphical representation of data.

III. LITERATURE REVIEW
Al-Dahmash et al. in [12], proposed SEMHTA, a methodology for healthcare to build system and application with reliable and protected software. It stands for software engineering methodology for healthcare applications development, this methodology relies on what developers do when building a different system. There was no consideration for the elicitation phase. Gausepohl et al. in [13], proposed a methodology in a healthcare system that focused on storytelling in the elicitation phase for medical device requirements. Results contributed similar quantity and breadth of information in significantly less time. Participants contributed more distinct context-of-use information with an emphasis on the social context using ontology technics. However, this methodology presents the elicitation of medical device requirements rather than a clinical requirement. Kaiya et. al. [14] did similar work to enhance requirement elicitation using ontology technical web mining and lightweight natural language processing, but this wasn't applied with healthcare systems. Widya et al. in [15], proposed a methodology for eliciting requirements in the eHealth domain. Developers developed a scenario that reflects the treatment protocol and it works with telemedicine treatment. Martin et al. in [16], proposed a user center designed approach in the clinical systems to keep users in the cycle of development. They performed semi-structured open-ended interviews to investigate the clinical need for the device as well as the supposed effects for patients and clinical users. Regarding approaches in requirements engineering in the non-clinical systems: Laporti et al. in [17], proposed the Athena approach that applied requirements elicitation by grouping storytelling from users to be merged in one story. Stories then transformed to scenarios then to use-cases. Andreas et al. in [18] explore and define a requirement engineering methodology for machine learning systems. This methodology incorporates additional types of standards, such as special legal requirements, and explains the ability and freedom from prejudice, in order to enhance the requirement engineering process. They improve the machine learning models by taking too many decisions. Kamal Rudin et al. [19] [20] have created a simple method for gathering consistent requirements from the client stakeholders. A library pattern that supports various application domains is developed to store the essential requirements following the essential use-case. The library pattern, on the other hand, is not concerned with the use of healthcare requirements. Hamzeh Eyal Salman in [21], designed an approach to cluster functional requirements automatically based on semantic measure Using Agglomerative Hierarchical Clustering (AHC) by grouping similar functional requirements into clusters. Results achieved high performance according to a well-known measure and didn't apply yet for the clinical system. Zeng Zhen in [22], applied the TA-ART algorithm ta generate an automatic concept map for text, by making text analysis and set records into clusters. Nadiah Daud in [23] created a methodology to enhance requirements elicitation. This methodology is composed of three phases; 1) the analysis: they collect requirements to analyze it or read the literature and gather requirements to analyze gaps and requirements. 2) they go to the design and development. 3) they make an evaluation and test the results. Rebecca Orsi in [10] presented a study using concept maps analysis by running more than multiple cluster analyses to describe a quantitative validity analysis. She used the R statistical software methods and packages to represent four clustering methods.

IV. PROPOSED APPROACH
This research presents a novel approach; "Conceptual mapping for health care non-functional requirements, (CMHR)". This approach is applied to requirement engineering to enhance the analysis process of the requirements. CMHR is composed of three phases as illustrated in Fig. 1. The first phase is Requirements Gathering. In this phase, the non-functional requirements are extracted from the software requirements specifications. The domain experts rank requirements by the identified attributes (suitability, priority, stability, risky, achievable) from 1 to 5. The second phase is Data Preprocessing. In this phase, the data pass through preprocessing phase to prepare the data for the next step and to ensure that the data is cleaned. The third phase is Machine Learning and visualization. In this phase, the approach starts to perform clustering using the k-means++ algorithm to perform the conceptual mapping analysis. In this phase, The Elbow method is used to calculate the number of clusters, then clustering the related requirements to each other based on similarity according to the identified attributes. Then, a visualization is performed through a matplotlib plot, and the data is exported to CSV file. Now labeling of the clusters and requirements can be performed. The benefit of labeling the data is to make a classification for the requirements and split it into the training and testing set. Finally, predict the new requirements classification. The details of these phases are decelerating in the following diagram.
Phase 1: Requirements Gathering. This phase consists of two main steps. Step1: Extract non-functional requirements from SRS. The structured requirements are extracted and the non-functional requirements are being selected to analyze it. In this research, 104 non-functional requirements are collected from SCRIBD1 website about ventilators as a medical device. Table I illustrates a sample of these requirements.  Devices where the safety of the patients depends on an internal power supply must be equipped with a means of determining the state of the power supply. 12.3. Devices where the safety of the patients depends on an external power supply must include an alarm system to signal any power failure. 12.4. Devices intended to monitor one or more clinical parameters of a patient must be equipped with appropriate alarm systems to alert the user of situations which could lead to death or severe deterioration of the patient's state of health.
12.5. Devices must be designed and manufactured in such a way as to minimize the risks of creating electromagnetic fields which could impair the operation of other devices or equipment in the usual environment.
12.6. Devices must be designed and manufactured in such a way as to avoid, as far as possible, the risk of accidental electric shocks during normal use and in single fault condition, provided the devices are installed correctly.
12.7.1. Devices must be designed and manufactured in such a way as to protect the patient and user against mechanical risks connected with, for example, resistance, stability and moving parts.
12.7.2. Devices must be designed and manufactured in such a way as to reduce to the lowest possible level the risks arising from vibration generated by the devices, taking account of technical progress and of the means available for limiting vibrations, particularly at source, unless the vibrations are part of the specified performance.
Step 2: Requirements Evaluation. In this step, a group of qualified scientists evaluates the non-functional requirements from the SRS document with attributes. The selected attributes are suitability, priority, stability, risky, achievable. Then the following questions are asked about every single requirement to a domain expert.
• Suitability: How much this requirement is complete, correct, and appropriate?
• Priority: How much this requirement is important?
• Stability: Does the requirement has reached a certain level of stability?
• Risk: What's at stake if the requirement isn't implemented?
• Achievable: How far this requirement could be achieved?
The mean of the answers of the survey is taken and be considered during analysis.
Phase 2: Pre-processing. Data cleaning is the process; which implemented at this phase. The kernel-based random method is used. This method finds the relations between concepts or data to perform data cleaning and normalization [24]. Data cleaning detects errors, expels mistakes and irregularities from information in arrange to progress the quality of information [25]. So, this step, try to fill the missing data by setting the most relevant value by calculating the mean value using machine learning.
Phase 3: Machine Learning and Visualization. This phase consists of four main steps. Step1: Compute the number of clusters using the ELBOW method to determine the optimal number of clusters for the data. It uses the k-means++ algorithm and applies a range of values for the k from 1 to 10 on the dataset to calculate the WCSS. WCSS stands for the Within-Cluster-Sum-of-Squares. It's the sum of squares of the distance of each data in all clusters to their centroid point using the following formula: Where, C is the cluster centroids. And d is the data point in each Cluster.
Step 2: Clustering similar Non-Functional Requirement: The k-means++ algorithm used to present multidimensional scaling by measuring similarity, which can be defined as the distance between various datasets; calculating the sum of the distance between point and centroid point. After calculating the distance, the group of related clusters is combined then the requirements could be visualized among the selected attributes throw the multidimensional scaling of the conceptual map and can be exported through CSV file. Requirements could be dynamically sorted based on any two selected attributes. Also, the map could fit for three attributes is developed.
Step 3: Labeling Requirements: Data labeling is the method of identifying the raw data. Data here is ready to be 185 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 8, 2021 labeled after performing clustering; each requirement can be assigned with a label name. Assume there are four clusters, so a label will be assigned to each cluster based on the data in that cluster.
Step 4: Classification of labeled data: Data now in the shape of multi-label classes. Since the requirements are labeled in the last step, so the data is ready to be classified. The Naïve Bayes technique is applied in this research to make a classification for the labeled data. Naïve Bayes split the data into a training set and testing set in aim to learn the machine.
Step 5: Predicting the new requirement cluster: This is the final step.
The developed approach can predict the classification of new requirements or for an edit of an existed requirement.

V. EXPERIMENTAL STUDY AND EVALUATION
The CMHR approach is applied to non-functional requirements of a ventilator device. First, 104 requirements were extracted from the SRS. Then the domain experts evaluated each requirement based on the defined attributes as it was explained in the previous section. The mean average of the evaluation was calculated and assigned to each requirement. Then the file is converted into CSV file. A sample of the data is presented in Table II.
The second step performs the preprocessing phase by using the sklearn library in python to complete the missing data. The data is ready now to perform the machine learning phase. This phase starts with computing the number of clusters using the ELBOW method to determine the optimal number of clusters, as shown in Fig. 2.   Fig. 2 depicts the result of the Elbow method. The clustering is applied based on prioritization and suitability. And figure showed the result of the optimal clusters is five. Fig. 3 depicts the result of the Elbow method. Prioritization and achievability are used to cluster the requirements. The figure showed that the optimal clusters is four.
The next step is to cluster similar non-functional requirements using the k-means++ algorithm. The multidimensional scaling of the requirements is presented in Fig. 4 and Fig. 5.
In Fig. 4, Requirements are sorted and visualized based on prioritization and suitability attributes. Requirements are clustered into the five clusters as the Elbow method showed.
In Fig. 5, Requirements are sorted and visualized based on prioritization and achievability attributes. Requirements are clustered into four clusters as the Elbow method showed.
CMHR approach is implemented dynamically. It offers clustering the requirements based on three attributes and makes a 3-Dimensions representation. For example, Assum the clustering is made based on three attributes prioritization, suitability, and achievability. Fig. 6 presents the result of the Elbow method. The clustering is applied based on prioritization, achievability and suitability. The figure showed that the optimal clusters is five.
Going through the next step to cluster similar nonfunctional requirements, the multidimensional scaling of the requirements is presented using the k-means++ algorithm in Fig. 7. Devices must be designed and manufactured in such a way as to minimize the risks of creating electromagnetic fields which could impair the operation of other devices or equipment in the usual environment.
3.88 3.89 4.9 3.8 3.8 Devices must be designed and manufactured in such a way as to avoid, as far as possible, the risk of accidental electric shocks during normal use and in single fault condition, provided the devices are installed correctly.       In Fig. 7, K-means++ algorithm runs based on the evaluation of three attributes, which are suitability, prioritization, and achievability, Requirements are clustered into five clusters as the Elbow method showed. The Requirements sorted and visualized through 3-Dimensions matplotlib.
Silhouette method is applied to evaluate the clustering. This technique used to determine the performance and how much the clustering technique that has been used is correct. [26] Silhouette scope determines the measures between every object in the cluster and the other objects from other clusters. [26] Silhouette coefficient range from -1 to 1, where 1 implies that clusters are well separated and the objects are related to each other at the same cluster, while 0 implies that clusters are not separated well and distance between clusters are too close. -1 implies that objects and clusters are wrongly appointed. [27] Silhouette score coefficient is calculated for requirements to determine the optimal number of clusters. It tested from 2 clusters to 6 clusters and the result is presented here in Fig. 8 to 12.       Table III. The thickness of the silhouette plot represents the related requirements for each cluster. Fig. 11 shows the silhouette score is 0.71.
In Table III, shows silhouette score for the different number of clusters. Fig. 4 showed that there are five clusters. Labeling of these clusters showed in Table IV. Since the requirements were labeled then the data can be classified. Data were split into a training and testing set; with 80% for training and 20% for testing. The data splitting is to confirm the predicting of the class of any new element. The number of tested requirements is 21. The Confusion matrix is a visual evaluation technique used in machine learning. It presents the actual class result associated with the prediction class result [28]. The result of the confusion matrix is presented in Table V and the accuracy score was 1 for the tested cases.   In Table V, 21 requirements had been tested with the classification technique Naïve Bayas. Results are shown in the table that eight requirements belong to class 1. Three requirements belong to class 2. Six requirements belong to class 3. Two requirements belong to class 4. Two requirements belong to class 5.
The classification report is used to assess the accuracy of a classification algorithm's predictions. The classification report presents the precision, recall, f1-score. Precision = (TP/(TP+FP)) Recall = (TP/(TP+FN)).
The classification report shows that the precision is one for the five classes. The recall is one for the five classes. The F1-Score is one for the five classes.

VI. CONCLUSION AND FUTURE WORK
In this research; a novel approach "Conceptual mapping for health care non-functional requirements; CMHR" is applied to enhance requirement engineering analysis of healthcare systems. The approach focuses on ranking requirements using a set of attributes and calculates the mean value of each requirement for every attribute. CMHR is applied using the K-means++ algorithm to gather the related requirements into one group according to its semantic similarity. Semantic similarity is captured using conceptual mapping of the requirements. The CMHR approach can be extended to any different number of clusters. Clusters can be defined based on two or three attributes to sort the data based on the attributes and visualize it. Assigning labels to the clusters is essential to identify every cluster and requirement, in order to make a classification. CMHR offers an automatic classification for a new requirement. CMHR is applied to medical device requirements and generated five clusters of the requirements depending on prioritization and suitability with a silhouette score 0.71. Also, generated four clusters depending on prioritization and achievability. It generated five clusters based on three attributes. CMHR can work dynamically between any two or three attributes. In future work, CMHR will be applied on more than project to consume the time will be saved to finish the project. More attributes will be found to enhance CMHR approach and to find relations between requirements. Swarm intelligence can be applied to get more enhancement for the proposed approach.