Quality In-Use of Mobile Geographic Information Systems for Data Collection

—Mobile Geographic Information Systems (GIS) plays a vital role in data collection, offering diverse functionalities for spatial data handling. Despite advancements, accurately determining the usage environment during development remains challenging. This study uses machine learning and natural language processing to automatically classify user reviews based on the ISO 25010 quality-in-use model. Motivated by the challenge of gauging user experience during development, stakeholders analyze user reviews for insights. An experimental study compares Support Vector Machine (SVM), Random Forest, Logistic Regression, and Naive Bayes classifiers, revealing superior performance by SVM and Random Forest, particularly in efficiency evaluation. Findings underscore the efficacy of SVM in classifying user reviews, emphasizing its effectiveness in evaluating efficiency within mobile GIS applications. Moreover, it provides valuable insights for stakeholders, contributing to the enhancement of software quality of mobile GIS apps.


INTRODUCTION
Mobile GIS has known a significant rise in recent years as a method for data acquisition across diverse disciplines including, but not limited to, environmental monitoring [1], urban planning [2], and emergency management [3].These GISs allow users to efficiently capture, analyze, and store spatial data related to space, resulting in an increase in productivity compared to traditional methods [4].The implementation of Mobile GISs can provide significant benefits in terms of cost-effectiveness and real-time data acquisition [5].In fact, mobile GIS is widely considered for data collection purpose, primarily due to the set of sensors supported by mobile devices that enable capturing positions especially Global Positioning System (GPS) and the Global Navigation Satellite System GNSS.In addition, mobile GIS enable orientation measure through the compass sensor [6].Moreover, from a data quality point of view, mobile GISs functionalities allow controlling data quality during collection activities [7]; thereby aspect of data quality can be ensured.For instance; the accuracy of data is verified by implementing data validation rules that prevent users from inputting data when the positioning system provides values out of tolerance.Another aspect of data quality is the completeness of data which can be achieved by ensuring that all required items are collected.Finally, the verification of data consistency is achieved through the application of spatial constraints.These constraints serve to alert the user when collected data conflicts with information from other data sources.For instance, an area may be collected as a building, whereas in another data source, it is classified as a farm.These functionalities and features have the potential to influence the attractiveness of the application by partially or fully meeting user's needs.In fact, multiple mobile GIS apps, specifically designed for data collection, are currently available for public use in app repositories [8].These repositories allow users to provide their feedbacks in the form of ratings and reviews, which are crucial for app developers and designers to improve their services and tailor the applications to meet user needs.However, due to the large number of feed backs and the diversity of wording used, reading and analyzing all reviews and ratings is time consuming manually, thus the need for the automation of this process.Moreover, the quality-in-use evaluation of these apps from the user point of view with respect to (International Standardization Organization) ISO 25010 standard [9] can be a tedious and a difficult task.
Besides, recent technological advancements have resulted in the proliferation of frameworks and libraries for natural language processing (NLP) [10], a specific area within the field of computer science and artificial intelligence that focuses on the comprehension, interpretation, and generation of human language by computers.One widely employed technique in NLP is the Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, which represents text as numerical vectors [11].When combined with machine learning (ML) classification methods, this technique enables the automated categorization of natural language into predefined classes.
For software quality, the ISO 25010 model provides two distinct models: The first is a software product quality model, which outlines eight characteristics pertaining to the static and dynamic properties of a given specific system or software product.The second is a quality-in-use model which defines the quality in use as the extent to which a product or system can be used by specific users to meet their needs and achieve specific goals with effectiveness, efficiency, freedom from risk, and satisfaction in specific contexts of use.In addition, the quality in use model defines five quality characteristics: (1) effectiveness, which refers to the accuracy and completeness with which users achieve their specified goals; (2) efficiency, which refers to the resources expended in relation to the accuracy and completeness with which users achieve their goals; (3) satisfaction, which refers to the degree to which user needs are satisfied when a product or system is used in a specified context of use; (4) freedom from risk, which refers to the degree to which a product or system mitigates potential www.ijacsa.thesai.orgrisks to economic status, human life, health, or the environment; and (5) context coverage, which refers to the degree to which a product or system can be used with effectiveness, efficiency, freedom from risk, and satisfaction in both specified contexts of use and in contexts beyond those initially identified.
This study assesses the quality-in-use of mobile GIS for data collection by employing manual labeling, NLP techniques, and term frequency-inverse TF-IDF as pre-processing steps on collected reviews and ratings.Subsequently, ML classification techniques are applied to the pre-processed reviews through an experimental process to identify the most suitable classifier for the specific domain of mobile GIS data collection.The classification of reviews aligns with the quality-in-use model of the ISO 25010 standard.
The study's novel contributions in the field of mobile GIS for data collection can be summarized as follows: 1) Proposing a novel application of natural language processing techniques, specifically IF-IDF, for analyzing user reviews in the context of mobile GIS.This approach enables the extraction of valuable insights from a large volume of user-generated data.
2) Evaluating the performance of four machine learning techniques -Logistic Regression, Support Vector Machine, Random Forest, and naïve bayes -in classifying user reviews based on the ISO 25010 quality characteristics, with a particular focus on the "efficiency" class (characteristic).
3) Comparing the performance metrics of SVM and Random Forest in identifying reviews belonging to the "efficiency" class, showcasing the superior performance of SVM.
4) Underlining the significance of SVM as a suitable classifier for classifying mobile GIS user reviews according to ISO 25010, offering better performance in accurately categorizing reviews related to "efficiency." The paper is organized as follow: Section II provides an overview of the related works.Section III presents the method.Section IV outlines the experimental process, and Section V presents the results of the study.Section VI discusses the findings, and Section VII addresses potential threats to validity.Finally, Section VIII encompasses Conclusion and potential future works.

II. RELATED WORK
In order to identify the used approaches for analyzing and classifying user reviews and ratings in mobile GISs for data collection, an analysis of previous relevant studies was conducted, with a focus on the type of study (i.e., review or empirical study, etc.), the scope (i.e., the mobile applications of GIS for data collection, or mobile applications in general, etc.), the quality aspects (i.e., quality attributes from ISO 25010 or others), NLP techniques, and ML techniques.
The aforementioned relevant studies are presented in Table I, which indicates that there have been diverse approaches employed to tackle the issue of software quality for both mobile apps in general and mobile GIS specifically for data collection purposes.For instance, Lew et al. [12] employed a modeling framework, 2Q2U (Internal/External Quality, Quality in Use, Actual Usability, and User Experience), to evaluate the quality of a desktop GIS application.This framework adopts a flexible approach to integrate and establish connections between the usability and user experience in order to evaluate software applications.Rahman et al. [13] conducted a study to validate the reliability and validity of an instrument aimed at assessing the influence of GIS quality and user satisfaction on individual work performance.The researchers drew upon an extensive analysis of existing literature and sought input from experts to develop a comprehensive questionnaire consisting of 68 items specifically related to GIS quality, user satisfaction, and individual work performance.In addition, Moumane et al. [14] conducted an empirical study with the objective of assessing the usability of mobile applications on different mobile operating systems.The study aimed to evaluate a framework specifically designed for mobile environments, based on the usability characteristic outlined in the ISO 9126 Software Quality Standard.Meng et al. [15] conducted an assessment of the usability of a Web-based Public Participatory GIS (Web-PPGIS) in a practical application setting.The researchers administered a questionnaire to participants and discovered notable disparities in system usability.These variations were observed based on the users' levels of experience and education.Other related studies have focused on the quality of data in mobile GIS as part of the system.Wang et al. [7] outlined the open architecture of field-based Mobile GIS and emphasized the importance of spatial data quality considerations.The study further elucidated how spatial data quality issues were tackled within the Mobile GIS context, in accordance with internationally recognized geoinformatics standards like ISO and Open Geospatial Consortium (OGC) standards.Furthermore, in another study by Song et al. a linear evaluation model utilizing Geographical Weighted Regression (GWR) and a nonlinear evaluation model based on random forest (RF) were developed [16].These models were employed to quantitatively assess the relationship between geographical factors and the positioning bias of mobile phone locations.
With respect to the application of ML classification and NLP, Oyebode et al. [17] used ML classification, NLP, and TF-IDF techniques to evaluate and classify 88,125 user reviews in 104 mental health apps based on predefined classes.Five techniques were involved in this study and they are RF, Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), Logistic Regression (LR), and Stochastic Gradient Descent (SGD).Dos et al. [18] conducted a user feedback classifier based on ML of Decision Tree (DT), Naïve Bayes (NB), LR, RF, and SVM for the classification of reviews on mobile apps across various domains.The classification was performed in accordance with software quality characteristics defined by the ISO 25010 standard.In addition, Dias et al. [19] applied ML techniques and NLP in the context of software requirements classification.The study employed four algorithms: LR, SVM, MNB, and kNN.The results indicated that the use of TF-IDF in conjunction with LR produced the best classification results in differentiating requirements.www.ijacsa.thesai.orgThe authors in [20] has presented a measure of the external quality of mobile GIS for data collection by assessing the degree of impact of requirements related to mobile GIS for data collection on each external quality characteristic, aligned with ISO/IEC 25010.In a separate study, the authors in [21] presented a catalog of requirements for mobile GIS data collection, and demonstrated how it can be used to evaluate such applications.
This study diverges from the aforementioned related work by integrating various dimensions.Notably, while prior studies have explored diverse aspects such as Mobile GIS for data collection, algorithm development, and evaluation, the current study uniquely incorporates and merges these facets.Specifically, the investigation delves into the intersection of Mobile GIS for data collection and the application of both machine learning and natural language processing techniques.In contrast to certain previous studies that addressed the scope of Mobile GIS for data collection but refrained from employing machine learning techniques, this study bridges the gap by incorporating advanced methodologies to automatically classify user reviews based on the ISO 25010 quality-in-use model.This integration enables a more comprehensive understanding of the user experience, contributing a novel perspective to the existing body of literature in this domain.Through the integration of the Mobile GIS scope for data collection with the refined application of machine learning techniques, this study presents a distinctive and valuable contribution to the field, laying the foundation for more refined insights and progress in the evaluation of software quality for mobile GIS applications.
To the best of our knowledge, there have been no prior assessments conducted on the quality in use of mobile GIS for data collection using the ISO 25010 standard, natural language processing (NLP), and machine learning (ML) techniques.

III. METHOD
The methodology employed in this study comprises five stages, as illustrated in Fig. 1: data collection, data preprocessing, data labeling, data vectorization, automated classification, and evaluation.The subsequent subsections offer a detailed overview of each step in the methodology:

A. Data Collection
During the data collection step, a two-fold approach is used to gather users' reviews on mobile GIS applications for data collection.
 First, a pre-existing list of apps obtained from [8] was utilized, and specific inclusion criteria were applied to determine their selection.Each app needed to satisfy the following inclusion criteria: (1) relevance to mobile GIS for data collection, (2) an update date of 2020 or later, and (3) a minimum of five user reviews.
 Second, a combination of the Google Play API [22] and a Java program, developed by the research team, was utilized to gather user reviews from the selected applications.
As a result, a set of 19 apps were selected in the data collection step with a total of 8,793 reviews collected from these apps (see Table II) for comprehensive list of the selected applications and detailed of collected reviews).www.ijacsa.thesai.org

B. Data Preprocessing
Data preparation is a crucial step in natural language processing (NLP), involving the cleaning and preprocessing of raw text data to eliminate irrelevant information.In order to achieve this, the following well-known steps were followed [23]:  Tokenization: In NPL, tokenization involves segmenting words into units called tokens based on certain rules such as removing punctuation or capitalization.The resulting tokens are intended to convey a semantic meaning.The tokenization of the collected reviews was achieved by removing punctuation marks, digits, and foreign characters (non-Latin) from the text data.
 Removing stop words: Stop words are commonly occurring words within text data that have little semantic value, such as "the" or "is", and are removed during preprocessing for NLP.The Natural Language Tool Kit (NLTK) package contains a pre-built list of stop words that can be downloaded and used [24].However, to ensure the inclusion of domain-specific terms in the data analysis, the authors of this study have compiled a list of words related to mobile GIS to prevent them from being removed during preprocessing.This list included for instance "GPS" -a widely-known sensor used for positioning that facilitates data collection via mobile GIS.Other term of -Accuracy" was included in the list as it relates to the precision of positioning, and consequently, the quality of data collected through mobile GIS.Additionally, -Map‖ was involved in the list as it's an important component in GIS that allow data presentation.www.ijacsa.thesai.org Lemmatization in order to reduce words to their base form.For instance, words of "running," "ran," and "run" will be reduced to their base form "run".
 Convert words to lowercase.
The aforementioned steps of data preprocessing were achieved using a python program developed by the authors of this study.For each review in the data set, the program executes successively the operations of Tokenization, removing stop words, Lemmatization and converting to lowercase.The output of these steps is then stored into new column of ‗pre-processed-review'.

C. Data Labelling
The data labelling step consists on the classification of the user reviews (resulted from step 2) through a manual process, which was carried out by the primary author, with respect to the quality characteristics specified in the ISO 25010 model for quality-in-use.For each review, the corresponding predefined quality characteristics are affected by the primary author and then validated by the others authors for relevance and consistency.In cases of disagreement, a consensus was achieved through collective discussion among all authors.The manual process was conducted through a web application that was specifically developed by the research team for this purpose.Fig. 2 depicts the interface of this application, which enables users to navigate through reviews and manually assign quality characteristics to each review by clicking the button related to the corresponding quality.At the end of the data labelling, a comma-separated values (CSV) file that contains the pre-processed-review with the corresponding label is generated using the button CSV.It is noteworthy that during the data labeling process, certain reviews were deemed ambiguous due to their unclear meanings or the presence of non-Latin characters that remained from the data preparation stage.As a result, these reviews were excluded from the data set, resulting in a reduction in the total number of reviews from 7322 to 6904.Table III shows the detailed results in term of reviews and quality characteristics.

D. Data Vectorization
This step consists of transforming text reviews into numerical values which can then be utilized as input for machine learning classification algorithms.TF-IDF [25] an extensively utilized technique in natural language processing, facilitates the transformation of text data into numerical vectors with a focus on classifying user reviews.This method computes multiplication of the term frequency (TF) with the inverse document frequency (IDF) for each term present in the review, yielding a numerical representation of the significance and rarity of the terms.This numerical representation enables the detection of patterns and trends within user reviews and the subsequent categorization of these reviews according to specific quality characteristics.In order to apply the TF-IDF vectorization technique on the user reviews, the authors developed a Python script that makes uses of the Scikit-learn [26].This script reads the CSV file generated during the preceding data labeling phase and computes the frequency of each term in the reviews along with their respective importance scores.The resulting TF-IDF matrix comprises the user reviews in the rows and the overall terms in the columns.Moreover, the script stores the quality characteristic of each review, obtained from the data labeling step, in an additional column labeled "labels".Finally, the output of the script is produced in a new CSV file named "TF-IDF.csv".

E. Automated Classification and Evaluation
The objective of this step is to identify the most suitable machine learning algorithm for classifying user reviews related to mobile GIS for data collection based on quality-in-use characteristics of ISO.To achieve this, the datasets generated through steps 1 to 3 were used as input for the classification methods.Given the impracticality of testing all potential combinations of classification techniques, an experimental study was conducted to automate the testing and evaluation process for each machine learning algorithm's performance.
To summarize, in this study, a dataset of user reviews related to a set of mobile GIS for data collection was obtained.These reviews were subjected to preprocessing utilizing natural language processing methodologies, followed by vectorization utilizing the TF-IDF vectorization technique.A manual labelling process was carried out to classify reviews based on the quality-in-use model of ISO.A dataset with 6904 reviews was obtained and will be used in the experimental study performed in the next section.

IV. EXPERIMENTAL STUDY
In this section, an experimental study is conducted to explore the application of machine learning (ML) classification techniques on the pre-processed reviews (obtained from steps 1 to 3 in the previous section).The objective is to identify the best classifier for mobile GIS data collection.

A. Dataset Preprocessing
As shown in Table III, A few quality characteristics within the quality-in-use model have limited or insignificant representation due to the small number of available samples.These qualities are: Context Coverage -Flexibility with only six reviews, Freedom from Risk quality with 2, 6, and 25 reviews respectively to Environmental, Health and Safety, and Economic Risk Mitigation.To maintain the validity and reliability of the model, reviews associated with these particular qualities were subsequently excluded from further analysis.Thus, the dataset has undergone a reduction in the total number of samples from 6904 to 6815.Furthermore, Fig. 3 presents a statistical analysis of the data related to this study, revealing a notable discrepancy in the sample distribution across different quality characteristics.This discrepancy gives rise to an imbalanced data challenge.To mitigate the issue of imbalanced data, the Synthetic Minority Over-sampling Technique (SMOTE) [27] was utilized to generate synthetic samples.

B. Experimental Process
The experimental process steps used is summarized as the following:  Four ML techniques are used, namely: (1) Support Vector Machine was introduced by (Vapnik and coworkers) as -a training algorithm that maximizes the margin between the training patterns and the decision boundary‖ [28].The SVM classifiers can be improved by modifying the kernel functions (Linear, Polynomial…) and its parameters (C: regulation, gamma: kernel coefficient …) [29].( 2) Logistic Regression is a statistical method applied for classification tasks by analyzing the relationship between a binary variable and one or more independent variables using a logistic function.[30].

(3) Naive
Bayes is defined as a simple probabilistic model for classification that assumes that the features are conditionally independent given the class label [31].
The method models the probability of each class given the observed features using Bayes' theorem, and selects the class with the highest probability as the predicted class for a given input.(4) Finally, Random Forest is defined as an ensemble learning method that perform classification by aggregating the predictions of multiple decision trees [32].
 A Grid Search [33] tuning parameter method with fivefold cross-validation was employed to identify the optimal set of hyper-parameters for each technique (see Table IV for the values for GS parameters). A Python script was developed using the Scikit library to achieve optimal classifier performance.The script implements the algorithm depicted in Algorithm 1 and is available upon email request to the author.V. RESULTS ANALYSIS Table V displays the performance of each classifier with respect to all the utilized performance metrics, along with the corresponding optimal values for the hyperparameters.

The results indicated that:
 The Random Forest classifier achieved a precision of 0.81, indicating that, out of all instances that were predicted as positive, 81% were actually positive.The classifier also achieved a recall of 0.79, indicating that, out of all true positive instances, 79% were correctly identified by the classifier.The overall accuracy of the classifier was found to be 0.79, indicating that 79% of the predictions made by the classifier were correct.The F1-score, which is a harmonic mean of precision and recall, was found to be 0.80, indicating that the precision and recall of the classifier was balanced.
 The SVM classifier obtained scores that were slightly different from those of the Random Forest classifier, with a precision score of 0.79, an accuracy score of 0.80, a recall score of 0.80, and an F1-score of 0.79.
 The Logistic Regression classifier performed slightly worse in terms of accuracy and recall, but obtained 0.81 in precision and 0.79 in F1-score.
 The Naive Bayes classifier had the lowest scores across all accuracy criteria, indicating that it performed less well than the other three classifiers.
Moreover, the confusion matrices scores related to SVM and Random Forest were calculated and presented respectively in Table VI and Table VII.As depicted in the confusion matrices, both models demonstrate strong performance.This is evidenced by the majority of entries being located along the diagonal of the matrices.www.ijacsa.thesai.orgVIII.As demonstrated, the precision, recall, and F1-score demonstrate heterogeneity across various categories, providing valuable insights into the classification performance of each algorithm.Subsequently, in the following section, these outcomes will be discussed in the context of the criteria for mobile GIS for data collection to select the best classifier from SVM and RF.

VI. DISCUSSION
Table V illustrates that the accuracy metric for SVM and RF classifiers achieved high values of 0.80 and 0.79, respectively.These results suggest that both classifiers were successful in correctly classifying a high proportion of instances, indicating that the vectorization process utilizing TF-IDF was successful in identifying relevant terms within the corpus of user reviews.Note that TF-IDF was previously identified in research as a strong vectorization method among user reviews [17,18].
The effectiveness of TF-IDF in mobile GIS for data collection reviews can be explained by its adeptness at capturing term significance through frequency calculations.Within this domain, where reviews frequently incorporate specialized terminology and jargon pertaining to geographic information, mobile devices, and associated technologies, TF-IDF stands out by recognizing and assigning importance to these specific terms based on their frequency.This emphasis on the frequency of domain-specific terms contributes to a more precise representation of the data, aligning with the high accuracy metrics observed in the classifiers' performance as highlighted in Table V.
The SVM classifier and Random Forest classifier were evaluated using precision, accuracy, recall, and F1-score metrics.The results revealed that the Random Forest classifier obtained scores of 0.81, 0.79, 0.79, and 0.80, respectively, while the SVM classifier obtained scores of 0.79, 0.80, 0.80, and 0.79, respectively.These results indicate that the Random Forest classifier performed slightly better in terms of precision and F1-score, while the SVM classifier performed better in terms of accuracy and recall.
The SMOTE technique has been employed to mitigate the issue of class imbalance.however, a detailed analysis of class scores is still necessary to reveal any performance variations of classifiers on specific classes and provide a more comprehensive understanding of their capabilities.Although no significant differences were observed in the four performance scores of the two classifiers, Random Forest and SVM, an exhaustive evaluation of their performance was conducted, taking into account the specific domain of mobile GIS for data collection.In fact, the requirements of mobile GIS for data collection regarding the positioning accuracy is crucial, as it affects directly the quality of collected data [21], which subsequently impacts the overall data collection process.Moreover, a real challenge is associated with GPS positioning accuracy in smartphones [35] and extensive investigations were conducted to identify factors that influence the accuracy of mobile GIS positioning [36][37][38].In this light, various solutions have been adopted to enhance the positioning accuracy in mobile mode [39,40].Therefore, comparing the performance of Random Forest and SVM classifiers on the class of efficiency can aid in selecting the best classifier for mobile GIS data collection purposes.
Based on the evaluation of the classifiers scores presented in Table VIII, the SVM classifier appears to be a more suitable option for identifying a maximum number of user reviews belonging to the "Efficiency" class in mobile GIS data collection.The SVM classifier exhibits a higher F1-score (0.77), recall score (0.80), and precision (0.74) as compared to the Random Forest classifier (F1-score: 0.75, recall: 0.78, precision: 0.73) for this class.These findings suggest that the SVM classifier has a greater ability to detect positive samples of the "Efficiency" class while maintaining a good balance between precision and recall.Furthermore, the SVM classifier has a higher precision score (0.74) than the Random Forest classifier (0.73) for this class, indicating that the SVM classifier generates fewer false positive predictions.Thus, the SVM classifier may be the optimal choice for this classification task in the mobile GIS data collection domain.
In addition, the complexities inherent in user reviews within the mobile GIS for data collection domain introduce a level of intricacy marked by complex and nonlinear relationships between linguistic expressions and corresponding sentiments.These reviews serve as reflections of nuanced discussions prevailing in this specialized technical domain.Leveraging their unique capacity to define optimal hyperplanes within high-dimensional spaces, SVM exhibit notable proficiency in capturing the nuanced patterns embedded in these reviews.The algorithm's adeptness in recognizing subtle differences and correlations within the technical language of user reviews establishes SVMs as a resilient and effective choice for classifying user-generated content within the intricate realm of mobile GIS for data collection.This underscores their efficacy in addressing the inherent complexities specific to mobile GIS for data collection.

VII. THREATS TO VALIDITY
Although objectivity was applied during the research process, there may still be limitations to this study:  In the natural language processing phase, certain terms may have been erroneously categorized as stop words and consequently eliminated from the dataset.This could impact the construct validity of the study.To address this issue, a specialized GIS term dictionary was constructed to ensure that relevant terms are not automatically removed during the data preprocessing stage, thus improving construct validity.
 The automated classification in this study concerned mobile GIS user reviews, which could pose potential challenges to external validity.To address this concern, the set of studied reviews was carefully chosen to ensure a representative sample.This limitation may have slightly affected the performance metrics, but optimism exists that the results may be utilized in forthcoming studies related to mobile GIS.www.ijacsa.thesai.org User reviews were assigned manual classifications based on the quality-in-use model.However, there is a possibility that a review may belong to more than one class which impact the internal validity.To address this issue, only the clearest classification was considered.

VIII. CONCLUSION AND FUTURE WORK
This study involved an experiment aimed at identifying the best classifier for analyzing user reviews of mobile GIS applications in the context of data collection.The process involved five steps: data collection, data preprocessing, data labeling, data vectorization, automated classification, and evaluation.
The evaluation of classifiers unveiled notable performance metrics.The Random Forest classifier showcased balanced performance, exhibiting a precision of 0.81, a recall of 0.79, an accuracy of 0.79, and an F1-score of 0.80.The SVM classifier, with slightly differing yet competitive scores, achieved a precision of 0.79, accuracy of 0.80, recall of 0.80, and an F1score of 0.79.Likewise, the Logistic Regression classifier demonstrated a precision of 0.81, accuracy of 0.79, recall of 0.79, and an F1-score of 0.79, while the Naive Bayes classifier showed lower scores across accuracy criteria.Notably, when honing in on the "efficiency" class, the SVM classifier outperformed the Random Forest classifier, displaying superior precision (0.74), recall (0.80), and F1-score (0.77) compared to the Random Forest classifier (precision: 0.73, recall: 0.78, F1score: 0.75).These results underscore the effectiveness of the TF-IDF vectorizer and SVM classifier combination within the specific domain of mobile GIS for data collection, emphasizing the significance of efficiency requirements in this context.The implications of this study extend to developers and designers of mobile GIS applications, providing insights for automatic quality evaluation using the ISO 25010 quality-in-use model.
In future investigations, the aim is to expand the scope of the study by increasing the number of experiments conducted.This expansion will enable a more extensive gathering of relevant and accurate results.Additionally, we intend to investigate the correlation between external quality and the quality-in-use of mobile GIS applications specifically designed for data collection purposes, with the ultimate goal of developing a predictive model for quality-in-use.This may have practical implications for enhancing the user experience and satisfaction of mobile GIS applications for data collection by ensuring that external quality meets the requirements of quality-in-use.

Fig. 3 .
Fig. 3. Distribution of the dataset into quality classes.

Algorithm 1 :
Grid Search for ML Algorithms Initialize a model-params dictionary of the four ML algorithms and their parameters Create an empty report array Compute For each model in model-params Create gvv instance of GridSearchCV with model params and five-fols cross-validation Fit gcv with the training set to find the best hyper parameters Test the fitted model on the test dataset Compute the confusion matrix Compute the evaluation metrics Add the confusion matrix and the evaluation metrics to the report End Display report  The performance of the four-classifier experimented in this study was evaluated using four commonly used accuracy criteria [34]: (1) Precision, which quantifies the proportion of true positive predictions among all positive predictions made by the classifier.(2) Recall, which quantifies the proportion of true positive predictions among all actual positive instances.(3) Accuracy, which quantifies the proportion of correct predictions made by the classifier among all instances.(4) F-score, which combines precision and recall into a single score.

TABLE II .
MOBILE GIS APPS SELECTED

TABLE III .
DISTRIBUTION OF REVIEWS ACROSS QUALITY CHARACTERISTICS Fig. 2. Screenshot of the data labelling web interface.www.ijacsa.thesai.org

TABLE IV .
VALUES OF GRID SEARCH PARAMETERS

TABLE V .
GLOBAL CLASSIFICATION SCORES AND HYPER PARAMETERS Furthermore, the performance scores related to SVM and Random Forest were calculated for each quality class and the results are presented in Table