Proposal of an Automated Tool for the Application of Sentiment Analysis Techniques in the Context of Marketing

—Currently, the opinions and comments made by customers on e-commerce portals regarding different products and services have great potential for identifying customer perceptions and preferences. Based on the above, there is a growing need for companies to have automated tools based on sentiment analysis through polarity analysis, which allow the examination of customer opinions to obtain quantitative indicators from qualitative information that enable decision-making in the context of marketing. In this article, we propose the construction of an automated tool for conducting opinion mining studies, which can be used in a transparent way to the algorithmic process by the marketing units of companies for decision making. The functionality of the proposed tool was verified through a case study, in which the opinions obtained from electronic commerce website concerning one of the best-selling technological products were investigated.


I. INTRODUCTION
The opinions and comments made in various digital media by customers regarding different products and services, being subjective and descriptive, have great potential for identifying the perception and preferences of the user regarding the products and services offered by companies to formulate appropriate marketing strategies [1]- [4]. In the same way, from the rapid growth of e-commerce platforms and social networks, potential buyers of products and services take into consideration the evaluations or reviews made by other clients before buying a product from a given company [5]- [8].
By the above, with the current evolution of artificial intelligence methods and the wide dissemination of machine learning models, one of the emerging areas of computer science is affective computing, which through techniques such as opinion mining or sentiment analysis allows obtaining quantitative indicators from subjective information that includes emotional content, such as opinions [9]- [13]. Sentiment analysis techniques correspond to a subdomain of natural language processing (NLP), and their objective is to identify how sentiments are expressed in a text through the determination of the polarity value (positive, negative, neutral) that are present in an opinion [7], [14]- [17]. In this way, considering the advantages of sentiment analysis techniques in terms of the analysis of qualitative information such as the opinion of customers on social networks and electronic commerce portals, these become useful tools for decisionmaking in the context of marketing by companies [14], [18], [19].
Based on the advantages provided by opinion mining techniques, in companies and organizations, there is a growing need for automated tools, which from structured and unstructured qualitative data extracted from different sources (social networks, electronic commerce, marketing campaigns, perception surveys, among others) make it possible to identify the individual and group perception of customers concerning a product or service more objectively and make decisions regarding marketing campaigns [6], [20]- [23].
This article proposes the design and implementation of an automated tool for the application of sentiment analysis or opinion mining techniques in the context of marketing and specifically in the analysis of reviews made on e-commerce portals by clients regarding the perception or degree of satisfaction with a product or service. In this sense, the tool receives as input, in a .CSV file, the opinions or reviews of the clients regarding a product or service and obtains as output, the polarity (positive, negative, neutral) associated with each of the opinions, as well as the statistical analysis of the polarities, the average percentage distribution of the polarities, and the percentage distribution of the dominant polarities of the total opinions. The proposed tool was implemented in the Python language using machine learning models associated with sentiment analysis techniques provided by the Pythoncompatible Paralleldots library. The proposed automated tool is intended to serve as a reference for conducting sentiment analysis studies in the context of marketing, to evaluate the impact and promote decision-making regarding the products and services offered by companies through digital media.
The rest of the article is organized as follows: Section 2 shows different studies that are relevant to the problems faced and the objectives to be achieved. Section 3 describes the methodology considered for the development of this research. Section 4 presents the results obtained in this work, which includes the description of the design and implementation of the automated tool and its application in a case study in the context of marketing. Section 5 gives a discussion that summarizes the data and facts obtained to support the purpose of this investigation. Finally, section 6 presents the conclusions and future work derived from this research. www.ijacsa.thesai.org II. RELATED WORK Different studies have been carried out regarding the application of the sentiment analysis theme and/or opinion mining in the business context. Thus in [24], a study on the perception of users regarding airport service in 10 main airports in the world was executed based on the application of sentiment analysis on the comments of 1224 passengers extracted from the Skytrax portal. In [25], a sentiment analysis study was developed on the opinions of tourists on social networks regarding the behavior and performance of this sector (airlines, tourism organizations, news, and tourism events) during the first months of the COVID-19 pandemic. In [26], a context-sensitive recommendation system was proposed, which extracted people's food preferences from their comments using sentiment analysis techniques for suggesting restaurants according to these preferences. In [27], a comparative study was carried out on the effectiveness of the application of sentiment analysis techniques and the stars rating with a total of 900 customer reviews of different products and services. It was obtained that the sentiment analysis techniques are effective in detecting the underlying tone of the analyzed content. In [28], a new sentiment analysis method supported by a knowledge-based lexicon is proposed, which was used to evaluate the polarity of customer opinions after the launch of new products in the videogame industry. In [29], a study of sentiment and text analysis after posts on Twitter related to Halal tourism was developed to identify the perception of users and the most popular tourist destinations. In [30], a sentiment analysis study was carried out on the opinions and comments of the inhabitants of India in relevant news articles regarding the economy and the financial market to identify their perception of the volatility of the stock market. In [23], an analysis of the perception of customers of energy companies in the United Kingdom was done based on the opinions expressed by them on Twitter, using a sentiment analysis approach supported by a knowledge-based lexicon. In [31], a sentiment analysis study was developed on the opinions of Twitter users regarding autonomous vehicles and their related promotional events.
Although the aforementioned works present a case of application of the sentiment analysis theme in the business context, they have focused on a few of the challenges of this topic, such as the improvement in precision and the optimization of supervised learning models or polarity classification [5], without addressing the development of tools that automate the process of opinion mining or sentiment analysis to facilitate their extrapolation and use transparently in different application contexts by stakeholders to obtain objective indicators of the perception of clients or potential clients from subjective information [6], [20]- [23].

III. METHODOLOGY
For the development of this work, 4 methodological phases were considered: exploration and selection of technologies, design of the automated tool, construction of the automated tool, and case study (see Fig. 1). In phase 1 of the methodology, a set of tools and technologies were explored to carry out the processes of word processing, sentiment analysis, and/or polarity in text, statistical analysis, and data visualization. Thus, the Python pandas library was selected for obtaining and processing the text of opinions from .CSV or excel files. Regarding sentiment analysis, the Python Paralleldots library was selected, which enables the extraction of the value or distribution of the polarities (positive, negative, and neutral) associated with the text of an opinion. For the analysis through descriptive statistics of the polarities obtained in the sentiment analysis process, the NumPy Python library was selected. Regarding the visualization of statistical and graphic results, the Python libraries Tkinter and matplotlib were selected. In phase 2 of the methodology, the functional modules of the tool were defined, as well as the processes associated with these modules and that allowed text processing, obtaining polarities in opinions, and statistical and graphical analysis of the results. Additionally, from the defined modules and processes, the high-level interfaces of the automated tool were designed. In phase 3 of the methodology, based on the high-level interfaces defined in phase 2, the automated tool for the application of sentiment analysis techniques in the context of marketing was implemented using the Python language and the libraries selected in phase 1, which allows performing on a dataset of opinions, the functional processes mentioned in phase 2 (text processing, determination of the polarities of the opinions and graphical and statistical analysis of the results). Finally, in phase 4, based on the use of the proposed automated tool, a case study was developed on a dataset made up of comments made to a technological producer offered in the Latin American e-commerce portal "Mercado Libre". This case study was carried out in order to demonstrate the ease with which the tool can process and analyze opinion datasets obtained from different e-commerce portals.

IV. RESULTS
This section describes the development of the different phases considered in the methodology presented in section 2, which includes the design and implementation of the proposed automated tool, as well as the case study developed through the tool. At the tool design level, the functional modules and the processes developed in each of these modules are presented. Regarding the implementation, the interfaces and/or tabs that make up the applied tool are described. Regarding the case study, the results obtained from the sentiment analysis study performed on the opinions or reviews of a product offered on the "Mercado Libre" website are presented. www.ijacsa.thesai.org Based on the above, Fig. 2 presents the six functional modules that make up the proposed automated tool: the GUI module, text processing module, sentiment analysis module, statistical analysis module, visualization module, and report module.
The GUI module was implemented through the Python Tkinter library, which was in charge of managing the different components, controls, and events that make up the graphical interface of the tool, in such a way that it allowed the interaction with the user and the presentation of the results of the study of sentiment analysis at the statistical and graphical level. The text processing module was implemented using the pandas library, and it was responsible for making it possible to access the opinion data stored in a .CSV or excel file. The sentiment analysis module was implemented through the Paralleldots library that oversees obtaining the polarity values (positive, negative, and neutral) of the different opinions extracted from the file attached to the tool. The statistical analysis module was implemented through the NumPy library, and it was responsible for the application of descriptive statistics techniques on the polarities obtained in the different opinions. The visualization module was implemented using the Python matplotlib library which allowed displaying the results of the percentage distribution of the polarities in each one of the opinions and the total of opinions. Finally, the report module was implemented through the Python file management functionalities that allowed generating a report in .CSV format with the polarities calculated for each of the opinions and the statistical analysis obtained from those polarities.
Taking into consideration the modules that make up the automated tool, Fig. 3 illustrates the different functional processes that the tool performs and that comprise those modules through a flow diagram. Thus, once the file with the opinions was loaded either in .CSV or excel format, the tool obtained each of the opinions through the pandas library and proceeded with the calculation of the polarities (positive, negative, and neutral) associated with them, using the Paralleldots library while these polarities were stored in a floating NumPy array. Once the polarities of the opinions of the file were obtained, the tool performed the analysis through descriptive statistics using the NumPy library. In the same way, the automated tool generated a set of graphs with the distribution of polarities on every single opinion and overall. Finally, the tool allowed generating a report in .CSV format with the polarities calculated for each opinion and their statistical analysis. Considering the processes defined and presented in Fig. 3, Fig. 4 shows the graphical interface of the implemented tool, which consists of five tabs labeled as "Opinion Analysis", "Statistical Analysis", "Polarities per Opinion", "Average Polarity Distribution", "Dominant Polarities". By pressing the "Open" button in the "Opinion Analysis" tab, it is possible to load a .CSV or excel file with the opinions or reviews of a product taken from an electronic commerce website. By pressing the "Process Opinion" button once the file was loaded into the tool, it calculated the polarity associated with each of the opinions while the results were presented in the text area of this tab. In the same way, it was possible to generate a report in a .CSV file in this tab with the polarities obtained for each opinion and their statistical analysis. As an example, Fig. 4 presents the results obtained for an example file, which has 5 test opinions. Thus, for the first opinion, the www.ijacsa.thesai.org result was a positive polarity of 0.057, a negative polarity of 0.628, and a neutral polarity of 0.315. On the other hand, Fig. 5 shows the "Statistical Analysis" tab of the automated tool, in which it is possible to calculate the average, the percentage distribution, the standard deviation, the maximum value, and the minimum value from the total of the opinions and for the 3 polarities (positive, negative, neutral). Likewise, the tool allowed obtaining, for the consolidation of the polarities, the percentage of dominant polarities, which was calculated by counting the polarity with the highest value in the different opinions.
To illustrate this, Fig. 5 shows how for the 5 test opinions loaded in the example file from Fig. 4, the polarity that presented a greater average in the total of opinions is the neutral one with an average value of 0.376 and with a percentage distribution in the total of opinions of 37.6%. In the same way, the minimum and maximum values for the neutral polarity are 0.315 and 0.511 respectively. On the other hand, if only the dominant polarities in each of the opinions are considered, the positive and negative polarities are dominant with 40% each. Continuing with the tabs of the automated tool, Fig. 6 shows the "Polarities per opinion" tab, in which it is possible to graphically obtain the distribution of the 3 polarities (positive, negative, and neutral) in each of the opinions uploaded to the file uploaded in the "Opinion Analysis" tab that added together result in 1.  As an example, Fig. 6 presents the distribution of the polarities for the 5 test opinions loaded in the example file of Fig. 4 in a way that it is possible to see how in two out of five opinions (1,4) the dominant polarity is negative, the other two (2, 5) have a positive dominant polarity while in the remaining one (3), the dominant polarity is neutral. On the other hand, Fig. 7 shows the "Average Polarity Distribution" tab of the automated tool, in which it is possible to obtain the average percentage of the distribution of polarities over the total opinions of the file loaded in the "Opinion Analysis" tab. Thus, Fig. 7 shows the percentage of distribution of the polarities corresponding to the 5 test opinions loaded in the example file of Fig. 4. Then, it is possible to appreciate how the polarity with the percentage of greater distribution in the total of the opinions is the neutral one with 37.6%, while the polarity with a smaller percentage of distribution in the total of the opinions is the positive one with 27.4%.
Finally, Fig. 8 shows the "Dominant Polarities" tab of the automated tool, in which it is possible to obtain the percentage of dominant polarities over the total opinions in the file loaded in the "Opinion Analysis" tab. Thus, it is possible to observe in Fig. 8 the percentage of dominant polarities for the 5 test opinions of the example file loaded in Fig. 4; both the negative and positive polarities are each one dominant with 40% of the opinions while the neutral polarity is presented as dominant in the remaining 20% of the opinions.   Once the different tabs of the tool proposed in this article have been presented to verify its usefulness, the case study developed from the opinions made by users of a Latin American electronic commerce website is presented below. The opinions were made on a specific technological product. In this way, a total of 26 opinions were collected by a group of users of the "Mercado Libre" portal about the experience of buying and using one of the best-selling cell phones in the virtual store (Motorola Moto E7). Thus, a .CSV file with the 26 opinions was generated after eliminating unnecessary spaces and spelling correction in some opinions, to improve the precision of the results. Thus, Fig. 9 shows the polarity distribution obtained for the 26 opinions of the case study considered. From the results obtained in Fig. 9, it is possible to observe how the polarity that has a higher percentage of presence in the different opinions of the case study is the positive polarity, followed by the neutral polarity, which can be seen more clearly in the statistical results shown in Table I, which were extracted from the results that the tool throws in the "Statistical Analysis" tab. Then, according to the results of Table I, the polarity with a higher percentage of distribution in the 26 opinions of the case study is positive with 50.7%, followed by the neutral polarity with a percentage of 36.6%, and the negative one with a distribution percentage of 12.8%, which indicates that the distribution percentage of the positive polarity is four times the percentage of distribution of the negative one. In the same way, the polarity with the highest maximum value is the positive one with a value of 0.826, followed by the neutral polarity with a maximum value of 0.577. These results can be explained by the fact that most of the opinions highlighted the good relationship between the price and the quality of the product. On the other hand, if only the dominant polarities in each of the 26 opinions of the study case are quantified, the results presented in Table II are obtained. According to the results of Table II, the positive polarity is dominant in 76.923% of the opinions while the neutral polarity is dominant in 23.077% of the opinions. In the same way, the negative polarity is not dominant in any of the opinions of the case study.

V. DISCUSSION
The tool proposed in this article allows the automation of the process of sentiment analysis on datasets made up of customer reviews of products available on e-commerce portals. Thus, from the opinions contained in a .CSV file or in an excel file, the proposed tool allows to load and process this file in order to obtain the degree of polarity of each opinion, as well as the total polarity. Similarly, through the proposed tool it is possible to perform an analysis based on descriptive statistics to obtain the average, minimum value, maximum value and standard deviation of each of the polarities, as well as the percentage of opinions in which each of the polarities is dominant or greater. Finally, the proposed tool allows generating a graphic analysis, by means of which two pie charts are obtained showing the percentage distribution of polarities in the opinions and the percentage distribution of dominant polarities.
The advantages provided by the tool can be seen more clearly in the case study developed on a dataset of product reviews from the "Mercado Libre" portal, in which the percentage of the positive distribution of reviews is four times the percentage of the negative distribution. In this way, the tool, through sentiment analysis techniques, makes it possible to obtain quantitative value enclosed in qualitative data, which complements and enriches the results of traditional valuation methods based on star ratings. In this sense, the application of these sentiment analysis techniques contributes to decision making by service providers with respect to marketing campaigns. www.ijacsa.thesai.org Finally, it should be noted that one of the major challenges and/or limitations of this study is to automate the correction of typographical and orthographic errors in the opinions to improve the accuracy in obtaining the polarities. In this sense, in the case study developed, this process was performed manually, considering that it was a dataset of 26 opinions, so that a future work derived from this work is to automatically perform the pre-processing of the opinions.

VI. CONCLUSION
Based on the growing need for companies to have tools that take advantage of the opinions of customers regarding the products and services offered by them to obtain quantitative indicators of customer perception, this article proposed an automated tool for the application of sentiment analysis techniques on the opinions of customers in electronic commerce portals, which can be used in a transparent way to the algorithmic process by the marketing units of the companies for the taking of decisions.
The proposed tool allows to load a .CSV or excel file with the opinions of the clients and obtain the distribution of the 3 polarities in each one of the opinions and overall, in an automated way. Also, the results of the application of the techniques of descriptive statistics on the consolidation of the polarities. In the same way, the tool obtains the percentage of the dominant polarities in the consolidated opinions. Through the statistical and graphical results, the automated tool provides quantitative indicators based on customer opinions that can be considered for decision-making in the context of marketing.
The Python language libraries used for the construction of the proposed automated tool proved to be adequate for the different processes associated with access to the text of opinions, analysis of sentiments and/or polarities, statistical analysis of polarities, and visualization of the results of the analysis. In this sense, this tool aims to serve as a reference to be replicated in other contexts in which it is intended to quantitatively identify the perception of a person, user, or client through the analysis of their opinion.
The case study developed from the use of the automated tool allowed the researchers to conclude that the 26 opinions made on the "Mercado Libre" website about the cell phone show a positive percentage distribution that is four times the negative percentage distribution. Likewise, it was possible to conclude that at the level of the dominant polarities, none of the 26 opinions had the negative polarity as dominant. The above is explained by the fact that in most of the comments the good relation between quality and price of the analyzed product stands out.
To enrich the functionality of the proposed tool, link the functionality of frequent words analysis in the text to identify the most common terms with which users or customers describe the products is a must. In the same way, a comparative study is intended to be carried out in the future between the results of sentiment and/or polarity analysis and the star ratings provided by customers on electronic commerce websites.