Public Response to the Legalization of The Criminal Code Bill with Twitter Data Sentiment Analysis

— The Criminal Code Bill, also known as Rancangan Kitab Undang-undang Hukum Pidana (RKUHP), passed in the House of Representatives (DPR) on December 6, 2022, is being debated because several issues need to be fixed. Therefore, research was conducted to determine the public's reaction to the ratification of the Criminal Code Bill by analyzing Twitter data. This study aims to obtain a general response to the legalized RKUHP. We use sentiment analysis, a text-processing method, to get data from the public. To do this, we used N-grams (unigrams, bigrams, and trigrams) along with three algorithms: Naïve Bayes, Classification and Regression Tree (CART), and Support Vector Machine (SVM). The result of sentiment analysis found that 51% of tweets were positive about the ratification of the RKUHP, and 49% were negative. In addition, it was also found that SVM has the best accuracy compared to other algorithms, with an accuracy value of 0.81 on the unigram combination.


I. INTRODUCTION
The Criminal Code Bill, also known as RKUHP, was signed into law on December 6, 2022. This is a momentous occasion, as this law would replace the existing Criminal Code (KUHP) [1]. Because the current Criminal Code is a legacy of Dutch colonialism, it is a version of the Wetboek van Strafrecht voor Nederlandsch-Indie. Changes are needed because the old Criminal Code is not keeping up with the times [2]. In addition, the revision of the previous RKUHP was carried out in partly making new laws related to the KUHP, which made regulations run wild, had no system or pattern, was inconsistent, made problematic laws, and even damaged the basic building system old KUHP [3]. RKUHP will be valid for three years from the date of promulgation [4]. However, the RKUHP is considered to have problematic articles still. Nurina S. (2022), in an interview with The Guardian, stated that at least 88 reports contain broad provisions that can be exploited and misconstrued by both the government and the general public to punish anyone and suppress the freedom of expression [5].
The response to ratifying RKUHP still has pros and cons. Because responses are essential determinants of every human action, interesting to see the public's response to RKUHP; when deciding, we need others' opinions. Companies or governments must know how the public feels about their products and services. The public sometimes utilizes Facebook and Twitter to engage socially online. Web-based social networks gradually engage the public [6]. This is consistent with research conducted in the United States about the public's reaction to the Chicago Department of Public Health's laws on electronic cigarettes, which examined the public's response on Twitter. The data can help organizations predict, recognize, and respond to how the community will react by finding patterns in how people have responded to this policy [7]. In addition, according to research conducted in Mexico, governments frequently use Twitter to interact with their citizens. As a result, it has emerged as a valuable source of information for studying how governments interact with their constituents and how those citizens respond to those communications. These insights about how people interact with the government can be used to help make public policies and understand how the public sees those policies [8].
The ratification of RKUHP has the same context. It will be fascinating to watch how Twitter data is utilized to gauge public opinion toward the ratification of RUU KUHP because these messages on Twitter are openly accessible. Consequently, it can be viewed as raw data primarily for the extraction of opinions and for the analysis of policy by analyzing the sentiment [9]. This will aid the government's ability to forecast, detect, and respond to the public's reaction to the dissemination of information before it is completely implemented. Sentiment analysis is another term for "opinion mining" or "emotion Artificial Intelligence‖. It refers to applying natural language processing (NLP), Using text mining, computational linguistics, and biometrics to carefully identify, extract, assess, and look into people's emotions and personal data [6].
This research aims to identify the sentiments surrounding the ratification of the RKUHP. The analysis results are reprocessed to determine what aspects of RKUHP concern the public. The use of public sentiment will assist the government in gauging the public's reaction to the ratification of the RKUHP and can be utilized as input for the planned socialization. In addition, by using multiple algorithm models, this research will identify the optimal categorization model that might be used by the government when trying to determine public responses with data from twitter. This research consists of five sections. The introduction, which contains the research's context and objectives, is the first section. The second section is a review of previous research and the theoretical framework. The study's research methodology is described in the third section. The fourth section is the results and discussion, which includes the findings from the research. The conclusion is the concluding section of the study. www.ijacsa.thesai.org

A. Previous Research
This section contains considerable research that employs various methods for sentiment analysis. The first research authors use one methodology for measurement sentiment analysis, shown in Table I. Authors in [10] have investigated the Naïve Bayes algorithm's capacity to classify public mood under COVID-19's new normal. From the 2807 tweets that have been processed, the test results show that Naïve Bayes has done an excellent job, with an accuracy of 83% and an F1score of 84%. The author in [11] researched sentiment analysis using the Support Vector Machine (SVM) with Weka (Waikato Environment for Knowledge Analysis) method and tested it on three different data sets with various labels. Because of this, the data set with the highest f1-score is the third one, which only has two titles: positive and negative.
Further research uses two methodologies for measurement sentiment analysis. In a study [12], sentiment analysis tests on comments on YouTube using Naïve Bayes and Support Vector Machines (SVM). Results when using a data scale of 7:3, with 70% of the data used for training and 30% for testing, show that the combination of Naïve Bayes and SVM results in higher accuracy and superior performance. In a study [13], researchers compare Naïve Bayes and SVM to evaluate the classification results that each method produces. Twitter data is used in this study for Tokopedia services. The outcomes demonstrated that, with an accuracy of 83.34%, the SVM linear kernel technique surpassed the Naïve Bayes technique. In a study [14] Using Twitter data, researchers assess the sentiment analysis of the COVID-19 virus infection on Indonesian public transportation. In this study, the authors used two comparison methods: Naïve Bayes and decision trees. The result is that Naïve Bayes outperforms the Decision Tree with an accuracy of 73.59%.
The third research uses more than two methodologies for measurement sentiment analysis. In a study [15], Researchers researched the sentiment analysis of tourists in Thailand during the COVID-19 pandemic. This study used three methods: SVM, Classification and Regression Tree (CART), and random forest. Consequently, SVM could identify the attitudes and intentions of the English-language tweets that included Phuket and Chiang Mai the best. Still, for tweets mentioning Bangkok, CART is the most accurate, with accuracies of 94.3%. Bangkok has more data tweets than others. Subsequent research, customer reviews of Amazon products. Researchers in this study [16] used four sentiment analysis methods: Naïve Bayes, SVM, Decision Tree, and K-Nearest Neighbor. In addition, this research also added TF-IDF and N-gram to its processing. The results of the TF-IDF method with N-grams show unigrams with SVM were the maximum accuracy results for Amazon product customer reviews. This study also found that comments on Amazon products influence potential consumers' purchasing decisions. The two studies [15] [16] were conducted to determine the differences and accuracy of the sentiment analysis method. This study aims to assess the impact of sentiment (positive, negative, and neutral) and Amazon product reviews on sales performance. Also, to The Result found that comments on Amazon products influence potential consumers' purchasing decisions.
addition, the TF-IDF method with N-grams shows unigrams with SVM were the maximum accuracy results for Amazon product customer reviews.
Based on previous research, researchers will use the Naïve Bayes [10][13] [14], SVM [11][12] [13][15] [16], and CART [15] in evaluating sentiment analysis. In addition, the N-gram and TF-IDF methods will be used because they are proven to increase accuracy [16]. The study used positive and negative labels because it was established in research [11] that they have the highest accuracy compared to data using more than two labels.

B. Sentiment Analysis
According to Pang et al. (2002), opinion mining and sentiment analysis are two terms that refer to the same process. Sentiment analysis automatically analyzes, extracts, and textually processes material to derive the sentiment information in a single opinion sentence. An individual's perspective, or their predisposition to have a positive or negative view or opinion about a particular issue or object, can be determined using a technique known as "sentiment analysis" [17] [12].

C. Data Preprocessing
Data Preprocessing involves converting raw data into a format the user may understand. Frequently, the data must be more structured and consistent, lack specific behaviors or patterns, and contain missing values, all of which contribute to many errors. Consequently, it needs to be cleaned, integrated, altered and decreased. The noise is eliminated, and missing values are filled in when cleaning is performed [18] [19].

D. N-Gram
The word n-gram feature counts sets of sequential N words in each tweet, where N can range from 1 to N. [20]. N-grams can be more informative. There could be t 2 bigrams containing t different words. In practice, only some characteristics are generated because terms can't follow each other. Usually, ngrams are more distinct than words. A more extensive, less common feature space is an n-gram. A larger n increases information and computational expense [21]. In this research, we combine the unigram, the bigram, and the trigram forms of the n-gram.

E. Term Frequency -Inverse Document Frequency (TF-IDF)
According to Jones (1972), Inverse Document Frequency (IDF) is a technique that can be combined with term frequency to lessen the influence of implicitly famous words in the corpus. This is how IDF is meant to be used. IDF gives greater weight to terms that appear more frequently in the document, regardless of whether those words are used often or infrequently [22] [23]. TF-IDF is now the most popular text classification and document categorization scheme [24] [21].

F. Naïve Bayes Algorithm
This categorization method is based on Bayes' Theorem and makes strong (naive) assumptions about feature independence. A Naïve Bayes classifier makes the following assumptions: that the proximity of one feature (element) within a class is unrelated to the proximity of other items. The Naïve Bayes algorithm is often used to divide texts into different groups, and it was recently used to separate data from sentiment analysis into groups [6].
The algorithm relies on Bayes' theorem and presumes that the class variable's value provides information for all variables independently. It is simple to program the Naïve Bayes classification algorithm to perform exceptionally well in supervised learning, and it can also be used in difficult realworld situations. The Naïve Bayes method is simple to grasp, needs an education dataset to figure out how to calculate its variables, doesn't care about things that have nothing to do with the problem, and works well with correct data from a single source [25] [10].

G. Support Vector Machine (SVM) Algorithm
According to Han et al. (2012), the Support Vector Machine (SVM) algorithm's goal is to locate the Maximum Marginal Hyperplane (MMH) by utilizing margins and support vectors. The MMH hyperplane is the best one available since it has the most significant margin distance and can be used to accurately and maximally segregate data for each class. Suppose both margins are in a position that is parallel to the hyperplane. In that case, the margin is defined as the point at which the shortest distance from a hyperplane to one side equals the distance from the hyperplane to the other side of the margin [26] [24].

H. Classification and Regression Tree (CART)
The classification and regression trees (CART) method is a systematic technique that was developed by Breiman et al. (1984) [27] [28]. For the construction of decision trees, CART employs historical data. The dependent variable decides whether a classification tree (for categorical categories) or a regression tree (for variables with continuous categories) will be formed. The newly discovered observations can then be predicted (using a regression tree) and classified (using a classification tree) using the constructed tree. Contrary to classification trees, regression trees do not have any predetermined classes. On the other hand, classification trees allow the user to select or calculate dependent variable types based on an external criterion. [27] [29][30] [28]. The CART approach consists of three steps: (1) the creation of the entire tree; (2) the selection of the ideal tree size; and (3) the evaluation of the results. (3) using a built tree to organize data or generate new information [28]. www.ijacsa.thesai.org

III. METHODOLOGY
The research consisted of several stages, including the collection of data, the creation of data sets, the labeling of data, the processing of data, the grouping of words using n-grams and term weighting (TF-IDF), classification modeling, the evaluation of classification modeling, and, finally, the output of sentiment results and recommendations. This is shown in Error! Reference source not found..

A. Data Collecting
Python and the twitter-snscrape library package are used to harvest Twitter tweet data at this step. Data was gathered using a search for the phrase "RKUHP" tweeted between December 6, 2022, and December 31, 2022. Tweets taken are in Indonesian, and identical tweets will be deleted. Related Tweets that only use the RKHUP hashtag and only contain ads will also be disqualified. Tweets are not converted into English due to possible differences in meaning in processing. All words resulting from sentiment will use the Indonesian language.

B. Data Labelling
In this phase, the training data is labeled manually whether tweets are positive (pro) or negative (con) with the ratification of the RKUHP. In this phase, irrelevant tweets are also deleted.

C. Data Preprocessing
In the preprocessing of tweet data, a series of operations are performed so that machine learning algorithms can read the tweet's standards and patterns in Table II. The method is as follows: 1) Case Folding turns all capital characters in tweets into lowercase letters.
3) Tokenization is the process of breaking sentences into separate words. 4) Stop Word is the process of removing words that don't add any meaning. 5) Normalization is the process of uniforming words with the same meaning but different spellings.
6) Stemming is changing words that have affixes into essential words.

D. N-Gram
In this phase, word separation is carried out; we combine the unigram, the bigram, and the trigram forms of the n-gram. Words are created using unigrams (one word), bigrams (two words), and trigrams (three words). Tweets that have at most three words will be deleted.

E. Term Weighting
The next step was word feature extraction using the term frequency-inverse document frequency (TF-IDF). The word weight in a given document is typically calculated using the TF-IDF technique. The term frequency describes the often appearing; that often appears in a manuscript (TF). Frequently occurring terms will obstruct the search for uncommon words. The inverse document frequency (IDF), which lessens the weight of often-appearing words, can gauge how significant a word's meaning is in a document [31].

F. Classification Modelling
In this step, classification modeling is applied to the test data using three machine learning algorithms: Nave Bayes, SVM, and CART. Modeling is done separately to produce accurate results. Each algorithm tests the words formed in the ngram process, and the term weighting process has been carried out. This classifier uses the sklearn library in Python. This study used 80% training data and 20% testing data. This is so that machine learning algorithms can perform better, according to research by Pham

G. Classification Modelling Evaluation
In this phase, the performance of each machine learning algorithm in the previous step will be evaluated. Evaluation is conducted using a confusion matrix by looking at the value of the accuracy of each algorithm. Accuracy, precision, and recall are the evaluative test parameters whose computations are derived from the confusion matrix table [13].

H. Sentiment Result and Discussion
This is the final stage in producing sentiment words for the word cloud. Which, according to the N-gram phase, consists of one word, two words, and three terms and is derived from the sentiment with the maximum accuracy. Then, the discussion will be made in light of these findings.

A. Results of Classification Modeling Evaluation
The number of tweets extracted using the snscrape library is 17,107. After cleaning the same tweets, the number of tweets increases to 10,763. Then, label each tweet manually. Then, preprocessing process the tweet and generate it again to yield 9,079 tweets. The tweet then executed the classification algorithms and the n-gram combination method. After preprocessing, the dataset is split into training and test sets. 80% of the dataset is used for training, and the remaining 20% is used for testing. The dataset's features are produced using an n-gram mix of unigrams, bigrams, and trigrams. The created words will then be weighted using term analysis. Different data are made when n-grams and term weighting are combined. The results are presented in Tables III, IV, and V. Using the confusion metrics, we have calculated the performance of each algorithm here. The confusion matrix, which measures the classification overlap, is an effective tool for performance evaluation. The multi-label classification task must establish the confusion matrix because each instance may be assigned to multiple classes [34]. The performance evaluation of the multi-label classifier is based on computing performance averages, including precision, recall, and F1-score [34]. Precision measures how accurate a class's predictions are relative to all the predictions included in the course. Recall is the percentage of a class's total number of categorized facts that can be predicted accurately. The f1 score calculation was then utilized to mix the precision and recall [35] [12].
For each n-gram combination used, precision, recall, and f1-scores for the CART algorithm are displayed in table III. The findings of CART do not differ much when unigrams, bigrams, or trigrams are used. In the bigram findings, for example, the precision value is 0.73 for negative and 0.75 for positive, and the recall value is 0.70 for negative and 0.75 for positive. The f1 values for positive and negative are then 0.72 and 0.74, respectively. As shown in Fig. 2, out of the 852 negatively judged tweets, 624 were true negatives (TN), and 228 were false negatives. In contrast, out of 964 positive tweets, 263 were false positives (FP), and 701 were true positives (TP).  Table IV shows the SVM algorithm's precision, recall, and f1-score for each n-gram combination. The unigram test had the best average outcomes, with precision values of 0.81 for negative and 0.80 for positive groups and recalled values of 0.76 for negative and 0.82 for positive. The f1-score is 0.78 for the negative and 0.81 for the positive. As shown in Error! Reference source not found., 711 of the 877 tweets that received a negative evaluation were true negatives, and 166 were false negatives. Comparatively, out of 939 positive tweets, 186 were false positives, and 753 were true positives.  The Naïve Bayes algorithm is presented in Table V   Calculating the accuracy of each method is another function of the confusion matrix, which can be seen in Table  VI. It has been demonstrated that the SVM constructed using the unigram has the maximum accuracy, equal to 0.81. In addition, bigram and trigram SVM continues to have the highest accuracy compared to other algorithms, with respective values of 0.79 and 0.78. When utilizing trigram combinations, Naïve Bayes on 0.78 achieves a higher level of accuracy. CART has the same accuracy in all ngram combinations. The analysis results in Table VI are consistent with research in [13] and [15] that shows that SVM has higher accuracy when compared to Naïve Bayes and CART. The SVM study achieved an accuracy of 83.34% and a Naïve Bayes of 75%. According to research [15], the amount of data used by the random forest and CART algorithms determines the soundness of multiple decision trees, the complexity of the trees, and thus the algorithm's accuracy. This explains why CART has the same accuracy because it has the same number of data sets.
In line with the results of this investigation, a study in [16] discovered that SVM with the unigram combination had the highest accuracy when compared to the other ngram combinations. This is likely due to the ease with which SVM can map words weighted with TF-IDF rather than utilizing multiple words to infer sentiment. This compares favorably with naïve Bayes, where the more word combinations in ngrams, the higher the accuracy. Many ngram combinations raise the level of accuracy in Naïve Bayes. Therefore, based on this, it was found that the combination would affect the accuracy of each algorithm. SVM is preferred over algorithms, Naïve Bayes, and CART because of its high accuracy. For Naïve Bayes, a higher gram would be preferable. Because CART is affected by a large amount of data, vast amounts of data will affect its accuracy.

B. Content Sentiment Analysis
Nine thousand seventy-nine tweets were included in the data obtained after being processed using Python and Microsoft Excel programming languages. Several duplicate and irrelevant tweets have been removed from the message. The result is that 51% of tweets, or 4.623 of them, favor the ratification of the RKUHP, while 49% of tweets, or as many as 4.455 of them, are in opposition to it, as can say be seen in Error! Reference source not found.. www.ijacsa.thesai.org The word cloud for a negative sentiment is displayed in Error! Reference source not found.. Negative sentiment is associated with a wide variety of topics and concepts, including ‗tolak', ‗kontroversi', ‗kritik', 'hina', 'koruptor', 'penting kuasa' The most common words are 'sah,' 'tolak,' 'rakyat,' and 'negara' in that order. The RKUHP received a negative response because it was thought to contain several articles that could be construed as contentious. Based on word cloud sentiments such as the words ‗kritik','hina' and ‗demokrasi', some articles are considered to silence criticism, specifically regarding insulting the president. Then the words ‗koruptor' and ‗korupsi' articles regarding corruption, with a minimum reduction in prison for corruption. Then there's the phrase ‗penting kuasa' and ‗rakyat', because some people believe that a lot of the new RKUHP articles were written more for the authorities' interests than for the people's interests themselves. Besides having negative sentiments, there are also words representing positive sentiments in the word cloud. Word like ‗sah', ‗baru', ‗hukum pidana', ‗baru' dan 'tuju' support the approval of the ratification of this RKUHP. The RKUHP is significant because it strengthens Indonesia's current criminal code. In Indonesia, criminal law that has undergone patchwork is no longer regarded as complying with legal criteria. The positive word Cloud is shown in Error! Reference source not found.. We may deduce what words are at the center of people's conversations based on the outcomes of positive and negative sentiments in Error! Reference source not found. and Error! Reference source not found.. The words that arise may serve as a first reflection for the organization of positive and negative things that are the community's response. This can be used as a resource for organizations to improve their understanding of the policies they issue. This is in line with the findings of a study [8] on how the Mexican government uses Twitter to connect with the people.
As a result, the outcomes of these attitudes can be employed by the government as a foundation for socialization. Because there is still the problem of the pessimism of RKHUP, there are still drawbacks to the ratification of the RKUHP, which is still relatively high and reaches 49% of the population. To find a solution to this problem, the government needs to engage in more social activities and listen to people's perspectives on matters that are regarded as contentious. This is done to ensure that both the adoption of the RKUHP in 2025 and its passage into the Criminal Code happen smoothly. Words that elicit negative responses might be utilized as the primary focus of socialization. This will help in mitigating the public's adverse reaction. Mitigation of this rejection will be better if there is a grouping of tweets based on topic, as in Research [36]. In this study, we used BERTaopic to classify the tweets. BERTopik will help categorize tweets and make it easier for the government to bring up specific RKUHP-related topics. Throughout the processing, it was discovered that the steaming method had limitations since particular terms, such as -pengesahan‖, were mistakenly converted into the root word -kesah‖. Hence the potential limits of the world cloud's word interpretation.

V. CONCLUSION
Examining Twitter sentiment, this study identifies responses to the ratification of the RKUHP. The RKUHP ratification drew 51% positive and 49% negative comments on Twitter. This demonstrates that, even though the positive is superior, the value is just 2%. According to the negative comments, the problem of controversial articles is related to the article about insulting the president, the post about cutting punishments for corrupt officials, and the piece about not representing the people. This must be the emphasis of the government's efforts to socialize the RKUHP. www.ijacsa.thesai.org The evaluation of the three tested algorithms-CART, SVM, and Naïve Bayes -found that SVM had the highest accuracy and was the most reliable even when the n-gram combination was used. SVM produces an accuracy value of 0.81 on the unigram, 0.79 on the bigram, and 0.79 on the trigram. This research is limited to grouping tweets that have yet to be grouped into specific topics and imperfections in the steaming process. It is hoped that future research can categorize recent tweets based on grouping relevant issues related to the RKUHP so that they are not only the results of grouping terms from the Word Cloud. It can also add more data which makes the topic even better. In addition, it can improve the algorithm steaming process to make it better.