Formulation of Association Rule Mining (ARM) for an effective Cyber Attack Attribution in Cyber Threat Intelligence (CTI)

In recent year, an adversary has improved their Tactic, Technique and Procedure (TTPs) in launching cyberattack that make it less predictable, more persistent, resourceful and better funded. So many organisation has opted to use Cyber Threat Intelligence (CTI) in their security posture in attributing cyberattack effectively. However, to fully leverage the massive amount of data in CTI for threat attribution, an organisation needs to spend their focus more on discovering the hidden knowledge behind the voluminous data to produce an effective cyberattack attribution. Hence this paper emphasized on the research of association analysis in CTI process for cyber attack attribution. The aim of this paper is to formulate association ruleset to perform the attribution process in the CTI. The Apriori algorithm is used to formulate association ruleset in association analysis process and is known as the CTI Association Ruleset (CTI-AR). Interestingness measure indicator specially support (s), confidence (c) and lift (l) are used to measure the practicality, validity and filtering the CTI-AR. The results showed that CTI-AR effectively identify the attributes, relationship between attributes and attribution level group of cyberattack in CTI. This research has a high potential of being expanded into cyber threat hunting process in providing a more proactive cybersecurity environment. Keywords—Cyber threat intelligence (CTI); association rule mining; apriori algorithm; attribution; interestingness measures


I. INTRODUCTION
As the Tactic, Technique and Procedure (TTPs) used by an adversary become unpredictable, determined, imaginative, funded, far more coordinated and financially motivated, acquiring useful information from threat information sharing is essential for cyberattack attribution. Cyber Threat Intelligence (CTI), as one of threat information sharing frameworks, has received a lot of media attention in mitigating and reducing cyberattack infection. However, one of the common issues in CTI is the quality of voluminous data from shared information and there is scarce literature in discussing the meaning of quality, basic methods and tools for assessment [1]. A huge volume of data in the CTI consists of raw data without a meaningful relationship between the data. This voluminous data can lead to the ineffectiveness of identifying cyberattack attribution levels due to a lack of useful data from various data sources. Cyberattack attribution process can provide a meaningful relationship between data by identifying the attribution level and hidden knowledge behind the data to assist organizations in decision making [2]. However, the current cyberattack attribution technique is ineffective in handling the voluminous data in CTI because it relies heavily on the manual process performed by the security analyst and is strictly related to the analyst's knowledge, creating human bias and errorprone [3].
This paper highlight the data mining process in solving the voluminous data issue that can help security analyst to find the relationship between datasets and perform the cyberattack attribution process in CTI. The proposed study was to formulate an association ruleset for cyberattack attribution process in CTI. This ruleset would enable the discovery of hidden knowledge behind the raw data in identifying the attribution level.
The remaining of the paper is organized as follows: Section II presents the research background and related work based on association rules mining in CTI. Section III describes the proposed methodology that includes data collection using CTI feeds, dataset for CTI feeds, association rules mining in CTI framework and formulation of association ruleset using the Apriori algorithm. While Section IV represents the outcome for association ruleset formulation in CTI and evaluate the ruleset generated using interestingness measures. Finally, Section V provides a brief conclusion for this paper.

A. Cyber Threat Intelligence (CTI) for Threat Attribution
There has been a lot of studies in the area of data mining to discover its insights in terms of large groups of items or objects in transactional databases, relational databases, or other information repositories using Association Rule Mining (ARM) technique. Association Rule Mining (ARM) is an important research branch of data mining which has attracted many data mining researchers due to its capability to discover useful and interesting patterns from extensive, noisy, fuzzy and stochastic data. The concept of ARM was introduced by Agrawal and Srikant [4]. In the data mining field, ARM can be utilized as a part of cyberattack attribution process in CTI to discover the hidden knowledge behind raw data. A critical issue for cyberattack attribution in CTI is how to successfully and effectively extract the hidden knowledge from the voluminous data and feasibly create the association ruleset for cyberattack attribution to assist security analysts in decision making.
Since the introduction of the first concept of ARM by Agrawal et al. [5], a wide variety of efficient ARM algorithms for generating association rules have been proposed over time. Some of the well known and most important algorithms are Apriori, Apriori-TID, SETM, Apriori Hybrid, AIS and Fpgrowth [6].
Currently, the most widely used algorithms in ARM is Apriori Algorithm. Agrawal and Srikant developed this algorithm to study customers' purchasing behavior in supermarkets where goods are often purchased together by customers [4]. Besides, the Apriori Algorithm has also been used successfully in many areas of daily life, including energy, recruitment, communication protocol, monitoring and network traffic behavior [7]. Hence, the implementation of the Apriori Algorithm in determining malicious network traffic behavior can help security analysts to study attacker behavior in conducting cyberattack.
Apriori algorithm has been implemented in various fields. Khalili and Sami [8] proposed an industrial intrusion detection approach to mitigate threats to cyber physical systems that utilise sequential patterns extracted by the Apriori algorithm to aid experts in identifying critical states. The study showed Apriori could be employed in the extraction of sequential patterns for industrial process monitoring. A study conducted by Hsiao et al. investigated the use of the Apriori algorithm to track adversaries transitioning through sequences of hosts to launch an attack [9]. Data are retrieved from network packets to determine the host sequence. The Apriori algorithm is proven to be suitable for this study. Meanwhile Liu et al. have utilized Apriori and MS-Apriori algorithm to investigate the relationship of data for network footprint (NFP) which consists of DPI data from ISPs and Crawler data from Web for App usage analysis [7]. The result provides insights for mobile application developers to recommend other applications for their users based on their interest and usage pattern. Adebayo and Abdul Aziz presented a novel knowledge-based database discovery model that utilizes an improvised apriori algorithm with Particle Swarm Optimization (PSO) to classify and detect malicious android application [10]. The usage of several rule detectors can maximize the true positive rate of detecting malicious code, whereas the false positive rate of wrongful detection is minimized. The use of the Apriori algorithm outside the cybersecurity domain has also been explored. It is used for smart health services in a study conducted by Jung, Kim and Chung [11]. The Apriori algorithm was used for a series of patient images acquired through the surveillance technology to generate bio-sequential patient patterns. The biosequential patterns are then used to create a basis for a biosequential pattern and any deviation from this could result in a possible emergency. The study demonstrated that the Apriori algorithm is used to develop bio-sequential patterns and could be used to extract patterns from the adversary SSH command sequence. Other than that, the Apriori algorithm is also being employed in a study to discover the contributory crash-risk factors of hazardous material (HAZMAT) vehicle-involved crashes on expressways [12]. The findings from this study indicated that ARM is a feasible technique of data mining that can be used to draw correlations between HAZMAT vehicleinvolved accidents and significant crash-risk factors, and has the potential to provide more easy-to-understand findings and applicable lessons for improving the expressways safety.
In this paper, we collect CTI data from current cyberattacks which contained network resources and attackers' behaviour and do association rules analysis using Apriori to generate rules. These rules would enable the discovery of hidden knowledge behind the raw data in identifying the attribution level.

III. METHODOLOGY
In this section, the experimental design to generate the association ruleset in CTI for cyberattack attribution is presented. The input of this experimental design was CTI feeds from OSINT. Data preprocessing technique were used to clean the CTI feeds and produce meaningful data that were used to generate the association ruleset. By conducting this experiment, the association ruleset could be produced to identify the hidden knowledge behind attributes in CTI feeds and identify the attribution level for cyberattack attribution in CTI. The design of experiments is shown in Fig. 1. Fig. 1 illustrates the entire process of association rule mining in CTI framework that consists i) Preprocessing network traffic data, ii) Generating logical rules using Apriori algorithm and iii) Apply the generated rule to facilitate cyber attack attribution. The Apriori Algorithm can discover groups of items occurring frequently together in lots of transactions and such groups of items are called frequent itemsets. The association rule generated from this process is measured using support, confidence, and lift. Given a set of transaction, the problem of mining association rules is to generate all association rules that have support and confidence greater than the user-specified minimum support (called minsup) and minimum confidence (called minconf) respectively.
To conduct Apriori algorithm on our dataset, we used R to process the filtered data and visualize the result. R is a language and environment for statistical computing, data mining and graphics.

A. Data Collection for CTI Feeds
Data collection for this paper is limited to CTI feeds from OSINT that related to network intrusion activities. For this paper, OSINT CTI feeds from Shadowserver, Lebahnet and MITRE as shown in Fig. 2 has been chosen because it can provide various types of useful information and Indicators of Compromise (IoC) for cyberattack attribution [13]- [15]. The focus of this research was to gather CTI data comprising network resources and attacker behaviour from existing cyberattack. Fig. 2 shows data collection process for CTI feeds. An API from each CTI feeds was used to collect the data, respectively. Thus, a scraper was used to collect popular network resources such as the domain of search engines or government website, IP address of common DNS server and MD5 hash value of notorious malware from CTI feeds. The examples of attributes collected from each CTI feeds are listed in Table I.  Shadowserver security feeds provided information about all the infected machines, drones, and zombies that were captured from the monitoring of IRC Command and Controls, capturing IP connections to HTTP botnets, or the IPs of spam relays. Lebahnet security feeds provided valuable supporting information such as network trends and malicious activities that were captured using a collection of distributed honeypots. Both security feeds could provide basic indicators of compromise such as IP address, domain name, URLs, hash value, malware infection type and geolocation. In contrast to Shadowserver and Lebahnet, MITRE knowledgebase was about high-level IOC that related to the behaviour of cybercriminals. MITRE datasets contained various tactics, techniques, software or tools and attackers groups that involved different stages of a cyberattack when infiltrating the network and exfiltrating data. The combination of basic IOC from Shadowserver and Lebahnet and attackers behavior from MITRE knowledgebased was essential in identifying the attribution level for cyberattack attribution in CTI.

B. Dataset for CTI Feeds
The domain of this research was limited to the cyber threat intelligence that related to network intrusion activities and the datasets limited to CTI feeds from OSINT. An API from each CTI feeds was used to collect the data, respectively. Thus, a scraper was used to collect popular network resources such as the domain of search engines or government website, IP address of common DNS server and MD5 hash value of notorious malware from CTI feeds. The CTI feeds covered the top 3 highest infections from 2018 until 2019 in order to be considered relevant cyberattack effort [16]. The summary of each dataset (DS) is depicted in Table II.  Table II shows four datasets from Shadowserver, four datasets from Lebahnet and four datasets from MITRE were collected in this research. The total datasets is twelve and naming as DS1, DS2, DS3, DS4, DS5, DS6, DS7, DS8, DS9, DS10, DS11, and DS12. DS1 to DS3 used for training purposes and explain in Section III (C). While DS4 to DS12 used for evaluation and validation purposes but only result for DS4 explain in Section IV. The rest of DSs were using the same process, hence, adopting the similar explanation as DS4.

C. Association Rule Mining Algorithm in CTI Framework
After the CTI feeds have been preprocessed for producing clean and useful data, the results will be used for association analysis to formulate an association ruleset. This association ruleset is to facilitate a cyber-attack attribution process in the CTI framework to produce an effective threat attribution. The association ruleset can assist security analysts in identifying the origin of the cyberattack and cyberattack attribution level.
To have a general view on the result generated by using R, we set the minimum support value as 0.001 and the minimum confidence value as 0.5. The overall association ruleset analysis classification in CTI was shown in Table III.
The attribution level was divided into three levels namely Level 1, Level 2 and Level 3 [17]. The attributes in Level 1 consisted of IP address, malware type, hash value and port number, Level 2 was Geolocation and Level 3 needed further analysis of the attributes from Level 1 and 2 to identify the person or attack campaign used by an attacker to launch the cyberattack. However, if the dataset acquired contained the TTP about attackers' behaviour such as datasets from MITRE, then, the attribution for Level 3 was achievable without further analysis from the association ruleset in Level 1 and 2.
Based on the analysis in Table III, three attribution levels can be used to identify the identity and location of an attacker and it can be correlated to CTI type to ease a decision making in an organization.
Table IV depicts the relationship of attribution level and its attribute with CTI types that are useful for verifying the effectiveness of the proposed cyberattack attribution in CTI. Level 1 and Level 2 are parts of tactical intelligence, and the outputs can help an organization to deal quickly and accurately through threatening indicators and prioritize vulnerabilities patches. Level 3 is part of operational intelligence, and its output can improve the detection rate and prevent future incidents as attacks can be seen in a clear context. The conclusion of output from level 1,2 and 3 are part of strategic intelligence which can drive organizations' decision making in terms of security countermeasures and improved areas through comprehending the current attack trends and financial impact to organizations.   Based on overall association ruleset analysis classification in Table III and attribution level and attribute relationship with CTI type in Table IV, Attribution Level Group for each ruleset (ALGR) is proposed as shown in Table V.   TABLE V. ATTRIBUTION LEVEL GROUP RULESET

ALGR1
This group is to represent any ruleset under attribution level 1

ALGR2
This group is to represent any ruleset under attribution level 2

ALGR3
This group is to represent any ruleset under attribution level 3 By using the association ruleset classification in Table III  and the proposed ALGR from Table V, the general association ruleset can be defined as an equation (1).
Where, n= represent attribution level, Level 1, Level 2 or Level 3; LHS.A= Attribute from attribution level n from the left-hand side, RHS.A= Attribute from attribution level n from the right-hand side, and ALGR = the attribution level group ruleset. While the ruleset representation from the general equation in (1)  In this paper, ALGR, as illustrated in Table V and Equation (1), are used to perform cyberattack attribution in CTI.

D. Formulation of Association Ruleset in CTI
In order to prevent cybersecurity threat from causing a significant impact on business and daily life, an actionable threat intelligence with clean data can help an organization in making a fast decision for cyberattack attribution. Cyberattack attribution is defined as a process to identify the location and identity of attackers involved in cyberattack. It is a demanding task that requires a comprehensive intelligence or context to achieve the attribution levels that are divided into three levels namely (1) Attribution to the specific hosts involved in the attack, (2) Attribution to the primary controlling host, (3) Attribution to the actual human actor and attribution to an organization with the specific intent to attack. These attribution levels can only be achieved when an effective threat intelligence framework is in place. To achieve an effective threat intelligence framework, an organization needs to think of how to build a framework deemed appropriate, specifically, in gaining the hidden information behind the raw data in CTI to assist security analysts in performing cyberattack attribution. Hence, this research focused on formulating an association ruleset in CTI framework to perform cyberattack attribution in CTI. Fig. 3 illustrates Apriori algorithm technique that was used to formulate the association ruleset from CTI OSINT feeds that were collected through CTI framework.   Table VI. In Table VI, the purpose and process in CTI framework show that meaningful data that are derived from the preprocessing process are used by the attribution analysis process to identify the attribute and attribution level.

IV. RESULT
The objective of this section is to present the result of CTI-AR implementation and its effectiveness in performing cyberattack attribution in CTI.. This CTI-AR would enable the discovery of hidden knowledge behind the raw data in identifying the attribution level and help security analyst in making a decision for cyber attack attribution. An objective interestingness measure was used to filter and rank the massive amount of association ruleset or CTI-AR generated by Apriori algorithm. This research applied three objective evaluation indicators that were frequently used in Apriori algorithm which were support (s), confidence (c) and lift (l) to measure and determine the interest of ruleset [18]. Support reflected the practicality or usefulness of association rules, confidence reflected the validity or reliability of association rules and lift was to complement previous two evaluation indicators by filtering and removing wrong and meaningless ruleset.

A. Association Rules Analysis for Dataset
The dataset used to mine the frequent itemset was obtained from the 'Shadowserver security feed' named "ss_2019_3.csv". The dataset, dated from 01/05/2018 to 31/05/2018, consisted of malicious network transaction data in Malaysia. It comprised 462885 rows and 35 columns of data, as shown in Fig. 4.
After performing data cleaning by removing incomplete data and filling the missing values, only eight columns of attributes were selected for discovering frequent itemsets as described in Table VII. Fig. 5 shows a snippet preprocess data for DS4. Apriori algorithm used an iterative level-wise search technique to discover (k + 1)-itemsets from k-itemsets. First, the dataset was scanned to identify all the frequent 1-itemsets by counting each of them and capturing those that satisfy the minimum support threshold. The identification of each frequent itemset required the scanning of the entire dataset until no more frequent kitemsets was possible to be identified. As for DS4, the minimum support threshold used was 20% or 0.2. Therefore, only the attributes that fulfilled a minimum support count of 0.2 were included in the ruleset generation process.   By using the frequent itemset identification process in the Fig. 1, the results of frequent itemsets for DS4 with minimum support count 0.2 were ['195.38.137.100', '22', 'AM', 'DE', 'MY', 'US', 'gamarue']. Then, these frequent itemsets were applied to ruleset generation process as in Fig. 1 to create association ruleset with the predefined minimum confidence (minconf) value equal to 50% or 0.5. The value of minsup=0.2 and the minconf=0.5 were adjusted manually to discover some specific and interesting rules from a large number of random rules [19]. As a result, eighty-one association rules met this threshold configuration. In order to get a realistic overview of the results, the association rules were represented in a scatter plot, as shown in Fig. 6. Based on Fig. 6, support value represent x-axis and confidence value represent y-axis. For example, the first plot of association rules is located at the coordinate 0.2 for support and 1.0 for confidence. This indicate that the selected plot already meets the threshold for minimum support 0.2 and the threshold for minimum confidence 0.5. To further analyze the relationship between attributes for this association rules, the top five ruleset were selected and presented in Tables VIII, IX, and X based on three IMs; support,confidence and lift. Table  VIII shows top five association rules based on support with threshold configured as minsup = 0.2 and minconf = 0.5. Support could measure the usefulness of association ruleset based on the frequency of itemsets occurring together in the data transaction [20], [21]. The top five rules in Table VIII summed up the combination of rules among port number 22, geolocation MY, DE and IP 195.38.137.100 which indicated that there was a strong association among these four items that frequently occured together. However, ruleset number R1 to number R4 did not meet the requirement to be included in 139 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 4, 2021 attribution level as the implication of antecedents and consequents did not provide meaningful information for decision making. In contrast, the R5 association rule indicated that IP 195.38.137.100 frequently appeared together in the dataset with geolocation DE and provided insight to the security analysts to deduce that the cyberattack possibly originated from this IP and country.
While support measures the usefulness of itemset that is occurring together in data transaction, confidence measure can indicate the strength of association ruleset generated whether it is reliable and valid for decision making [20].  Table IX presents the top 5 most reliable rules with a threshold for minsup = 0.2 and minconf = 0.5. The top five rules based on confidence measurement showed that high confidence rules were usually related to port number 22, geolocation MY, DE and IP 195.38.137.100. This ruleset indicated that this IP was used by an attacker to launch a cyberattack and most probably originated from country DE. However, strong association rules are not always effective, some are not what users are interested in, and some are even misleading [21]. For this top five rules only ruleset number R3, R4 and R5 were reliable and were included in attribution level 2.
Support and confidence provided the information about useful rules based on occurrence and reliability of ruleset that occured in the dataset. Hence, lift measure was needed to complement these two IMs by helping to measure the importance of ruleset that suit the purpose of the research. Table X depicts the top five association rules for lift measure. Three categories were used to interpret the relationship of X / Y in lift measurement. If the lift is equal to 1, then, X and Y are independent. If the lift is higher than 1, then, X and Y are positively correlated. If the lift is lower than 1, then, X and Y are negatively correlated.
Based on Table X, the itemsets of 22, DE, MY, 195.38.137.100 and gamarue respectively had a positive correlation. Thus, this IP was malicious, being infected by gamarue and most probably originated from MY or DE. All ruleset met the requirement to be included in attribution Level 2.

B. Result of Evaluation and Validation for CTI-AR
This evaluation was to determine the capability of the proposed association ruleset for cyberattack attribution process in CTI. However, the number of association ruleset generated by using the proposed association rule mining could be massive and even tricky for domain specialists to study and summarize the meanings behind the ruleset. Moreover, it was also impractical to sift through a broad set of rules containing noise and irrelevant rules. Hence, the interestingness measure could be used for filtering or ranking association ruleset. This paper only focused on objective interestingness measure using support, confidence and lift to measure the meaningful and reliable association ruleset that were used to guide security analysts in making decisions. The thresholds for minimum support (minsup) and minimum confidence (minconf) were predefined manually by using trial and error method [7], [19], [22]. The summary of the association rules generated for all the datasets is depicted in Table XI using Apriori Algorithm.
Based on the association ruleset summary, the process of identifying the attributes in attribution level and classifying the ruleset into the respective attribution level group (ALGR) were conducted. Still, not all the generated ruleset met the requirement to be included in the respective ALGR because the ruleset must have at least one attribute from Level 1, Level 2 or Level 3 in both antecedents and consequents.
To further analyzed the findings of evaluation and validation for each association ruleset in Table XII, this paper summarize the ALGR and IM range for DS1 to DS12 in Table XI.   Table XII shows the range of IM capture from the strongest association ruleset that was generated using the general Equation (1), the threshold used to generate the ruleset and ALGR found in DS4 up to DS12. The value of range for support, confidence and lift in Table XII was used to validate and verify the strong association ruleset to be included in ALGR. Support could measure the usefulness of association ruleset based on the frequency of itemset occured together in the data transaction. Confidence indicated the strength of association ruleset generated whether it was reliable and valid for decision making. At the same time, lift measure was needed to complement these two IMs by helping to measure the importance of ruleset that suit the purpose of the research, whereby to perform cyberattack attribution process in CTI. Once the list of strong association ruleset was identified and met the threshold for minsup and minconf, this list of association ruleset was included in the respective ALGR based on the presence of the attributes in each association ruleset. The steps to classify the association ruleset into ALGR are explained in the following subsection. Table XII showed that the ruleset found in this research was effective in performing the cyberattack attribution because it could identify all ALGRs where each ALGR is mapped to different CTI type as discussed in Table IV and Table V. This CTI type was used by an organization for a specific purpose to prevent from cyberattack. For example, ALGR1 and ALGR2 were mapped to tactical intelligence subtype, hence, the outputs from these ALGRs could help an organization to deal with threat indicators and prioritize vulnerabilities patches quickly and accurately. Then, ALGR3 was mapped to operational intelligence and the output from ALGR3 could improve the detection rate and prevent future incidents as attacks could be seen in a clear context. The outputs from ALGR1, ALGR2 and ALGR3 were mapped to strategic intelligence to drive the organization decision making regarding security countermeasure and areas of improvement from the insights of current attack trends and financial impact to the organization.
The results of the evaluation and validation from the experimental approach are presented in Table XIII. Table XIII illustrates the top 5 association rulesets results from each Interestingness Measure (IM) based on support, confidence and lift measure that filtered and ranked to their respective ALGR. The ALGR grouping could provide hidden information behind the rulesets about attribution level that could help security analysts to perform cyberattack attribution process in CTI.
The association ruleset in Table XIII showed how attributes of LHS implied the attributes of RHS. For example, a ruleset {195.38.137.100,gamarue} ⟹ {22} indicated that an IP address 195.38.137.100 was infected by gamarue and had been used by an attacker to launch an attack using port 22. This ruleset provided the relationship between attribute and guidance to security analysts on the function of the attribute in the cyberattack. This knowledge can help security analysts to plan a mitigation action.
Table XIII also showed how association ruleset were divided into specific ALGR through IM. The grouping of association ruleset into ALGR was based on an attribute that was available in the particular ruleset. Table IV describes the details of attribute in each attribution level. The attributes description for attribution level in Table IV was used as a reference for distinguishing the presence of the attribute from a specific attribution level in each association ruleset. The attribute identification in ruleset could help security analysts to verify what type of attribution achieved from each ruleset. For example, a set of association ruleset in row number four from Table XIII was measured through confidence to prove the reliability of association ruleset provided the information about attribution on IP address, malware type, hash value and port number. The list of attribute found using confidence could be used by a security analyst for further investigation as it is valid and reliable.
Besides, Table XIII also summarized the list of association ruleset into respective ALGR. The classification of ruleset into ALGR was done based on discussion in Table IV and Table V. For example, ruleset classification to ALGR1 was based on the existence of the attribute from Level 1 in the ruleset. This attribute comprised IP address, hash value, malware type, domain name or URLs in the LHS or RHS of the ruleset. As for ALGR2, it required the occurrence of an attribute from attribution Level 1 and Level 2. Geolocation was an attribute of attribution Level 2.
In contrast, the classification of ALGR3 must have attribute from attribution Level 1, 2 and 3 occurred in the ruleset. However, there was also an exception in determining ALGR3, where TTPs alone was sufficient in determining the ruleset as part of ALGR3. It is because TTPs could provide the context to the association ruleset throughout the technique, tactic and procedure used by an attacker to launch the cyberattack.
The results from Table XIII indicated that the formulation of association ruleset from the proposed CTI-AR could help security analysts in making a decision about cyberattack 141 | P a g e www.ijacsa.thesai.org attribution and the details of the validation result are characterized in Table XIV.   Lift 3 Technique and tactic were found Therefore, using the characteristics shown in Table XIV, the CTI-AR was validated, as summarized in Table XV.   Table XV indicates the proposed CTI-AR which comprised all characteristics. The proposed CTI-AR was capable of generating the association ruleset from the frequent itemset process, identifying the relationship of attributes among the association ruleset, identifying the threat attribution level for each association ruleset and the attributes in attribution level. Based on the association ruleset and attribution level, the proposed CTI-AR was capable in performing cyberattack attribution process in CTI. These findings were then compared to the findings from the association rule mining (ARM) in existing CTI framework to validate the proposed CTI-AR as discussed in Table XVI.   Table XVI shows the comparison between the association rule mining in existing CTI framework and the proposed CTI-AR in CTI. Based on the characteristics, the ARM in the existing CTI framework is able to identify the attribution level but unable to classify and identify the complete list of attributes that belong to the attribution level. In contrast, the proposed CTI-AR in CTI is more capable in performing the attribution of cyberattacks not only by finding the relationship between the attribute but also providing additional information on the attribution level and attributes at the attribution level.