Early Detection and Defense Countermeasure Inference of Ransomware based on API Sequence

—Currently, ransomware attacks have become an important threat in the field of network security. The detection and defense of ransomware has become particularly important. However, due to the insufficient data and behavior patterns collected dynamically to detect variants and unknown ransomware, there is also a lack of specialized defense strategies for ransomware. In response to this situation, this article proposes a ransomware early detection and defense system (REDDS) based on application programming interface (API) sequences. REDDS first dynamically collects API sequences from the pre-encryption stage of the ransomware, and calculates the API sequences as feature vectors using the n-gram model and TF-IDF algorithm. Due to the limitations of dynamic data collection, API sequences were enhanced using Wasserstein GAN with Gradient Penalty (WGAN GP), and then machine learning classification algorithms were used to train the enhanced data to detect ransomware. By mapping the malicious API of ransomware to public security knowledge bases such as Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK), a Ransomware Defense Countermeasures Ontology (RDCO) is proposed. Based on the ontology model, a set of inference rules is designed to automatically infer the defense countermeasures of ransomware. The experimental results show that WGAN-GP can more effectively enhance API sequence data than other GAN models. After data augmentation, the accuracy of machine learning detection models has significantly improved, with a maximum of 99.32%. Based on malicious APIs in ransomware, defense countermeasures can be inferred to help security managers respond to ransomware attacks and deploy appropriate security solutions.


I. INTRODUCTION
In recent years, ransomware attacks have sharply increased, especially during the COVID-19 pandemic.This is because many companies and organizations require remote work, and their network security measures may not be sufficient to cope with the risks brought by remote work.In addition, ransomware developers are using increasingly complex technologies and means, making it more difficult to defend against ransomware.In recent years, some famous ransomware attacks have included WannaCry, Petya/NotPetya, and Ryuk.These attacks have caused significant economic losses and data breaches, affecting thousands of organizations and individual users.These events have elevated ransomware to a level of national security concern, prompting the US Department of Justice to classify such attacks as terrorist attacks [1].
Ransomware attacks can be divided into two categories: encrypted ransomware and locked ransomware.The characteristic of encryption ransomware is its ability to encrypt files, making it impossible for victims without decryption keys to move.Lock screen ransomware can lock the screen or operating system of the victim's host after infecting it [2].Currently, among the publicly available ransomware, encrypted ransomware is the most common type [3][4][5][6].Therefore, this article focuses on encrypted ransomware as the research object, and the "ransomware" mentioned below when not specifically mentioned are all encrypted ransomware.
After the ransomware completes the encryption of specific files on the victim host, these files on the victim host will lose availability and be difficult to decrypt without paying a ransom [7].Even if file recovery can be achieved through early backup, there may still be situations where information systems or files cannot be used normally during the recovery period, which affects the normal operation of enterprise business [8].Therefore, it is particularly important to accurately detect and defend against ransomware before encrypting files.After the ransomware infects the host, it will encrypt the files in the infected host as soon as possible, so in a short period of time, the ransomware will perform a large number of operations [9], calling a large number of application programming interfaces.Taking API sequences as the analysis object, detecting and responding to targeted defense strategies in the early stages of ransomware operation is a feasible technical means.However, traditional malware dynamic detection methods often only collect limited API sequences of ransomware, lacking sufficient data and behavior patterns to detect the increasing variety of ransomware and unknown ransomware.At present, there is a lack of defense measures against ransomware, and relying solely on data backup is not enough to cope with the continuous emergence of ransomware and its variants.To address this challenge, a combination of ATT&CK framework and D3FEND can be used to combat ransomware attacks.The ATT&CK framework is a knowledge base developed by MITRE company for describing and organizing network attack techniques.This framework provides rich information about the behavior of attackers, including the various stages and potential attack modes of ransomware attacks.By combining the knowledge of the ATT&CK framework, it is possible to better understand the behavior patterns of ransomware and establish more effective defense strategies.D3FEND is a framework used to describe and organize defensive measures.It provides defense www.ijacsa.thesai.orgstrategies and control measures for different ATT&CK attack technologies.By comparing with the D3FEND framework, it can be ensured that appropriate security measures have been taken and loopholes in existing defense strategies can be filled.
In response to the above issues, this article proposes a ransomware early detection and defense system (REDDS) based on API sequences.In order to address the limitations of the API dataset collected by ransomware for dynamic analysis of samples, this article innovatively applies WGAN-GP [10] technology to ransomware detection.WGAN-GP has better training stability and better generalization ability than other Generative adversarial network, and the data quality generated by WGAN-GP is higher.In addition, this paper conducts indepth research and analysis on the relationship between the specific API of ransomware and security knowledge sources such as ATT&CK, builds a ransomware defensive countermeasures ontology (RDCO), and designs Rule of inference to derive the defense countermeasures of ransomware.Starting from APIs to defend ransomware provides a new idea and method for the defense of ransomware, and has broad application prospects in practical applications.REDDS collects a certain number of preencrypted API sequences from normal software and ransomware, and generates feature vectors through n-gram [11] and Term Frequency Inverse Document Frequency (TF-IDF) [12].WGAN-GP is used to enhance API sequence data, and machine learning classification algorithms are used to train pre and post enhancement data to detect ransomware.And Rules of inference are used to infer the defense countermeasures against the malicious API of ransomware, as well as the timely response and defense of ransomware.The main contributions of this article are as follows: 1) A novel early detection and defense system for ransomware based on API sequences has been proposed.The system uses n-gram and TF-IDF to generate API feature vectors, and utilizes WGAN-GP for data augmentation, thereby improving the robustness and accuracy of early detection models.
2) REDDS constructs a ransomware defense ontology by analyzing the mapping relationship between malicious APIs in ransomware and attack techniques in ATT&CK.The ontology integrates multiple security knowledge bases, and designs Rule of inference from the malicious API to derive defense countermeasures against ransomware attacks.
3) In the experiment, 86 ransomware samples from 24 different families and 40 normal software were used to evaluate REDDS.The results indicate that WGAN-GP generates better API data quality than other GAN models, and data augmentation significantly improves the accuracy of the detection model.It can also infer defense countermeasures against malicious APIs.The remainder of this paper is organized as follows.Related work is presented in Section Ⅱ. Section Ⅲ presents the system design of REDDS.Section Ⅳ constructs the RDCO ontology and ontology-based reasoning defense countermeasures, and Section Ⅴ presents the experimental results and evaluation of the system.Finally, Section Ⅵ concludes the paper.

II. RELATED WORK
The huge losses and harms caused by ransomware to enterprises or organizations have become the main security threat to the cyberspace.In order to respond to ransomware attacks and ensure the security of the network environment, the detection and protection of ransomware has become a hot topic in the field of cyberspace security.Based on the theme of this article, this section briefly summarizes the application of early detection of ransomware, generation of adversarial networks in the field of security, and related research in the field of ontology-based malware analysis.

A. Early Detection of Ransomware
Due to the fact that ransomware quickly encrypts files after infecting a host, early detection is of great significance for file protection and timely defense of infected hosts.Morato et al. [13] used SMB traffic in network shared volumes for early detection of ransomware, using 19 different series of ransomware to test the practical application of algorithms.Chen Changqing et al. [3] proposed the concept of critical time period (CTP) for ransomware detection, using machine learning for early detection based on 78 API short sequences called by ransomware during execution within CTP.Al Rimy et al. [14] applied feature selection technology based on mutual information to early detection of ransomware.This method can obtain good detection accuracy by using features extracted and selected from the behavior data of limited ransomware at the initial stage of operation.Mehnaz et al. [15] proposed a ransomware detection mechanism that detects encrypted ransomware in real-time by deploying bait technology, carefully monitoring running processes and malicious activities in file systems, and evaluating the system using samples from 14 of the most popular ransomware families.Roy et al. [16] applied BiLSTM and fully connected layers to simulate the normal state of hosts in operating enterprise systems, and detected ransomware from a large amount of environmental host log data.Execute 17 ransomware attacks on the victim host, and use the infected host log to record data validation to verify the system.The above work achieved good detection accuracy in the early detection of ransomware, but lacked sufficient data to verify the behavior of ransomware in the pre-encryption stage, and did not involve defense measures against ransomware.

Field of Security
In the field of security, the limitations of dynamically collecting attack data can lead to limited or unbalanced datasets.Therefore, some studies have applied GAN for data augmentation in their work.Liu et al. [17] proposed an intrusion detection method based on the oversampling technology of WGAN-GP and feature selection, conducted experiments on three intrusion detection data sets, and improved the detection performance of the machine learning model, but their work did not evaluate the samples generated by WGAN-GP.Hu et al. [18] used the algorithm MalGAN based on generative adversarial networks (GANs) to generate www.ijacsa.thesai.orgadversarial malware examples, which can bypass detection models based on black box machine learning.Yilmaz et al. [19] proposed an intrusion detection method based on GAN and MLP.GAN was used to generate three types of attacks in the UGR 16 dataset to balance the dataset.The experimental results indicate that GAN's balanced attack sample dataset produces more accurate results than the imbalanced attack sample dataset.Wu Yangming et al. [20] used a deep learning intrusion detection method based on WGAN-GP data augmentation and combined deep belief networks and extreme learning machines.The proposed algorithm was tested using the CICIDS2017 dataset, and comparative experiments showed that the method had higher accuracy and faster convergence speed.In this paper, we build an API sequence generative model based on WGAN-GP ransomware to generate high-quality API sequence samples.Evaluate the generation ability of WGAN-GP by comparing it with other GAN models, and finally prove through comparative experiments that the detection results after using WGAN-GP are better.

C. The Use of Ontology in the Field of Malicious Software Research
Ontology is used to describe information objects for domain knowledge sharing and reuse [21].In the field of information security, ontology has been used to characterize the behavior of malware, and build the knowledge representation of malware [22], so as to realize the analysis and reasoning of malware.Rastogi et al. [23] designed a malware ontology called MALOnt, which includes concepts such as malware features, attack behavior, and detailed information about attackers.It supports collecting intelligence on malware threats from different online sources and constructs a knowledge graph framework based on MALOnt.Ding et al. [24] designed conceptual classes and object attributes for malware, and proposed methods to represent the semantics of malware behavior.And use the Apriori algorithm to classify the malware family.This article designs a ransomware defense strategy ontology, which includes information about the characteristics of ransomware malicious APIs and related defense strategies.Design inference rules based on semantic web rule language, use rule inference engines to infer the knowledge, determine which APIs are malicious, and generate corresponding defense strategies for these APIs.In this way, when a ransomware attack occurs, the system can defend against the attack based on the inferred countermeasures.

III. RANSOMWARE EARLY DETECTION AND DEFENSE SYSTEM
This article proposes an early detection and defense system for encrypted ransomware in the pre-encryption stage based on API sequences, as shown in Fig. 1.The system focuses on API calls during the pre-encryption phase after starting operation.In order to compensate for the insufficient data collected during the pre-encryption phase of ransomware attacks, WGAN-GP's data augmentation method is used to generate API sequence samples, and the enhanced API sequence dataset is used to train the detection model.For the defense of ransomware, by mapping the API of ransomware with ATT&CK technology, integrating ATT&CK, D3FEND, and other security knowledge sources, and constructing a ransomware defense strategy ontology, the defense strategy is inferred based on the API of ransomware for reference by security personnel.REDDS includes four modules: API sequence collection, feature calculation, data enhancement and detection, and defense strategy inference.Below are detailed explanations of these four modules.

A. The Pre-Encryption Phase of Ransomware Collects API Sequences
The attack process of ransomware can be divided into four stages: system infection, file traversal and encryption, ransom collection and file decryption [14], among which the encryption operation was previously referred to as the preencryption stage.In the system infection stage, the ransomware infected the victim host through phishing, web hanging horses, exploit and other ways.After infecting the system, ransomware usually performs a series of operations such as file search and identification, file encryption, etc. [6].Once these operations are completed, all specific files in the infected host will be encrypted; During the ransom stage, the victim is informed of how to pay the ransom through ransom text; After the victim user pays a ransom, they enter the file decryption stage, and the attacker releases the decryption device to decrypt the encrypted file.In reality, it would be meaningless to check out all files on the victim host after the ransomware completes encryption [25].Therefore, how to accurately detect ransomware during its pre-encryption stage is a key issue in ransomware defense.
According to the execution process of the ransomware mentioned above, before 2.2 in Fig. 2 is the pre-encryption stage.The ransomware will involve a large number of API calls during the pre-encryption stage, including file read and write, network communication, system information acquisition, etc.The number of APIs that normal software runs during the same time period is much smaller than the number of APIs called by ransomware.Therefore, this article believes that the number of APIs executed by software within a period of time after it starts running can be used to distinguish between normal software and ransomware.
In this module, REDDS will collect the API sequences called during the pre-encryption phase after the ransomware www.ijacsa.thesai.orgstarts running, as well as the API sequences of the benign software, and use them as inputs for the next phase.Run samples of ransomware and benign software in the CuCKoo sandbox or other sandbox, with a running time of 60 seconds to ensure that the API sequence of ransomware can be fully captured during the pre-encryption phase.And use API Monitor to monitor the API sequence of these software, record the API sequence of each ransomware before the first call to the encryption class API and the API sequence of benign software.Ransomware typically utilizes the encryption API provided by the Windows system to implement file encryption and decryption functions.The Windows system provides multiple encryption API libraries, and attackers can choose the appropriate encryption API library to implement encryption functions according to their own needs.There are five commonly used encryption API libraries: BCrypt, CryptoAPI, DPAPI, Ncrypt, and Schannel.These five encryption API libraries have different functions to implement various functions.The first called APIs in the five encryption API libraries are shown in Table I.REDDS takes these APIs as the pre-encryption boundaries and collects the API sequences in the pre-encryption stage.The collected API sequences are deduplicated, and the deduplicated API sequences are used as data sets for subsequent analysis and research.

B. API Sequence Feature Calculation
In this stage, REDDS uses the n-gram algorithm to segment API sequences and calculates corresponding eigenvalues using the TF-IDF algorithm.The application of the n-gram algorithm preserves the adjacency of APIs in the generated feature vectors.After processing API sequences through n-gram algorithm, TF-IDF algorithm in natural language processing is used to calculate eigenvalues to generate eigenvectors.Use (1) to calculate the TF value of the n-gram sequence for ransomware and normal software.
In summary, REDDS treats the API sequences collected in the pre-encryption stage of ransomware as a whole, and obtains the corresponding feature vectors through n-gram and TF-IDF algorithms.

C. Data Enhancement and Detection Based on WGAN-GP
The dynamically collected API sequences are limited and incomplete, which will affect the detection of ransomware variants and unknown ransomware.To this end, we propose a data enhancement algorithm based on WGAN-GP, which satisfies the Lipschitz restriction by applying an independent gradient penalty (GP) to each sample.This allows the weights of neurons to be distributed more evenly during the backpropagation process, helping the detection model to better understand the potential characteristics and behaviors of ransomware, effectively improving the accuracy, robustness and generalization ability of the ransomware detection model.
The API sequence data enhancement method based on WGAN-GP introduces Wasserstein distance on the basis of GAN model, and its distance formula is shown in equation ( 4).
In the formula: is the real sample, and is the generated sample.∏( ) is the joint probability distribution set of the real sample probability distribution and the generated sample probability distribution .
The use of Lipschitz weight pruning in WGAN limits the range of weight parameters in the discriminator sub network, resulting in extreme network parameters that greatly limit network performance and weaken the fitting ability of the neural network.Moreover, the scope of weight pruning is mainly based on expert experience, and setting network parameters through manual determination can easily lead to inappropriate parameter settings, leading to the phenomenon of gradient explosion and vanishing again.Therefore, WGANwww.ijacsa.thesai.orgGP introduces the Gradient Penalty (GP) term to improve the impact of the weight clipping constraint parameter used in WGAN to meet the 1-Lipschitz continuity condition on the network.Its loss function is expressed as follows: , ( ( ))- , ( ( ))-, ( )- In the formula: is the generator loss function, is the discriminator loss function, ( ) is the sample generated by the generator, is a prior distribution of random noise . is a gradient penalty term, Represents the penalty coefficient; represents the distribution of the entire sample space, that is, the sum of the generated and real sample distributions; ( ) represents the gradient of the discriminator, and || || is the P-norm.The schematic diagram of WGAN-GP structure is shown in Fig. 3. Through continuous training of the WGAN-GP network, the data obtained by the generator is closer to real data, but the final discriminator cannot recognize whether the input data comes from actual data or generated data, thus achieving Nash equilibrium and forming a good generation model [26].
The specific training process for data augmentation and detection based on WGAN-GP is as follows: 1) Input the feature vectors obtained from feature calculation into the generator; 2) Fix the parameters of generator G, and adjust the parameters of generator according to formula (5) to minimize the loss function of generator ； 3) Fix the parameters of generator D, and adjust the parameters of discriminator according to formula (6) to minimize the discriminator loss function ； 4) Cycle through the above two steps until the training network achieves Nash equilibrium, and then stop training.
5) Expand the original collected API sequence using WGAN-GP and train the detection model to detect ransomware.

IV. RDCO ONTOLOGY CONSTRUCTION AND DEFENSE COUNTERMEASURE REASONING
Starting from the API called by the ransomware, we map some specific malicious APIs to ATT&CK technology.Based on the work of integrating multi-source security knowledge [21], malicious APIs and multi-source security knowledge are mapped.However, the link between ransomware and defense countermeasures is implicit.Ontology can discover new relationships from known knowledge through the formal description of knowledge in a specific domain, and the design of reasoning rules based on the ontology.Therefore, this paper proposes a defense countermeasure reasoning method for ransomware based on semantic ontology and rule logic, and infers corresponding defense countermeasures for the APIs called in the ransomware.

A. Ransomware API mapping Multi-source Security Knowledge
The existing ransomware that can infect Windows systems is operated by calling a large number of Windows APIs.By analyzing the attack process of ransomware and the technologies under the Windows platform in enterprise ATT&CK, some common APIs are mapped to corresponding technologies in the ATT&CK framework.For example, the widely used defense avoidance technique in ransomwareprocess injection, which runs custom code in the address space of another process.Process injection improves invisibility, and some technologies also achieve persistence [27].There are many types of process injection techniques.This example analyzes the classic malicious dynamic link library (DLL) injection technique, and its corresponding ATT&CK technique is T1055.001.The injection process of DLL is shown in Fig. 4, where the ransomware calls VirtualAllocEx to obtain a space to write the path to its DLL.Then, ① call WriteProcessMemory to write the path to the allocated memory.In order for the code to execute in another process, ② will call APIs such as CreateRemoteThread, NtCreateThreadEx, or RtlCreateUserThread.Finally, ③ successfully injected the ransomware code into the target process.Through the above analysis, the T1055.001technology can be mapped to the API during the DLL injection process.Map 47 technologies with their corresponding APIs based on the www.ijacsa.thesai.orgdata components (operating system API execution) in the ATT&CK framework.At the same time, CAPEC focuses on the attack pattern, linking the tactics and techniques in ATT&CK with the weakness of the attack target, and using the weaknesses in CWE to link to vulnerabilities in CVE.And integrate defense technology in D3FEND through digital artifacts.Fig. 5 depicts the relationship links mapped according to the selected knowledge sources.Through this end-to-end link method, security researchers can not only analyze potential threats to the system, but also defend against ransomware attacks based on defense strategies.

B. Build Ontology and Design Inference Rules
Based on the mapping between security knowledge and malicious APIs of ransomware, a Ransomware Defensive Countermeasures Ontology (RDCO) is proposed.The ontology model consists of classes, attributes, and relationships between classes and individuals [28].RDCO includes seven categories: ransomware behavior, offensive techniques, digital artifacts, defensive countermeasures, attack patterns, weaknesses, and vulnerabilities.Classes and the relationships between classes are shown in Fig. 6.In order to better infer defense countermeasures and vulnerabilities from the RDCO ontology, this paper uses the semantic inference rule description language recommended by W3C to design inference rules for SWRL (semantic web rule language) [29].To implement SWRL rules in an ontology, it is necessary to write and describe the reasoning conditions (Antecedent) and the reasoning results (Consequence) in language.A single or multiple basic atoms (which are already knowledge constructed in the ontology) form each SWRL rule in the reasoning, and represent it as a logical "AND" relationship through "^".The connection between the reasoning conditions and results is made through "→", Create rules for expanding existing OWL languages, where the formal expression of SWRL rules is: Equation ( 8) ， is a fundamental atom, and their form can be represented as C (x), P (x, y), or (x, y), where C is an OWL description, P is an OWL attribute, and x and y can be OWL instances or OWL data values.Only when atoms have been constructed in the ontology and are all true can SWRL rules be effectively set.Using SWRL rules, new relationships can be discovered from known knowledge.As shown in Table Ⅱ, we designed three inference rules based on RDCO ontology to infer relevant defense strategies for ransomware.Behavior(?b)^mappingAttack(?b, ?t)^Offensive_Technique(?t)^TaxonomyMappings(?t, ?a)^Attack_Pattern(?a)^exploit(?a,?w)^Weakness(?w)^hasVulnerability(?w, ?v) ->exploitVulnerbility(?b, ?v)Behavior(?b)^mappingAttack(?b,?t)^Offensive_Technique(?t)^hasMiti gation(?t,?m) ->useMitigation(?b, ?m)Behavior(?b)^mappingAttack(?b,?t)^Offensive_Technique(?t)^hasArti fact(?t,?da)^Digital_Artifact(?da)^hasDefend(?da,?dt)^defensive_tec hnique(?dt)->useDefendTechniques(?b, ?dt)Rule : Using the attack pattern that CAPEC focuses on as a bridge, associate the attack technology used by the attacker with the weaknesses of the attack target.Due to the fact that weakness in CWE can be linked to vulnerabilities in CVE, it is possible to infer vulnerabilities exploited by ransomware.Knowing the exploited vulnerabilities can help to promptly fix them and prevent ransomware attacks.
Rule : When the malicious API of ransomware is known, by mapping the API to ATT&CK technology, each ATT&CK technology in the ATT&CK framework has its corresponding mitigation measures, thereby automatically extracting mitigation measures related to ransomware.
Rule : In addition to inferring mitigation measures in response to ransomware attacks, the API of ransomware is mapped to ATT&CK technology, using digital artifacts as a medium to automatically infer defense technologies in D3FEND corresponding to ransomware in response to ransomware attacks

V. EXPERIMENTS AND EVALUATION A. Experimental Dataset
There is currently no publicly available ransomware standard dataset in the academic community.This article builds a Cuckoo sandbox, in which the Windows 7 operating system is used as the software runtime environment.Run the sample for 60 seconds and collect the API sequence called during software runtime.The ransomware samples used in the www.ijacsa.thesai.orgexperiment were collected from Virusshare and Any.Run, and the normal samples were collected from 360 software stewards.See Table Ⅲ for the types and quantities of the experimental samples.The versions of different samples in the same ransomware sample family are different.The normal software covers antivirus software, office software, chat tools, download tools, video software, browsers, input methods and other common types.
This article truncates the collected API sequences and only studies the API sequences before encrypting them.After analysis, it was found that a file operation of ransomware often requires calling four APIs, such as CreatFile, WriteFile, HWrite, LClose, etc., to quickly create a file.Therefore, REDDS uses the 4-gram algorithm in feature calculation, and then calculates the eigenvalues through TF-IDF, which can make the generated feature vectors best match the features of the API sequence.After repeated debugging, it is determined that the discriminator uses a three-layer neural network structure, and each layer is followed by a hyperbolic tangent activation function (Tanh).The generator adopts a 4-layer neural network structure, and ReLu activation function is used behind each layer.The optimization solver uses Adam, the learning rate of the generator and discriminator is set to 0.0005, and the training round is 10000.The loss function curve of generator and discriminator during training is shown in Fig. 7.The API sequence data enhancement method based on WGAN-GP is used to enhance the API sequence of ransomware and the API sequence of normal software respectively.The API sample library before and after augmentation is shown in Table Ⅳ.The authenticity, probability fitting accuracy, and diversity of generated samples are important indicators for evaluating the generation ability of a model.This paper includes two generative model, BEGAN (boundary equilibrium GAN) [30] and LSGAN (Least Squares GAN) [31], as reference objects to evaluate the generation capability of WGAN-GP model.Taking the API sequence samples of ransomware generated by WGAN-GP, BEGAN, and LSGAN models as examples, t-distributed stochastic neighbor embedding (t-SNE) algorithm is used to perform dimensionality reduction analysis on real samples and partially generated samples of each model.The visualization results are shown in Fig. 8. From Fig. 8, it can be clearly seen that the samples generated by the BEGAN and LSGAN models are concentrated within a small range near the real sample, while WGAN-GP, due to considering the 1-Lipschitz condition limitation, generates scattered samples within the real sample distribution range, with good diversity.Kernel Perception Distance (KID) is used to evaluate the authenticity of the generated samples, and KID uses Gaussian kernel functions to measure the similarity between the real dataset and the generated dataset in the Perception feature space.The KID values of the samples generated by the three GAN models are shown in Table Ⅴ.It is evident from Table Ⅴ that the KID values of the WGAN-GP generated samples are significantly lower than those of the LSGAN and BEGAN models.The smaller the KID value, the closer the generated sample is to the real sample.

C. Detection Model
The expanded API sequence dataset was divided into a training set and a testing set in a ratio of 70% to 30%.The detection performance of five classification algorithms, namely Support Vector Machine (SVM) [32], RF [32], Naive Bayes (NB) [32], K-Nearest Neighbor (KNN) [33], and MLP [34], was further compared before and after using WGAN-GP data augmentation.This article uses commonly used accuracy, precision, recall, and F1 score (F1) in the field of malware detection as performance evaluation indicators [35].Fig. 9 shows the accuracy of the detection models before and after using WGAN-GP data augmentation.It can be seen that the accuracy of all five detection models has increased after using data augmentation.Among them, MLP's accuracy rate increased the most, by 10.12%.RF has the highest accuracy, reaching 99.23% after data enhancement.Table Ⅵ shows the P, R, and F1 values of the five detection models before and after using WGAN-GP data augmentation.From the results in Table Ⅵ, it can be seen that among the five classification algorithms used in this article, the detection results obtained using RF and MLP are superior to the other three algorithms.The experimental results indicate that using WGAN-GP data augmentation can significantly improve the performance of the target detection model.By using WGAN-GP data augmentation, we can utilize more data to train and optimize the model, thereby improving its accuracy and robustness.

D. Defense Countermeasure Reasoning
HermiT general rule inference engine is used to describe the proposed inference rule set, and it is integrated in the RDCO ontology model.The model is based on the security domain knowledge base and the condition pattern in the reasoning rule set for reasoning.Ransomware often exploits system or software security vulnerabilities to gain system privileges and install ransomware on the victim's computer.Finding and fixing these vulnerabilities can help us better defend against ransomware attacks.
After detecting the ransomware, the inference engine passes the inference rule Infer the associated security vulnerabilities based on the behavior of specific ransomware.Taking the API of ransomware calling ChangeServiceConfigW and CreatServiceW as an example, new implicit facts are inferred according to the rules.The inference engine associates the specific API of ransomware with the security vulnerabilities that may be exploited in the system through the object attribute exploitVulnerability.To clearly display the specific reasoning results, Neo4j is used to visualize them in the form of a knowledge graph.As shown in Fig. 10, the yellow arrow in the figure represents the inferred relationship exploitVulnerability. www.ijacsa.thesai.org

VI. CONCLUSION
This article proposes an early detection and defense system for ransomware based on API sequences.To address the limitations of collecting datasets, API sequences were enhanced based on Wasserstein GAN with Gradient Penalty (WGAN GP), and then machine learning classification algorithms were used to train the enhanced data to detect ransomware.The detection accuracy was significantly improved after using WGAN-GP data augmentation, with the RF algorithm having the highest accuracy of 99.23%.By mapping the malicious APIs of ransomware to public security knowledge bases such as ATT&CK, a Ransomware Defense Countermeasures Ontology (RDCO) is proposed.Based on the ontology model, a set of inference rules are designed, and corresponding defense Countermeasures are automatically inferred based on the malicious APIs in ransomware, helping security managers respond to ransomware attacks and deploy appropriate security solutions.In the follow-up work, we will be committed to the detection and defense of ransomware attacks in the field of industrial internet of things.
the frequency of n-gram appearing in n-gram sequence , where ( ) represents the number of times n-gram appears in n-gram sequence , ∑ ( ) represents the total number of n-grams in the ngram sequence .After calculating the frequency of an n-gram, calculate its IDF using equation (2). the rarity of n-gram in the entire n-gram sequence, and represents the set of all ngram sequences.* | + represents the number of ngram sequences containing n-gram .After calculating the ( ), use equation (3) to complete the calculation of TF-IDF.

Fig. 10 .
Fig. 10.Security vulnerability reasoning results.Map specific APIs called by ransomware to ATT&CK technology, and each ATT&CK technology has corresponding mitigation and defense measures.By inference rule and can automatically infer defense strategies targeting specific APIs, helping security managers respond to ransomware attacks and deploy appropriate security solutions.Taking the APIs called by the aforementioned ransomware, ChangeServiceConfigW and CreatServiceW, as examples, the specific APIs of the ransomware are associated with mitigation measures and defense technologies through the object properties useMigration and useDefendTechniques.The specific reasoning results are shown in Fig. 11, where the orange and red arrows represent the inferred relationships useMitigation and useDefendTechniques, respectively.

TABLE I .
API CALLED FOR THE FIRST TIME IN BCRYPT, DPAPI, NCRYPT, SCHANNEL, AND WINCRYPT ENCRYPTION API LIBRARIES

TABLE II .
INFERENCE RULE TABLE

TABLE IV .
API SAMPLE LIBRARY BEFORE AND AFTER DATA AUGMENTATION

TABLE V .
KID VALUES OF SAMPLES GENERATED BY THREE GAN MODELS

TABLE VI .
THE P, R, AND F1 VALUES OF THE FIVE DETECTION MODELS BEFORE AND AFTER USING WGAN-GP