Detection of Anomalous In-Memory Process based on DLL Sequence

—The use of Computer systems to keep track of day to day activities for single-user systems as well as the implementation of business logic in enterprises is the demand of the hour. As it plays a vital role in making available information on one click as well as impacts improvement in business and influences the profit or loss. There is always a possible threat from unauthorized users as well as untrusted or unknown applications. Trivially a host is intended to run with a list of known or trusted applications based on user’s preference. Any application beyond the trusted list can be called as untrusted or unknown application, which is not expected to run on that host. Untrusted applications becomes available to a host from sources like websites, emails, external storage devices etc. Such untrusted programs may be malicious or non-malicious in nature but the presence must be detected, as it is not a trusted program from user’s view point. All such programs may target the system either to steal valuable information or to decrease the system performance without the knowledge of the user of the system. Antimalware vendors provide support to defend the system from malicious programs. They do not include users trusted program list in to consideration. It is also true that new instances of attacks are found very frequently. Hence there is a need for a system which can be self-defending from anomalous activities on the system with reference to a trusted program list. In this paper design of an “Anomalous In-Memory Process detector based on the use of the DLL (Dynamic Link Library) sequence” is proposed, which does accountability of trusted programs intended to run on a particular host and create a knowledgebase of classes of processes with TF-IDF (Term Frequency-Inverse Document Frequency) multinomial logistic regression based learning approach. This knowledgebase becomes useful to map a suspected In-memory process to a class of processes using loaded DLL’s of it. With a cross-validation approach, the suspected process and processes of its predicted class are used to conclude whether it is a trusted, variant of the trusted or untrusted process for that host. Not necessarily the untrusted program is a malware but it may be a program not listed in the trusted program list for the specific host. Hence this work aims to detect anomaly in concern with list of trusted applications based on user’s preference by doing a dynamic analysis on In-memory processes.


I. INTRODUCTION
In the 21 st century use of computers is becoming quite obvious in all fields, starting with the banking sector, education sector, health sector, e-commerce, etc. The use of computers is not only limited to such big domains; but also are extended to be used by individuals in their home's, small offices, and various goods' retail counters to keep track of their day to day activities. Whether large commercial sectors or small retail counters or individual use of computers increases day by day with the availability of Internet facilities.
On the contrary, the risk of the exploitation of data and information kept on computers also increases day by day because of the exposure of computers to the outside world due to internet connectivity. There are intelligent programmers, who somehow put a piece of code (a small program which is unknown or untrusted) on a computer of interest with an intention of either stealing or misusing the data kept on computers or making computers non operable. Such programs are referred to as malware or potentially unwanted application (PUA). There exist many categories of such malware like viruses, worm, spyware, adware, ransomware, etc. The adverse effect of the presence of malware on a computer system scales from a very small impact to an extremely large impact. PUA do not have any specific types as they seem to be normal programs but there may a possible threat due to the presence of them.
Quick heal annual threat report 2019 says that prediction of becoming vicious about ransomware happened to be true in 2018. Only in one month, the ransomware detection reached 2Million in 2018. Also, the prediction about small and medium-sized businesses to be in the red zone became true. Cryptojacking is a new buzzword suppressing ransomware, which is a process of using someone's computer to earn money. The only sign of a computer used for cryptojacking is a little slower computer performance while executing programs. The CPU is targeted up to 100% by cryptojacking which leads to hardware faults slowly. The owner of the compromised computer becomes unaware of being a victim of cryptojacking. The report says about detection of more than 800k cases of cryptojacking in 2018. It also has given information about an Infector named W32.Pioneer.CZ1, which injects the files on to disk and then decrypts the malicious DLL present in the file and drops it to do malicious activities. -Fig. 1‖ shows the frequency of attack of various malware types per day, per hour, and per minute referring to Quick heal threat report 2019 [1]. Internet security threat report, Symantec 2019 says 69 million events detected in 2018 which is 4 times to cases in 2017. The report also speaks about PUA which is not necessarily harmful but may lead to security risks. The existence of such PUA may also result in Hostwww.ijacsa.thesai.org Based exploits. It also says about cases of Cryptojacking and its consequences. The report mentioned about supply chain attacks which target third party software by injecting code into its libraries. These libraries are integrated into larger software projects. Injection of code in to libraries can be understood as DLL injection, which is a possible approach of exploiting a host [2].
The case of infector W32.Pioneer.CZ1, supply chain attack and possibility of threat due to the presence of PUA points out a need of a system for real time detection of host based exploitations. Antimalware vendors do provide support in detecting malicious programs with signature based static analysis but they don't take users preference in to consideration. Hence some unwanted or unknown programs referring to users preferred list left undetected. Such unknown programs may be a malware or PUA, which is a possible threat to that host. These observations motivated to apply multi-class classification approach on known or trusted processes using their respective list of loaded DLL's on a host considering its users preferred list of programs. This knowledge helps in detecting a deviation from known Inmemory processes which is either a malfunctioned known process or unknown process or untrusted process using some potentially unwanted DLL or malfunctioned DLL.
The organization of the remaining sections in this paper is as follows: Section 2 speaks about the related works on malware analysis considering In-memory processes and injection of unwanted DLL's. Section 3 speaks about the design of the System in detecting anomalies or deviations with respect to In-memory processes and their respective loaded DLL's. It is about designing an Anomalous In-Memory Process detector based on the use of DLL's, which learns the trusted programs intended to run on a particular host and creates multiple class of them referring their usage of DLL's. With a cross validation approach a suspected process gets validated with processes of a class it is mapped to and gets detected as either trusted or variant of trusted or untrusted for the specific host. Section 4 speaks about the experimental setup for the empirical evaluation of the system. Section 5 describes the concluding remark of the work.

II. RELATED WORK
Analysis of the behavior of unknown programs like PUA, malware etc. is becoming truly diversified. Various forms of analysis are done on a system to identify a threat to the information stored on the computer. The analysis can be in the form of identification of untrusted programs available on secondary storage or anomalous In-memory processes. The approaches of analysis can be said as either static or dynamic or hybrid or memory-based [3]. The static analysis considers opcode's, N-gram opcode sequences, control flow graph as features to analyze further without executing the programs. The dynamic analysis considers function calls, API calls, function parameters, instruction traces, and instruction flows as features to analyze further after executing the programs [4]. The hybrid analysis is a combination of static analysis and dynamic analysis [5]. The memory-based analysis is also a kind of dynamic analysis that considers network connection information, changes in registry keys and In-memory processes and there DLL sequences for further analysis during the execution of programs [6,7,8]. With run time attributes of benign process using string analysis for anomaly detection in Android operating system is found effective [9]. Studying the behavior of malware is becoming popular with memory forensic techniques for malware injection and hidden processes [10]. DLL injection is a process where the malicious DLL gets injected on to an In-memory process and the control of execution gets transferred to that code block [11]. Reflective DLL injection has also gained popularity where they do malicious activities in memory only without leaving any footprint [12,13].
A Windows application uses DLL files during runtime to load libraries. It tries to locate the DLL with a hierarchy of searches. First, it tries to find with the given path. But when it fails to locate, it searches at some predefined set of directories. Malware programs breaches this search order to load malicious DLL during run time. In this context, DLL-Side loading is becoming a very popular method for attacking Windows systems [14]. In such cases, the malware payload places the spoofed malicious DLL into a specific location so that the spoofed DLL gets loaded instead of legitimate DLL. Such DLL-Side loading bypasses the signature-based static analysis process. This DLL load order hijacking process to load a malicious DLL in run time can also be referred to as DLL hijacking. A variant of such an approach where a malware launcher loads the malicious DLL compromising a victim processes memory whereby loads the malicious DLL by creating a thread. Such an approach of entry of malicious DLL onto to system is referred to as DLL Injection. With this approach, the program loads unintended DLL's due to the presence of side-loading vulnerability of Windows side-byside manifests [15].
Typically when malware attacks, it makes available its payload physically on the system storage and gets loaded on to memory to do the malicious activity. In such cases either the traditional static analysis using signature-based detection becomes helpful or the dynamic analysis considering the various run-time behaviors of processes becomes helpful. But Fileless malware has become a new possible attack type, where the malware is not saving the payload on system www.ijacsa.thesai.org storage rather it malfunctions the trusted and legitimate processes of the Operating System. It injects the malicious program directly on to the compromised processes memory without dropping any file to the file system. As no physical file presents the Sandbox detection approach fails. Again as there is no possibility of having a signature, hence the signature detection also fails. Hence the detection complexity becomes too high for Fileless malware. The possibility of investigating Fileless malware is only limited to analysis of the behavior of the system using the snapshots of In-memory processes, which is considered here as Memory based analysis [16]. Information retrieval theory is applied with a dynamic analysis to extract API calls and system calls to classify malicious programs. They are stored in documents on which the TF-IDF weighting approach is applied to get a good accuracy of malware classification [17].
In this paper, a novel approach is proposed considering memory based dynamic analysis of In-memory processes in identifying any deviation from the trusted process list of a particular host. The DLL lists of In-memory processes are taken in to consideration for deciding a suspected process as either trusted or variant of trusted or untrusted for a specific host. The list of trusted in-memory processes are classified in to multiple classes considering the DLL sequences they use at various instances. A suspected process gets mapped in to one of the trusted class of processes based on its DLL sequence. For this multi-class classification DLL lists are formed as attribute vectors with Vector Space Model (VSM), on which TF-IDF multinomial Logistic regression is used to train the system. Objective of training process is to prepare a knowledgebase of classes of processes, which are considered as known or trusted and legitimate processes from the viewpoint of a particular user. This system can take an Inmemory process at any random instance of time and do a prediction of its class using the learned knowledge base. The cosine similarity metric is used to cross-validate a suspected process with all the processes of the predicted class before concluding it as either a trusted or variant of a trusted or untrusted process for that specific host. In this work a list of processes are declared as trusted processes from the user's regular use viewpoint. A variant of a trusted process is understood as a process of an updated version application from trusted list. Any other process other than a trusted or a variant of trusted is understood as untrusted.

A. System Overview
The anomalous In-memory process detection system can be divided into three parts: data preprocessing, the process class prediction model of the system, and cross-validation of the predictors result. Data preprocessing is about collecting the DLL sequences loaded for all In-memory processes with reference to a given list of trusted applications of the specific host, using Windows Sysinternals Process utilities like Pslist.exe and Listdlls.exe [18]. Pslist.exe shows information about In-memory processes. Listdlls.exe shows the list of DLL's loaded for a specific process at that time instance. A TF-IDF weight matrix gets generated defining weight of each DLL in the collected DLL sequences for the list of In-memory processes. The said system applies multinomial classification on the data set of In-memory processes to classify them in to multiple classes of processes based on DLL sequences as feature vector. The process class prediction model is trained and tested using the generated data set. For the training and testing phase of the system multinomial Logistic Regression, multinomial Naive Bayes, and Support Vector Classifier (SVM-SVC for multiclass problem) mechanisms are used. The training phase of the model uses the approach of learning the usual activity of a host from In-memory processes and their respective DLL sequences to create the knowledge base of processes as multiple classes. The testing phase of the model uses the knowledge gained in the training phase, to decide accuracy of the system in classifying In-memory processes to their class. Cosine similarity measure is used for Cross Validation of the predictors result. With cosine similarity the DLL sequence of a suspected In-memory process is compared with DLL sequences of processes of the predicted class to verify the similarity of the suspected process and subsequently to say whether the process is a trusted, variant of trusted or untrusted.

B. Data Preprocessing
There are various run time attributes of an In-memory process, which speaks about the behavior of it. Some attributes are process path, process name, process priority, number of threads, number of handles, private virtual memory, path of all the DLL's loaded, etc. In this system, the focus is given on two run time attributes namely path of the process and path of all the DLL's loaded. Pslist.exe is used to collect all the process names and their respective process ids. - Fig. 2‖ represents a sample output which is a list of elements a.k.a. In_Memory_Process_List where each element is a 2tuple say Process_tuple (pname,pid) containing process name and process id for all the In-Memory processes at a particular time instance.
Listdlls.exe is used to collect all the DLL's loaded on to the memory for each element of the In_Memory_Process_List at that time instance. - Fig. 3‖ represents the absolute path of the program corresponding to one of the In-memory process and the absolute path of all its loaded DLL's. There will be a list of such records based on the number of In-memory processes at that time instance. Let that be referred as a Database DLL List a.k.a. DBDLLList[].The algorithmic steps for generating a collection of DBDLLList's at various time instances is explained in Algorithm 1 which is a.k.a. IMPDLLList.    Fig. 3‖ related to one of the records of DBDLLList[], it is observed that absolute path of program as well as DLL's contains symbols like forward-slash (/), hyphen (-), and dot (.) , which are considered as special characters and separators in various platforms. To fit the collected data well in the system, an encryption process is carried out on processes and DLL's. A unique class label say p_i is assigned for all instances of a process-i considering its absolute path. Each individual DLL in the DLL sequences is encoded with a unique id, named as dll_i. The process of encryption on DBDLLList[] is explained in Algorithm 2 which is a.k.a. Encrypt_DBDLLList. It helps in preparing the data set ready for TF-IDF weight matrix construction. When the key is some p_i, the value is the absolute path of the respective In-memory process. When the key is some dll_i, the value is the absolute path of respective DLL. A Sample of the expected output is shown in - Fig. 5   189 | P a g e www.ijacsa.thesai.org

C. Process Class Prediction Model
The records present in EncrDBDLLList[] are being tokenized using the classical separator blank space. With this an In-memory process P is represented as a text * +where each S i is considered as a string of the text. Here S 0 represents the process class p_i for an In-memory process. S 1 to S n represents DLL sequence of that process where each S i represents dll_i for . Let C is the set of the text representation of m In-memory processes such that * + , where each represents the text representation of i th In-memory process. Further C is split into two lists named as C tags and C docs . Where S 0 will be included in C tags when S j will be included in C docs when The use of VSM is very common in representing textual documents algebraically as vectors in a multidimensional space [19]. The components of such a vector represent the importance of a term in a document. TF-IDF is very popular in evaluating how important a word is in a document. TF-IDF weighting schema is the most popularly used approach in converting textual documents to a VSM [20].
In this context, C tags is the document representing the list of classes of In-memory processes, where existence of more than 3 classes observed. C docs is the list of DLL's for process classes in C tags. C docs is treated as a textual document, which is the list of the text representation of DLL sequences of all Inmemory processes. The TF-IDF weighting schema is applied to find out the VSM view of the system for a particular host. Considering TF-IDF over raw frequencies of occurrences of words is to scale down the impact of very frequently occurring words in a document which is empirically less informative than the words of less frequency. C docs is represented by a -Feature-DLL to In-Memory-Process‖ weight matrix, where the element (i,j) illustrates an association of i th DLL to j th Inmemory Process. Using TF-IDF weighting schema, the weight of i th DLL to j th In-Memory Process is denoted as and defined as given in (1). (1) in (1) is the L 2 normalized term frequency for i th DLL with respect to the j th In-memory process. The Term Frequency is defined as given in (2).

√∑ ( )
Here is the number of occurrences of i th DLL in j th Inmemory Process, and √∑ ( ) is the magnitude of the vector representation of DLL's present in the j th In-memory Process.
in (1) is the Inverse document frequency for i th DLL in C docs . The Inverse Document Frequency is defined as given in (3).
Here | | represents the total number of In-memory processes and | | | represents the number of Inmemory processes in C docs containing the i th DLL i.e. S i . Using (1) W the weight matrix of C docs is found for the ‗Feature-DLL to In-Memory-Process' matrix representation of the system. W is typically a sparse matrix and tells statistically how important a DLL is to an In-memory process in the collection of all the In-memory processes.
Weight matrix W is then split into a training and testing data set with 3:1 ratio with random sample selection. Multinomial logistic regression, multinomial Naïve Bayes, and SVM-SVC (SVC) learning methods are applied on the proposed model. The objective is to choose the classifier which results with highest accuracy in process class prediction by using a DLL sequence as attribute vector. -Fig. 6‖ shows the functional representation of the model.

D. Cross Validation
The cosine similarity measure is used to cross-validate the suspected process with all the processes of the predicted process class. It is used to find the relative closeness of the suspected process with the trusted processes of the predicted class. Cosine Similarity is a similarity distance measure which finds the cosine angle between two vectors u and v, which is defined as given in (4).
Here is the dot product of two vectors u and v. | | | |represents product of magnitudes of vectors u and v , respectively. Cosine angle as 0 o (i.e. Cosine distance measured as 1) between two vectors concludes both are similar where as an angle close to 0 o (i.e. Cosine distance measured is close to 1) indicates they are closely similar. What must be the accepted value to consider case of closely similar vectors to case of similar vectors depends on field of application and experiential results? But a larger angle says they are dissimilar.
The proposed system has the objective of detecting anomalous In-memory processes on a specific host with reference to trusted applications list. With VSM and TF-IDF any In-memory process can be represented as a weighted vector considering DLL sequence as attribute vector. Hence any suspected process can be applied on the model to find its www.ijacsa.thesai.org process class. All the processes of the predicted process class can be compared with the suspected process using the Cosine similarity measure. This cosine similarity distance is used to conclude whether the suspected process is to be considered as trusted, variant of trusted or untrusted. The algorithmic steps of cross validation are explained in Algorithm 3 which is a.k.a. Cross_validate_suspected_process. The algorithm needs a Suspected_Process information similar to the sample record shown in - Fig. 3‖, i.e. in the form of [process_path, dll1_path, dll2_path,….,dllm_path]. It also refers EncrDBDLLList[] and DictDBDLLList{} which are found during learning stage of model using Algorithm 2 Encrypt_DBDLLList, to encode the Suspected Process information such that it can be applied on Process Class Prediction Model. β 1 and β 2 are the threshold values for considering cosine distance measure to decide process as a trusted and a variant of some trusted program, respectively.
The Suspected_Process will be a trusted process if the number of processes of the predicted class whose Cosine distance is measured as 1 with the Suspected_Process becomes the threshold β 1 . Here β 1 will be the minimum count for the number of processes of the predicted class whose Cosine distance is measured as 1 with the Suspected_Process. An optimize value for β 1 to be found from experiment for a specific host.
The Suspected_Process will be a variant of some trusted process if the average of Cosine distances measured between the processes of predicted class and the Suspected_Process becomes the threshold β 2 . Here β 2 will be the minimum average cosine distance between the processes of predicted class and the Suspected_Process. An optimize value for β 2 to be found from experiment for a specific host.
The Suspected_Process which fails to qualify the threshold β 1 followed by β 2 will be an untrusted process.

IV. EXPERIMENTAL SETUP AND EVALUATION
For the experimental setup and evaluation of the proposed system following steps are taken.
 A questionnaire is used to collect the list of application programs with reference to specific users' interest. It is considered as the trusted application list for this host and any other application is assumed as untrusted. Table I and Table II show a sample list of system processes and trusted processes of the host respectively. Combination of such system processes and processes of the trusted application is considered as the list of trusted processes on which anomalous activity is monitored. The model is trained and tested with training and testing set of 3:1 ratio with random sample selection on these 10000 records.
The proposed model works on a multi-class problem where the In-memory processes of a host are classified into several classes and a suspected process gets predicted to belong to a specific class of the processes. The performance of three classifiers are compared in terms of accuracy, {Micro | Macro | Weighted} Precision, {Micro | Macro | Weighted} Recall and {Micro | Macro | Weighted} F1-score considering the multinomial classification approaches named multinomial Logistic Regression, multinomial Naïve Bayes and SVM-SVC (further referred as SVC in this paper). For a binomial classification case evaluation of performance metrics is done based on positive class and negative class, whereas for a multinomial classification case evaluation of performance metrics is done based on One-vs.-Rest (OvR) classes. For each class in case of multinomial classification the below mentioned basic parameters are found, which are used to evaluate overall performance metrics of the model. The basic parameters referred above are True Positive (TP)the classifier correctly predicts the class, True Negative (TN)the classifier correctly predicts which are not of the class, www.ijacsa.thesai.org False Positive (FP)the classifier incorrectly predicts other classes to be of the class and False Negative (FN)the classifier incorrectly predicts the class to be of other class. Table III explains pictorially a sample case of three classes in  which how TP, FP, FN, and TN for CLASS1 to be considered  in a multinomial classification scenario. In the proposed model the below-given performance metrics are calculated for three multi-classification approaches named as OvR Logistic Regression, OvR Naïve Bayes and OvR SVC. Recall for a class says a fraction of all samples of that class which is predicted correctly, which is evaluated as given in (5). Precision for a class says a fraction of all predicted samples of that class which is predicted correctly, which is evaluated as given in (6). F 1 score of a class will be the harmonic mean of precision and recall of that class, which is evaluated as given (7).  For a multi-class scenario: Micro precision, Micro recall and Micro F 1 Score are calculated globally considering total TP, total FP, and total FN of the model instead of considering individual classes. In such case Accuracy of the model is same as Micro precision, Micro Recall, and Micro F 1 Score measured for the model globally. Macro precision, Macro recall and Macro F 1 Score are calculated considering the precision, recall, and F 1 Score of individual classes and taking the un-weighted mean of the measures. The Weighted precision, weighted recall and Weighted F 1 Score are calculated considering the precision, recall, and F 1 Score of individual classes and taking the weighted mean of the measures. The weight for each class is the total number of samples of that class. Accuracy and all the calculated Micro, Macro and Weighted performance metrics for the considered classifiers on the testing data is shown in Table IV.
The comparison of accuracy for the three classifiers OvR Logistic Regression, OvR Naïve Bayes and OvR SVC is plotted in - Fig. 7‖. It has been found that the performance of OvR SVC is better than OvR Naïve Bayes but OvR Logistic Regression is found better than OvR SVC. The comparison of precision, recall and F 1 score for all three classifiers is plotted in -Fig. 8‖, -Fig. 9‖ and -Fig. 10‖, respectively. Considering all these metrics, it is found that OvR Logistic Regression www.ijacsa.thesai.org Classifier is most efficient for the model in comparison to other two classifiers. OvR Logistic Regression Classifier outperformed others with accuracy rate as 97% in predicting a process to its class and highest rates in precision, recall and F 1 Score.
To ensure the result of the system in identifying a suspected process as either trusted or variant of a trusted or untrusted, the processes of predicted class for a suspected process need to be cross-validated. Cross-validation of the suspected process is done with processes of predicted class considering cosine distance measure. As the OvR Logistic Regression classifier resulted with higher accuracy over the other two, it is chosen to predict the class of a suspected process at any time instance. As explained in Algorithm 3 Cross_validate_suspected_process, Cosine distance is measured between DLL sequence vector of a suspected process and DLL sequence vectors of all the processes of the predicted class. The calculated list of cosine distance measures are used to reach a conclusion about the suspected In-memory process as either a trusted or a variant of trusted or untrusted type for the specific host. To evaluate the systems performance 300 processes of mixed samples are selected from the considered host. In these samples 200 processes are from trusted process list, 40 processes are from variant of the trusted process list and 60 processes are neither from the trusted list of processes nor from variant of any trusted process. Table V shows the confusion matrix for the 300 processes considered as suspected processes, which are initially applied to process class prediction model and further Cross validated using cosine distance measure. During the Cross validation of suspected process with processes of predicted class the optimized threshold value for β 1 and β 2 are calculated by several iterations. For the considered host the optimized value for β 1 is found as 1, which is the minimum number of processes from the predicted class with cosine distance as 1 to the suspected process. The optimized value for β 2 is found as 0.90, which is the average cosine distance of processes from the predicted class to the suspected process. - Fig. 11‖ speaks about precision and recall of the case study on the preferred host. In all cases, precision is above 95%. Whereas for trusted and untrusted processes recall is more than 93%, but for variant of process it is 84%.     V. CONCLUSION The said system considers the trusted process list of a specific host as a multi-class problem considering DLL sequences as attribute vectors for In-memory processes. The objective of the system is to detect any deviation in the Inmemory processes of the specific host. The system works in two stages. First stage is the process class prediction model, which is used to predict the class of a suspected process referring its DLL sequence as attribute vector. Second stage is Cross validation of the suspected process with the processes of predicted class. Three different multinomial classification approaches considered during evaluation of the process class prediction model where OvR Logistic Regression is proven to be the best performer compared to others. With OvR Logistic Regression 97% of accuracy and more than 95% of weighted precision, recall, and F 1 score achieved for the model. To identify anomaly or deviation with some In-memory process during Cross validation of the suspected process with processes of the predicted class, use of cosine distance measure is found very effective. The case study during evaluation of system shows precision above 95% for all trusted, variant of trusted and untrusted processes. Recall of variant of trusted process is found as 84% where as 93% for trusted and untrusted processes. These results are quite impressive for finding any deviation with respect to Inmemory processes of the host under consideration. An optimized value for threshold's β 1 and β 2 plays significant role for concluding the suspected process as either trusted or variant of a trusted or untrusted. In the case study using β 1 as 1 and β 2 as 0.9 shows the best performance on the host under consideration. It is also observed higher β 1 moves less occurring trusted processes to a variant of trusted process and higher β 2 moves a variant of trusted to untrusted process.
Hence β 1 and β 2 has impact on false negative cases. A lower value of β 2 moves untrusted process to variant of trusted process and variant of trusted to trusted process. Hence β 2 has impact on false positive cases. So an optimized value of β 1 and β 2 has significant impact on the performance of the system. It is also to be understood that the model's performance relies on the agreed list of trusted processes by the user on a specific host. The data collection for the training of process class prediction model is to be done under a proper supervision, as a biased data may result in higher false negatives or higher false positives. This system is found effective with memory based dynamic analysis for detection of anomaly or deviation from its normal operation with reference to known or trusted Inmemory processes of a specific host. This system may help to have zero-day detection with respect to the presence of anomalous In-memory processes on a specific host which can be either an unknown program or a PUA or a malware. This system can be extended to find anomalies with In-memory processes considering a group of hosts with possible communication among the hosts. Using an efficient protocol for exchanging information about processes may help in reducing false negative or false positive cases. Analysis of communication cost with expectations in decrease in false positive and false negative cases may be crucial in performance evaluation of the system.