Cryptic Mining in Light of Artificial Intelligence

— “Analysis of cipher text is intractable problem ”, for which there is no fixed algorithm. For intelligent cryptic analysis there would be a need of cooperative effort of cryptanalyst and inference engine .The information of knowledge base will be useful for mining tasks such as information about the classification of cipher text based on encrypting algorithms, clustering of cipher text based on similarity, extracting association rules for identifying weaknesses of cryptic algorithms. This categorization will be useful for placing given cipher text into a specific category of difficulty level of cipher text-plain text conversion. This paper attempts to create a framework for AI-enabled-Cryptanalysis system. The process depicted in the paper generalizes the idea for development of from scratch. The paper also presents useful system design diagrams for development of extended AI based Cryptic cipher analysis tool.


INTRODUCTION
Originally data mining techniques are concerned with information extraction at application level or for business and commercial need of individual or organization.The term "Cryptic-Mining" is used for low level information domain.This knowledge area increases the security level of information and power of cryptic algorithms by helping cryptanalyst.In order to strengthen the cryptosystem, automated tools can be developed that intelligently exploits patterns among cipher-text, plain-text, key size, key life time and log of partially recovered plain-text-cipher text derived knowledge.Cryptic mining domain assumes that cipher texts present in the network or stored encrypted files/logs are not 100% random and exhibits some patterns.These patterns may be useful to exploit weakness using mining algorithms.
Imagine the perspective of a cryptanalyst, who is interested to know about the type of enciphering algorithm.He is also interested in obtaining the plain text from encrypted text by exploiting patters or weakness.The obvious way to deal these intractable situations is mimic different theoretical and lengthy approaches by a human mind.Other alternative is to use AI and computational intelligence techniques that solves similar problems.In subsequent sections of this research work, a framework for AI enabled cryptic analysis system has been presented.This performs the cipher detection and successful conversion into plain-text in efficient way.This AI enabled system would help us to understand and analyze the various problems of cryptanalysis excluding strength and weaknesses of cryptic algorithms.This system would accept cipher texts generated from some algorithms and would try to extract meaningful information using some novel model or frameworks.Elucidation of cipher text-plain-text process has been shown on substitution cipher, such manner will resembles with the human way approach to solve the same problem.Later this concept would be generalized.
The flow diagram for schema of AI-enabled Cryptosystem has been depicted in the fig.3.It accepts a given cipher text (Substitution cipher), and attempts to transform it back to corresponding plaintext using process similar to human experts.www.ijacsa.thesai.orgIn subsequent sections of this paper, we will describes the analysis of research topic using different examples and chalk down the system design based upon the proposed conceptual framework to be built.It includes various class diagrams and data flow diagrams describing the "dashboard".Further, system testing also has been discussed for using different examples to check functioning of each module.At the end future enhancements and new directions for further research work has been discussed in detail.

II. BASIC TERMINALOGIES
Cryptogram: A segment (word) of cipher text of length 1...n Cryptographic Algorithms: The procedure that transforms messages (or plain-text) into cryptograms (or cipher text) and vice-versa.

Key Space:
The set of possible keys K is called the keyspace.
Substitution Cipher: It is the method of encoding by which units of plain-text are replaced with some other text.
Intractable Problem: Theoretically a solvable problem that takes too long time, in practice, for their providing useful solutions (e.g.deciphering cryptograms).Different alphabets are used in order to better distinguish plaintext and ciphertext, respectively.In fact these alphabets are the same.
A cryptosystem "S" can be defined by a 7-tuple: (M)) Often e ke are one to one and onto.

III. REVIEW OF LITERATURE
In [1], a cryptosystem has been presented that records cipher generated using information recording techniques.Then, features from this information can be extracted to distinguish one cipher from others.Also, these features can be used to transform from future information into cipher-text.www.ijacsa.thesai.org In [2] analysis of cipher text was presented by combed algorithms simultaneously to transform cipher-text into plaintext information and addressed some problems like:{Block Length detection, stream detection, entropy analysis, recurrence analysis, dictionary based analysis, decision tree based problems}.
In [3], pattern recognition based enciphering algorithms have been presented for the identification of patterns using different classification techniques like:{ SVM, Naive Bayesian , ANN, Instance based learning , Bagging , AdaBoostM1, Rotation Forest, and Decision Tree }.It can be noted that, these approaches requires improvement in accuracy with increase in number of encryption keys.
In [4], some methods have been presented with application of tools like support vector machine to identify block-ciphers for different inputs.The first one works on cipher text and second method takes partially decrypted text derived from a cipher text as input.The SVM based method performs regression using hetero-association model to derive the partially decrypted text.
Nuhn and Knight [5], worked towards automation of deciphering of ciphers.They have analyzed large number of encrypted messages found from libraries and archives, and trained by human effort only by a small and potentially interesting subset.Their work attempts to reduce human effort as well as error in decryption.Also they were interested to develop a distinguisher (first trained and then predict) to know which enciphering method has been used to generate a given cipher text.
In [6], ANN based tool has been used for decoding of a ciphertext by a pattern classification problem.
A survey of AI techniques for development of cipher analysis has been demonstrated in [7], here main objective was to investigate usage of advanced AI techniques in cryptography and they found that AI based security measures can be developed but their performance will depends on the data representation and problem formulation.
In [8], Deciphering of messages from encrypted one using genetic algorithm has been presented.It searches the key space in encrypted text.They identified limitation that it didn"t work with a two rotor problem in times comparable to those obtained using the iterative technique.
Frequency analysis in cipher-text provides a significant direction to cryptanalyst.According to Ragheb Toemeh and colleague in [9], this frequency analysis technique is used for framing objective function of cryptography.They studied the applicability of other methods like genetic algorithms for searching the key space of encryption scheme and presented cryptanalysis of polyalphabetic by applying Genetic algorithm.
Another survey based on parameters like queries, heuristics, erroneous information, group key exchange, synaptic depths has been conducted in [10], by Chakraborty and team .These parameters are suggested to improve the time complexity of algorithmic interception or decoding of the key during exchange.
In [11], A mathematical black-box model was proposed by Alallayah, AbdElwahed and Alhamami that builds the foundation for the development of Neuro-Identifier for determining the key from any given plain text-Cipher text pair.Some system identification techniques were combined with adaptive system techniques were used for the creation of the model.
All the above works and techniques follow in the direction of established long-fixed key sized algorithms.These algorithms rely on the ciphers would be secure enough if they are generated with keys of longer size.But in literature there are ciphers being generated through keys of short-fixed-length keys [12,13] varying with session to sessions.Ciphers generated through these AVK mechanism [14,15] are to be converted back into plain text.

IV. EXPERIMENTAL DESIGN
For designing experimental setup it is necessary to first understand the complete mechanism of how the cipher analysis process works?How cryptanalysis applies rules of English grammar?
For this various grammar rules will be applied on the given cryptogram at different stages for each replacement which will aid in obtaining the desired plain-text.
Given following examples will be used to develop design model.Let us assume that cryptanalyst has captured following cryptogram: "q azws dssc kas dxznn dasnn".Now cryptanalyst may process according to following steps:

1) To develop a model we take a hypothesis of solving a plain-text [Table 1]with one initial seed point .[Hint : wv]
2) Secondly the sentence is searched for smallest word (word with least number of letters), which in this case is the one-letter word 'q'.This word is replaced by plain letter 'A' as it has the highest priority for one-letter word according to the English grammar.
3) Next the first occurrence of double letter is searched in the sentence which is 'ss'.As it is in the middle of consonants, therefore it has to be a vowel according to English grammar and 's' is replaced by plain letter 'E' which has highest priority in this case.

4) Further the next smallest word is searched which is 'kea'. With this pattern the word with highest priority is 'THE'. Hence 'k' and 'an' are replaced by 'T' and 'H' respectively. 5) Now the word having the maximum number of letters replaced is 'HzVE' which can possibly be 'HIVE'('have' cannot be taken as 'A' is already used). Therefore 'z' is replaced with plain letter 'I'.
6) Next word 'dEEc' can be 'SEEN','BEEN','FEEL' etc.This word will be a verb, so we replace this word with 'SEEN'.
7) Now our sentence includes 'A HIVE SEEN', which is not possible as a hive cannot see.This states that we have possibly made some mistake with our assumptions before.Backtracking to the first assumption which was qa and www.ijacsa.thesai.orgchanging qi to correct the sentence.Also the assumption zi has to be changed to za. 8) Further in the next word 'SxAnn', the double letter 'nn' will be a consonant according to the English language.Therefore 'n' is replaced by plain letter 'L' which has the highest possibility in this case.9) Now 'SxALL' can possibly be 'SMALL' or 'SHALL'.But observing the sentence structure it can be a noun or an adjective so 'SMALL' is used.Hence 'x' is replaced by plain letter 'M'.
10) Finally we obtain the plaintext from the cryptogram given.
The above process can be summarized in Table1: V. EXPERIMENTAL FINDING It can be observed that a central place (like Dashboard) is needed to apply sources of knowledge.It would be useful to align with the assumptions made and to reason the consequences.Knowledgebase (a Data structure) KS will maintain log of many different sources of knowledge such as: Knowledge about grammar, spelling and vowels.At some point of time, specialization process (moving down) is followed (General to specific) during the replace of cryptogram with n=3 and ending with "e".(for THE ) and at some other points, Generalization process i.e. moving Up process is followed (from Specific to General) during the processing of cryptogram with n=4 and having pattern "?ee?"Which may be from {deer, beer, seen} but at the third position the word must be a verb instead of a noun, so "seen" should be final choice.

VI. FLOW DIAGRAM
In order to build a system flow of information from one component of system to other is depicted by fig.4, fig. 5 and fig.6.

function1-def spell_check(word)
This module checks the spelling of the word and returns true if the spelling is correct.

function 2-def replacefunc(word, file_word)
This module replaces the word with a word from file and adds the entry in assumption(dictionary containing cipher Letter-plain, Letter pair)

-def transposition( )
This function displaces the cipher letter with plain letter according to the displacement in the plain letter with its corresponding cipher letter (key) in the assumption (dictionary).If the words replaced don"t have correct spelling then the transposition is reverted back and the plain letters are again replaced with corresponding cipher letters which were added to assumption dictionary.If no pattern match is found for a word then that word is passed as the argument to backtrack, it will replace the plain letter with their corresponding original cipher letter as the #assumptions made before was not correct This function checks if a word (input) contains any double letter, if yes it replaces the double letter cipher with appropriate plain letter according to its position (i.e. if in middle it will be a vowel and if end it will be a consonant according to English grammar rules) function 10.-def one_letter( ) If the sentence contains one-letter-word in cipher then this function will replace that cipher with the possible plain oneletter-word and will make entry according to the assumption.

function11-def find_key(value)
This function finds the corresponding cipher(key) letter of the plain letter(value) given as argument from the dictionary "assumption" www.ijacsa.thesai.orgVIII.TEST CASE DEVELOPMENT Test cases are developed to validate and verify the working of system in two situations.Case-1 and Case-2.

Case-1: For testing transposition
Let sentence given by user: sent_1 = "k co c iktn" Case-2: For testing english grammar Input Sentence supplied by user: sent = "dwer er ed" Actions performed on the words of list two_w,hence on sent Replaces "er" with "OF"(first word in fil tw) sent = "dwOF OF Od" OK 12. replacefunc(er,OF) Replacement done on sent Replaces "er" with "OF"(first word in fil tw) sent = "dwOF OF Od" OK www.ijacsa.thesai.org 2) Incorporating plain-text of multiple languages in the process is also desirable.That is, current elucidation demonstrated in this work deciphers and outputs result in English.Maximum number of ciphers gives English plain-text on decryption.But over the communication channel languages local, non-English languages are also exchanged.For decryption of cipher text yielding other language plain text, the grammar rules of that particular language has to be applied.
3) Extension of character set with adding special characters and symbols will make the current system more flexible.The reason behind this is, day-by-day increasing amount of data transferred, and the need to encrypt it in a more complex way is mandatory for securing information from unauthorized users.Hence special characters and numbers are used to generate a more complex cipher patterns.Deciphering these ciphers using algorithm with condition for checking these symbols together with the English alphabets will be necessary.
4) Extension for n-gram (n>4) will increase the power of cipher analysis.That is checking cipher words with having length more than 4 and words which are not present in any knowledge source, needs to be worked out.Currently the Knowledge source, include files having upto 4-letter words.More generalized approach is needed for words having length more than 4.This may require a tool for checking the spellings of every possible word which states that the spelling is correct or not.

Fig. 2 .
Fig. 2. Components of AI-Enabled cipher text to plain text conversion S = (M, C, K d , K e , F, E, D) where: M = Set of all possible plaintext m i.e.M= {m 1 , m 2 .......}.Each message m i is the text to be encrypted (plaintext) and usually written in the lowercase alphabet: M = {a,b,c… x,y,z}.C = Set of all possible cipher text c i.e.C = {c 1 ,c 2 .......}.Each encrypted message (cipher text) c i is usually written in uppercase alphabet: C = {A, B, C… X, Y, Z}.K d = Set of all possible decryption key k i.e.K d = { k 1 ,k 2 ,....} K e =Set of all possible encryption key k" i.e.K d = { k d K e is a mapping from decryption key with corresponding encryption key.For Symmetric Cryptosystem Kd = K e and F=I where Encryption and Decryption keys are same.E is the relation E: K e (MC) that maps encrypting keys k e into encrypting relations e ke : MC.Each e ke must be total and invertible, but need not be a deterministic function or onto.D: K(CM) is the mapping that maps decrypting keys k into decrypting functions d k : CM.Each d k must be a deterministic function and onto.E and D are related in that K e = F (k) D (k) =d k = e ke -1 = E (k e ) -1 m = D [k] (E [F (k)]

function 5 .
-def trans_status( ) After doing transposition it checks whether the transposition made was correct or not.function 6.-def revert_trans( ) If the transposition made was correct then it displays the final sentence otherwise revert all the #changes made during transposition process function 7.-def pat_rep(lst, fil, cnt) pat_rep function replaces the words from list with suitable word from file according to condition.It has three arguments: lst: list of specific words(i.e2-letter, 3-letter etc) if the sentence containing cipher.fil: text file of containing 2-letter-letter etc plain-letter words corresponding to list.cnt: counter to mention the position in the file function 8.-def pattern(word, fil, cnt) If the word contains one or more plain letter pattern function matches the word with every word in file and replaces if a pattern is matched.It has 3 arguments: word: word from sentence containing a capital letter fil: corresponding file(for ex: 4_word file for 4-letter word) cnt: counter that mentions position in the file function 9.-def double_letter(word)

TABLE I .
CRYPTANALYSIS STEPS WITH KNOWLEDGE SOURCE USED INTERFERENCE