Case Based Reasoning : Case Representation Methodologies

Case Based Reasoning (CBR) is an important technique in artificial intelligence, which has been applied to various kinds of problems in a wide range of domains. Selecting case representation formalism is critical for the proper operation of the overall CBR system. In this paper, we survey and evaluate all of the existing case representation methodologies. Moreover, the case retrieval and future challenges for effective CBR are explained. Case representation methods are grouped in to knowledge-intensive approaches and traditional approaches. The first group overweight the second one. The first methods depend on ontology and enhance all CBR processes including case representation, retrieval, storage, and adaptation. By using a proposed set of qualitative metrics, the existing methods based on ontology for case representation are studied and evaluated in details. All these systems have limitations. No approach exceeds 53% of the specified metrics. The results of the survey explain the current limitations of CBR systems. It shows that ontology usage in case representation needs improvements to achieve semantic representation and semantic retrieval in CBR system. Keywords—Case based reasoning; Ontological case representation; Case retrieval; Clinical decision support system; Knowledge management


I. INTRODUCTION
Clinical Decision Support System (CDSS) that bear similarities with human reasoning and explanation have benefits.They are often easily accepted by physicians in the medical domain [1][2][3][4].Many of the early AI systems attempted to apply pure Rule-Based Reasoning (RBR) as ‗reasoning by logic in AI' for decision support in the medical area.However, for broad and complex domains where knowledge cannot be represented by rules (i.e.IF-THEN), this pure rule-based system encounters several problems.Because medical knowledge evolves rapidly, updating large rule based systems and proving their consistency is expensive.Knowledge acquisition bottleneck is one of the most critical problems in any knowledge-based system.A risk is that medical rule-based systems become brittle and unreliable.One faulty rule may affect the whole system's performance [5].Case-Based Reasoning (CBR) is a promising AI method that can be applied as -reasoning by experience in AI‖ for implementing CDSSs in the medical domain since it learns from experience in order to solve a current situation [6].Readers interested in CBR applications in healthcare can read these reviews [7,8,9].
CBR is based on remembering past experiences and using them to solve current situations that are similar to the ones already solved and stored.CBR is especially suitable when domains are difficult to formalize.In CBR, experiences in the form of cases are used to represent knowledge.A case may be a patient record structured by symptoms, diagnosis, treatment and outcome, and clinicians often reason with cases by referring to and comparing with previous cases.Many other AI and statistical methodologies can be used to implement CDSS.CBR is much better compared to RBR, Artificial Neural Network (ANN) and other statistical and machine learning techniques [10].For example, ANN is a black box and cannot always explain why they arrived at a particular solution.Moreover, they cannot always guarantee a completely certain solution, arrive at the same solution again with the same input data, or always guarantee the best solution.Aamodt and Plaza [11] provided scheme of the CBR working cycle comprising of four phases RETRIEVE, REUSE, REVISE and RETAIN.These phases depend on the existence of the Knowledge Base (KB) in the form of Case Base.Case representation is a critical success factor in CBR because the reasoning capabilities of CBR depend mainly on the structure and content of cases.Cordier [12] and Finnie [13] added a case base building phase to [11] that required case representation process.Cases can be represented as simple feature vectors, or they can be represented using any AI representational formalism such as frames, objects, predicates, semantic nets, or rules.The choice of particular representational formalism is largely determined by the information to be stored within a case.
There is a lack of consensus within the CBR community to determine case contents and representational formalism.However, two pragmatic measures can be taken into account in deciding both the information to be stored in a case and the appropriate representational formalism: the intended functionality and the ease of acquisition of the information represented in the case [14].
Cases are the basis of any CBR system: a system without cases would not be a case-based system.Yet, a system using only cases and no other explicit knowledge (not even in the similarity measures) is difficult to distinguish from a nearestneighbour classifier or a database retrieval system.In other words, such a system does not exploit the full power of CBR, resulting usually in poor system performance due to inefficient retrieval based upon case-by-case search of the completely case base.Ontologies play an important role in enhancing the capabilities of CBR systems.They improve case indexing and retrieval, case representation and storage in case base, case adaptation and case retention.They solve the problem of knowledge acquisition bottleneck by allowing the case base to www.ijacsa.thesai.orgbe represented as ontology and allowing discovery of cases from existing domain ontologies [15,16].It facilitates the creation of Knowledge Intensive CBR (KI-CBR) systems where cases, in some way or another, are enriched with explicit general domain knowledge [17].The role of the general domain knowledge is to enable a CBR system to reason with semantic and pragmatic criteria, rather than purely syntactic ones.By making the general domain knowledge explicit, the CBR system is able to reason in a more flexible and contextual manner than if this knowledge is compiled into predefined similarity metrics or feature relevance weights [18,19].What is more, ontologies can be used for case representation, which enhance the integration between case base and domain knowledge.
Case based reasoning is applied in different fields ranging from non-medical domains [20] to medical domain [21].Since 1997 over 130 major companies worldwide were fielding CBR applications [22].As the paper concentrate on medical field and because of space restrictions, most medical case based reasoning systems have been collected in these surveys [8,9].Case representation in medical domain applications differ from others in three points (1) the form of used ontology as medical systems use standard ontologies as ICD [23], (2) the integration of Clinical Practice Guidelines (CPG) in case base ontology, and (3) the usage of soft-CBR because medical data are incomplete and vague, in most cases [24].
Although case representation is the most critical decision in building CBR systems, there is a shortage in surveys for this aspect.Bergmann et al. [25] has provided a survey for traditional case representation methods.This survey is old and did not discuss the semantic aspect in case representation using ontologies and rules.
This paper reviews all existing case representation formalisms in CBR concentrating on the logical structure of cases in case base.Cases can be physically stored in databases (relational or object oriented), XML files or even flat files.Cases can be represented using traditional methods as feature vector.Moreover, it can be represented in more intelligently enhanced ways using semantic mechanisms as ontologies and rules.The paper concentrates on the role of ontology in CBR named ontological case based reasoning.The databases ScienceDirect, IEEEXplore, and Springer have been used in our research.Moreover, the paper have done an exhaustive literature search in some proceedings of yearly CBR conferences as the European Conference -Advances in Case-Based Reasoning‖ and the International Conference on Case-Based Reasoning.Because there is very little researches in the case representation methodologies, our search strategy concentrated on collecting case based reasoning systems between 1990 and 2012 and studying their case representation strategies, evaluating, grouping and comparing them.Screening has based on titles and abstracts containing combinations of keywords -case representation strategy, methodology, model‖, -case base organization‖, -case based reasoning‖, -semantic CBR‖ + -ontological case base‖, -ontology based CBR‖, -case base ontology‖ + -case base storage model, ontology‖.The paper will be organized in 5 sections as follows.Section 2 discusses CBR definitions, models and importance of case representation.Section 3 discusses traditional case representation methods.Section 4 discusses semantic case representation methods.Comparison between semantic case representation methods is discussed in section 5. Section 6 discusses semantic retrieval methods.Section 7 provides the CBR challenges.Finally, the conclusion is discussed in section 8.

II. CASE BASED REASONING
CBR is a problem solving methodology that aims at reusing previously solved and memorized problem situations, called cases.A case is a concrete problem solving experience.One of the main assets of CBR is its eagerness to learn.Learning in CBR can be as simple as memorizing a new case or can entail refining the memory organization or meta-learning schemes.CBR has developed from these premises, and been found suitable to solve any type of problem, but preferably experimental sciences problems, where cases are readily available in the form of patients, living beings, or natural phenomena.Kolodner [26] defines case as "A case is a piece of knowledge in a particular context representing an experience that teaches an essential lesson to reach the goal of the reasoner."Cases may be kept as concrete experiences, or a set of similar cases may form a generalized case.There are many models for CBR lifecycle such as Hunt's [27], Allen's [28], Kolodner and Leake's [6], and Aamodt [11].All have nearly similar phases.According to Aamodt [11], CBR working cycle can be described best in terms of four processing stages (R4 model) as shown in Fig. 1   Each of these steps can be decomposed in sub-steps as shown in [11].In fact, the representation of cases is important for CBR because CBR is heavily dependent on the structure and content of its collection of cases.The previous models do not contain a separate phase for case representation and assume that the case base is ready for the first process (case retrieval).Gavin and Zhaohao in [13] propose a model that considers case base building as the first step.In the R5 model, repartition, www.ijacsa.thesai.orgretrieve, reuse, revise and retain are the main process steps in the CBR.While the other process steps are the same as those in the R4 model, repartition is used to build the case base.
Cordier proposed a model [12] that composed of five stages: (1) preparation, (2) memory retrieval, (3) reuse (adaptation), (4) revision, and ( 5) memorization (learning).This model also asserts a case base building step, preparation, where a set of cases is capitalized in the knowledge base (base case).This model is used by Maalel et al. [29] to build CBR system to manage railroad accidents.Finally, building the case base is critical to the success of the CBR system.A case representation methodology must be selected which determine the content and structure of the case base.Case structure consisting of the following five parts: (1) a problem description (e.g.symptoms); (2) a solution (e.g. a diagnosis or a therapy) and, sometimes, means of deriving it; (3) outcome (e.g. the result of applying the solution); (4) explanations of results, if necessary and available, of why it might not have worked as well as expected; (5) lessons that can be learned from the experience.Beside the case base, CBR makes use of other types of knowledge as the vocabulary, similarity measures, and adaptation knowledge [30].
Case representation in CBR makes use of familiar knowledge representation formalisms from AI to represent the experience contained in the cases for reasoning purposes.The two main categories of case representation are traditional methods that are discussed in section 3 and semantic methods that are discussed in section 4. Case representation and case retrieval are the main and most important steps in CBR [31].The efficiency of case retrieval algorithm is determined and affected directly by the case representation method used [32].As a result, it is more logical to introduce case retrieval methods after surveying the representation methods to link them together.Case retrieval methods will be discussed in section 6.

III. CBR TRADITIONAL CASE REPRESENTATION METHODS
A CBR system should be organized with some basic elements: the knowledge representation, to depict the cases, and the similarity measure to define how much a case is similar to another one.In a CBR system, the knowledge is in the case base.One case contains the knowledge of an experimented episode and the context in which the knowledge can be applied.When considering case representation, this problem needs to be studied from two points of view: first, the conceptual models that are used to design and represent cases, and second the means of implementing cases in the computer [25].Case representation in CBR contains three issues: defining which attributes describe a case, defining the structure for describing the case content and how to organize the cases in the case base [33].Case base can take any format to physically store cases.The case base can be a relational or object oriented database, XML files or plaintext files.Cases must be indexed so that the retrieval step can use a structure to have access to cases.Usually there is a separation between the case storage from the indexing structure because indexes can be built without knowing how and where the cases are stored.
Moreover, different indexes can be defined upon the same set of cases to allow the evaluation of different indexing techniques.This work concentrates on the logical structure of cases that can be stored in any format in the case base.A large variety of representation formalisms have been proposed such as feature vector (or propositional) cases, structured (or relational) cases, and textual (or semi-structured) cases.

A. Feature vector representation
This is the simplest form of case representation where each case is represented as a set of features describing the problem (attribute-value) and associated solution (see Fig. 2 [34]).All cases have the same kind and the same number of features [33].The similarity is straightforward since each feature is compared with its corresponding one.There are no relationships or constraints between features.Moreover, exact similarity is performed only (i.e.no semantic similarities are possible since there is no domain knowledge) [35].We must first have a sufficiently complete conceptual model about the problem, then compare problem features between the new case and past cases in the case-base, and finally get the most similar past case for reuse by similarity measuring.If the conceptual model is known incompletely or ambiguously, then the subsequent steps cannot continue [34].Fig. 2. the retrieval mechanism of CBR The PROTOS system [36] uses a feature vector approach for domains with weak or intractable theories.A category is extensionally represented as a collection of cases called exemplars (see Fig. 3 [11]).
A new case is classified into a category if a match can be found between an exemplar and the new case.This matching process is knowledge intensive and tries to build an explanation that connects the features of the new case with an exemplar.Since each explanation is a path constructed inside a semantic net, retrieval is the process of explaining the (similarity) relation between a new case and an exemplar.Unlike most early CBR systems that use feature vector representations, PROTOS already uses a knowledge-intensive similarity measure.Features can be organized in a hierarchy where generic features are in top of tree/graph and specific features in the bottom [37].

B. Frame-based representation
Frames provide a natural way for the structured and concise representation of knowledge.In a single entity, a frame combines all necessary knowledge about a particular object or concept.A frame provides a means of organizing knowledge in slots to describe various attributes and characteristics of the object.Each frame has its own name and a set of attributes, or slots, associated with it [38].In CBR terminology, a frame can represent a case and each frame slot is a case feature.A slot may contain a primitive value or a pointer to another frame.In the same way, in CBR, features can be primitive (simple) or complex (compound).Cases represented as frames can have semantic relationships because a case may have a feature (slot or attribute) whose value is pointer to another frame.Moreover, as inheritance is an essential feature of frame, a hierarchy of cases connected by IS_A and PART_OF relationships can be formed.This case hierarchy enhances the semantic retrieval and indexing of cases and adaptation operations.Frame based representations have been (partially) formalized by description logics [39].The notion of -cases as terms‖ [40] argues that viewing structured cases as terms in feature logics (a particular brand of description logics) helps in better understand several aspects of case-based reasoning.Domain knowledge can be integrated using a sort hierarchy and the issue of composite cases (cases that group together other objects or sub-cases) is understood by the fact that a sub-term is also a term.Finally, the notion of similarity between two cases is linked to the concepts of subsumption and anti-unification of terms.

C. Object Oriented (OO) representation
The feature vector model is not suitable for building a complex case data structure.In this situation, OO representation works much better.The OO method needs less memory storage to represent each case.Furthermore, since OO is a natural way of representing IS-A, HAS-A and PART_OF relationships, case representation is easier for users to understand.Cases are represented as collections of objects, each of which is described by a set of attribute-value pairs.The structure of an object is described by an object class [41].CASUEL [42] is an early example in plain ASCII, but recent languages are XML compatible.Generally, with the objectoriented structures of the cases, the similarity measures follow the "local-global" principle [43].

D. Textual representation
Textual case-based reasoning is a CBR where some or all of the knowledge sources are available in textual format.It aims to use these textual knowledge sources in an automated or semi-automated way for supporting problem solving through case comparison [44].Many techniques for textual case base representation are available.Burke et al. [45] developed FAQ-Finder, a question-answering system.It starts with a standard Information Retrieval (IR) approach based on the vector space model, where cases are compared as term vectors with weights based on a term's frequency in the case versus in the corpus.In addition, FAQ-Finder includes a semantic definition of similarity between words, which is based on the concept hierarchy in WordNet.Wilson [46] investigated cases that required mixed representations including both textual and nontextual features.Another group of projects focused on developing methods to map textually expressed cases into the kinds of structured representations used in CBR systems such as SPIRE [47].In [48], textual case representations decompose the text that constitutes a case into information entities (IEs).IE represents any basic knowledge item such as attribute-value pair.As a result, a case consists of a unique case descriptor and a set of IEs linked to it.The case base is a network with nodes for the IEs observed in the domain and additional nodes denoting the particular nodes.An IE is a word or a phrase contained in the text that is relevant to determine the reusability of the episode captured in the case.The set of cases that form the case base is organized in the form of a case retrieval net (CRN), which is a directed graph with nodes representing cases and their IEs.These nodes are linked according to their similarity.Hence, knowledge about similarity is encoded into the strength of the links between the nodes in the CRN.Case retrieval is similar to activation propagation in a neural network: the IEs that occur in the current problem are activated and this initial activation is propagated through the case retrieval net according to the similarity-based link strength.A promising and highly ambitious approach, using natural language processing (NLP) to derive a deep, logical representation, has been proposed for the FACIT project [49].It derives a first-order representation of the case texts.Weber et al. [50] introduced a semi-automated approach to populate case templates from textual documents.This method is based on knowledge engineering, NLP and data mining.Bag-Of-Words (BOW) representation is introduced by Brüninghaus 51,52].They applied text classifiers to automate the mapping from texts to structured case representations.Brüninghaus [53] argues that text representation that combines some background knowledge and NLP combined with a nearest neighbour algorithm leads to the best performance.As a result, textual CBR will mostly require textual descriptions of cases to be mapped onto structural representations that facilitate computationally comparing cases [54].

E. Hierarchical case representation
The previously discussed approaches typically represent cases at a single level of abstraction.However, Cases can be represented using multiple representations at different levels of abstraction.The basic idea behind these approaches is to represent a case at multiple levels of detail, possibly using multiple vocabularies.When a new problem must be solved, similar cases at appropriate levels of abstraction are retrieved from the case base, and solutions from these cases will be combined, and these solutions may be refined [55].Watson [56] asserted that as the problem space increases (number of Category-1 Weakly prototypical exemplar Strongly prototypical exemplar Difference (feature-1 feature-4) Difference (feature-2) www.ijacsa.thesai.orgcases features), it becomes statistically less likely that a close matching case will exist.Thus, the CBR system will return a distant solution (see Fig. 4 [56]).A potential solution to this problem is that, where suitable, a large problem is divided into several smaller sub-problems, each of which can be solved separately using CBR (Fig. 5 [57]).The sub-solutions can then be combined to produce an accurate solution to the entire problem [57].The advantage of this approach is that each individual subproblem is represented by a case-base that is significantly smaller (in terms of problem and solution space size) than if the whole problem were represented by a single case-base.Because each sub-problem space has fewer case features, the theory predicts, that each individual sub-case retrieval distance will be shorter than for the un-decomposed problem.Therefore, the adaptation distance will be shorter and a better sub-solution will be generated.Assuming there are no conflicting constraints, the re-composition of sub-solutions will produce a better solution than would have been obtained by using a single large case-base.One way that has been suggested to reduce constraint problems with solution recomposition is to use contextual information to guide retrieval [58].

F. Predicate based case representation
A predicate is a relation among objects, and it consists of a condition part and an action part, IF (condition) and THEN (action).Predicates that have no conditional part are facts.Cases can be represented as a collection of predicates [24].The advantage of predicate representation is that it uses both rules and facts to represent a case, and it enables a case-base designer to build hybrid systems that are integrated rule/casebased.Although the traditional data models described above are useful to represent and to index cases, in many practical situations when specifying a case, it is often difficult to articulate the feature values precisely.This uncertainty may be caused by incomplete, missing, unquantifiable information, overlapping of the data regions, or user ignorance.Therefore, to make cases more expressive in dealing with such situations, soft computing techniques are introduced [24].These techniques include fuzzy logic, neural network, rough sets and data mining.These techniques are outside the scope of this paper.

IV. CBR SEMANTIC CASE REPRESENTATION METHODS
The above case representations may be characterized as being knowledge-poor.They do not contain many (or any) structures that describe the relationships or constraints between case features.However, these case representations usually describe relatively simple cases with few indexed features, perhaps in the order of ten to twenty indexed features.In many situations, additional knowledge (background knowledge) is required with the case base to cope with the requirements of an application.In [59], the author integrated two kinds of rules with the case base.The first kind is Completion Rules that infer additional features out of known features of an old case of the query.These rules complete the description of a case.The second type is Adaptation Rules that describe how an old case can be adapted to fit the current query.As in Fig. 6 [59], the general knowledge, in the form of rules, will guide the CBR query and adaptation operation.However, creation of a rulebase, managing its execution in the form of forward or backward chaining, and integrating rules with CBRare challenging.
Knowledge-Intensive CBR (KI-CBR) assumes that cases are enriched and/or coupled with general domain knowledge [17,60,33].In CREEK [18] cases are embedded within a general domain model.It provides a strong coupling between cases and general domain knowledge in that cases are submerged within a general domain model.This model is represented as a densely linked semantic network.Concepts are inter-related through multiple relation types, and each concept has many relations to other concepts.The network represents a model of that part of the real world which the system is to reason about, within which model-based reasoning methods are applied.From the view of case-specific knowledge, the knowledge-intensiveness of the cases themselves are also increased, i.e. the cases become more -knowledgeable‖, since their features are nodes in this semantic network [61].Fig. 7 [18] shows the semantic network www.ijacsa.thesai.org that integrate cases with the general domain knowledge.It illustrates the three main types of knowledge in CREEK, a toplevel ontology of generic, domain-independent concepts, the general domain knowledge, and the set of cases.The retrieva l of relevant cases will based on the semantic and pragmatic criteria, rather than purely syntactic ones.By making the general domain knowledge explicit, the case-based reasoner is able to interpret a current situation in a more flexible and contextual manner than if this knowledge is compiled into predefined similarity metrics or feature relevance weights.Fig. 6.Architecture for integrating general knowledge in to CBR Studer et al. [62] defined ontology as "a formal, explicit specification of a shared conceptualization.‖Ontologies can be useful for designing KI-CBR applications because they allow the knowledge engineer to use knowledge already acquired, conceptualized and implemented in a formal language, reducing considerably the knowledge acquisition bottleneck.It has powerful abilities in knowledge acquisition, representation, and semantic understanding [63].Moreover, the reuse of ontologies from a library also benefits from their reliability and consistency.Ontologies may help in the creation of complex, multi-relational knowledge structures to support the CBR methods.
In CBR, knowledge is distributed among the four knowledge containers: vocabulary, similarity measures, adaptation knowledge and case base.Ontology plays critical roles in representing all of these knowledge containers.For example, as the vocabulary to describe cases and/or queries, as a knowledge structure where the cases are located, and as the knowledge source to achieve semantic reasoning methods for similarity assessment and case adaptation that are reusable across different domains [64].Bergmann et al. [65] concluded that ontology-based knowledge management and CBR knowledge management complement each other very well.Most ontology-based systems utilize logic-based deductive inference, while CBR systems provide a search functionality that makes use of similarity measures for ranking results according to their utility with respect to a given query.On the one hand, logic deduction produces only correct and provable results, which are consequences of the ontology and metadata.
On the other hand, CBR retrieval suggests results even in the case that no exactly matching answers can be found.As a result, each method solves problems that the other method cannot solve.

A. Ontology as the CBR's domain vocabulary
This approach build case base in any traditional methodology as feature vector and store it in relational database, and build ontology for domain knowledge (domain vocabulary).The case structure is defined using types from the ontology even if the cases are not stored as individuals in the ontology.There are also simple types like string or numbers that are considered in the traditional way [64].
Regarding the query vocabulary, there are two options to define the queries:  Using exactly the same vocabulary used in the cases, i.e. the same types used in the case structure definition.
 Using the ontology as the query vocabulary, that allows richer queries and semantic retrieval.The user can express better his requirements if he can use a richer vocabulary to define the query.During the similarity computation, the ontology allows to bridge the gap between the query terminology and the case base terminology.
In this approach, Case base is stored in a SQL database, the retrieval and similarity computation methods are configured as (Nearest Neighbour) NN based on numeric and standard similarity functions, while adaptation is defined as a substitution method that relies on DLs to find suitable substitutes on the domain model.Numerical similarity functions based on ontologies is used where similarity between cases can be divided into two components that are aggregated.The computation of a concept based similarity that depends on the location of the concept in the ontology and the computation of a slot-based similarity that depends on the fillers of the common attributes between the compared objects.www.ijacsa.thesai.org

B. Ontologies as case base and domain vocabulary
For better communication between case base and domain vocabulary, Assali et al. [66] created ontology for domain vocabulary and ontology for case base.In [66], it is based on a knowledge base that integrates domain knowledge along with cases in an ontological structure, which enhances its semantic reasoning capacities.Users can describe their cases using instances from the knowledge base.The resulting case base is heterogeneous where cases do not always share the same attributes (dynamic representation of cases).Inspired by Lamontagne and Lapalme [67], COBRA architecture is composed of two main parts (see Fig. 8 [67]): processes and knowledge containers.
 Processes: This is the functional part of the system and consists of off-line and on-line processes.
 Knowledge containers: As in Richter [19].COBRA represented case base and domain knowledge in an ontological structure to allow a better communication between the knowledge about the cases and the domain [68].
The domain ontologies are core ontology that contains generic concepts that provide the context and domain ontology that are specializations of other concepts in the core ontology.The case ontology consists of three main parts: a problem description part describing the context of the experience, a failure mode part describing the type of failure, and a cause part describing the different possible causes of this failure, see Fig. 9 [68].The retrieval is guided by the adaptability [69]; i.e. a case is retrieved if its solution can be reused to construct a solution for the target problem.To determine adaptable cases, given a target problem, each attribute of the query must be compared to its corresponding attribute in each source case.In homogeneous case bases, all cases share the same predefined structure, and thus, corresponding attributes are already identified.On the other side, heterogeneous case bases contain cases that do not share a predefined structure (in terms of attributes), which complicates the retrieval process.The problems of heterogeneity are solved by case alignment [66].This approach gets similarities or mapping between cases attributes of query and target cases using the support of domain ontology, and using the notions of similarity regions and roles of attributes.The same scenario is followed by Maalel et al. [70] to develop an ontological CBR system for railroad accidents application.Their methodology depended on [71,72] ontology development methodologies.
To enhance the case retrieval and case adaptation, Maalel et al. [29] created domain ontology from which cases are instantiated in the case base and operational ontology in the form of decision rules to restrict the search space and guide case adaptation (see Fig. 10 [29]).The adaptation rules are not created in a standardized form suitable for ontology such as SWRL.In addition, the process of creating these rules is not straightforward.The COBRA was a domain-dependent model since it created ontologies for specific domain.JCOLIBRI (Cases and Ontology Libraries Integration for Building Reasoning Infrastructures) solved this problem and created a knowledge intensive and domain-independent architecture for CBR [64,73,74].

C. Domain independent ontological CBR framework
Díaz-Agudo et al. [75] created a domain independent architecture to help in the design of knowledge intensive CBR systems.www.ijacsa.thesai.orgPSM describe the reasoning process of a Knowledge Based System in an implementation-and domain-independent manner [76].As shown in Fig. 11 [76], COLIBRI has two-layered architecture.The lower layer provides with domain specific knowledge while the top layer is used as a bridge between the domain knowledge and the generic PSMs.Ontology Server contains all reusable and formal domain specific ontological knowledge.This way, the specific domain model is interchangeable and the same knowledge could play different roles within different contexts of problem resolution.Moreover, COLIBRI integrates different knowledge sources, range from general domain knowledge to CBROnto knowledge about tasks and methods.
To take advantage of the domain knowledge acquired by reusing ontologies from Ontology Server, the knowledge needed by the CBR methods, or at least part of it, should be expressed in a similar way.CBROnto provides terminology about CBR that captures CBR semantically important terms and provides vocabulary for describing issues involved in the CBR methods.CBROnto includes CBR dependent but domainindependent terms that make possible different types of CBR [77].These terms are used as the junction between the domain knowledge and the Problem Solving Methods that are defined using CBR terminology but with a domain-independent perspective (Fig. 11).CBROnto aims to unify case specific and general domain knowledge representational needs.All domain terms (concepts and relations) are classified according to the role they play in CBR methods.CBROnto terminology serves as the syntactic and semantic -glue‖ between the domain terminology and the reusable and generic PSMs.
That mechanism allows the CBR methods to be domain independent because they only refer to the CBROnto terms.CBROnto ontology includes general terminology (Fig. 12 [17]):  Note that, designer doesn't classify one by one every domain term because due to the inheritance mechanism only the top-level terms in the hierarchies should be classified.
The activities performed by the CBR application designer to model a domain, and to formalize it as a knowledge base include:  The designer determines what domain is to be modelled, and selects from the library those ontologies that are potentially useful.
 The domain terminology from the ontologies has to be integrated as two term hierarchies: the concept hierarchy rooted by the CBROnto's Thing concept, and the relation hierarchy, rooted by the CBROnto's Binary-Tuple relation (see Fig. 12).
Each case is described by CBROnto and domain vocabulary.In this sense, the CBR processes are domainindependent but they are guided by the domain terminology organized below (in the subsumption hierarchies) the CBROnto terms.This model describes CBR processes using tasks and methods (PSM).These tasks and methods have global CBROnto task/method hierarchies.This model used the task decomposition of [11] for CBR processes.CBROnto includes the capabilities for describing a library of PSMs associated to the main CBR tasks.CBROnto describes CBR PSMs by relating them to terms and relations regarding tasks, requirements and domain characteristics.CBROnto includes terms of the method description language that are used to formalize PSMs.

D. XML-based case representation with ontology
Recently, several XML-inspired case representation languages such as CBML and OML have been introduced into the CBR community.They are devised to facilitate case interchanging in the web and could be viewed as structured representation languages that facilitating the encoding of case knowledge into web documents.The following issues should be taken into consideration:  Some standard vocabularies for case description are needed, which ensure the success of case interchanging and distributed case-based reasoning.
 Some conveniences for integrating domain vocabularies should be provided.
 The web case language should be flexible to fulfil the needs of both unstructured and structured case representations.
Huajun et al. [78,79] propose a web-oriented case representation RDF-based Case Markup Language CaseML for encoding case knowledge into web documents that allowed the usage of case base in the semantic web.To achieve the purpose of globally interpreted case base, the following issues are considered: www.ijacsa.thesai.org Some standard vocabularies for case description are needed, which ensure the success of case interchanging and distributed case-based reasoning.
 Some conveniences for integrating domain vocabularies (ontologies) should be provided.The authors defined sets of standard classes (i.e.CaseBase, case, problem, solution, etc) and properties (i.e.domainOntology, hasProblem, hasSolution, etc) to define the structure of the case base ontology shown in Fig. 13 [78].CaseML offers basic building blocks for publishing case knowledge onto the web and facilitates the sharing and interchanging of experience knowledge and building distributed CBR systems.Besides, it integrated domain ontologies with the case base ontology with enhanced CBR processes.What is more, [78] provided a generic architecture for the CBR (OpenDisCBR) in an open and distributed environment.This architecture emphasises on the integration of case knowledge with web ontologies.In this architecture, heterogeneous case bases cooperate with each other through some domain web ontologies.

E. OWL based and medical domain case representation methodology
Juarez et al [32] proposed case representation ontology for medical domains.This method depends on heavyweight ontologies, and temporal and context aspects are added (see Fig. 14 [32]).Case ontology is integrated with domain ontology for semantic case retrieval.The domain ontology contains domain specific concepts and standard concepts as ICD10.Context knowledge is collected from patient information as demographic information.Cases are instances of this ontology.This technique defines five kinds of cases: Complete, Valued, Solved, Contextualized, and Valid (and their opposites: Incomplete, Unvalued, Unsolved, Uncontextualized and Invalid).These kinds allow similarity at 3 levels of discrimination: (1) case representation ontology inference criteria, (2) top level description criteria, and (3) problem similarity criteria.
As a result, using ontology in case representation and in reasoning as a domain vocabulary enhances the CBR systems.
When case base is in the form of ontology, Ontology Description Logic Inference can be used to find relationships between cases.Reusability and sharing is also enhanced very much.What is more, the same ontology can be used in different systems and in different environments.Integration between ontological case base and domain ontology is enhanced and cases can contain textual, numerical and concept features.The semantic similarity and retrieval of cases is achieved, where users can express their request in a variety of terminology and the system understand user query by ontology terminology and ontological reasoning (i.e.description logic).

V. A COMPARISON BETWEEN ONTOLOGICAL CBR METHODS
To the best of our knowledge, there are no previous researches for comparing case representation methods neither traditional nor ontological.The previous works as [9] have concentrated on the CBR systems as a whole and case representation has not mentioned.As a result, the paper depends on the existing ontological CBR systems as a whole for comparison.A comparative study between various systems that use ontology will be done.These systems are heterogeneous, and the paper depends on the self-explanatory features in each system and introduce ours.This strategy is followed by many researches [8].The purpose of this comparison is to discover the weakness points and the challenges for the future enhancements.The main focus is on:  Whether the system uses only simple traditional features as textual, numerical.
 Whether the system uses a traditional features and ontological features.
 Whether the system uses ontologies for case base.
 The form of integrated domain knowledge (not used, rules or ontology).
 Whether the case base ontology includes default knowledge, temporal knowledge and context knowledge.
 Whether the adaptation knowledge utilized ontology or rules.
 Depending on the application domain, whether the system used standard domain ontology as SNOMED CT in medical domain.
 The querying and retrieving capabilities of the system, whether it support semantic retrieval, the indexing structures used, and whether it support results explanation.
 Whether the ontology semantic is enhanced using rules.This metric is very important in medical domain www.ijacsa.thesai.orgbecause medical ontology logic can be improved by domain expert and CPG rules.
 Whether the system support representation of cases with different internal structure.
 Whether case updating and maintenance is supported.
These are the most important metrics for comparing CBR systems.Moreover, the usage of ontology affects all these parameters as retrieval query, explanation, etc.Some other metrics such as the integration with other AI techniques, feature weighting methods, case mining, feature selection and/or extraction, integration with other reasoning systems, and others are discussed in other works as [8] and these aspects have little relation with case representation.One exception is the relationship between fuzzy logic and case representation.In the recent period, especially in medical domain, the representation of vague knowledge in case base has gotten a great attention.The introduction of fuzzy ontology and fuzzy semantic rules [80] will enhance the case representation.However, this aspect will be handled in future works.Because ontology-based CBR systems outweigh CBR systems that depend on traditional methods for case representation, the comparison will involve four KI-CBR systems and nineteen measures (see table 1).The paper checks whether a system support a feature or not.The most complete system is JCOLIBRI.Its completeness is 52.6% that is a low level.All of the methods suffer from shortages.CBR systems face great challenges that need solutions in future researches.The success of CBR system cannot be achieved without the cooperation of all these aspects ranging from query creation to case base maintenance.

VI. SEMANTIC RETRIEVAL METHODS
The key to case-based reasoning is to retrieve the most similar case in a fast and accurate way [81].Thus, the case similarity measurement is distinctly important which has a direct influence on the matching process.In some applications of CBR, it may be adequate to assess the similarity of the stored cases in terms of their surface features where similarity is computed by k nearest neighbour algorithm.In other applications, it may be necessary to use derived features obtained from a case's description by inference based on domain knowledge.In complex applications, cases are represented by complex structures (such as graphs) and structural similarity is required for retrieval.In case of surface features retrieval, a CBR system retrieves the k cases with maximum similarity to the target problem.However, sequentially processing all cases in memory has complexity O (n), where n is the number of cases.Optimization techniques are required such as parallel processing, indices or creating binary tree that organize cases in case base according to their similarity.Structural similarity is computationally expensive.One way for enhancement is to combine surface and structural similarity as in MAC/FAC model [82], Spread Activation Model [83], using generalized cases [84].Improvement to the retrieval algorithms includes techniques for improving the speed of retrieval and for improving solution quality.Problems likely to affect solution quality include the use of inadequate similarity measures, noise, missing values in cases, unknown values in the description of the target problem, and the socalled heterogeneity problem that arises when different attributes are used to describe different cases.Each case representation method has a suitable retrieval algorithm.The paper concentrates on the ontological or semantic similarity measurement.The similarity computation of two ontology concepts or instances can be divided into two components: a concept-based similarity (or inter-class similarity) that depends on the location of the concepts in the ontology, and a slot-based similarity (or intra-class similarity) that depends on the fillers of the common attributes between the compared objects.Let q, q' be two instances of the ontology.The concept-based similarity, simcpt, is computed using the measure of Wu and Palmer [85] defined as follows (Eq.1): (1) Dendani [86] adds the weights of the attributes wq to enhance similarity.Weight can be represented as attribute in ontology, and there are many methods to calculate it (Eq.2).

Sim cpt
(3) Garrido [88] defined a simple method for semantic similarity with low semantics as in Eq.4: Where prof is the depth of a concept or an instance in the ontology hierarchy (only inheritance relations), and LCS is the Least Common Subsumer concept of two instances.In a special case, when q and q' represent the same instance in the ontology, then: prof (LCS (q, q')) =prof (q), and thus: sim cpt (q, q') = 1.
The more specific a concept that subsumes the concepts being compared, the more similar the concepts are.The above two approaches for concept similarity can be enhanced by adding the context knowledge in the ontology.This way the only concepts valid to comparison must be in the same context.
Another possibility for improvement is using the Similarity Region that is sub-hierarchy of the ontology where concepts and instances are comparable with each other [66].In addition, other relations such as PART_OF, CAUSE need to be considered in similarity measurement in some ways.
The slot-based similarity measure, sim slt is defined as follows (Eq.5):

∑ | |
(5) www.ijacsa.thesai.orgWhere CS (Common Slots) is the set of common simple attributes (properties) of q and q', |CS| is its cardinality, q.s(or q'.s) represents the simple attribute s of q (or q'), and sim(q.s,q'.s) is the similarity between the two simple attributes.To compute this similarity, they have defined two calculation modes that can be associated to attributes: ignore: for the properties that must not be taken into account in the similarity.
exact: a strict mode that allows verifying the equality of property values.By considering these modes, the similarity sim (q.s, q'.s) is calculated as follows (Eq.6): Where w q.s is the calculation mode associated to the simple attribute q.s, and v q.s is the value of this attribute in q.This method does not handle three points: (1) if the attribute is also a concept, it may be needed a loop of localglobal similarity, (2) the method deals with exact quantitative attributes.When attributes have inexact values which need ontology-based fuzzy CBR or when the values are text, interval (time) or list,(3) when the two measured attributes are concepts which have different number of properties, and (4) the function speak about exact or not exact evaluation.It also does not take into account the largest and smallest values of the measured attribute.The global similarity measure of q and q' is given by the following formula (Eq.7): sim (q, q') = (1 − α) ×sim cpt (q, q') + α ×sim slt (q, q') (7) Where α is a parameter allowing controlling the importance of the slot-based similarity in the calculation.
The global similarity between the two cases C 1 , C 2 can be calculated as follow (Eq.8): Where C 1 (q 1 , q 2 … q n ), C 2 (q' 1 , q' 2 … q' n ) are the two compared classes.w i is the weight of attribute q i .The above methodologies compare the query case with all cases in the case base ontology, but case base size is increasing exponentially when new cases are retained.Case base clustering, multi-way indexing, context knowledge, case classification ontology, and/or combination with RBR are critical to lower the search space and enhance case retrieval speed especially in time critical systems as ICU.Moreover, the semantic relationships between cases problem features can be inferred using DL inference beside similarity functions.The semantic relationships between cases solution features can be used to discover solution for unsolved cases.What's more, the Eq. 8 assumes that both query and retrieved cases have the same number and type of features.The number of features may not be the same between cases, the features may not be comparable because of its semantic, noise may be exists in the query case.Adding defaults and benefit from ontological reasoning can mitigate this problem.Similarity is also increasingly being combined with other criteria to guide the retrieval process, such as how effectively the solution space is covered by the retrieved cases; how easily their solutions can be adapted to solve the target problem; and how easily the proposed solution can be explained.What is more, query can be represented as small ontology.This way ontology matching between query ontology and case base ontology with support of domain ontology and DL reasoning can enhance semantic retrieval.These points require further research.

VII. CBR FUTURE CHALLENGES
Ontological CBR has many challenges to reach its full functionality.Challenges exist in all aspects and processes of the CBR system such as: case base creation, query building, case semantic retrieval, case adaptation, case retention, case www.ijacsa.thesai.orgbase update and maintenance.Here, the paper will discuss some of these challenges.

1) Case solution adaptation has many techniques range from manual to generative (replays the method of deriving the retrieved solution on the new problem)
. Adaptation knowledge may be in the form of rules that are not fully compatible with ontological CBR.In order to determine which rules must be included in the system, and a deep analysis of the domain is required.Unfortunately, CBR is often applied to domains poorly understood or difficult to codify in the form of rules.So the leaders in the field have sometimes argued for postponing or avoiding the automatic adaptation.One challenge is how to auto learn the adaptation knowledge by discovering the semantic relationship between case description features (concepts) and formulate semantic rules in the same ontology to guide case adaptation (semi-) automatically.In addition, how to represent these rules in formats compatible with ontological case bases is another challenge.The best way to combine case adaptation rules in CBR system is by using ontology itself or using rule format designed for ontology as SWRL.SWRL is designed to add rule logic into OWL ontologies.Only some of the systems develop automatic adaptation strategies whereas the majority of the systems/projects provide for manual/conventional adaptation [89,90,91].Ontology can provide more intelligence in case adaptation algorithms [73,64].
2) Ontology engineering is critical in ontological CBR.Invention of a suitable ontology construction methodology for CBR case base and domain ontology in connection with patient medical record is a critical research area.It will enhance the integration of Case-Based and Ontology-Based Reasoning [92], and the discovered case base structure will require new indexing, semantic retrieval algorithm and similarity metrics.Until now, there is no ontology engineering methodology specific for CBR in the medical domain.This model will be different from existing ones because of the complexity and richness of medical domain: the existing standard terminologies as UMLS, standard ontologies as Disease Ontology, upper ontologies (i.e.Basic Formal Ontology (BFO), DOLCE, General Formal Ontology (GFO), and Unified Foundation Ontology (UFO)), vagueness in data, integration with EHR, etc.
3) Data pre-processing steps are critical to prepare medical data to form case bases because medical data are incomplete, inconsistent, vague, and detailed in most cases.It includes data aggregation, summarization, normalization, fuzzification, coding, integration, cleaning, etc. AI and data mining techniques help in this filed [93,94].The selection, mining and extraction of relevant features for case representation and weights for these features are open problems [95,96].The problem becoming complicated in the recent medical CBR systems due to a complex data format where the data are coming from sensors, images, time series or free-text format.The solution ranges from automatic one as genetic algorithms, or done manually by domain expert.The weights may be static for all situations or dynamic according to the context of execution.Adding default knowledge for describing classes is critical, and this allows reasoners to perform default reasoning with defaults added to class descriptions [17].
4) Cases are represented using simple or concept attributes.For medical domains, other multimedia attributes as images could add more semantic.
5) Reasoning with incomplete, inconsistent, vague and/or inaccurate data is expected in medical domain.Soft computing can enhance the functionality of CBR system [97].For example, the use of fuzzy sets allows a flexible encoding of case characteristics as linguistic terms [98].Cases are stored in fuzzy database or fuzzy ontology.During retrieval, the fuzzy similarity of a case can be calculated using a fuzzy membership function and weighted fuzzy pattern matching.This similarity can enhance the semantic similarity achieved by using ontology.All numeric parameters of the CBR system (e.g., feature weights, value of k in k-NN, shape of fuzzy similarity membership functions) can be maintained using a genetic algorithm and ANN.Inductive methods can be used to cluster case bases and find representative and redundant cases, which can be used to direct case base maintenance.Moreover, query creation connected with patient record that contains all patient medical data and connected with rule-base background knowledge will enhance new case creation or enrichment.
6) In medical domain, the domain ontology can benefit from and reuse existing standard ontologies as SNOMED, UMLS, ICD, etc. [89].These ontologies provide standardized terminology to represent findings, diseases, procedures, medications, sites, and organisms.Without these deep domain ontologies, CBR systems would not have been able to perform acceptable clinical assistance.However, coding of EHR data and extraction of reference set from these large ontologies is a big challenge.The open question is how to use these ontology to achieve semantic interoperability between EHR systems [89,99], and ease case collection from distributed databases.
7) Temporal data representation in domain ontology, case base, and the query is critical especially in ICU and chronic diseases patients where temporal and continuous evaluation is essential.Temporal data is represented in case, and handled in case retrieval algorithm [100,101,102].Time representation in case ontology is standardized in OWL ontology [103] and requires temporal similarity functions for effective retrieval [104,105].However, the application of temporal CBR requires more research [106].Moreover, handling uncertainty in temporal data is critical especially for medical data [107,108].
8) Distributed CBR on the web is critical to share, integrate and distribute knowledge.It will be advantageous to develop CBR systems as Web services, to receive patient input data from the Internet, securely, to process them against several CBR systems, combine with non-CBR systems, and give back a consolidated result from several sources.9) In medical domain, there are two types of knowledge.(a) The general knowledge including domain ontologies, standardized terminologies (i.e.SNOMED CT, ICD and www.ijacsa.thesai.orgUMLS) and CPG.Domain ontology provides ground service to specify the meaning of the terms used in case description.The challenges in this point is the encoding of EHR data using a selected ontology, the creation of suitable subset of this ontology for your domain, and the creation of efficient semantic case retrieval algorithm [109].CPGs can be represented in the form of rules [110].In [111] clinical pathways are represented in prototypical cases.(b) The experience knowledge that are represented in cases.CPGs can enhance the reasoning process of CBR because these rules can be represented in the form of ontology (using SWRL) and enrich knowledge in case bases and domain ontologies.
10) The number of initial cases in case base affects the efficiency of CBR system.The creation of ontology engineering methodology, to extract cases as ontology instances from EHR, is critical issue.In other words, the casebase ontology population by cases from EHR raw or prepared data.The cases must have a standard structure that may utilize HL7 RIM data model, and standard content that utilize standard terminologies.When CBR systems are able to take advantage of patients' representations in electronic health records, they will become applicable to a wide range of diseases.
11) Heterogeneous case base contains cases with different structures or with different number and types of attributes.This case requires enhanced case retrieval algorithms [68].Ontology enhance the creation of dynamic structure case base very much [66].
12) Defining a Medical Context Ontology for the domain explicitly species a set of medical context, which are used for retrieving only cases highly relevant to the new case [112].A context can be defined as a set of attributes relevant for a given retrieval that is a set of constraints on the patient clinical state.
13) No researches have been done in the establishment of semantic relations between case problem attributes, between case solution attributes and between the two.These relationships have benefits in query answering, complex case decomposition, case enrichment, case adaptation, etc.
14) Because of space restrictions, the paper will not discuss the challenges of soft CBR including the integration with fuzzy logic, statistics, neural networks, and data mining and how these technologies can enhance the functionality of CBR systems.

VIII. CONCLUSION
This paper has reviewed the CBR case representation formalisms.They can be divided into two categories, traditional and ontological methods.The traditional methods have many limitations such as the case features have no relations to each other and users have to express their queries for new cases exactly as represented in case base.The similarity and retrieval of cases is static and based on exact matching.There are no inference mechanisms in the case base.On the other hand, integrating ontologies as domain terminologies with traditional case representation methods can enhance the sharing and querying capabilities.The optimum solution is achieved by using ontologies in representation of cases and domain knowledge.This action creates what is named knowledge intensive CBR.Sharing, semantic retrieval, case representation issues are achieved.This paper has also conducted a comparison between ontological CBR methods, and it has concluded that JCOLIBRI is the best approach.The paper has discussed the semantic retrieval in case based reasoning and suggest the challenges for the future research in ontological CBR.As a result CBR could be a valid approach for building CDSS, but more investigations are needed.As future works, we will study the case retrieval algorithms, the soft-CBR techniques, the integration between CBR and EHR environment.We will study how the results of this paper can be extended for other new systems or new metrics.
[11]: (a) RETRIEVE the most similar case(s).(b) REUSE the case(s) to attempt to solve the current problem.(c) REVISE the proposed solution if necessary.(d) RETAIN the new solution as a part of a new case.

Fig. 3 .
Fig. 3.The structure of categories, features and exemplars

Fig. 9 .
Fig. 9. Part of the case model This model solves the knowledge-acquisition-bottleneck problem that faced all previous ontological methods.It is based on knowledge acquisition from a library of applicationindependent ontologies and the use of CBROnto, ontology with the common CBR terminology that guides case representation and allows the description of flexible, generic and reusable CBR Problem Solving Methods (PSM).

Fig. 12 .
Fig. 12. Fragment of the CBROnto hierarchies  Related with the tasks and methods hierarchies. Related with the definition of the case structures; and related with different knowledge roles used in the PSMs; and terms used to organize and classify the domain knowledge.

TABLE I .
COMPARISON BETWEEN CBR METHODS, √=SUPPORT, ×=NOT SUPPORT