xMatcher: Matching Extensible Markup Language Schemas using Semantic-based Techniques

Schema matching is a critical step in data integration systems. Most recent schema matching systems require a manual double-check of the matching results to add missed matches and remove incorrect matches. Manual correction is labor-intensive and time-consuming, however without it the results accuracy is significantly lower. In this paper, we present xMatcher, an approach to automatically match XML schemas. Given two schemas S1 and S2, xMatcher identifies semantically similar schema elements between S1 and S2. To obtain correct matches, xMatcher first transforms S1 and S2 into sets of words; then, it uses a context-based measure to identify the meanings of words in their contexts; next, it captures semantic relatedness between sets of words in different schemas; finally, it uses WordNet information to calculate the similarity values between semantically related sets and matches the pairs of sets whose similarity values are greater than or equal to 0.8. The results show that xMatcher provides superior matching accuracy compared to the state of the art matching systems. Overall, our proposal can be a stepping stone towards decreasing human assistance and overcoming the weaknesses of current matching initiatives in terms of matching accuracy. Keywords—Schema matching; matching accuracy; semantic similarity; semantic relatedness; WordNet


A. Motivation and Background
Schema matching aims at identifying semantic correspondences called matches [1], [2] in multiple schemas. It is critical for applications that manipulate data across different data sources because -if done correctly -it gives the end user a unified view over sources. We use an example to illustrate the schema matching problem. Let S 1 (Listing 1) and S 2 (Listing 2) be two XML schemas describing academic conferences. Our goal is to identify the matches in Fig. 1.
Although it is often desirable to define manually an integrated schema that represents all sources, this is often impossible for two main reasons: (1) the huge number of sources; and (2) the continuous updates. Thus, plenty of automatic schema matching systems have been developed (we refer the reader to [3], [4], [5], [6] for recent surveys and some state of the art matching systems). However, the term automatic is quite relative because even when humans do not help during the matching process, they help at the end correcting the results: adding missed matches and removing erroneous matches. Therefore, improving the accuracy of the output matches can significantly reduce humans' workload, and avoid possible mistakes humans might make. Also, it can save a considerable amount of time by leaving merely few results to correct. Furthermore, the state of the art schema matching systems often reach a very moderate (sometimes poor) matching accuracy [2], and require loads of manual assistance to help correct the matching results [2]. In this paper, we will introduce a new schema matching system that will overcome these limitations as it is designed to achieve a high matching accuracy without any human assistance.

B. Challenges
Valuable as it is, producing high accuracy matches is also very difficult. First, schemas often use different naming conventions, e.g. conference name (see Listing 1) and name (see Listing 2), or totally different words, e.g. publication (see Listing 1) and paper (see Listing 2). Second, schema elements are not fully independent from each other. For example, nested elements in XML schemas. Third, a word can have multiple meanings. Finally, given a word W , WordNet hierarchy [7] connects W to other words through a wide variety of relations (e.g. hypernyms, hyponyms, meronyms); contributing unevenly to the definition of W . For example, according to WordNet the word conference has a direct hypernym (meeting) and five direct hyponyms (symposium, seminar, colloquium, Potsdam conference, and Yalta conference), both combined provide a comprehensive definition than one of them combined with conference's meronym (conferee).

C. Contributions
In this paper, we introduce xMatcher, an approach to automatically match XML schemas. The key idea of xMatcher is to match XML schemas based on their semantics and with the objective of obtaining high accuracy matches, which reduces considerably humans' workload and offers a reliable and unified view over a large number of data sources. In particular, we make the following contributions: • We propose a context-based measure to determine the meanings of words according to their contexts. • We propose an automatic strategy to capture semantic relatedness between sets of words in different schemas. • We present a semantic similarity measure over WordNet to calculate the semantic similarity between semantically related sets of words. • We evaluate our similarity measure on a popular dataset and show that it provides correct results and surpasses the state of the art semantic measures and distances. • We evaluate xMatcher on different real-world domains and show that it produces high accuracy matches and outperforms the state of the art systems in terms of matching accuracy.
The rest of this paper is organized as follows. Section II first reviews the state of the art schema and ontology matching systems, then it presents the state of the art similarity measures. Section III defines the problem of schema matching. Section IV describes xMatcher. Section V evaluates both our similarity measure and xMatcher in terms of matching accuracy. Section VI concludes this paper and discusses future work.

A. Schema and Ontology Matching Systems
Although it is not in its infancy, schema and ontology matching still an active research area. Indeed, the number of approaches available for schema and ontology matching increases continuously (we refer the reader to [3], [4], [5], [6], [8] for recent surveys and some existing matching systems). Also, the number of matching systems participating in the Ontology Alignment Evaluation Initiative (OAEI 1 ) is increasing significantly. Before we proceed with the description of our new matching system xMatcher, we first review the state of the art matching systems that use WordNet as the matching space (e.g. ALIN [9]), and the top matching systems that participated in the 2018 edition of OAEI (e.g. Holontology [10], DOME [11], ALOD2Vec [12], and AgreementMakerLight [13]).
Holontology [10] is a modular holistic ontology matching system based on the Linear Program for Holistic Ontology Matching (LPHOM) system. It uses a combination of several similarity measures: Levenstein, Jaccard, and Lin to match two ontologies or multiple ontologies at once after it converts them into an internal predefined format. Then, Holontology transforms the results into alignments exported by RDF.
ALIN [9] is an interactive ontology matching system which takes as input two ontologies and deliver as output a set of alignments between them. It proceeds in two major steps. (1) It generates the initial mappings. (2) It waits for the human expert feedback and changes the mappings accordingly in order to improve the accuracy of the final results. This step is repeated until the human expert has no more mapping suggestions. DOME (Deep Ontology MatchEr) [11] is a scalable matcher which uses doc2vec and exploits large texts that describe the concepts of the ontologies. To deal with the main issue of matching similar large texts, DOME uses topic modelling such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA).
ALOD2Vec [12] uses as external background knowledge source the WebIsALOD database of hypernym relations extracted from the Web. It also exploits element-based information and label-based information. In order to determine the similarity score between nodes of the knowledge graph (WebIsALOD is viewed as a knowledge graph), ALOD2Vec applies RDF2Vec.
AgreementMakerLight (AML) [13] is an ontology matching system which derives from AgreementMaker [14]. AML consists of two main modules: the ontology loading module and the ontology matching module. The ontology loading module loads the ontology files along with the external resources and then generates the ontology objects. The ontology matching module main goal is to align the ontology objects generated. The ontology loading module is extensible as it allows the virtual integration of new matching algorithms.
The matching systems presented above achieve acceptable results. The goal of this paper is to surpass the aforementioned systems in terms of Precision, Recall, Overall, and F-Measure (we refer the reader to subsection V-A for a definition of these quality metrics).
B. Semantic Similarity 1) Similarity Measures and Distances: One of the many possible approaches to discover matches is to compute the semantic similarity values between schema elements which is the approach we adopted for our matching system. Semantic similarity measures are one of the biggest pressing challenges facing the improvement of schema matching. According to [15], [16], [17], [18], [19], [20], [21], [22], semantic similarity measures are grouped into four categories: edge-based measures, information content-based measures, feature-based measures, and hybrid-based measures.
• Edge-based measures (also known as path-based measures). They determine the similarity between two concepts by considering both the length of the path that links the concepts in the taxonomy and the position of the concepts in the taxonomy [15], [16], [18], [20]. Examples include the shortest path-based measure [15]. • Information content-based measures. The main idea of these measures is that the more information two concepts have in common, the more semantically similar the concepts are [15], [23]. Examples include Resnik [24], Jiang & Conrath [25], Lin [26], and Nababteh [27]. • Feature-based measures. They use the properties of the concepts in a way that the more common features two concepts have and the less non-common features they have, the more semantically similar the two concepts are [15], [16], [18], [20], e.g. Tversky [28]. • Hybrid-based measures. They combine all the three aforementioned categories [16], [18], [20]. Zhou's measure is an example of hybrid-based measures [29].
But since the information content-based measures perform better than other categories (information content-based measures have the highest correlation coefficients when compared to the matching results provided by human experts) [15], we decided to direct our attention to the aforementioned information content-based measures that we will compare later to our semantic similarity measure. WordNet [7] is a lexical database for the English language created by a research team at Princeton University. It groups words into sets of synonyms called synsets, which are interlinked by means of semantic relationships, for instance, is-a relationship which connects a hyponym to a hypernym. And it is commonly used by semantic similarity measures. Indeed, the following measures all use WordNet as an external resource.
Resnik's measure [24] computes the Information Content (IC) of the Least Common Subsumer (LCS) of two concepts denoted by a and b as follows: Where: • Given a concept C, we have IC(C)= -log(p(C)).
• p(C) = f requency(C) N refers to the probability of C. • N refers to the total number of nouns.
The main issue of Resnik's measure is the following: any pair of concepts having the same LCS will definitely have the same semantic similarity value [15]. Luckily, Jiang & Conrath (J&C) and Lin found out a way to overcome Resnik's problem [25], [26]. In addition to the IC of the LCS, both J&C and Lin consider the IC of each concept [25], [26]. J&C define the distance between two concepts as follows [25]: It differs from similarity measures in a way that the higher it gets, the less similar the two compared concepts are. Typically, given J&C's distance, one can revert it to serve as a similarity measure and vice versa. Conversions are made using equation 3. In this paper, we are going to use the similarity measure.
Lin describes the semantic similarity between two concepts as follows [26]: The main issue with Lin's measure is the following: if the IC of LCS, a, or b is equal to 0 then the semantic similarity value is equal to 0 as well [27].
In order to deal with Lin's problem, Nababteh suggests to divide 2 times the IC of the LCS of the two compared concepts by the sum of the IC of the direct hypernym of the first concept and the IC of the direct hypernym of the second concept [27].
For the time being, the aforementioned semantic similarity measures are quite successful, they remain, however, some issues that require more attention. Indeed, according to [24], [25], [26], [27], the aforementioned measures might not provide the correct results all the time since when compared to the reference similarity values on Miller and Charles' (M&C) benchmark dataset the results were not promising.
2) Schema-Based Information and Instance-based Information: The Rivalry to Dominate Schema Matching: One of the most important choices that impacts the accuracy of the results returned by a similarity measure used by a schema matching system is the information used to find out semantic correspondences between schemas. Besides the external resources, a similarity measure may utilize either schema-based information, instance-based information, or both. In Table I, we present the advantages and disadvantages of each approach.

Advantages Disadvantages
Schema-based approach -It uses the properties of the schema elements (e.g. labels, data types, integrity constraints).
-Easy to implement.
-They are fast.
-It does not produce good results when the properties of the schema elements are not available.
Instance-based approach -It exploits the data stored at a given time which provides more details about the schema elements and hence improves the accuracy of the final results.
-Unavailable data may cause the matching system to stop functioning properly and exit.
-Incorrect data may lead to false matches or miss true matches.
-They operate slowly.
-More complicated to implement than schema-based approaches.
Based on the information presented in Table I, we decided to use schema-based information to define our solution.

III. PROBLEM STATEMENT
In this section, we present definitions related to the schema matching problem. In this paper, we consider only XML schemas and leave other data representations for future work. Definition 1 (Entity). Let S be an XML schema. An entity e is used interchangeably to refer to a complex type element, a simple type element, or an attribute. Definition 2 (Set of Words). Let S be an XML schema and n be the number of entities (e 1 , e 2 , . . . , e n ) it contains. Given an entity e 1 ∈ S, the set of words generated from e 1 is defined Remark: All the sets of words generated from S are defined as follows SET S = {set e1 , set e2 , . . . , set en }. Definition 3 (Semantic Relatedness). Let S 1 and S 2 be two schemas, and SET S 1 and SET S 2 be their respective sets of words. set 1 ∈ SET S 1 and set 2 ∈ SET S 2 are semantically related if they can be used together in the same schema. For example {conference, paper, title} and {conference, paper, author} from Listing 2 are semantically related. Definition 4 (Semantic Similarity). Let S 1 and S 2 be two schemas, and SET S 1 and SET S 2 be their respective sets of words. set 1 ∈ SET S 1 and set 2 ∈ SET S 2 are semantically similar if they share the same meaning. Also, semantically similar sets cannot be used together in the same schema. For example {conference, publication, title} in Listing 1 and {conference, paper, title} in Listing 2 are semantically similar. Remark: Let S 1 and S 2 be two schemas, and SET S 1 and SET S 2 be their respective sets of words. If set 1 ∈ SET S 1 and set 2 ∈ SET S 2 are semantically similar then they are semantically related as well, e.g. {conference, publication, title} and {conference, paper, title}. However set 1 ∈ SET S 1 and set 2 ∈ SET S 2 are semantically related does not necessarily imply that they are similar, e.g. {conference, paper, title} and {conference, paper, author}. Definition 5 (Problem Statement). Given n schemas S 1 , S 2 , . . . , S n . Our goal is to maximize the accuracy of the matches discovered between S 1 , S 2 , . . . , S n and minimize humans' workload traditionally used to correct the matching results.  In the next section, we introduce xMatcher the solution to the schema matching problem described in Definition 5.

IV. THE XMATCHER APPROACH
The xMatcher architecture (see Fig. 2) consists of three main modules: pre-matching, matching, and post-matching. Given two XML schemas S 1 and S 2 , the pre-matching module (µ : S 1 × S 2 → SET S 1 × SET S 2 ) uses WordNet along with a database of abbreviations and applies fuzzy string matching to generate, from each entity in S 1 and S 2 , a set of words. The matching module (φ : SET S 1 × SET S 2 → [0, 1]) then identifies semantically related sets, for which it calculates the similarity values. Finally, the post-matching module (θ : [0, 1] → M atches) matches the entities whose similarity values are greater than or equal to 0.8. It is important to note that all three modules take place prior to any user request.

A. The Pre-Matching Module
Before we proceed with the matching module, a prematching step is required since schemas use different naming conventions. The entity name might be an expression that does not belong to WordNet. Examples of such non-WordNet entities include abbreviations, concatenation of words, and words separated by underscores. Thus, we use two components, the sets generator and the entities combiner, to produce, for each entity, a set of words that help clarify its meaning. Entities combiner. Let c be a complex type element, e be a non-complex type element included in c, and set c = set c W ordN et ∪ set cexpression and set e = set e W ordN et ∪ set eexpression be their respective sets of words. We made the following observation: the more words set e contains, the more meaning e conveys. Therefore, we decided to utilize the context of e, which is the complex elements e belongs to, as follows set e ← set e ∪ set c . Algorithm 2 summarizes this.
Next, we use the sets of words to match schemas using relatedness matrices and a semantic similarity measure.

B. The Matching Module
The matching module consists of two major components: relatedness determinator and similarity calculator. The relatedness determinator uses relatedness matrices to capture www.ijacsa.thesai.org for each abbr ∈ DB abbr in e do 6: Substitute abbr for its full expression 7: Add its full expression to set e 8: end for 5: end for 6: return SET S semantic relatedness between different sets of words. Then, the similarity calculator exploits WordNet hierarchy to calculate the similarity between every semantically related sets.
1) Generating relatedness matrices: Prior to computing the semantic similarity values between different sets of words, we first must identify semantically related sets. This is very important for two main reasons. First, it narrows down the total number of computations, since we will only calculate the semantic similarity values between related sets. Second, let e 1 ∈ S 1 and e 2 ∈ S 2 be two entities, and set e1 and set e2 be their respective sets of words. Let's suppose that both e 1 and e 2 are not contained in any complex type element. Missing contexts implies that set e1 and set e2 convey poor meanings. Thus, identifying whether they are semantically related or not will help improve considerably their meanings. To this end, the relatedness determinator proceeds in two steps (Algorithm 3 summarizes this). First, it uses equation (6) to determine the meaning of a word according to the other words in the same set. Second, it employs fuzzy string matching and words synonyms available in WordNet to identify semantically related sets. In the following, we explain these steps in more details.
Step 1: Identifying meanings of words. Let e be an entity and set e be its set of words. Given that a word W ∈ set e may have more than one meaning, we use set e \ W to identify the meanings of W . Where: • s i and s j,k are the i th sense of W (meaning of W in WordNet) and the k th sense of the j th word in set e \ W , respectively. • n and n j are the total number of senses of W and the total number of senses of the j th word in set e \W , respectively. • relatedness returns the number of overlapping phrases or words between s i and s j,k .
Step 2: Identifying semantically related sets of words. Let e 1 ∈ S 1 and e 2 ∈ S 2 be two entities and set e1 = {W 1,1 , W 1,2 , . . . , W 1,card(sete 1 ) } ∈ SET S 1 and set e2 = {W 2,1 , W 2,2 , . . . , W 2,card(sete 2 ) } ∈ SET S 2 be their respective sets of words. We use fuzzy string matching to determine the words contained in both set e1 and set e2 . We display the results in a relatedness matrix F = (f i,j ) 1≤i≤card(sete 1 ) 1≤j≤card(sete 2 ) (see Table III) whose individual items are defined as follows: where o 1i,j is equal to 1 if W 1j or one of its synonyms and W 2i or one of its synonyms appear together in set e1 , and 0 otherwise. Similarly, o 2i,j is equal to 1 if W 1j or one of its synonyms and W 2i or one of its synonyms appear together in set e2 , and 0 otherwise.
We generated relatedness matrices for different real-world schemas (Airfare, Automobiles, Books, Car Rentals, Hotels, Jobs, Movies, and Music Records) extracted from the Web interfaces in the TEL dataset of the UIUC Web Integration Repository 2 . We noticed that semantically related sets (provided manually) are assigned matrices that contain more ones than zeros. Thus, we made the following conclusion: we say that two sets are semantically related if and only if the occurrence of 1 in F is greater than the occurrence of 0.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 8, 2020 Algorithm 3 RelatednessDeterminator(SET S 1 , SET S 2 ) Input: SET S 1 , SET S 2 Output: SET S 1: for each W in set 1 ∈ SET S 1 do 2: Identify the meaning of W using equation (6) /*Similarly, we identify the meanings of words in SET S 2 */ 3: end for 4: for each set 1 in SET S 1 do 5: for each set 2 in SET S 2 do 6: Determine semantically related sets based on their relatedness matrix 7: Add semantically related sets to SET S 8: end for 9: end for 10: return SET S Next, we calculate the similarity between semantically related sets of words.

2) Calculating similarity values between entities:
The similarity calculator operates in two steps (see Algorithm 4). First, it calculates the similarity between words. Then, it uses the results to calculate the similarity between sets of words. for each set 2 in SET S 2 do 3: Calculate the similarity v between set 1 and set 2 using equation (17) 4: end for 6: end for 7: return V Step 1: Calculating the semantic similarity between words. Given a word W ∈ W ordN et, we noticed that both its hypernyms and its direct hyponyms can be used together to define it. Hence, we decided to utilize this information to determine how similar two words are. Given two words a, b ∈ W ordN et, comparing a to b is equivalent to comparing {a, P a , H a } to {b, P b , H b }. Thus, the similarity calculator calculates the similarity between a and b (7), a and P b (8), a and H b (9), P a and b (10), P a and P b (11), P a and H b (12), H a and b (13), H a and P b (14), and H a and H b (15). P a and P b refer to the hypernyms of a and b, respectively. H a and H b refer to the direct hyponyms of a and b, respectively. Note that we consider only non-shared hypernyms hence P a ∩ P b = φ.
Where s a refers to the sense of a and Sy a refers to the synset (set of synonyms) of a.
We applied our measure (16) on M&C benchmark dataset several times, each time with a different combination of SM 1≤i≤9 (Given two parameters α, β ). We then calculated, for each combination, the correlation coefficients between the reference results in M&C's experiment [24] and our similarity values. The process of selecting the most promising combination was based on the correlation r: eliminating combinations with weak correlation (|r|< 0.5), and keeping combinations with strong correlation (0.5 ≤ |r|≤ 1).
SM i is the combination we decided to keep because its correlation was the highest almost every time (in the range of 0.88 − 1). This is due to the fact that given a and b are semantically similar, they satisfy that similar relations (a with b (SM 1 ), hypernyms of a with hypernyms of b (SM 5 ), and hyponyms of a with hyponyms of b (SM 9 )) are more likely to be similar than different relations (a with hypernyms of b (SM 2 ), a with hyponyms of b (SM 3 ), hypernyms of a with b (SM 4 ), hypernyms of a with hyponyms of b (SM 6 ), hyponyms of a with b (SM 7 ), and hyponyms of a with hypernyms of b (SM 8 )). Thus, the similarity value between a and b is calculated as follows: words (a, b) = 1, if a and b are synonyms or one of them is a direct hyponym of the other Step 2: Calculating the semantic similarity between sets of words. The similarity calculator uses the similarity measure between words (16) to compute the similarity between sets of words. Given two entities e 1 ∈ S 1 and e 2 ∈ S 2 . Let set e1 = {W 1,1 , W 1,2 , . . . , W 1,card(sete 1 ) } and set e2 = {W 2,1 , W 2,2 , . . . , W 2,card(sete 2 ) } be their respective sets of words. The similarity calculator uses equation (17) to calculate the similarity between set e1 and set e2 .
Next, we define the matches based on the similarity values.

C. The Post-matching Module
We applied our similarity measure (17) on the semantically related sets of words from the TEL schemas. The results formed a set of similarity values, each represents the similarity between two sets. The process of selecting the threshold value was based on reference matches we defined manually in order to identify the range of similarity values generated for semantically similar sets. We noticed that most matching sets have a similarity value greater than or equal to 0.8. Hence, we defined the threshold value 0.8 beyond which the pair of entities must be matched.
The post-matching module consists mainly of one major component, namely the matches generator, which uses the threshold value to eliminate entity pairs with very low similarity values, and match only pairs with high similarity values (≥ 0.8). Algorithm 5 summarizes this.

A. Experimental Setup
Datasets: First, we experimented our measure on M&C dataset [24], which contains thirty word pairs (see Table IV). We then experimented xMatcher over the Conference Track used in OAEI 2018 and available on the Web 3 . The Conference Track involves 16 ontologies describing the domain of organizing academic conferences. It has been used by the research community for over 13 years. It has 21 reference alignments composed from 7 out of 16 real domain ontologies.
Implementation: In addition to our measure, we implemented four measures and distances Resnik, J&C, Lin, and Nababteh over WordNet. Then, we implemented xMatcher. Finally, since xMatcher was initially developed to take as input XML schemas and since the Conference Track includes ontologies, we implemented the converting process presented in [30] to transform ontologies into XML schemas.
Measures: For semantic similarity values (produced by all five measures), we used the correlation coefficient and Mean Square Error (MSE) to compare the returned results with the reference results [24]. The correlation coefficient measures how strong the relationship is between the returned values and the reference results. MSE measures the average of the squares of the errors between the returned values and the reference results. The lower the MSE is, the better.
While xMatcher matches both classes and properties, Lily and ALIN match only classes the reason why they failed to produce high accuracy matches with ra1-M2, ra2-M2, and rar2-M2; SANOM, AML, LogMap, and XMap match some but not all properties which explain their negative Overall with ra1-M2, ra2-M2, and rar2-M2; and KEPLER, DOME, Holontology, FCAMapX, LogMapLt, and ALOD2Vec match very few properties which justify their negative Overall and low Precision, Recall, and F-Measure with ra1-M2, ra2-M2, and rar2-M2. We can conclude that (1) SANOM, AML, LogMap, XMap, KEPLER, ALIN, DOME, Holontology, FCAMapX, LogMapLt, ALOD2Vec, and Lily work well with the reference alignments that consider classes or both classes and properties. However, they fail to match correctly with the reference alignments that consider only properties; and (2) xMatcher succeeds to achieve superior accuracy matches regardless of the reference alignment it is compared to.
Overall, xMatcher obtained the highest accuracy matches (see Fig. 4.j which displays the average matching accuracy): P recision = 0.89 suggests that most matches are correct; To prove scalability of xMatcher (note that due to space limitation, we do not display the results in figures in this paper), we applied xMatcher on more datasets, for instance the TEL (Travel, Entertainment and Living) datasets which contain five different datasets that are publicly available on the Web. The Travel group includes two various domains: Car Rentals and Airfare; the Entertainment group contains two different domains as well: Movies and Books; and, the Living group involves mainly one single domain: Jobs. The results show once again the capability of xMatcher to reach a high matching accuracy, which proves that xMatcher is scalable.

VI. CONCLUSION
We have demonstrated that the use of WordNet combined with our semantic similarity measure is an effective way to capture semantic correspondences in XML schemas. Current matching systems are error-prone and human-dependent. Thus, we have developed xMatcher, an approach to automatically match XML schemas and provide accurate matches.
Given two XML schemas S 1 and S 2 , our main idea is to first generate sets of words from S 1 and S 2 , then determine semantically related sets, and finally identify semantic correspondences between related sets. We evaluated xMatcher over the Conference Track. The results show that xMatcher achieves better accuracy than twelve state of the art matching systems. Future research includes the following: • Improving the accuracy of the matches. An interesting direction is to achieve better correlation, MSE, Precision, Recall, Overall, and F-Measure. • Considering other matching quality factors. In this paper, we focused on achieving high matching accuracy. A future direction is to propose techniques that consider other quality factors. • Matching other data representations. xMatcher takes as input XML schemas. An interesting direction is to match different data representations.