An Ontology Driven ESCO LOD Quality Enhancement

The labor market is a system that is complex and difficult to manage. To overcome this challenge, the European Union has launched the ESCO project which is a language that aims to describe this labor market. In order to support the spread of this project, its dataset was presented as linked open data (LOD). Since LOD is usable and reusable, a set of conditions have to be met. First, LOD must be feasible and high quality. In addition, it must provide the user with the right answers, and it has to be built according to a clear and correct structure. This study investigates the LOD of ESCO, focusing on data quality and data structure. The former is evaluated through applying a set of SPARQL queries. This provides solutions to improve its quality via a set of rules built in first order logic. This process was conducted based on a new proposed ESCO ontology. Keywords—ESCO; linked open data; ontology; semantic web; data quality; SPARQL; OWL; metadata


I. INTRODUCTION
Labor market governance is one of Europe's top priorities. Market governance is an important challenge because the job market is a complex network involving many diverse actors. Therefore, the European Commission has proposed European Skills, Competences, Qualifications and Occupations (ESCO) 1 (the multilingual European Skills, Competences, Qualifications, and Occupations classification) as a standard language of work. To enhance its use and reuse, ESCO has published its dataset as Linked Open Data (LOD). Meanwhile, some intelligent services have been provided by the use of LOD like entity search, personalized recommendation and so on [1] [2]; Furthermore, the ability to add a language tag to different labels [3], which belongs to one Universal Resource Identifier (URI), enables the use of this system in different countries. For instance, the financial crisis of 2007-2008 increased the rate of unemployment in Europe, especially in Spain where youth unemployment exceeded 50 percent [4]. At the same time in some economic sectors such as engineering and healthcare, companies were not able to find the workforce they need [5]. The EU seeks to reduce this problem by achieving two objectives: 1) helping the jobseekers find a suitable job in another European country, and 2) enabling people to refocus on their careers with a future outlook [6]. Based on this, ESCO was born to help someone who studied in Germany, and lived in Greece to work in Italy by the linked open data that achieve semantic interoperability throughout Europe. Nevertheless, data diffusion is not the only priority to have a good knowledge system on the labor market also data 1 https://ec.europa.eu/esco/portal quality has to be assessed. Data quality has always been the focus of researchers' attention for the many challenges it faces [7] [8]. Several methodologies have been developed to enhance as well as to assess data quality [9]. For these reasons, any Linked Open Data (LOD) has to consider these aspects before being published.
In order to solve these issues, this study seeks to make the ESCO LOD more structured and more accurate in providing search results. Section 2 in this article addresses the concept of data quality, data quality dimensions and the related methods of evaluation. Section 3 explains the ESCO structure in details. Section 4 provides a proposal to redesign the structure of ESCO ontology. Section 5 evaluates the new ontology.

II. LINKED OPEN DATA AND DATA QUALITY
The LOD has been considered as the cornerstone of the semantic web vision and as windows through which data is published in the web. Nowadays there are millions of LOD published in the web [10] at different quality. The data quality is defined as the ability to use and reuse data in a particular application or use case [11]. Data with quality problems might be useful in some cases as long as the quality is within the required range [12]. Nevertheless, it has many challenges. In particular, as explained in [13], the data is published by different providers so that a question of data confidence might be raised. Second, data increases rapidly, making its quality difficult to assess. Third, the level of data quality has been determined from the point of view of the system provider. In fact, when LOD is reused for a different purpose to the initial intention of the provider, certain difficulties are encountered due to the issue of data quality required for the new objective. Data quality has multiple dimensions [14]. In addition, these dimensions range from accessibility to completeness through comprehension. The quality dimensions pose certain challenges [15] such as: a) the issues that the quality of information is dependent solely on the data provider, b) the rapid increase of amount of data makes it more difficult to assess its quality, c) the preparation of the linked open data to be able to reused by third party in a way not expected by the provider, d) the linked open data is a dynamic environment, which requires up-to-date changes to reflect the real world.
Although data quality cannot be assessed with an absolute measurement, LOD can be considered as a useful tool to determine its fitness for reuse. www.ijacsa.thesai.org Multiple methodologies have been developed to improve the quality of linked open data; such as: using the statistical distributions to increase the quality of incomplete and noisy Linked Data sets [16]. The authors proposed a method to demonstrate the understandability problems of Resource Description Framework (RDF) data by using the different technologies provided by the semantic web.
The assessment of quality for LOD can be divided into three categories: automated [17], semiautomatic [18], and manual [19]. This article adopts the methodology used in "Test-driven Evaluation of Linked Data Quality" [20] to assess the quality LOD. The method defines some query based text cases implemented with the use of SPARQL (query language for RDF) query templates.
This article focuses on the case of the LOD of the project European Skills, Competences, Qualifications and Occupations (ESCO).

III. ESCO LOD ASSESSMENT
ESCO has published its ontology and its LOD. In Fig. 1, the ESCO ontology 2 is depicted while Fig. 2, exhibits the class structure of ESCO LOD that is represented by stardog server 3 .
LOD is the new opportunity for sharing and reusing; meanwhile, the ontology forms the main joint of this LOD that weaves the data together [21]. In contrast, comparing these two structures ESCO ontology and ESCO LOD identifies some questions. In order to assess the capability of the current ESCO ontology to being exploited of retrieve valuable information from the related LOD, according to [20] we identified a number of assessment queries.

A. Resource Description Framework Schema and Web Ontology Language Metadata in ESCO LOD are Missing
It can be argued that the concepts of class, subclass, data property, object property, and individual lacks a clear definition. Binding between two resources to indicate that the first resource is sub concept of the other depends on the two properties of Simple Knowledge Organization System (SKOS) "broader and narrower". Meanwhile, one of these resources or both can be part of a classification. However, this way does not differentiate between a concept that represents a certain level of classification and the individuals contained in this level. To acquire all the skills connected with an occupation is a straightforward task: returns not only skills which are not connected with an occupation, but also all the resources which represent the hierarchical structure of the skill concept.
The benefit of using this metadata is that it facilitates the reuse [22] and supports reasoning in all profiles of Web Ontology Language (OWL). Moreover, since query answering is reduced to OWL-QL query answering, this allows queries to be run over large ontologies [23].

B. Label and Description
The label properties altLabel, hiddenLabel and preflabel, are used to provide a label to a resource. Each property has two namespaces: the first is SKOS, which links the resource to the literal object; the second one is the extension Simple Knowledge Organization (SKOS-XL), which links the resource with one or more resource type SKOS-XL:Label which in turn has a "literalForm" feature with the same role of SKOS's previous property. However, if the resource contains more than one resource from SKOS-XL:Label, each one belongs to label written in a specific language. Additionally, the definition and the description are properties that provide a description to a re-source where the definition property is used only 54 time concurrently with description property. Each resource is collected with one or more resource which in turn has property "nodeLiteral" containing a literal object that includes the description with a "language" property that indicates the language used to write the description. In case the resource is collected with one resource then the description is written in English. However, if the resource is collected by more than one resource, each one belongs to the description written in a specific language. Consequently, the dataset of ESCO include duplicate information. Therefore, data exploration becomes more difficult and a storage space increases.

C. The Relationship between Skill and Occupation
The relationship between skills and occupation has been built by only two predicates "relatedEssentialSkill and relatedOptionalSkill". At the same time, the skills in ESCO dataset are divided into two type "skill and knowledge" by a triple that has the skill as subject, skill type as predicate and the type of the skill as an object where each Skill belongs to only one type. The SPARQL query that returns the skills and the knowledge of an occupation, it is very complicated and is written in the following format: } This complexity in query formulation consequent to triples diversity causes slow execution of the SPARQL query [24]. The principal impediment a user faces when trying to apply a query is that he mostly has no information about the LOD underlying structure.

D. Skill and Occupation Structure
The structure of skill and occupation has been discovered within the linked open data of ESCO by applying some query and by using the information represented in class esco:Structure.
The occupation structure consists of six levels, the first four levels are based on International Standard Classification of Occupations (ISCO), and the last two levels can be considered as instances of the fourth level. The relation between each level is managed by some predicate like skos:broader, skos:broaderTransitive and skos:narrower. The resources of ESCO classification are generated from type of skos:Concept. However, the occupations resources are generated from type skos:Concept, MemberConcept and Occupation.
On the other hand, the skills structure has nothing to do with standard classification and not tied to a consistent classification where the classification branches have different lengths. The first two levels of the classification can be considered as classes and the rest of classification levels can be considered as instances. The relation between one level and www.ijacsa.thesai.org another is managed by some predicate like skos:broader, skos:broaderTransitive and skos:narrower.
All in all, this structure only complicates the data, making it difficult for the user to understand and manipulate.

E. The type of Concept, ConceptScheme and MemberConcept
OWL ontologies and LOD are increasing; thus, the need to give more accurate descriptions of their sources is becoming more necessary [25]. When a general type of class contains sources that only belong to this class or for other classes at the same time cause difficulty to discover their roles and their relations within the linked open data by the user; for example, each resource represents a skill is from Skill, concept and MemberConcept type; instead, each resource represents a skill reuse level is from Concept type only. SKOS classes can consider them as a representative that establishes an "indirection role'' between lexical entities and ''real-world'' but not as a representative of the ''real-world'' [26].

IV. THE PROPOSED ONTOLOGICAL MODEL TO RECONSTRUCT THE ESCO LOD
Nowadays information and systems are growing more rapidly and becoming more complex. As a result, there has to be a method to generate the result of improving the information and the systems with shorter lead-times at less cost [27]. For the semantic data, this method is represented by the rules that define new concepts, relations and metadata which provide a real definition of each resource in the LOD [28] [29] [30] All the rules included in appendix "first order logic rules". Fig. 3 represents the proposed ontology for ESCO. This model was built by implementing a set of rules written in first order logic. Each set of these rules has a specific task in building the model as follows

A. Classification Building
The model consists of two classifications: one represents the occupation and the other represents the skill. In terms of occupation, the structure is divided into two parts: the first part displays the hierarchical structure represented by rules from 1 to 8, and the second part shows individuals represented by rules from 9 to 16. In terms of skill, the structure is divided into two parts: the first part presents the hierarchical structure represented by rules from 17 to 24, and the second part presents individuals represented by rules from 25 to 44.

B. Give Entities to Different Resources in LOD
The proposed model encompasses classes that did not exist in the ESCO ontology to express the nature and the entity of some the sources that were under general classes. In fact, it can only be identified by relations. The rules between 45 and 54 represent the process of creating new classes and adding individuals to each one.

C. Create the Object Properties of Proposed Ontological
Model The proposed ontological model contains new object properties that represent the relations amongst the new classes. It also contains new relations that describe the relations amongst the existing classes in ESCO ontology in a more accurate manner. Rules 55 to 82 describe the process of establishing these object properties.

D. Stay away from Duplicate Data that Achieve the Same Goal
The article demonstrates that in ESCO LOD has been used the vocabulary of SKOS and the vocabulary of SKOS-XL as noted in the paragraph 3.2. The vocabulary of SKOS-XL is used when is needed to add more information to a label or a description [31]. Nonetheless, the ESCO LOD has not added any other metadata information for this reason, the vocabulary of SKOS-XL has been excluded and only used the vocabulary of SKOS.

V. EVALUATION OF THE PROPOSED ESCO ONTOLOGY
The evaluation of the proposed ontology is based on three criteria:  The ability to know the contents of the dataset and the mechanism of linking these contents through the ontological schema.
Through the ontological scheme we can understand the following issues: the individuals of class Skill have two different natures; consequently, it can be Skill or knowledge. To be able to perform an occupation, one needs to have some essential skills and knowledge and some optional skills and knowledge. Also to be able to have a skill or a knowledge, one needs to have some essential skills and knowledge and some optional skills and knowledge.
 Preventing information duplication and reducing dataset size.
The ESCO LOD uses two ways to add the labels to a resource as we see before, in spite of the pro-posed ontology use The direct way to add the labels for a resource accordingly, it prevents the duplicate information and reduce the dataset size by more than three million and half triples.
 Easy retrieval of data through SPARQL queries.
The proposed ontology includes four object properties to connect an occupation or a skill with their essential or optional skills and knowledge. Consequently, it is easy to write a SPARQL query to know which skills or knowledge are essential and which ones are optional to perform an occupation or to obtain a new skill or knowledge. www.ijacsa.thesai.org

A. Getting all Skill which are Not Connected with an
Occupation Applying a SPARQL query to answer this ques-tion depending on a new ESCO ontology is as follows:

B. Acquiring All Skills and Knowledge of an Occupation
When we add other two properties to represent the relation between skills and occupations in the new ESCO ontology, the query will be more clear and more simple. For instance, get all the skills and knowledge for the occupation "footwear production machine operator" and "footwear designer"

VI. CONCLUSION
ESCO is one of the most important European projects aimed at modeling the labor market. Its LOD is one of the most qualified LOD for reuse. Thus, it has to be clear and as easy to use as possible. In the proposed ontological model, this study relied on a set of conditions to maintain clarity, such as:  Non-repetition data  Using OWL and RDFS to build classifications and to identify each source and whether this source represents a class, individual, object property or data property. www.ijacsa.thesai.org  Determining the dependency of each source for a specific class illustrating the nature of this source.
One of the more significant findings to emerge from this study is that the proposed ontological model could be a pillar of a new version of the ESCO LOD in the coming years since the European Union will adopt this data at the level of all member states. The current study makes several noteworthy contributions to improve the outputs of studies that aim to use ESCO LOD as a tool for search and job matching, career management, and labor market analysis.
The methods used for this study to improve the data quality and data structure of ESCO LOD may be applied to other datasets published as LOD elsewhere in the world.
The following conclusions can be drawn from the present study. The ESCO LOD could be not only one of the most important sources of information for building job applications, but also a basis for a recommendation system for building an effective training system in all member states. This is expected to yield several benefits arising from the advantages of hierarchical structure for classifications of some classes within the data. Another benefit will result from the advantages of horizontal structure arising from relationships between the classes, as well as qualifications issued by private awarding bodies.