Towards the Design of a Textile Chemical Ontology

— The main goal of this paper is to present the initial version of a Textile Chemical Ontology, to be used by textile professionals with the purpose of conceptualising and representing the banned and harmful chemical substances that are forbidden in this domain. After analysing different methodologies and determining that “Methontology” is the most appropriate for the purposes, this methodology is explored and applied to the domain. In this manner, an initial set of concepts are defined, together with their hierarchy and the relationships between them. This paper shows the benefits of using the ontology through a real use case in the context of Information Retrieval. The potentiality of the proposed ontology in this preliminary evaluation encourages extending the ontology with a higher number of concepts and relationships, and validating it within other Natural Language Processing applications.


INTRODUCTION
The information available on the Web is increasing at a fast rate.This can be viewed from a double perspective: it is positive, since users can be more and more informed, but in contrast, the negative aspect is that information can be overwhelming and users cannot manage it in an effective and efficient manner.In particular, when we deal with information of a specific domain, this problem is exacerbated.
Specifically, this occurs to the professionals of the Textile Chemical domain that constantly need to be up-to-date concerning all the directives and legislation about harmful substances applied to the Textile Chemical domain.As it was shown in [9] general purpose tools, such as Google or Yahoo search engines, are not sufficient for the type of information they need to deal with.For this reason, specialised tools and resources capable of facilitating the processing and understanding of this type of information would be crucial.In this sense, Natural Language Processing (NLP) can provide tools and resources in order to retrieve, extract, classify, summarise the information of interest in a specific domain.Moreover, when focusing in very specialised domains (e.g., medicine, chemistry, etc.), specific knowledge is also needed.
In order to be able to represent semantic knowledge, ontologies can be developed.In [1], the term ontology is described as the basic concepts and relations comprising the vocabulary of a topic area, as well as the rules for combining concepts and relations to define extensions to the vocabulary.An ontology can be also defined as "a formal and explicit specification of shared conceptualization" [12].
The difference between ontologies and taxomonies is that taxonomies are constituted by strict subclasses from the relationship "is-a" [22].Moreover, ontologies are a powerful semantic knowledge representation that have been used in a wide range of contexts.In particular, the use of ontologies have been proven successfully in several domains like health domain [20], pharmacological [22], tourism [21], agriculture [23], e-business [24], education [25], or chemistry [9,26] among others.
In particular, in the Textile Chemical domain we want to represent the knowledge about the banned and harmful substances.For this, it is necessary to develop an ontology containing these substances because textiles are subjected to strict controls according to the current directives that are constantly being updated.Additionally, to ensure the safety of the final consumer, when a laboratory detects any banned chemical substance, the product is removed from the market.
Therefore, the aim of this paper is to present the initial version of an ontology in the Textile Chemical domain.More specifically, the goal will be achieved by first gathering the expertise and knowledge of the professionals in the Textile Chemical domain, with the purpose of determining the relevant concepts and relations for developing and building the Textile Chemical Ontology.Then, with this knowledge, the ontology will be developed, further populated, and finally, it will be evaluated.
The validation of the initial version of the ontology will be carried out in the context of information retrieval, through a set of preliminary experiments with some relevant concepts and instances (without using the whole ontology).Generalpurpose search engines with and without using the ontology concepts and relations will be compared, determining whether the use of the ontology will retrieve more specific results.This paper is organised as follows.In Section II, the related work is briefly described.Section III describes the design of the Textile Chemical Ontology.Section IV presents the use case where the ontology will be validated in a preliminary manner.Finally, Section V concludes the paper, and outlines the future work for developing the ontology completely.

II. RELATED WORK
This section focuses on existing ontologies that could be related to some extent to the one proposed in this paper.The analysis of the different methodologies employed can be found in Section III.A. www.ijacsa.thesai.org To the best of our knowledge, at the present moment, it there is not any ontology for the Textile Chemical domain.Only the Project SEAMLESS 1 with TEX GLOB as a textile ontology [7] has been found.TEX GLOB ontology consists of three parts: the TEXTILE TAXONOMY, the TEXTILE VOCABULARY and TEXTILE DATA MODEL.The TEXTILE VOCABULARY, named as TEX GLOB vocabulary, is a tool where the relevant textile terms can be found with their explanation, and it indicates if a term belongs to the taxonomy or to the data model.
The terms and definitions were extracted from the CORE ontology and one part is used for the TEX GLOB data model [27].As the CORE ontology, the TEX GLOB data model is split into some parts to represent, the vocabulary and taxonomy concepts, respectively, as well as the main data types generated by the companies, namely company profiles, products/services, and business documents.One part is unchanged and taken as it is from the CORE Ontology whereas other parts are defined by importing the homologous CORE parts and then extending them with the addition of attributes and classes.
Other interesting ontologies are the chemical ontologies where CO [5] or ChEBI 2 can be found, among others.On the one hand, CO is a chemical ontology for the identification of functional groups and semantic comparison of small molecules.This is a small ontology based on the assignment of functional groups through a computerised tool called Checkmol.One advantage to note is that the terms are assigned automatically by the program and connected with a chemical structure and a definition.
On the other hand, ChEBI contains a small set of molecules belonging to chemical compounds.The concepts included in ChEBI represent natural and synthetic products that could be found in the internal process of organisms, together with the types of entities.
Moreover, it includes an ontological classification with the relationships between molecular entities and their parents and/or children [4], as well as the relations with other ontologies [2].
Moreover, interesting databases related to the Textile Chemical domain are ChemTop 3 and Chem-BLAST 4 .With ChemTop, it is possible to query about chemical and physical properties of the chemical species.Many data are collected from NIST 5 Website.In this database all the information is structured from the chemical compounds perspective.
In contrast, ChemBLAST is an incomplete database of compounds where the user can find structures in 2-D.This could be of interest only for very specific Chemistry context.
As it has been shown before, there is not any Textile Chemical Ontology available (neither a specific database) for the professionals working in this domain, so they have to For this reason, a new Textile Chemical Ontology is proposed.Its aim is to model the knowledge of harmful and banned substances in the Textile Chemical domain.Not only will be the Textile Chemical Ontology novel, since it will focus on harmful substances and components that could be used in textiles, but also it will add value for the research community, allowing to have all the related knowledge represented in an ontology, thus being useful for employing or integrating it in NLP tasks.

III. DESING OF THE TEXTILE CHEMICAL ONTOLOGY
The aim of this section is to analyse and justify the methodology used for building the Textile Chemical Ontology, as well as to describe the concepts and the types of relationships our ontology covers in its initial version.Therefore, an analysis of the existing methodologies is first provided (Section III.A), then the methodology employed for the development of our Textile Chemical Ontology is explained (Section III.B), and finally to what extent the information existing ontologies described in Section II can be reused, adapted or extended is discussed (Section III.C).

A. Analysis of Existing Methodologies
In [3] different methodologies for building ontologies are described.Some of the most popular ones include: -CyC [14] -Uschold and King [18] -Grünninger and Fox [13] -Kactus [15] -Methontology [19] -Sensus [17] -On-To-Knowledge [16] The use of one methodology or another will depend on different factors concerning the development of the ontology.Among these factors one can find: the knowledge acquisition, the verification and validation of the process, or the documentation in the integral process.
In the process of the ontology development, other issues to consider are the requirements, design, implementation, and maintenance; all of them related to the part of the World to be represented.
In [3], an extensive analysis of the different methodologies for designing and developing ontologies is carried out.Based on this analysis and our findings, the methodologies of CyC, Uschold and Kind and Methontology have been chosen, because they are the most complete methodologies between all the analysed methodologies.In these methodologies the process of development and the use of the ontology are independents, and therefore, they could be the most appropriate for designing our Textile Chemical Ontology.
Next the stages involved in each of the methodologies previously chosen are described in more detail, in order to see the common points and differences between them.
The stages involved in CyC methodology are: www.ijacsa.thesai.org Extract the necessary knowledge from all the available and interesting information sources for our represented domain.
 Acquire new knowledge using NLP tools.
 Develop and represent the ontology.
The method proposed by Uschold and Kind consists of four phases: As it was described before, the stages of each methodology show the process of building an ontology according to each model.
The method proposed in Methontology is more complete than others, in the sense that this methodology have a life cycle of the ontology where the technicians can make proves until everything is finished and the ontology can be used.That is the main reason why this methodology was finally chosen for building the Textile Chemical Ontology.

B. Methodology Chosen: Methontology
From the analysis carried out in the previous section.Finally the "Methontology" was determined as the most suitable methodology to start developing the ontology.This decision was also motivated by how Methontology is structured for building the Textile Chemical Ontology.The stages are very clear and better structured than others for building ontologies.Moreover, the flexibility of this methodology for building and ontology, allows us to adapt the Textile Chemical Ontology construction to our needs.
Since we are interested in representing knowledge about harmful or banned chemicals applied to textiles in any form, the stages that this methodology takes into account for building the Textile Chemical Ontology need to be followed.The Methontology methodology has been chosen by different authors (e.g., [8,6,11]) for building the ontology, and we will follow the same guidelines for building ours.Next, the stages for building our ontology are described:  Specification.This stage identifies the purpose of the ontology, domain of use and users, the degree of formality required, and the scope of the ontology including the terms that be represented.
In our particular case, the purpose of creating this ontology is to help researchers and professionals working in the Chemical Textile domain, when looking for information concerning new legislation that could affect textiles.Currently, they need supporting tools to find specific information very quickly and in short time, which unfortunately, they are not available yet.
 Knowledge acquisition.This stage is developed in parallel as the previous stage.Any type of knowledge source and any method can be used to build the ontology, although the roles of expert interviews and analysis of texts are very valuable.The knowledge required for developing the ontology will come from a professional of the Textile Chemical domain and the information sources describe in Section III.C.
 Conceptualisation: In this stage, the concepts, relations and properties are identified.Once the concepts, relations and properties are identified, they are represented using an applicable informal representation.After that, once the knowledge is conceptualised, the ontology can be populated with the corresponding instances.Further detail about this stage in Sections III.C and III.D is provided.
 Integration: In the event that more information is needed, the knowledge available in other ontologies can be reused to complete the information.As it was described in Section II, several and different types of ontologies related to either Textile or Chemistry can be found.They can be advantegous for this research in the sense that some information contained in existing ontologies could be reused or adapt.They can also be useful for extracting some concepts and making extensions for preparing the Textile Chemical Ontology.In this manner, we can take advantage of the provided taxonomy and vocabulary from the textile ontology (TEX GLOB).
Moreover, we can also reuse some information about the structures that are included in chemical ontology (ChEBI), since some of the concepts that we may need to include in our ontology could be already present in chemical ontology.The integration of other ontologies will increase the robustness of ours.
 Implementation.In this step, the ontology is represented in a formal language.In particular, for the proposed Textile Chemical Ontology, XML language will be used, because it is a standard that can be used to encoded the knowledge that could be then integrated in automatic processes.In addition, the ontology will be developed in English, because professionals working in the Textile Chemical domain normally use this language, although future extensions to other language would be also possible.
 Evaluation.After creating the ontology, it is necessary to evaluate the ontology by checking how complete or valid it is.At the moment, to verify the usefulness of the ontology, a use case in the context of the information retrieval is designed, which is explained in www.ijacsa.thesai.org Section IV.This use case will serve to analyse the usefulness of the proposed Textile Chemical Ontology in a preliminary way.
 Documentation.In order to be reusable and understandable by the research community or the group of professionals working with it, the ontology must be documented using a specific software for building ontologies, e.g.PROTÉGÉ 6 .These documents will contain all the information about the design and development of the ontology.
All the previously mentioned steps, except the information integration, have been already completed.Thus, this constitute the first cycle of the ontology.The ontology life cycle answers the previous questions identifying the set of stages though which the ontology moves during its life [19].Making an analogy, it could be said that the ontology development process is similar to the production chains in a manufacturing domain as the ontology is to the final product that such production chain.Later, we could add more information to the ontology if newer concepts or instances are obtained, so that a better and more complete ontology is created.For building the Textile Chemical Ontology this life cycle in our methodology will be taken into account.
When all the steps are finished the life cycle of an ontology is closed and is prepared for using by the users.

C. Concepts and Properties Definition
At the moment, in the initial version of our Textile Chemical Ontology, it has been structured in concepts and relationships.
Concepts are the main basic piece of the knowledge to represent, and the instances are more specific concepts represented in ontology.Instances will be derived in the ontology population, which is out of the scope of this research work.Relationships represent the interaction between concepts and shape the structure of domain.
For extracting the concepts that will be part of Textile Chemical Ontology, the knowledge sources shown in Table I are used.These Websites contain all the legislation from Europe and other countries applied to the textile domain and therefore, banned chemical compounds also appear, such as Lead.In the case of the REACH Web, some chemical products from other domains as environmental, food, plastics, wood and others, can be also found, but only the chemical compounds applied to the Textile Chemical domain will be employed.
These information sources will be used to extract the necessary concepts for building our proposed ontology.However, to start with, a set of concepts were defined, and later grouped into different levels.The reason to choose these initial concepts was because all of them are already legislated and banned.
The proposed 3-level structure will allow us to define common properties to all levels (defined in the superclass), as well as specific properties only for the different subclasses.
For instance, chemical substance (first level), heavy metals and chemical residues (subclases of chemical substance; this is a second level) and DMFu (dimethylfumarate) and SCCP (Shot chain chlorinated paraffins) (subsubclases of chemical substance; this is a third level).Figure 1 shows a fragment of the concepts of the Textile Chemical Ontology.In this fragment the concepts according to the three levels of hierarchy previously mentioned are represented.

D. Relationships Definitions
After extracting the concepts from different knowledge sources, the next step in the design process is to establish the relationships between them.www.ijacsa.thesai.orgSome relations that can be found in the domain are: -is a-,part of-, -contained in-, -affects-, -related to-, -cause-.At the current version of the ontology design, 3 types of relationships were identified: i) is-a; ii) cause; and iii) part-of.
Next, each of this type of relationships is illustrate with an example: LEAD is a Heavy Metal.Phthalates cause cancer.Chemical Residues part of Chemical Substance In the short-term, we will extend the number of concepts and therefore, identify more relations between them.

IV. USE CASE: INFORMATION RETRIEVAL
In this section a use case where the Textile Chemical Ontology could add value to the final results is described.In particular, the scenario focus on the information retrieval task.Firstly, it is explainedhow the ontology could be used, and then a preliminary comparison with and without the use of our ontology is conducted.
In [10], it was shown that general-purpose alert systems as Google and Yahoo!Alerts were not suitable for searching highly specialised information.Several problems were encountered when using these systems, for instance, the problem of ambiguity.Concepts, such as or flame have different meanings, but Google and Yahoo! used to place at the first positions, the documents referring to their most frequent word senses, so very few results were provided about their meaning in the specific Textile Chemical domain.The results related to this domain, if retrieved, were always placed at the end of the list of retrieved documents, which was very difficult to find, given the high number of results that were retrieved.
In light of these experiments, a preliminary evaluation is performed, where the ontology is used to expand the terms of the query in order to analyse to what extent the retrieved documents could be more accurate.This term expansion could not be done without the knowledge about the Chemical Textile domain represented in our ontology.
For doing these experiments, 10 concepts about different levels from the Textile Chemical Ontology were used.The general-purpose Google search engine was selected for performing the searches, and each of these concepts was searched for individually (without exploiting the knowledge in our proposed ontology).Later, they were again searched but this time, using the relationships "is-a" of our ontology in order to expand the terms in the query, given an initial concept.
Table II shows the results of the experiments.We show the differences between using and not using the Textile Chemical Ontology.In both cases, a general-purpose search engine is employed.
After analysing the data obtained, it can be observed that the use of the Textile Chemical Ontology for performing the search helps to reduce the number of retrieved results.Moreover, analysing in detail whether the results among the 10 first positions may or not be related to the Textile Chemical domain, it was found that the ontology is also suitable to focus on more specialised documents.Within this process, the 10 first documents that recover the general-purpose system were analysed in more detail.In the case of the phthalates, these documents are interesting for us without using the ontology, but when the Textile Chemical Ontology is used, the number of documents retrieved are lower but more specific in our domain.In the case of other terms, such as Heavy Metals, searched without the Textile Chemical Ontology, the 10 first documents are referred to the musical genre, but when the ontology is used, more specific and specialised documents, and therefore more interesting for our purposes are found.
When a search with general-purpose search engines is performed, the results obtained are general and many times they are not related to our domain.However, when the Textile Chemical Ontology is used for searching the information, the results obtained clearly belong to the specific domain.Therefore, the combination of NLP tasks, such as information retrieval and ontologies is very appropriate, since it is possible to retrieve specific information for the Textile Chemical domain, decreasing also the time spent for retrieving information of interest in this domain.This use case was just an example of a scenario where the ontology could be validated.Although the experiments conducted are very preliminary, the comparison made reveals the potentiality of the ontology when apply to NLP tasks.

V.
CONCLUSIONS AND FUTURE WORK In this paper, the initial design of an ontology for the Textile Chemical domain are presented and discussed.After the study of the related existing ontologies, no ontology specifically developed for the Textile Chemical domain was found.There are ontologies either for Chemistry or Textile domains independently, CheBI and TEX GLOB, respectively, but there is none that combines both fields into the same ontology.www.ijacsa.thesai.org In order to start designing the Textile Chemical Ontology, select a suitable methodology.In this case, it was decided that "Methontology" methodology will be used to develop the ontology.Having analysed other available methodologies, "Methontology" is the best methodology for building the Textile Chemical Ontology, since this methodology is very clear and it provides a better organisation for the stages involved in the ontology development process.
Following the stages of this methodology, two key issues in any ontology are: i) the definition of concepts and ii) relationships.For the former the sources of information where the concepts constituting the ontology can be extracted (e.g., OEKO-TEX or REACH) were first analysed.For the latter, the different types relations were also studied, being three of them (is-a, cause, and part-of), the most relevant ones.
Finally, a use case where the Textile Chemical Ontology was used in the context of Information Retrieval task was presented.In this manner, it was shown how the problem of ambiguity for some concepts could be solved, when employed general-purposes search engines (e.g., Google).In this manner, it was shown that when using the Textile Chemical Ontology the problem of ambiguity disappears, reducing the number of retrieved documents as well as obtaining higher precision.
In the short term, as a future work, the size of the ontology will be extended, increasing the number of concepts and relationships.One aspect planned to be added when developing the ontology would be to take into consideration the multilinguality of the concepts, adapting those parts of the ontology where different languages may be involved.Moreover, its validation in the context of information retrieval will be broaden and improved, integrating it and experimenting with specialised crawlers that might improve the results for specific domain.
In the medium and long term, the use of the ontology in the context of other NLP tasks, such as text summarisation or information extraction will be also analysed.
Identify the purpose of the ontology  Build the model  Evaluate the model  Document the ontology Finally, for building ontologies with Methontology methodology, it is necessary to:  Specify the objectives and decide the domain  Conceptualise the terms  Formalise the model  Implement the mode  Maintain and incorporate more information

Fig. 1 .
Fig. 1.Fragment of the concepts of the Chemical Textile Ontology