An Approach based on Machine Learning Algorithms for the Recommendation of Scientific Cultural Heritage Objects

The Scientific Cultural Heritage (SCH) of the Drâa-Tafilalet region in south-eastern Morocco is a rich source of data testifying to the ingenuity of an older generation that has shaped the past of the region. These data must be preserved for future generations, particularly with new technologies and the semantic web. Recommendation systems (RS) are intended to assist prospective users in recommending the most suitable services based on their profile and expectations. The collaborative filtering (CF), content filtering (CB) or hybrid filtering (CF) RS has shown promising results in order to explore the problems experienced especially in CH. However, there are some limitations to be resolved, mostly due to the ability of these methods to build a stable and complete framework, which can provide a complete image of the user profile and suggest the most appropriate offers. This paper presents a hybrid recommender system for SCH data; a field little explored despite its historical importance and the value it generates. The results presented in this paper belong to the data collected from the region of DrâaTafilalet in southern Morocco. Keywords—Cultural heritage; CIDOC-CRM; ontologies; OWL; recommender system; semantic web; RDF


I. INTRODUCTION
Given the many features and applications developed in the Big Data age, a Recommender System (RS) is an essential and effective user support tool. The user loses a considerable amount of time because the access to relevant information is difficult and the value of the services offered is challenged. An RS is a data filtering method that defines a collection of resources that are important to a particular user. There are three different forms of RS to think about. Content-driven SRs monitor user behavior and make suggestions for new items based on the user's interests. The guiding forces behind this work are the need to protect the Drâa-Tafilalet heritage. A region with a rare richness and diversity in its tangible and intangible cultural heritage, the use of which was restricted to a few a number of projects, without noticeable effects, by some public institutions. Using emerging technology will help make this heritage known, preserve it and make it a valuable added vector for a country with practically non-existent economic activities. Technologies for semantic web, Linked Data, for example, will go a long way toward resolving some of the issues that have arisen in several applications of this type.
In particular, the lack and dispersion of data sources and the redundancy of information available on the web in data sources held by the public sector and private organizations. Using link tools, we can expect to have a single complete Moroccan Heritage data source utilizing multiple data sources. As a result, the stored data is automatically processed to extract useful information for the end-user, adding a significant dimension to the proposed system's architecture [1]. This document is structured as follows: Section II provides a synopsis of how SRS works for CH. Section III deals with some terminologies on Drâa-Tafilalet's unique and rich SCH. Section IV describes the architecture of the proposed system and presents some results obtained, while Section V presents a conclusion and some future directions.

II. RELATED WORKS
The literature includes a limited number of works discussed by machine learning (ML) [13] algorithms for the issue of SCH conservation. This study focuses on the creation and testing of a SCH recommendation framework based on ML algorithms and semantic data through the use of a CIDOC CRM-reference model. The use of SRS in CH has been widely debated by researchers. In view of the great demand displayed, especially for cultural tourism, many players, especially in the last 10 years, continue to express their interest. In the semantic integration of the Draa-Tafilalet area CH data from a generic Big Data architecture a complete analysis was carried out and applied [2]. It is the primary motivation for this work as well as a rich source of data for the future works. The authors in [4] propose a novel methodology for implementing a route planner within cultural sites such as museums by combining recommendation facilities with agent-based planning techniques. No clear idea is provided on the data collected and its nature. The PCS data is not illustrated in this case. The authors present INTHELEX, an application of a first-order logic incremental learning system that allows learning the automatic identification rules of a wide range of essential document classes and their related components in [5]. When the set of documents is constantly expanded, incrementally plays a critical role. The Folksonomy-based Item Recommender System (FIRST) [24] is a CHAT-developed content-based recommender system. The goal is to propose a *Corresponding Author method for adaptive exploitation of digital libraries based on a flexible architecture that allows the recommendation of artworks located in the Vatican Picture Gallery (Pinacoteca Vaticana), providing users with a personalized tour of the museum based on their preferences.
CHAT-Bot [9] is a chatbot that recommends adaptive tourist itineraries with points of interest and associated services based on the tourist's profile and contextual factors. The authors of [10] present a Big Data architecture supporting popular applications for CH (query, content analysis, navigation). The objective is to propose a new user focused approach to suggest different cultural artifacts. SMARTMUSEUM [17] is a mobile application that offers customized, contextual knowledge and content suggestions to museum visitors using geo-location, Near Field Communication (NFC), and Radio Frequency Identification (RFID) techniques. In most of the works reviewed, CH is dealt with in a general manner or, alternatively, by taking into account a case study concerning a particular object that may be part of the tangible or intangible heritage. Processing of SCH objects such as zawyas, khizanas and oulamas is almost absent or in the majority of the cases treated from its historical and cultural side. This is devalorizing given its importance in the history and civilization of a generation that has tried so hard to make this heritage known. The use of new technologies of the semantic web such as RDF graphs and ontologies will surely allow attributing to the SCH data an interesting semantic side and therefore benefiting from all the advantages and advances that these technologies offer.
In this work, we focus on the SCH data. The number of objects belonging to this type of data and the dimension of each object are two important factors. This justifies the choice to focus on the objects: Medersas, Zawyas and Oulamas. Each Zawya may contain a Khizana or library containing documents of several types (books, manuscripts, documents, maps...) The SCH objects processed can be classified according to the structure given in Fig. 1.

A. Scientific Cultural Heritage
Drâa-Tafilalet region in Morocco has a diverse and rich SCH. This heritage includes objects, buildings, and monuments from various periods and architectural styles, including religious architecture, funerary architecture, military architecture, and domestic architecture.

1) Zawya:
Zawya is a Moroccan term that refers to the location where Sufi disciples congregate in the presence of Sufi sheikhs to "purify the soul and refine behavior." The Zawya is a scientific, social and religious entity that has received special attention because of a historical relationship between Morocco's rulers and the Zawya's sheikhs, who played an important role in the country's political stability and spiritual security. In general, every Zawya has a library with its name. The library contains books, old manuscripts and documents that reflect the country's history and culture. There are dozens of Zawyas in the region, many of which are still active and have been for centuries. Some of them still keep their extensive libraries.
2) Oulamas: The Oulamas are Muslim scientists who have worked in a variety of fields, including science, literature, and law. They have left an indelible mark on Moroccan history, as well as the history of all Islamic countries throughout history. Ultimately, the goal of this research is to identify the Oulamas associated with the Tafilalet region who lived within its walls, refine their talents and abilities, and improve their knowledge and achievements in Shari'ah and related sciences. It is therefore another type of CH of the diversity of the Drâa-Tafilalet region.
3) Medersa: A "medersa", "Madrassa" or Koranic school is a Muslim theological school operated by a religious foundation known as "waqf". The Merinid medersa of Morocco, located in Fez, is the most notable among them, with impressive architecture. In addition, those from the Drâa-Tafilalet region, who is main function is to teach the Koran and the Hadith. Semantic web references such as RDF/RDFS and Owl are used to represent the region's collected scientific heritage data. Several knowledge bases are included to enrich the information presented to the end user and ensure semantic consideration of user queries, regardless of formalism or language used. Table I illustrates a semantic representation of some SCH concepts.
These semantic equivalences are exploited to provide rich and optimized content since some descriptions of some commonly used cultural objects will derive from reference knowledge bases such as DBpedia. In Fig. 2, an overview of the representation of the Zawya NASSIRIYA library in Tamegroute is illustrated.

B. Basic Functionalities
In general, a SRS integrates the following main functionalities: 231 | P a g e www.ijacsa.thesai.org 1) Acquisition/learning: Acquisition and learning are the first step of every SRS. It is a delicate and necessary step since any system's output depends on the quality and volume of input data, and it consumes the majority of the time spent on a project. The geographical dimension and constraints associated with the region of study, as well as the nature of the data collected, are also important considerations.
2) Creation of the user profile: The development of a user profile takes place according to the previous stage and the interactions an actor has in the system, particularly if the SRS is of hybrid type as in the study. One of SRS's most difficult challenges is the cold start problem, where the machine does not have details about the actor who is supposed to use it. This issue is addressed by assigning a default profile to each new user based on the semantic average of what is visited and what is most desired by other users, owing to the semantically close relationship between the objects manipulated by the system.
3) Extracting objects of interest: The next step is to extract the most interesting things for the user. Several techniques are used in SRS to ensure the semantic side of extracting the items of interest, which is dependent on the capacity of the system to interpret and understand user interactions (ontologies, RDF graphs, SPARQL queries, etc.). CIDOC-CRM standard was used for the integration of CH data of the study area. CIDOC CRM is the most widely used ontology and offers several advantages, which make it a reference, justifying its use in our case. The use of ontology aims to identify objects that are semantically linked to the user's interest items in order to ensure as rich as content as possible. The actor may be interested in the historical aspect of a KHIZANA from the area. However, he may be more interested in this one's content, which is rich in historical and cultural data.

4) Classification:
The classification helps for the organization of the extracted objects based on their continuously modified profile and their importance to the end user. A number of algorithms were used in the literature to accomplish this mission, based on expected features and performance. The advantages of ML and DL technologies have been demonstrated in a wide range of fields, especially in CH (CNN, Naive-Bays, SVM, CNN, etc.) (Fig. 3).

5) Recommendation:
A list of items that can interest the end user is produced in the recommendation process. This list is typically presented in the order of relevance based on the user's created profile. Based on the user's interactions with the system, this profile is constantly updated. A semantic link depth is defined to prevent the end-user from becoming lost in the massive amount of data presented to him. The best depth is currently defined by what is observed practically through user interactions, but it will be the subject of a much more in-depth study based on what is observed.
In this paper, we represent each object O 1 by a set of n features (f 1 ; f 2 . . .; f n ). We note d the distance function used to compare objects. if D is a Database (relational DB and Knowledge base) of objects, the k-nearset neighbor of an object noted q can be described by : The user through his interactions with the system identifies the inputs. The new connections are automatically assigned a set of concepts extracted from his profile or a generic profile. This algorithm is of the KNN type, and the number of k closest neighbors shown in the output adapts to the display device used by the end user. The depth is limited to three levels of semantic links extracted from the SCH concept semantic representation, as shown in Fig. 2. This restriction was imposed following a series of experiments and tests. The semantic distance is calculated for each extracted concept, and if it is less than a certain threshold, the item is added to the output list.

Inputs
Target User : U Set of Concepts : C Number of items to recommend ∈ ℕ ⟵ 3 (link depth)

C. Algorithms of ML for CH
A preliminary study was carried out on the various ML algorithms used to process cultural heritage data (rule-based algorithms, genetic algorithms, SVM, KNN, etc.). The goal of this part is to find the most efficient algorithms to use for the CH of the study region and to increase the performance of the proposed system. This paper describes nine ML algorithms, which are as follows: • SVMs (Support Vector Machines) [21]: This is a set of supervised learning algorithms that are frequently used for regression or classification. SVM is an algorithm based on statistical learning. These algorithms search for all data points similar to those in the other classes. These data points, known as support vectors, are used for the classification task, the others data points are ignored. The best dividing line, known as the decision boundary, is then defined.
• Naive Bayes (NB): This is a statistical approach founded on the theorem of the probabilities of Bayes. NB employs statistical functions to determine the likelihood that input is relevant for a specific predefined class. This algorithm returns the most likely class.
• Decision Tree (DT) [15,19]: As a logic-based algorithm, DT models data sets in hierarchical structures using a series of if/else statement comparisons. Each node in the tree is made up of either decision nodes that contain terms (or objects that are more complex) or leaves that contain class label predictions. The weight of each word or object is labeled on the branches.
• KNN (K-Nearest Neighbors) [6,19,21,22,30]: this is a statistical method for predicting new input by calculating the similarity between the test data and the new instance by locating the closest data points (or data objects) in the training dataset based on certain distance functions. K denotes the number of closest data points (i.e., neighbors). The value of K is frequently determined using experimental test data.
• Rule-Based (RB) [5] classifications are algorithms in which collections of rules represent the data set. In contrast to DTs, which use a strictly hierarchical approach, RB classifiers allow for overlaps in the decision space. These rules are divided into two sections: the left side is made up of conditions, and the right side is made up of classes. The dataset is used to generate these rules.
• CNN (Convolutional Neural Networks) [20,21,23,25,30,31] are widely used for image or video processing, including image classification. The term "convolutional network" refers to a mathematical concept known as the convolution product. In simple terms, we apply a filter to the input image, and the parameters of the filter are learned as we go. Following that, a learned filter will be used to recognize and classify a more complex image or object. To classify architectural heritage images, the authors propose and implement a pre-trained CNN such as GoogLeNet, resnet18, and resnet50 in [31]. The goal is to improve image database management and make it easier to search for a specific element, thereby facilitating the study and analysis of the relevant heritage object.
• Genetic-Based (GB) [6,7]: GB algorithms are a subset of evolutionary algorithms. The goal is to use an optimization mechanism to approximate the solution to an NP-complete problem. The GB Algorithm begins with a population of candidate solutions known as individuals, which evolves from generation to generation until the first one contains the best solutions. Each individual has unique characteristics that can be influenced by genetic mutations (mutation, crossing, etc.). Each individual is evaluated, and its fitness value is used as a criterion for survival from generation to generation.
• Conditional Random Fields (CRF) [19] are statistical modeling techniques used in machine learning. A classifier predicts a single sample label without taking into account the context, whereas a CRF can. The prediction is represented as a graphical model, the type of which depends on the application, and it incorporates dependencies between predictions.
• Gaussian Mixture Models (GMM) [18,19] are probabilistic models that distribute points into different groups using the flexible clustering approach. GMMs assume a large number of Gaussian distributions, each of which represents a cluster. As a result, a Gaussian mixture model tends to cluster data points that belong to the same distribution.
In comparison to generic quadratic programming (QP) algorithms, Sequential Minimum Optimization (SMO) [19,21] is a simple and efficient algorithm used to solve the learning problem in SVMs vectors. At each step, SMO decomposes the global QP problem into QP sub-problems and solves the minor possible optimization problem. Fig. 3 describes a summary of the outcomes obtained by distinguishing between tangible and intangible heritage objects. SRS is a type of semantic information filtering that has benefited greatly from the significant progress made in the semantic web world (ontologies, RDF/RDFS, OWL, etc.). A study was carried out in order to gain a clear picture of the most recent advances and technologies used in SRS for CH and its algorithms. Table II summarizes the findings.
The following are the most relevant comparison criteria that have been identified in terms of the basic functionality that any SRS for CH must provide: • The objects of CH used: Cultural places, Tripes, Museum, painting, events, etc.
• Date: Date of start or publication of the first works and prototypes.
This will provide an idea of the historical side of the system. The developed system offers two options for support: a mobile application for Android systems and a web application. The end-user can access the rich content via a simple graphical interface, which provides relevant information on the KHIZANAS of the study region (Libraries). Each KHIZANA contains a number of documents (manuscripts, books, maps, geometric drawings, etc.). The user can also operate a query on the available data. Following that, three functions are carried out: Acquisition to extract keywords from a query and then detect important concepts implemented to feed the user profile or, if necessary, create a new profile. Several data sources are queried based on the entries of the users (database, knowledge base, .etc.). Based on the CIDOC CRM repository, an ontology was created at the start of this project to represent the SCH of the study region. The goals of this ontology are as follows: • Integrate the semantic aspect to the system to guarantee complete handling of the user request.
• Have a controlled vocabulary for all the concepts used and their semantic links.
• Extract concepts semantically linked to those extracted from the user query (according to a predefined link depth) to provide more prosperous and more relevant content.
• Once this task is completed, the system provides adaptive content as recommendations that may interest the final user. The activation of one of these contents will help to enrich the profile already created.
The proposed system's architecture (Fig. 4) allows a user to interact by formulating queries and collecting the responses directly as SCH articles. It also makes recommendations based on the user's previous activities and, if necessary, the history of visits by other users if no profile has yet been created, reducing the cold start problem that any SRS must deal with. This is because SCH data is already semantically close, and the recommended objects have a high likelihood of being accepted by the current user. A particular work is done in the background to ensure the consistency of the recommendations provided. The back-end is primarily made up of algorithms that allow for the extraction of relevant concepts, their classification, and recommendation algorithms by implementing knowledge bases that allow for a semantic representation of the items related to the SCH and the various data collected, as well as the various records stored in various formats. The system administrator creates profile prototypes that correspond to the developed system preferences and functionalities and have a close relationship with the various SCH's forms that have already been introduced. The main problem that recommender systems face is the cold start problem. This is solved by creating a default profile (SCH) for each new user. This profile is increasingly customized to provide more targeted and relevant recommendations and services based on its interactions (and the history of other users) with the system (requests, consultation of proposed assets, comments, etc.).    5 shows the results of integrating several semantic levels in the case of users interested in Books, resulting in a recommendation of objects with the same level of semantic depth (Books in this case) or higher levels (Manuscripts in this case) based on the semantic knowledge base already developed. Fig. 6 shows an example of a recommendation based on a user's interactions with objects of type: Khizana (Libraries), in this case, the Khizana Nassiriya, one of the world's oldest Khizana and houses to incomparable historical documents. In this case, the user will see recommended Khizana from the same category; it is the case where the semantic depth leads to objects from the same category.

V. CONCLUSION
Until now, very few works in the literature have focused on the valorization of the SCH of a region very rich in heritage data that has remained ignored. The added value of this work consists largely in the exploitation of some ML algorithms for the realization of an intelligent system allowing the recommendation of SCH objects. This will allow the preservation of this heritage and make it recognized by a maximum of actors. In this perspective, a Semantic RS for SCH has been implemented and tested for Drâa-Tafilalet region. The process of creating and assigning profiles considers end-user preferences. The process enables semi-supervised (content based) profiling, which is then refined using other available profiles (based on user collaboration (history of other users)). The system automatically adjusts the content to the user's terminal (mobile or web). The proposed system will enable the valorization of a little-known CH despite its significant potential in several fields of the economy of a region whose primary source of income is tourism. The majority of the functions described in this paper are already operational, and test data was used to validate the final prototype. The next step is to incorporate this component into the overall system in order to achieve the primary goal of this study: to have a comprehensive system for preserving and presenting the CH of the study region. Another point to raise, which will serve as the foundation for future work, is the incorporation of an interface that will allow cultural heritage specialists in the region to validate available data before it is explored via the platform.