Numerical Representation of Web Sites of Remote Sensing Satellite Data Providers and Its Application to Knowledge Based Information Retrievals with Natural Language

A method for numerical expression of web site which is relating to satellite remote sensing and its application to knowledge based information retrieval system which allows retrievals with natural language is proposed and implemented. Through experiments with remote sensing related information, it is found that the proposed information retrieval system does work in particular for remote sensing satellite data retrievals with natural language.


INTRODUCTION
When a word or words are typed in search engines, a list of web sites that contain those words is displayed.The words you enter are known as a query [1].Baeza-Yates and Ribeiro-Neto linked Information Retrieval to the user information needs which can be expressed as a query submitted to a search engine [2].Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s [3].Although search engines are programmed to rank websites based on their popularity and relevancy, empirical studies indicate various political, economic, and social biases in the information they provide [4], [5].
There are great amount of information for remote sensing satellite data retrievals.Directory, inventory, catalog, and guide information are available in a worldwide basis.Smart search engine, therefore, is needed for remote sensing satellite data retrievals.In order to realize a smart search engine, knowledge base system has to be involved.Knowledge base system consists of knowledge base which includes object and attribute, and inference engine.Also, users would like to search remote sensing satellite data with natural language.
The next section describes the proposed knowledge based search engine which allows search appropriate URLs of remote sensing satellite data providers with natural language followed by some experimental results.Then conclusion is described together with some discussions.

A. Knowledge Based System and Conventional Information
Retrieval Systems Figure 1 shows configuration of knowledge based system which consists of Inference Engine: IE, Knowledge Base database: KB, Knowledge Base Management System: KBMS and Knowledge Acquisition Module: KAM.When user submits query to the knowledge based system, previously acquired knowledge about remote sensing satellite data providers is used to output search results.Also Figure 2 shows the query system which allows distribution of multiple queries to the multiple databases which include different types of remote sensing satellite data through expanded query generator from a single query.Therefore, appropriate queries are submitted from expanded query generator by database system by database system.There are distributed remote sensing satellite database systems created and managed by the data providers.Figure 3 shows assisted search module which allows distributed database servers search through internet.Only thing users have to do is to access the assisted search module.Then the module makes a search for appropriate database server from the distributed servers.www.ijarai.thesai.orgThere are three basic components for GCDIS-ASK, client module, assisted search module and data collection module.Figure 4 shows system architecture of client module.When query is submitted from users, there are three options, direct search for the database, assisted search, and local search for the database under the GUI: Graphical User Interface.
One of the key features of assisted search module is Natural Language: NL search engine.Search can be done with a combination of statistical search and concept base search.The former is based on statistical variables, frequency of the query words, distance between query words, etc.On the other hands, the later uses concepts derived from expertise persons.Thus users can create concepts by using previously acquired knowledge and expertise in the knowledge base in order to improve search performance.Extendable knowledge base system makes such data and information search available.Under the extendable knowledge base system, there is NL search engine which consists of dictionary.In order for that, smart query server and smart query scheduler are prepared as shown in Figure 5.There is specific database server under each smart query server.Search query scheduler monitors each smart query server.When user submits a query, client handler makes query generation and send query to appropriate smart query servers as shown in Figure 6.
Figure 7 shows more detailed architecture of assisted search module.Key issues here are NL search engine and user profiling.Users may use natural language in their queries.Users' profiles are archived and used for choosing information access options.Therefore, every time user access to the database, user profile is updated and thus information and data search can be done in much efficient way.

B. Proposed Information Retrieval Systems
There are attribute information about data provider, keywords about URL, URL itself, data provider name, available data period (from when to when), information about data provider.There are attribute information about data, atmosphere, hydrosphere, cryosphere, geosphere, biosphere.Under the attribute information, there are many sensor names as shown in Table 1.
In the attribute information about data, there are observation target names, satellite names, sensor names, etc. as shown in Table 2.This is the example of NSIDC: National Snow and Ice Data Center.
Then it becomes possible to plot all URL in the five dimensional (attributes) vector space as shown in Figure 9.In the figure, x, y, and z axis are hydrosphere, cryosphere, and geosphere, respectively.Then the distance between URLs can be defined as shown in Figure 10.Angle between URLs can be calculated easily.Thus the smallest angle between input search query and the existing URL can be found followed by sending the closest URL to users as search result.
These attribute information can be classified as shown in Table 3.In the Table 3, number denotes the number of attribute and can be normalized as shown in the bottom row of the Table 3.   Figure 11 shows architecture of the proposed remote sensing satellite data and information retrieval system.Query from users is written in text format with natural language.Then angle between URLs be calculated easily.Thus the smallest angle between input search query and the existing URL can be found followed by sending the closest URL to users as search result.

A. Implementation
Using netscape environment, web design is performed with PHP.Top page of the proposed search system is shown in Figure 12.

B. Search Example
In the example of Figure 12, search request is done with the following natural language, "I would like to get images of areas suffered from heavy snow.I would like to know situation of iceberg in the Antarctic Ocean using data from Polar 1km AVHRR dataset.I would like to know about icy content mapped from space with RADARSAT."Users can refine the search results by reselecting much appropriate wording for query as shown in Figure 14.Then users can get much suitable URLs.Users' satisfaction is evaluated through questionnaire with the ten students and compares the evaluation result to the conventional keyword search.As the result, all students prefer the proposed natural language search rather than the conventional keyword search.Hit ratio is also evaluated with ten students and compare to the keyword search.It is found that approximately 10 points improvement is confirmed for the proposed search system in comparison to the conventional keyword search.

IV. CONCLUSION
A method for numerical expression of web site which is relating to satellite remote sensing and its application to knowledge based information retrieval system which allows retrievals with natural language is proposed and implemented.www.ijarai.thesai.orgThrough experiments with remote sensing related information, it is found that the proposed information retrieval system does work in particular for remote sensing satellite data retrievals with natural language Users' satisfaction is evaluated through questionnaire with the ten students and compares the evaluation result to the conventional keyword search.As the result, all students prefer the proposed natural language search rather than the conventional keyword search.Hit ratio is also evaluated with ten students and compare to the keyword search.It is found that approximately 10 points improvement is confirmed for the proposed search system in comparison to the conventional keyword search.

Fig. 1 .
Fig. 1.Fundamental configuration of knowledge based system

Fig. 2 .
Fig. 2. Expanded query generatorThere is NOAA: National Oceanic and Atmospheric Administration, EOSDIS: Earth Observation Satellite System of Data Information System, USGS: United States Geological Survey, DOE: Department of Energy, etc. as database servers of data providers.This assisted search module is the fundamental function of GCDIS-ASK: Global Change Data and Information System of Assisted Search for Knowledge.

Fig. 10 .
Fig. 10.Relation between input query and the existing URLs

Fig. 11 .
Fig. 11.Architecture of the proposed remote sensing satellite data and information retrieval system

Fig. 12 .
Fig. 12. Query input web pageWhen users submit the query together with users ID and the maximum number of search results, then the search result is returned as shown in Figure13.For the example, the top five closest data providers to the query are output as search result with URL and the detailed information.These are aligned in accordance with the distance (angle) between query and the attribute information about data provider of URLs.

Fig. 13 .
Fig.13.Search result for the query of with the following natural language, "I would like to get images of areas suffered from heavy snow.I would like to know situation of iceberg in the Antarctic Ocean using data from Polar 1km AVHRR dataset.I would like to know about icy content mapped from space with RADARSAT."

TABLE II .
A VARIETY OF ATTRIBUTE INFORMATION ABOUT DATA WHICH ARE PROVIDED BY NSIDC TABLE III.CLASSIFIED ATTRIBUTE INFORMATION AND THE NUMBER OF ATTRIBUTESFig.9. URL distribution in the feature space of attributes (Hydrosphere, Cryosphere, and Geosphere) www.ijarai.thesai.org

TABLE I .
SENSOR NAMES UNDER THE ATTRIBUTION INFORMATION