An Efficient Resource Discovery Methodology for Hpgrid Systems

— An efficient resource discovery mechanism is one of the fundamental requirements for grid computing systems, as it aids in resource management and scheduling of applications. Resource discovery activity involves searching for the appropriate resource types that match the user's application requirements. Classical approaches to Grid resource discovery are either centralized or hierarchical, and it becomes inefficient when the scale of Grid systems increases rapidly. On the other hand, the Peer-to-Peer (P2P) paradigm emerged as a successful model as it achieves scalability in distributed systems. Grid system using P2P technology can improve the central control of the traditional grid and restricts single point of failure. In this paper, we propose a new approach based on P2P techniques for resource discovery in grids using Hypercubic P2P Grid (HPGRID) topology connecting the grid nodes. A scalable, fault-tolerant, self-configuring search algorithm is proposed as Parameterized HPGRID algorithm, using isomorphic partitioning scheme. By design, the algorithm improves the probability of reaching all the working nodes in the system, even in the presence of non-alive nodes (inaccessible, crashed or nodes loaded by heavy traffic). The scheme can adapt to a complex, heterogeneous and dynamic resources of the grid environment, and has a better scalability I. INTRODUCTION Computational Grids and Peer-to-Peer (P2P) computing are the two popular distributed computing paradigms that have been converging in recent years. Computational Grid is an infrastructure that can integrate the computational resources of almost all kinds of computing devices to form a global problem-solving environment. On the other hand, P2P systems aim at resource sharing and collaboration through direct communication between computers without a centralized server as a medium. Computational Grids and P2P are both resource sharing systems having as their ultimate goal the harnessing of resources across multiple administrative domains. These two distributed systems have some commonalities as well as some conflicting goals as discussed in [4]. They have many common characteristics such as dynamic behavior and heterogeneity of the involved components. Apart


INTRODUCTION
Computational Grids and Peer-to-Peer (P2P) computing are the two popular distributed computing paradigms that have been converging in recent years.Computational Grid is an infrastructure that can integrate the computational resources of almost all kinds of computing devices to form a global problem-solving environment.On the other hand, P2P systems aim at resource sharing and collaboration through direct communication between computers without a centralized server as a medium.Computational Grids and P2P are both resource sharing systems having as their ultimate goal the harnessing of resources across multiple administrative domains.These two distributed systems have some commonalities as well as some conflicting goals as discussed in [4].They have many common characteristics such as dynamic behavior and heterogeneity of the involved components.Apart from their similarities, Grid and P2P systems exhibit essential differences reflected mostly by the behavior of the involved users, the dynamic nature of Grid resources (i.e., CPU load, available memory, network bandwidth, software versions) as opposed to pure file sharing which is by far the most common service in P2P systems.Although Grid and P2P systems emerged from different communities in order to serve different needs and to provide different functionalities, they both constitute successful resource sharing paradigms.It has been argued in the literature that Grid and P2P systems will eventually converge [12,17].The techniques used in each of these two different types of systems will result to a mutual benefit.
Resource discovery is the key requirements in large heterogeneous grid environments, and an effective and efficient resource discovery mechanism is crucial.Traditionally, resource discovery in grids was mainly based on centralized or hierarchical models.Resource discovery could be the potential performance and security bottleneck and single point of failure.The Peer-to-peer systems for discovering resources in a dynamic grid discussed in [19].Using P2P technology, the resource can be discovered quickly and effectively in grid environment, scalability and robustness can also be improved in P2P Grid.
In this paper, we propose a P2P based Grid resource discovery model, HPGRID system which uses Parameterized HPGRID algorithm, to optimize grid resource discovery and reaches all the grid nodes during searching process even in the presence of non-alive (crashed, inaccessible, experiencing heavy traffic, etc.).The model overcomes the defects of central resource discovery mechanism.The HPGRID model can adapt to the distributed and dynamic grid environments, and has a better scalability.The HPGRID nodes are partitioned isomorphically listing the available resources according to their zones which aids the user to select the needed resource to execute its job rather than traversing the whole grid nodes.
The rest of this paper is structured as follows.Section 2 surveys the related work.Section 3 describes the Hypercubic P2P grid topology.Section 4 describes the HPGRID resource discovery algorithm integrated with Isomorphic partitioning.http://ijacsa.thesai.org/Section 5 presents the performance evaluation.Finally, section 6 concludes the paper and presents the future work

II. RELATED WORK
The taxonomy of resource discovery discussed in [21] has identified four main classes of Resource Discovery systems namely centralized, distributed third party, multicast discovery and P2P resource discovery.P2P-based resource discovery systems allow nodes participating in the system to share both the storage load and the query load [18].In addition, they provide a robust communication overlay.P2P-based Grid resource discovery mechanisms that appear in the literature can be divided into two categories: structured and unstructured [11].Most proposed systems depend on a structured P2P underlying layer.A structured system however assumes that all pieces of information are stored in an orderly fashion according to their values in a DHT.This is the reason structured systems support efficient resource discovery.However, apart from static resources, Grids include dynamic resources whose values change over time.Whenever the value of a resource attribute stored in a structured system changes, it needs to be republished.If this occurs too often, the cost of republishing becomes prohibitively high.[7] based on an unstructured network similar to Gnutella combined with more sophisticated query of forwarding strategies which is taken from the Freenet overlay network.Requests are forwarded to one neighbor which are only based on experiences obtained from previous requests, thus trying to reduce network traffic and the number of requests per peer compared to simple query flooding as used by Gnutella.Iamnitchi improves the central control of the traditional grid and adapts fully the decentralized resource discovery in grid environments.However, the limitations are still there in this approach.

Iamnitchi et al. proposes resource discovery approach in
Felix Heine et al. propose grid resource discovery approach based ontology and structured P2P technologies in [6].The approach tackles the semantic problem, but the maintenance of Peer is too high cost because Peer joins and leaves dynamically in structured P2P grid environments.Moreover, the approach focuses on the inherited relationship among grid resource classes and have not discussed the unstructured P2P technologies.
Several P2P schemes, e.g.MAAN [2], NodeWiz [1] and SWORD [8], [16] have been proposed to index and discover Grid resources in a structured P2P network.By using appropriate routing schemes, search queries are routed to the nodes that are responsible for indexing the corresponding resources.Therefore, these schemes scale well to large number of participating nodes.On the other hand, their flat indexing structures pose a major challenge to the global resource monitoring in Grids due to its large-scale and decentralized nature The HyperCuP system used ontology to organize peers into groups of similar interests using a hypercube topology network [9].Search queries were forwarded to interest groups to produce a better hit rate and reduce redundant query messages.This approach required complex construction of the structured hypercube topology network.When joining the network, a peer declared its interest so that the network could put the peer into the cluster of its interest.As P2P is a dynamic environment, a peer might change its interest over time.Constantly updating the network would result in high cost.Furthermore, it would be more complicated if peers had more than one interest.A superpeer model for resource discovery services in large-scale grids discussed in [20].Zheng [22] describes a model for resource discovery among Grids based on the community categorized by application domain.Rozlina [23] discussed the issues related to matrix for measuring the cost and benefit for choosing the right resource discovery mechanism for a P2P systems.The main purpose of the resource discovery Strategy [24] is to improve the efficiency of the implementation of grid system.Abdelkader Hameurlain [25] provides a survey and a qualitative comparison of the most promising approaches (P2P techniques and agent systems) for RD.Viability of Grid systems relies mainly on efficient integration of P2P techniques and mobile agent (MA) systems to bring scaling and decentralized control properties to Grids.

A. Hypercubic P2P Grid Topology
The Hypercubic P2P Grid Topology is the hypercube structure with additional neighborhood links.In short, we refer Hypercubic P2P Grid as HPGRID.
The Hypercubic P2P Grid nodes have     be the layer of the HPGRID.Let d be the set of nodes at each layer of the HPGRID, then d = 0, 1, 2, 3. Also, the number of nodes in HPGRID is k n , and the number of edges are . The HPGRID Topology for n = 3 is depicted in Figure 1.There in, the dashed lines are the additional neighborhood links.
The HPGRID system can be represented by an undirected graph where p and q are the binary values of the nodes of HPGRID for a 3D HPGRID (000) denotes node 0and (001) denotes node 1 and so on., and p 1 denotes the number of ones in p.Here In E, the first four edges are the additional neighborhood links, and the remaining edges are the hypercubic edges.

B. Representation of HPGRID
Generally, the grid model integrated with P2P mode is composed of many Cubic GridPeers [13].Each Cubic Grid-Peer represents a super management domain.Each Cubic GridPeer controls the access of a group of local computing resources.It plays two roles: one as the resource provider and the other as resource consumer.The resource provider allows its free resources to other Cubic GridPeer (consumer), while the consumer arbitrarily uses its local resources or the free resources of other Cubic GridPeers to carry out its task.The resource discovery model for HPGRID is shown in figure 2. The bottom communities of the model using the traditional grid technologies, and the P2P mode are adapted to interact the information between Cubic GridPeers.Here, Cubic GridPeer (CGP) is equivalent to a super node.When they search resources, the users first query the resources in the domain of Cubic GridPeer.If no query result, the search will be carried out through Cubic GridPeer to query the other Cubic GridPeers with P2P way.In HPGRID, each node represents a CGP where each CGP is a collection of Grid Nodes GNs.The GN that belongs to a particular CGP is called Grid Community GC.Each Grid node is represented using its own identifier and the identifier of the corresponding CGP.That is, grid node & is represented as At each CGP in the HPGRID system, it contains a CGP Header in the format represented in figure 3.Each CGP contains a common resource table which has the details of all the available resources in its own GC.The Resource table format is described in Figure 4.It contains the resource name and number of resources corresponding to its resource number.

C. Parameter at Cubic Grid Peer
The parameters represent the state of a Cubic Grid Peer (CGP) that one must meet the following criteria: they must be small, they must facilitate identifying the fertility of a Cubic GridPeer and they must not divulge resource structure information.Based on parallel application characterization experience, we identified the following parameters.http://ijacsa.thesai.org/

1) Distance Factor (DF)
This gives an idea of how far the target CGP is from the home CGP.A home CGP is defined to be the CGP in which the program and input data are present and to which the output data will go.If it is separated by a large network distance, i.e., high latency and low bandwidth, the staging files and the arriving program and the input files to that CGP will be costly.Another reason why such a factor is important is that tasks in parallel programs might be scheduled on different CGP.Thus there will be some communication between CGP, even though such a situation will be reduced as far as possible by the search algorithm.For tightly coupled applications this may not always be possible and the scheduler might be forced to schedule them on different CGP.This parameter will make CGP between which there is large latency or low bandwidth less desirable to the CGP selector.A high value of this factor makes a CGP less desirable for scheduling.
where h(CGP) denotes home CGP and n(CGP) denotes neighbor CGP.

2) Resource Density (RD):
This parameter represents the intensity of computing power per unit communication bandwidth.The lower the value of RD, the more will be the bandwidth between every pair of nodes.This signifies that the resources in those CGPs are tightly coupled.For parallel programs that have a communicator in which a small group of processes communicate a lot, a CGP with a low value of RD is important.For example, a SMP will have low RD whereas a network of workstations will have high RD.A similar parameter has been used to represent the computation to communication ratio in schedulers of parallel programs.

3) Load Factor (LF)
This gives the overall load at some instant in that CGP.This is important to take care of the computation component of the parallel program.Thus parallel processes have a high computation aspect compared to communication which would prefer a better value for LF than for RD.
These parameters would be number calculated from information about the state of resources in a particular .CGP IV.RESOURCE DISCOVERY In this section, a HPGRID resource discovery model is proposed, and is described as isomorphic partitioning and resource search algorithm.

A. Isomorphic Partitioning
The basic idea of isomorphic partitioning is to partition the HPGRID into number of hyper cubes, where i is the partition step, where i = 1 for n = 3, i = 2 for n = 4, and so on.After Partitioning, the HPGRID is divided into 4 zones namely Z 1 , Z 2 , Z 3 and Z 4 .Each zones differ in their higher order bit as shown in figure 5.The processor space is partitioned into higher dimensional isomorphic sub-cubes and keeping the same order of dimension.Isomorphic partitioning strategy for HPGRID systems significantly improves the Subcube recognition capability, fragmentation, and complexity compared to existing methods as discusses in [3].
The following zones show the results of isomorphically partition of the 3D HPGRID into 4 zones containing the following nodes at each zone.
Thus, there is one bit difference between the neighboring zones.The partitioned HPGRID for 3D has been depicted in figure 6.The resulting partitioned sub-cubes are said to be isomorphic in the sense that they are also n-cubes, and for this reason, they retain many attractive properties of Hypercube networks, includes symmetry, low node degree (2n) and low diameter (kn).The 3D HPGRID is isomorphically partitioned into 4 zones containing the following nodes at each zone.

B. Resource discovery algorithms
In this section we present a scalable self configuring resource search algorithm (named Parameterized HPGRID http://ijacsa.thesai.org/Algorithm) that is able to adapt to complex environments.It is possible to initiate a search request from any of the live nodes.For the reasons of clarity, however the examples used from now on assume that node 0 is the start node, without a loss of generality.

1) The Search Procedure in an HPGRID using Parameterized HPGRID Algorithm
The search procedure starts when a consumer wants to discover a CGP service.The consumer connects to one of the CGP nodes of the system and requests a service (a resource or some resources).The service discovery is tried first inside the requester's own CGP depending on the parameters explained in Section 3. .In case the request cannot be satisfied and the node that receives the message is the start node (startNode is true), the list v d is initialized to v d = {0, 1,..., n -1} (the complete set of dimensions) and v a is initialized to v a = {} (an empty list).Otherwise, v d and v a are initialized to the lists received along with the request message.In both the cases the lists represent the set of dimensions (neighbors) along which the message must be propagated.
2. The node calls the statusneighbors function and reorders the list v d in such a way that the dimensions corresponding to the nonalive neighbors are located at http://ijacsa.thesai.org/ the last positions of the list.For example, if v d = {0,1,2} and the neighbor along dimension 0 and 1 is not responding, then v d is reordered to {2,1,0}.The statusneighbors also returns with two integer values n non-alive and d alive .The integer value n non-alive represents the number of non-alive nodes in the reordered list v d .
The integer value d alive represents the dimension of the last alive neighbor stored in v d .For example, if v d = {2,1,0} and its neighbors in dimensions 0 and 1 are non-alive nodes, n non-alive = {2} and d alive ={2}.
3. If the number of non-alive neighbors (n non-alive ) is more than one, the node calls the addToList(v a , d alive ) function.This function appends d alive to the end of the list v a and returns to the new list (v a2 ) else it make the d alive empty.
4. When a new service request message is received by a node in the HPGRID system, the function procRequest(message.request) is called.If the request included in the message cannot be satisfied, the node sets the value of satisfyRequest to false and the request will be propagated.Otherwise, satisfyRequest is set to true.Here, it calculates the Distance factor (DF), Resource Density (RD) and Load Factor (LF) to check whether resource is available at CGPs Grid Community.If so, no propagation is performed.The message forwarded is composed of the request (message.request)and two vectors of dimensions (message.vd and message.v a ).In case the request cannot be satisfied and the node that receives the message is the start node (startNode is true), the list v d is initialized to v d = {0, 1,..., n -1} (the complete set of dimensions) and v a is initialized to v a = {} (an empty list).Otherwise, v d and v a are initialized to the lists received along with the request message.In both the cases the lists represent the set of dimensions (neighbors) along which the message must be propagated.
5. The node calls the statusneighbors function and reorders the list v d in such a way that the dimensions corresponding to the nonalive neighbors are located at the last positions of the list.For example, if v d = {0,1,2} and the neighbor along dimension 0 and 1 is not responding, then v d is reordered to {2, 8. Finally, the node propagates the request to each of the neighbors along with v a dimensions only if the corresponding neighbor is not its parent node.Now this propagation takes place under two cases.
Case 1: If the number of elements in v d is not equal to 0 i.e., v d .size()!=0, then the request travels inside a message together with v a and v d as empty lists.10.Propagating the requests in this way, the effect of nonalive nodes is reduced.Consequently, the algorithm tries to isolate the nodes that are in a non-alive state so that they become leaf nodes (if it is possible) under such circumstances, each node has only one non-alive neighbor, and then all live nodes can be reached.On the other hand, the nodes that are unreachable because of inaccessible or crashed nodes along a path to them, can be reached eventually via other nodes -using the v a list.
V. PERFORMANCE EVALUATION In this section, we present the simulation results using the GridSim simulator [14].The Parameterized HPGRID algorithm has been implemented for three and four dimensional HPGRID system.This algorithm has been evaluated in comparison with the existing algorithm described in [5].In order to evaluate we describe the following test cases.
 BEST CASE: All the nodes in the network are in alive state and the request has been satisfied at the first hop itself.
 AVERAGE CASE: Some of the nodes are in alive and some are in non-alive state and the request is satisfied at the next hop.
 WORST CASE: Most of the nodes are in non-alive state and the request is satisfied at the last zone.Simulation has been done on a 3D HPGRID for the worst case keeping the nodes source nodes as (000) making the zeroth, first and second dimension nodes as non alive depicted in Figure 8, the following figure 9 gives the complete traversal example for 3D HPGRID system starting from node (000), having all its neighbor non alive namely (001,010,100) except the node present in the additional link node 110 is alive.Sample part of the gridsim output for resource discovery on a best cast 4D HPGRID is shown in Figure 10.Resource search path traversal for the best case in the 4D HPGRID system is depicted in figure 11. Figure 12 and 13 depicts that HPGRID algorithm outperforms the existing algorithm in comparison with the number of hops needed to complete the resource discovery process simulated using Gridsim [15] for both the 3D and 4D HPGRID systems in comparison with the HGRID system which does traversals in a normal hypercube.The hypercubic P2P grid approach for resource discovery has been compared with the existing approaches discussed in [10].The following table gives the comparison study of resource discovery algorithm described in Table I.

VI. CONCLUSIONS AND FUTURE WORK
Our resource discovery scheme in HPGRID system uses Parameterized HPGRID algorithm which reaches all the alive nodes with minimum number of hops.The proposed algorithm is scalable in terms of time, because it keeps the maximum number of time steps required to resolve a resource request, to a logarithmic scale with respect to the total number of nodes.Moreover, each node has knowledge of the overlay CGP using the parameters defined.Therefore, our approach is also scalable and reaches all the alive nodes even in the lesser dimension of its search.Furthermore, scalability is also maintained by querying each node only once at the most (if possible).This important property (scalability) also extends to the number of nodes in the CGP.By using the deep multidimensional interconnection of a hypercube with additional neighborhood links, we provide enough connectivity so that resource requests can always be propagated in spite of non alive nodes.This makes our proposed algorithm much more fault-tolerant when it is compared with other topologies such as centralized, hierarchical or trees.In the absence of non alive nodes, it is able to offer lookup guarantees.Using isomorphic partitioning scheme if the resource needed not in the start node zones, then the number of resources and the number of tasks under examination are reduced by a single hop, thereby reducing resource discovery time The future work could be integrating of other resource management issues in this topology which could be extended to generalized topology like k ary n-cube systems.It could be extended considering the scheduling, security, QoS issues and also design and maintenance of new protocols in HPGRID

Figure 3 .
Figure 3. CGP Header Format The CGP Header field description is as follows,  CGP ID : Cubic Grid Peer Identifier,  DF : Distance factor,  RD: Resource Density,  LF : Load Factor,  Flag = 0,CGP is non-alive, Flag = 1,CGP is alive,  Node ID i : Nodes Identifier where i = 1, 2 …n. Resrc_Avail: : The total number resources available at the node,  Ptr to RT :Pointer to ResourceTable,  No. of CPUs : Total number of processing element at the node.

Case 2 :
If the number of elements in v d is equal to 0 i.e., v d .size()==0,then the node calls the statusneighbors(v a ) function and reorders the list v a in such a way that the dimensions corresponding to the n non-alive neighbors are located at the last positions of the list.The status neighbors(v a ) also returns two integer values n non-alive and d alive .The integer value n non-alive represents the number of non-alive nodes in the reordered list v a .The integer value d alive represents the dimension of the last alive neighbor stored in v a .9. For each position k in the list v a that represents a live neighbor node, the node calls the createList(k, v a ) function which creates a new list composed of all the dimensions located after position k in the ordered list v a .Also, for each alive neighbor, the v d list is initialized as empty.The request, v d and v a are sent to the corresponding neighbor in the v d [k] dimension inside a new message by calling the sendToNeigbor(v a [k],message) function.For the remaining elements in the list v a represents non-alive neighbor node, the request travels inside the message together with v a and v d as empty lists.

Figure 7 .
Figure 7.A complete example using Parameterized HPGRID Algorithm.A request of resource started at node 000 in a three-dimensional hypercube

Figure 12 .
Figure 12.Performance Evaluation of 3D HPGRID 3. If there is no provider, then the request is redirected to other CGPs.Parameterized HPGRID Algorithm gives the pseudo-code in a node when a new resource request message arrives 1,0}.The statusneighbors also returns with two integer values n non-alive and d alive .The integer value n non-alive represents the number of non-alive nodes in the reordered list v d .

TABLE I .
COMPARISON OF RESOURCE DISCOVERY ALGORITHM