Polylogarithmic Gap between Meshes with Reconfigurable Row/Column Buses and Meshes with Statically Partitioned Buses

This paper studies the difference in computational power between the mesh-connected parallel computers equipped with dynamically reconfigurable bus systems and those with static ones. The mesh with separable buses (MSB) is the mesh-connected parallel computer with dynamically reconfigurable row/column buses. The broadcast buses of the MSB can be dynamically sectioned into smaller bus segments by program control. We show that the MSB of size $n \times n$ can work with$O(\log^2 n)$ step even if its dynamic reconfigurable function is disabled. Here, we assume the word-model broadcast buses, and use the relation between the word-model bus and the bit-model bus.


I.INTRODUCTION
The mesh-connected parallel computers equipped with dynamically reconfigurable bus systems gained much attention due to their strong computational powers [3,11,12,13,14]. The dynamic reconfigurable function enables the models to make efficient use of broadcast buses, and to solve many important, fundamental problems efficiently, mostly in a constant or polylogarithmic time [13]. Such reconfigurability, however, makes the bus systems complex and causes negative effects on the communication latency of global buses [2]. Hence, it is practically important to study the trade-off between such points quantitatively.
In this paper, we investigate the impact of reconfigurable capability on the computational power of mesh-connected computers with global buses. Here, we deal with the meshes with separable buses (MSB) [3,12] and a variant of the meshes with partitioned buses called the meshes with multiple partitioned buses (MMPB) [4]. The MSB and the MMPB are the mesh-connected computers enhanced by the addition of broadcast buses along every row and column.
The broadcast buses of the MSB, called separable buses, can be dynamically sectioned into smaller bus segments by program control, while those of the MMPB, called partitioned buses, are statically partitioned in advance and cannot be dynamically reconfigurable. In the MSB model, each row/column has only one separable bus, while in the MMPB model, each row/column has partitioned buses ( ). By comparing the relative power between these models, we clarify the difference in computational power between the parallel models equipped with reconfigurable bus systems and those with static ones. In this paper, we assume that the size of MSB and that of MMPB are of . The case of different sizes was investigated in [8].
Here, we study how much slowdown is necessary when we deprive the MSB of its reconfigurable function. In [5,6], we have shown that the MSB of size can be simulated timeoptimally in ( ⁄ ) steps using the MMPB of size , where L is constant and the global buses are of word-model, i.e., the bus-width is the same as the number of bits in one word. From this result, it is natural to think that the slowdown may be at least of polynomial time. However, here we show that we can suppress the slowdown to polylogarithmic time, by making use of the relation between the word-model bus and the bit-model bus.
In this paper, we show that the MSB can work with step slowdown even if its reconfigurable function is disabled. Here, we assume that the broadcast buses are of word-model, and use the relation between the wordmodel bus and the bit-model bus. As a corollary, since we have shown that the MSB of size can simulate the reconfigurable mesh [1,11,14] (or PARBS, the processor array with reconfigurable bus systems) of size in steps [10], we can say that the reconfigurable mesh of size can also work with step slowdown even if its reconfigurable function is unused. In [7], we have proposed more efficient algorithm, which exploits the pipeline technique heavily. Although the algorithm presented here is slower than the one in [7] by the factor of log n, the key ideas and explanations are much simpler than those in [7]. This paper is organized as follows: Section II describes the MSB and the MMPB models, and briefly explains how to solve the simulation problem of the MSB by using the MMPB. Section III shows that the MSB can work with step slowdown even if its reconfigurable function is disabled. Lastly, Section IV offers concluding remarks. www.ijacsa.thesai.org

A. Models
An mesh consists of n identical processors or processing elements (PEs) arranged in a two-dimensional grid with n rows and n columns. We assume that all the meshes are synchronous. The PE located at the grid point (i, j), denoted as PE[i,j], is connected via bi-directional unit time communication links to those PEs at ( ) and ( ), provided they exist ( ). PE[0,0] is located in the top-left corner of the mesh. Each PE[i, j] is assumed to know its coordinates (i, j).

An
mesh with separable buses (MSB) and an mesh with multiple partitioned buses (MMPB) are the meshes enhanced with the addition of broadcast buses along every row and column. The broadcast buses of the MSB, called separable buses, can be dynamically sectioned through the PEcontrolled switches during the execution of programs, while those of the MMPB are statically partitioned in advance by a fixed length. In the MSB model, each row/column has only one separable bus (Fig. 1), while in the MMPB model each row/column has L partitioned buses (Fig. 2). The MSB is essentially the same model as the horizontal-vertical reconfigurable mesh (HV-RM) described in [1,13]. Those L partitioned buses of the MMPB are indexed as level-1, level-2, ..., level-L, respectively. We assume that the partitioned buses of the MMPB are equally partitioned by the same length if they belong to the same level. For each level-k, the value denotes the length of a bus segment of the partitioned bus in level-k. Without loss of generality, we assume .
We assume that the word size of processor is ⌈ ⌉ for a mesh of size . As for the bus-width, we consider two types of bus-models: word-model and bit-model [13]. In the word-model, a broadcast bus consists of ⌈ ⌉ wires and conveys one word of data in one step; in the bit-model, a broadcast bus consists of a single wire and conveys only one bit of data in a step. We call the MSB (resp. MMPB) with word-model global bus by the word-model MSB (resp. MMPB). The bit-model MSB and MMPB are termed similarly.
(Strictly speaking, the bit-model defined in [13] assumes that both the processor word-size and bus-width are constant. Here, we assume that only bus-width is constant, and that processor word-size is of ⌈ ⌉ for the mesh of size .) A single time step of the MSB and the MMPB is composed of the following three sub-steps:

1) Local communication sub-step:
Every PE communicates with its adjacent PEs via local links.

2) Broadcast sub-step:
Every PE changes its switch configurations by local decision (this operation is only for the MSB). Then, along each broadcast bus segment, several of the PEs connected to the bus send data to the bus, and several of the PEs on the bus receive the data transmitted on the bus.

3) Compute sub-step:
Every PE executes some local computation. Here, we assume that a PE writes to only one bus at a time in the MMPB model. The bus accessing capability is similar to that of the Common-CRCW PRAM model. If there is a writeconflict on a bus, the PEs on the bus receive a special value (i.e., PEs can detect whether there is a write-conflict on a bus or not). If there is no data transmitted on a bus, the PEs on the bus receive a special value (i.e., PEs can know whether there is data transmitted on a bus or not).  The simulation of broadcast sub-step can be achieved by connected-component labeling (CC-labeling) of a portconnectivity graph (pc-graph). See Fig. 3 (a) and (b) for an example. Vertices of the pc-graph correspond to read/writeports of PEs, and edges stand for the port-to-port connections. Each vertex is initially labeled by the value that is sent through the corresponding port by the PE at the broadcast sub-step. If there is no data sent through the port, the vertex is labeled by . The CC-labeling is done in such a way that vertices in each component C is labeled by the smallest initial label of all the vertices in C, with regarding as the greatest value (Fig. 3 (c)). These labels are called component labels. Obviously, the broadcast sub-step of the MSB can be simulated in steps on the MMPB if the CC-labeling of the corresponding pc-graph can be solved in steps by the MMPB. If there occurs collision on a bus segment, at least one of the senders can detect it by comparing its sending data with the component label obtained by the CC-labeling algorithm (e.g., in Fig. 3 senses the collision). Then, by distributing such collision information using the CC-labeling algorithm, PEs can resolve the collision.

B. Port-Connectivity-Graph
In Section III, we solve the CC-labeling problem by the divide-and-conquer strategy composed of the following three phases:

Phase 1: {local labeling}
Divide the pc-graph into sub-graphs, and label vertices locally within each sub-graph. These labels are called local component labels. In each sub-graph, also check whether the two vertices located at the boundary of the sub-graph are connected to each other or not (this connectivity information is used in Phase 2).

Phase 2: {global labeling of boundary vertices}
Label those vertices located at the boundary of each subgraph with component labels.

Phase 3: {local labeling for adjustment}
Update vertex labels with component labels within each of the sub-graph for the consistency with Phase 2.
In the next section, we implement the above algorithm on the MMPB model.

III.SIMULATION ALGORITHM
In this section, we show that the MSB can work with step slowdown even if its reconfigurable function is disabled. For clarity, let MMPB <L> denote the MMPB that has L partitioned buses per row/column. First, we prove that any step of the word-model MSB of size can be simulated by the word-model MMPB <L> of size in ( ⁄ ) steps even when L is non-constant. Next, we show that any step of the word-model MSB of size can work with step slowdown even if we deprive the MSB of its reconfigurability, by considering the relation between the word-model bus and the bit-model one.

A. Simulation of the word-model MSB by the word-model MMPB
In [5,6], we have proved the following lemma, assuming that L is a fixed constant.

Lemma 1 [5, 6] Any step of the word-model MSB of size can be simulated by the word-model MMPB <L> of size in
In this section, we show that we obtain the almost same result as Lemma 1, even if we assume that L is non-constant.
In what follows, we mainly focus on how to simulate the broadcasts along a row of the simulated MSB by using the corresponding row of the simulating MMPB. The simulation for columns can be achieved similarly.
To begin with, we introduce two fundamental results. Then, we can prove the following lemma even if L is nonconstant. From Lemma 2 and Corollary 1, there exist some constants and such that the following two inequalities hold:

Lemma 3 The broadcasts taken on the separable bus in
In what follows, we prove that the following equation holds for some constant c. The proof is done by mathematical induction on k ( ).
Here, without loss of generality, we assume , .
For the base case where = , from Eq. (1) and , we have and thus Eq. (3) holds.
For the inductive case where , we prove Eq. (3), assuming that the following inductive hypothesis holds.
Let and respectively denote PE[ ] of the MSB and PE[ ] of the MMPB <k> ( ). Now, we explain how to implement the algorithm defined in Section II B. We divide the pc-graph corresponding to the broadcasts on the row separable bus into / disjoint sub-graphs , ,..., / of width . Here, we say that a sub-graph of pc-graph is of width w if it contains 2w vertices corresponding to the read/write-ports of w consecutive PEs. The CC-labeling of such defined pc-graph is carried out on the MMPB <k> as follows. We divide the row of the simulating MMPB <k> into / disjoint blocks , ,..., / in a way that each consists of ( ). Note that each sub-graph is processed by block alone. Then, for each block , since the PEs in and a bus segment of the level-k partitioned bus can be seen as a linear processor array of PEs with a single broadcast bus of length , Phase 1 can be executed in V( ) steps. As for Phase 2, the number of active PEs is n/ , and each of those PEs can communicate in a constant time with next such PEs via either a local communication link or a bus segment of the level-k partitioned bus. Hence, by conveying the information of boundary vertices of each to the leftmost PE in , and letting the information be processed by the leftmost PE in alone, Phase 2 is essentially the same problem as simulating the broadcast operation of the / MSB using the / MMPB <k-1> where each level-j partitioned buses are segmented by the length = ⁄ ( ). Here, It should be noted that holds for each l ( ). The operations required for such adjustment (data transmission to/from the leftmost PE of each , etc.) can be completed in a constant number of steps, and let be the time cost for them. Without violating the argument here, we assume that holds. (We can chose the constant c appropriately in advance so that c ≧ holds.) Then, from Eq. (4), Phase 2 can be completed in

( )
where level-j bus is segmented by steps. Phase 3 can be done in steps similarly to Phase 1. As a whole, the algorithm can be executed in

B. Simulation of the word-model MSB by the bit-model MMPB
In this section, we show that any step of the word-model MSB of size can work with step slowdown even if we deprive the MSB of its reconfigurable function, by considering the relation between the word-model bus and the bit-model one.
First, we prove the following lemma.

Lemma 5 Any step of the word-model MMPB <L> of size can be simulated by the bit-model MMPB <L> of size in steps.
Proof: The ⌈ ⌉ bits of one word data can be conveyed sequentially in ⌈ ⌉ steps, one bit per step, in the bit-model MMPB <L> .
We illustrate the results of Lemma 4 and 5 in Fig. 4. Obviously, Fig. 4 implies the following lemma: By letting L = log n, we obtain the following corollary.
Thus, the conclusion follows.
Since the word-model MSB of size has wires for each row/column, we can view a word-model bus as log n bit-model buses (Fig. 5). Hence, without increasing any circuitcomplexity, we obtain the bit-model MMPB <log n> of size from the word-model MSB of size . With this observation and Corollary 2, we can state the main theorem of this paper as follows: Theorem 1 Any step of the word-model MSB of size n × n can work with step slowdown even if its reconfigurable capability is unused.
IV.CONCLUDING REMARKS In this paper, we showed that the word-model MSB of size can work with step slowdown even if its reconfigurable capability is unused. We obtain the result from these two facts: 1) every global bus of the word-model MSB of size consists of ⌈ ⌉ wires, and 2) we can obtain the bit-model MMPB of size with L=⌈ ⌉ from the wordmodel MSB of size without increasing circuitcomplexity. In [7], we have proposed more efficeint algorithm that exploits the pipeline technique heavily. Although the simulation algorithm presented here is slower than the one in [7] by the factor of , the key ideas and explanations are much simpler than those in [7].
From a practical viewpoint, we expect that the communication latency of the broadcast buses of the MMPB is much smaller than that of the MSB. Each broadcast bus of the MSB of size can form the broadcast bus whose length is n, and such a bus contains sectioning switch elements in it. As for the MMPB of size , though the bus length is also at most n, but no switch element is inserted to the bus because it has no sectioning switch. Hence, compared to the MSB, the MMPB model has an advantage that each broadcast bus has smaller propagation delay introduced by the switch elements inserted into the bus (i.e., device propagation delay), and thus our simulation algorithm is practically useful when the mesh size becomes so large that we cannot neglect the delay. In future work, we will study the effectiveness of our simulation algorithm, by taking into account the propagation delay.