CNRS,France

The paper presents a novel Optical Network on Chip (ONoC) relying on the multi-level optical layer design paradigm and called “OMNoC”. The proposed ONoC relies on multi-level microring resonator allowing efficient light coupling between superposed waveguides. Such microring resonator avoids using waveguide crossing, which contribute to reduce propagation losses. Preliminary experimental results demonstrate the potential of multi-level optical layer for reducing power consumption and increasing scalability in the proposed ONoC.


INTRODUCTION
The constant progression in computer and smartphones architecture and the continuous growth of the number of transistors on a chip involve an increasing need of highperformance, low cost and low power consumption networks on chip.In the next future, electronic interconnections will no longer be able to fulfill these requirements.
The numerous advances in silicon photonic integration and the successful realization and fabrication of micro-optical devices such as photo-detectors [1,2,4] ,modulators [5,6,12], buffers [21] and optical switches [8,9,13,16,22] push the studies on future generation of multiprocessor systems-onchip (MPSoC) using optical interconnections.Increasing importance is given to the architectures and topologies of optical networks on chip.Some of them use passive optical networks, such as Briere et al [7] and Vantrease et al [26].This choice allows a 100% optical network using passive switches (preconfigured), but the challenge to overcome is to manage the complexity of static routing tables and to provide a large range of resonant wavelengths when increasing the number of cores.Others like A.Shacham et al. [10], Xu et al. [3], proposed the integration of active optical components, minimizing the number of optical resonators and waveguides, but requiring electrical command of the optical switches.In the same way, and in order to ensure reliable and efficient transmission of optical signals, many studies have been made to develop control protocols for packet routing that provide QoS [19] and synchronization or organization protocols of the optical traffic inside the network [18,19,20].
The ONoC proposed in this paper relies on the multi-level optical layer design paradigm.As a main contribution, three waveguides level are used in order to implement two subnetworks without any waveguide crossing: a network is dedicated to payload data and the other is for control flow.The data network is realized through two superposed waveguides and is a combination between two modified Fat-H-Tree [11,14].
Interactions between the waveguide levels are realized by a novel 3D microring resonator.The control network is Meshbased and is located under the data network.In section II, the overall architecture, including the ONoC and the control network are presented.Results on optical power loss, design complexity, set up latency and power consumption are presented in Section III.Section IV concludes the paper and gives perspectives to this work.

II. NETWORK ARCHITECTURE
In this section, we present how multilayer can be used to design an effective ONoC.For this purpose, we exploit all the possibilities of optical structures such as superposed, crossing and curved waveguides.

A. Architecture Overview
The proposed 3D architecture is composed by a processing (electrical) layer and multi-level optical layer dedicated to data and control flow, as illustrated in Fig. 1.In this example, 8x8 processors are considered, resulting in 4x4 circuit interfaces (CI) and 3x3 control units (CU).The circuit-switched data network is implemented by two layers of superposed waveguides and adjacent to each other, as illustrated on the side view in fig. 1.The control network is composed by waveguides (implemented on a third dedicated layer) interconnecting control units located on the electrical layer.This optical network is efficiently implemented, without any waveguides crossing and realizing the shortest waveguide length between CU thanks to the multi-layer silicon deposit technology.
The electrical (i.e.processing) layer and both data payload network, and control networks interact with each other through Reduced Optical TurnArround Router (ROTAR).www.ijacsa.thesai.org

B. ROTAR( Reduced Optical TurnAround Router)
Each ROTAR is composed by 4 waveguides superposed in two layers, 4 two-level switch (microring resonators), a Control Unit (CU) and 4 Circuit Interfaces (CI), as illustrated in fig. 2. Since each circuit interface is shared by 4 processors, ROTAR manages communications for 16 cores.The CU is dedicated to the management of packets in the control network.If a source processor has to send data to a destination processor not sharing the same interface, the CU requests an access to the ONoC.For that, a setup packet P setup is injected into the control network (i.e. in the waveguide located on the first level), following an XY static routing policy.Once the optical signal reaches the adjacent CU, it is converted in the electrical domain.If the destination is not reached, the control packet is re-converted in the optical domain and is transmitted to the next CU, depending on the availability of the optical resources, etc. Similarly to the networks proposed in [3,10,15], optical resources in the data flow network are reserved all along the setup packet path.Once the destination CU is reached, an optical acknowledgment packet P ack is sent back to the source processor (it is transmitted similarly to P setup ).In order to avoid deadlock and to release optical resources as soon as possible, priority is given to acknowledgment packets.Once the P ack is received by the source processor, payload data are injected in the data flow network; the optical signals will then propagate along the waveguides located on the first and second level, depending on the configuration of the circuit interface.Reduced Optical TurnAround Router ROTAR www.ijacsa.thesai.orgROTAR is a reduced version of OTAR, the optical router proposed by Huaxi et al [3].In order to simplify the topology, ROTAR consider only X-and Y-hops (does not allow diagonal communications) thus reducing the number of microrings by 30% compared to OTAR.This leads to a significant reduction of the power consumption, as detailed in the results section of the paper.The CI is formed by 6 superposed waveguides on 2 layers and 8 microring resonators to interconnect 4 cores (fig.2).One of the 8 microring resonator should be turned on in order to eject wave to its output port.To inject payload data, the processors send the signal through one of the 4 ports located in the CI (see fig. 2).The configuration of each microring resonators is driven through TSV connected to the CU Fig. 3 illustrates a 2D view of the dataflow network layout with a chip size of 13mm×13mm, optical links between adjacent processors cores which are assumed to be 1mm by 1mm have an approximate length of 5 mm.This layout allows communication between 64 processors which need 64 lasers and 64 photodetectors to be fully interconnected.To achieve the high modulation speeds that would make on-stack interconnects practical (typically 10Gb/s data rate), one would need to use vertical cavity semi conductors lasers (VCSELs) for direct modulation.Since VCSELs are built using III-V compound semi-conductors, they cannot be easily integrated in a CMOS-compatible process.One stack mode locked lasers are an interesting separate modulation alternative.For that a one stack mode silicon locked lasers have been demonstrated [27].

C. 3D switch design (microring resonator)
In order to validate the interest of using multi-level optical layer for improving ONoC efficiency, a novel 3D implementation of the switching element (an active microring resonator) was realized using Comsol software.This switching element (fig.4) is composed by two crossing and superposed waveguides (Si in SiO 2 ) with a width of 440 nm and a height of 300 nm, and a resonant ring of 3.4 µm diameter.The ring is located under the upper waveguide and adjacent to the lower one with a gap of 100 nm between ring and guides.The microrings resonators switches are configured to send the signal in the direction to be achieved.Basically, the configuration is achieved as follows: -ON state in the case of horizontal hop: the optical signal is coupled from a waveguide to another; -OFF state is configured otherwise: the optical signal propagates along the same waveguide.

D. Multi-path communications
Depending on the location of source and target processors in the network, the communications will occur through interlaced and external straight parts of waveguides (Fig. 5).In order to reduce communication contention, each CI provides a dedicated ONoC access to its four attached processors P1, P2, P3 and P4.This is achieved by providing access to P1 and P2 (resp.P3 and P4) through the top (resp.bottom) optical layer and by allocating a given waveguide to each processor.Similarly, each processor can receive data independently from each other.
The waveguides located on the top (resp.bottom) layer are dedicated to propagate data from north to south (resp.south to north).Hence, data propagate along Y axis by using a same waveguide while communications along the X axis will require routing an optical signal from a waveguide to another.

E. Control unit
The control unit is one of the key elements to achieve a successful transmission; it is built from traditional CMOS transistors and it uses electrical signals to drive microring resonators; it deals only with control packets and it is formed by 4 optical/electrical converters, 4 multiplexers, 4 schedulers.www.ijacsa.thesai.orgThe first scheduler for λ setup uses FIFO algorithm for priority and the second gives priority always for λ acq .Using optical waveguides in the control network affects directly the "End to End delay (ETE)" which will be much smaller than with the use of electrical buses The control unit is divided into two computing units, one for P setup and the other for P acq .This strategy has been adopted to decrease delay and packet queuing in CU, considering that each kind of control packet has a different wavelength so they can follow the same path simultaneously without any deadlock.
Each router is controlled by a control unit (CU) which processes electrical control packets.All control units are connected in an optical control network which commands microring resonators via TSV (through silicon via).The architecture of the control network for 256 cores is shown in fig.6.
The operation mode of the CU can be summarized as:  Browsing of the table of "adjacent queue.Status" and synchronization of the control packets according to its content, update of the table by decrementing the current queue size and incrementing the queue size of the next control unit. Browsing the table of "path.Status", updating it according to the routing decision, decrement the field containing the number of displacements in the control packet P setup . Command microrings at the slot time "T n " if it is a horizontal hop, and increment the field of "time delay" in the control packet if the path is busy.
A study of set up latency and maximum queue size is done with Network Simulator NS-2 respecting the next rules:  In each processor is implemented a routing protocol module developed with C++ due to its compatibility with NS-2.
 The processor generates a control packet according to this algorithm.
 The control packet is updated at each travel through a CU and is forwarded to the next CU until it reaches its destination.
 If it is an X hop, the CU switches the microring in the adequate time slot.
Results of this study is given in section III.

F. Circuit interface
The circuit interface (CI) is the component interfacing the optical network layer for payload data and the CMOS electrical layer by EO-OE interfaces.The CI is formed by 6 superposed waveguides and 8 microrings to interconnect 4 cores, just one microring should be turned on in order to direct the wave to its output port.Top view of the CI is shown in Fig. 7 while the complete 3D implementation is shown in Fig. 2.
The CI processes only payload data, which means that after reserving path by control packets, the processor sends the payload data over the CI.The microrings in the CI are driven by the control unit and are switched on only if the current CI is linked to the processor's receiver of the final destination of the packet.The connection between adjacent processors sharing a common CI is done electrically due to the short distance between them.Each processor has two ports, one for injection and a second for ejection.The injection is done directly through Y-waveguide while ejection requires driving on one of the microring resonators (Fig. 8).Fig. 8.
Wave injection and ejection in CI Fig. 8 shows the four possible paths that can be established between two different CI.Corresponding to injection port (processor 1, processor 2, processor 3, processor 4), the suitable path to the destination CI is defined (one of path 1, path 2, path 3, path 4).While in order to be driven to the corresponding processor , each time we command the microring resonator corresponding to the ejection port of this processor.(example from Fig. 8 (scenario 1) ; for path 1, injection is from processor 1 in emitter CI and ejection is to one of processor 1 or 2 in receiver CI).

G. Discussions
Compared to the networks proposed by [3,10,15], the control flow is realized optically.In addition to allow a reduction of communication latency, WDM is used in order to concurrently propagate setup and acknowledgment signals, respectively through λ setup and λ acq wavelengths, which can help reducing contentions in the network.By using a single layer, the number of waveguides crossing would increase the propagation losses in the overall ONoC.With a two-level waveguide network, waveguides crossings are avoid and coupling losses as well as crosstalk can be minimized, leading to low latency and power efficient ONoC.
We believe that multi-level optical layer design paradigm opens new researches directions in ONoC topologies.Indeed, related works intend to decrease propagation loss by reducing the number of waveguides crossings.This has led to the use of ring topology and its associated serpentine like layout.However, optical signals propagate along with a single dimension, which negatively increase the distance between source laser and destination photodetector, thus impacting the power consumption.By reconsidering n-dimensional data routing network such as Mesh or Torus, shorter distances will be considering, leading to lower power consumption in ONoC.

III. EXPERIMENTAL RESULTS
Maximum optical power loss and crosstalk are two critical figures of merit for an optical network on chip, which determine the feasibility and scalability of this ONoC, as well as the power consumption of E/O interface to generate and detect the optical signal.

A. Optical Power Loss
The total optical loss in the network is the sum (in dB) of all waveguide losses and of the wave coupling losses with transmitters and receivers: Where  L EW is the coupling loss from transmitter to waveguide through the Y-junction in CI (see figs 2 and 7).
 L LW is the propagation loss of a straight waveguide expressed as : L LW = α*waveguide length .
 L CW is the loss due to waveguide crossings, proportional to the number of crossings over the path of signal.
 L BW is the bending loss depending on radius and angle of curvature of the bent waveguide.
 L MR is the loss due to the coupling between guides through the microresonators by evanescent field coupling; this loss is proportional to the number of switches over the path of signal.
 L WR ; the coupling loss from waveguide to receiver.
The main loss occurring in a straight waveguide is due to the sidewall roughness, exactly parameters such as roughness correlation length and it increases with the index contrast.The estimated optical losses using Comsol software of both single and multi-level implementations of elementary optical component are summarized in Table 1.
We first begin the performance evaluation study by demonstrating the potential of multi layer design in optimized standard network floorplans such as Mesh, Torus and passive lambda network.Analytical results of crossing waveguides number in longest path for M*N Mesh "CR mesh " and optimized Torus "CR optimized_Torus " topologies based network are proposed by K.Feng et al [23].While Le-Beux et al [24] give the number of crossing waveguides in multi stage lambda network "CR λ ": Optical Power Loss Optical device loss parameter Insertion loss Waveguides propagation "LLW " Waveguides crossing " LCW " Waveguides bend " LBW " Passing by ring (off) Coupling in one level switch (On) " LMR " Coupling in multi-level switch(On) " LMR " loss via Y junction "L YW " Crosstalk in one level crossing Crosstalk in two level crossing Input coupling loss "LEW " Output coupling loss "LWR " 6.5 dB/cm 0. 5 dB 0,05 dB 0.5dB 3.5 dB 4.5 dB 0,5 dB -13dB -23 dB 3 dB 0,6 dB Results shown in Fig. 9 demonstrate that by using multilayer design and so avoiding waveguides crossings, optical power of about 4dB is saved in lambda network and 6 to 8 dB are saved in Mesh and Torus networks.Optical power saved by using multi-layer design paradigm for different network architecture For given photodetector and laser source devices, the maximum OMNoC size is limited by the worst case propagation loss L TOTAL .
Assuming typical injected power of +5 dBm for on-chip integrable laser diodes and receiver sensitivity of -20dBm for 10 Gbit/s operation, the maximum insertion loss in any network size should be less than 25dB (available power budget) while crosstalk should be (in absolute value) much higher than this level in order to obtain a good reception quality.
Estimation of maximum optical power loss and crosstalk is done for different chip size in single and multi-layer implementations (Fig. 10).
In one level optical layer implementation, network scalability is limited to 64 cores, while by using superposed waveguides and so avoiding waveguides crossing loss, the cores number in one chip scales to 256.Worst case optical communications losses and crosstalk in our proposed multi-layer ONoC The crosstalk in an overall communication is defined as the attenuation between the source processor and a processor which should not be the addressed processor.A higher crosstalk (in absolute value) involves less signal interference in the receivers.As shown in Fig. 10, the use of the multi-layer design paradigm will satisfy this requirement with crosstalk 25dB lower than with single layer design.

B. Design complexity
Because all the constructed optical components are centrally symmetric, each waveguide in the ONoC satisfies the rules mentioned above.There are totally 4X+7Y-4 crossings for X horizontal and Y vertical hops number , Nr routers, 4((N r )+(N/2)) microrings and 2N/4 waveguides for N lasers and N photodiode .For each optical link, only (X+1) Microrings are resonant at most, which can reduce the power consumption of the multi-level optical network on chip ONoC , and the details can be found in Table 2.According to parameters values given in Table 3, for mesh topology based network 14*8 microrings are needed to interconnect 8 processors.For Torus based networks 8*8 microrings are required for 8 processors too.While in our proposed multi-level ONoC, only 36 microrings are required to 16 processors (18 for 8 processors), leading to the lower number of used microrings.These results mean that OMNoC require lower chip area compared to Mesh and Torus based networks.

C. Maximum Queue size and packet set up delay
A study of packet set up latency and maximum queue size is done with network simulator NS-2.Contention may cause the packet to be blocked, leading to a path-setup latency on the order of tens of nanoseconds.For that the study of the maximum queue size is a key to avoid contentions in network.Once a path is acquired, the transmission latency of the optical data is very short, depending only on the group velocity of light in a silicon waveguide: approximately 6,6 10 7 m/s, or 300 ps for a 2-cm path crossing a chip .
We consider a traffic with 12 control units and each link between two CU supports a rate of 12.5Gb/s, we consider also 6 traffic generator that transmit 30 bits each 0.05ms during 1.5ms, this traffic share a common link, so we will focus on the maximum size of the queue in the common node in order to avoid packet drop.
The maximum queue size in the common traversed CU is found by comparing maximum queue size and queue lost and by decreasing each time data rate generating until reaching a curve of zero queue lost (without packet drop).According to that, the CU can deal with 19 packets without any network contentions (Fig. 11).
The latency components are based on predicted individual latencies of electronic and silicon-photonic components in a future 45nm process.Results show a minimum set up latency of less than 50 ns for an offered load less than 0.2.While in overhead charge in the CU with an offered load up to 0.5, the set up latency increases to reach more than 350 ns per packet (Fig. 12).
Here n is the number of driven microring resonator in the network including those in the circuit interface.P mr is the average power consumed by one microring resonator, in ONstate it consumes 20µW [17], L payload is the size of payload packet which is taken to be 1024 bits.R is the data rate transmitted by interfaces which taken to be 12,5Gb/s, D l is the distance traveled by the optical packet in the ON-state, it depends on the processor core size which is taken here to 1mm by 1mm, and on the number of processor connected to each circuit interface.v is the light velocity.E oeeo is the energy consumed for 1 bit oe-eo conversion which is assumed to be 1pJ/bit, J p is the number of hops made in the control network.L control is the sum of the size of control and acknowledgment packets which is 28 bits.E CU is the energy consumed by control unit, it was simulated by Cadence and is the sum of the energy consumed by buffers (0.003pJ/bit), the energy consumed by computation unit (1,5pJ) and by the schedulers using FIFO algorithm (0, 12 pJ/bit).For example, for a 256-core MPSoC and 128-byte packets, OMNoC consumes 1,032 nJ/packet in the payload data network and 0,187 nJ /packet in the control network using a control packet size of 28 bit.

IV. CONCLUSION
In this paper, we propose an ONoC that relies on the multilevel optical layer design paradigm.As a main contribution, three levels of waveguides are used in order to efficiently implement data flow and control networks.Interactions between the waveguides located on different levels are realized by a novel 3D microring resonator.Simulation results demonstrate the lower coupling loss of this 3D microring resonator compared to related works.The energy required by the resulting ONoC to send a packet is estimated to be 1,228 nJ, resulting in low power architecture.

Fig. 1 .
Fig.1.4x4 3D optical network on chip (with a top view of optical network on chip ) and Mesh-based control network on chip architecture

Fig. 5 .
Fig.5.Possible communications paths between two different CI

Fig. 9 .
Fig.9.Optical power saved by using multi-layer design paradigm for different network architecture

TABLE II .
Merits of the constructed N multi level ONoC

TABLE III .
Number of microrings and processors required in OMNoC and related architectures