# Design of On-Chip Permutations Network using 3D Mesh Network on chip

D. Jagadish Kumar
PG Student, Dept. of ECE
Kakinada Institute of Engineering and Technology
Kakinada

N. Gannanaga Prasad
Assistant Professor, Dept of ECE
Kakinada Institute of Engineering and Technology
Kakinada

Abstract- A mesh based 3D NoC has been proposed and the 3D Networks-on-Chip (3D NoCs) have been attracted an interest to solve on-chip communication demands for future multipurpose systems. Silicon Integrated technologies provides a new opportunity for three-dimensional (3D) Network-on Chip (NoC). The communication is through node routers in NoC. One method is to achieve such goals is to implement efficient router architectures capable of fast packet switching and routing for parallel and scalable Networks-on-Chip (NoC). In order to reduce the wire length in 3D ICs, a single cycle router implementation for 3D mesh NoC with two arbitration schemes is proposed.

*Index Terms*— Network on chip, On-chip communication, integrated circuits, 3D network.

### I. INTRODUCTION

Networks-on-chip (NoC) has emerged as a promising interconnection architecture for multiprocessor system-on-chip (MPSoC) platforms. With scaling of process technology, it has been a reality to integrate multicore and eventually many core on a single chip. The chip multiprocessors (CMPs), which contain tens to hundreds of homogeneous cores on chip, and the multiprocessor system-on-chips (MPSoCs), which are composed of many different types of processors for embedded system, are proposed as the architectures of future microprocessors.

Technology scaling has allowed Systems-on-Chip (SoCs) designs to grow continuously in count of components and complexity. This significantly leads to some very challenging problems, such as power dissipation and resource management. Particularly, the interconnection network starts to play an important role in determining the performance and power of the entire chip . These challenges have led conventional busbased-systems that are not reliable architectures for SoC, due to lack of scalability and parallelism integration, high latency and power consumption, in addition to their low throughput .

In this sense, three-dimensional (3D) NoCs have emerged to reduce the length and number of global interconnections

and the number of hops that packets must pass through, and consequently, decreasing the network latency. This paper introduces OcNoC. The OcNoC's router may be implemented according to two different arbitration models.

This remaining paper is organized as follows. : Section II describes the OcNOC architecture with switch network. In Section III, presents the router interface. Section IV presents Fsm based arbiter algorithms. Section V discusses the different arbiter schemes. Lastly, Section VI has results with conclusion.

# II. OCNOC ARCHITECTURE WITH SWITCH NETWORK

Each switch consists of four bidirectional port and the 4 ports are connected to corresponding neighboring switches, and the port which remains is connected to the on-chip IP through a wrapper.



Fig 1: Switch architecture

NoC structure consists of processing elements (PE), network interfaces (NI), routing nodes and links. The OCNOC router implements the routing algorithm with wormhole switching techniques which reduces the network latency and depth of buffers.



Fig 2: 2D network topology

Mesh network is a family of multistage networks, applied to build scalable multiprocessors with thousands of nodes in macro systems. A three-stage mesh network is defined as c(m,n,p), where n represents the number of inputs for p of first-stage switches and m is the number of second-stage switches. In order to provide a parallelism degree of 16 for practical MPSoCs, a topology is proposed to use c(4,4,4) for the designed network. This network has a readjust able property that can realize all possible alternatives between its input and outputs.

The choice of the three-stage Mesh network with a reserved number of middle-stage switches is to minimize implementation cost.



Fig 3: 2D and 3D mesh networks

One advantage of 3D NoC architectures is the 3D fabrication will be simpler compared to 2D architectures. The 3D mesh NoC implements the deterministic XYZ routing algorithm with wormhole technique.

# III. ROUTER INTERFACE

The router interface with full duplex communication supports bidirectional links. There is only one source router and destination router and again these routers have input and output ports respectively.



Fig 4: control and data signals

The signals that are present in both input and output ports are Clock, creditin and creditout, dataIn and dataOut respectively.

#### IV. FSM BASED ARBITRATION ALGORITHMS

The arbitration consists of three different schemes.

- (i) Round robin priority scheme.
- (ii) weighted/dynamic priority scheme.
- (iii) Fixed priority scheme.

Round robin scheme is mostly used algorithm in computing in networks. The time slices are divided to each process in equal blocks with circular order, handles all processes same. It is easy to implement, simple and free from starvation.

In dynamic priority algorithm, all devices will have an opportunity that place the request for grant to communicate with other device.

In fixed priority scheme, the processor executes the highest priority task of all the tasks that are ready to execute. By using the above algorithms, the FSM with four states is presented below.



Fig 5: FSM diagram

The four states of router are

S0: This is the initialize state form where the router starts its processes and switches to next stage after a clock cycle.

S1: As the input router sends the packets, this stage waits for the packet to send it to destination port.

S2: Here the packet destination address is verified with routing address. If the destination port is free, then the switching is done and moves to s3 else it moves to s1 for re-switching.

S3: In this stage both the flits delivering and switching process is done by clearing the incoming port. After that it moves to S1 for next switching and requests.

#### V. DIFFERENT ARBITER SCHEMES

The two approaches are divided by port availability and pack destination.

**1. Centralized scheme:** From a single arbiter, the switching requests from input to destination are done. The pack is passed to destination in one cycle if request is granted.



Fig 6: Arbitration circuit

**2. Distributed Scheme:** The only difference occurs between distributed and centralized scheme is, the priority based switching requests with three arbiters.



Fig 7: Distributed scheme

# VI. RESULTS

#### **Simulation Result of OCP**



#### **Schematic View of OCP**



# VII. CONCLUSION

The on chip NoC is designed using verilog language and simulated in Xilinx tool. By using Mesh Network Topology we can operate network at a range of 100MHz frequency with a bandwidth of 30Gbps approximately.

By using the circuit-switching approach, combined with dynamic path-setup scheme under a Mesh network topology, the proposed design offers an arbitrary traffic permutation in runtime with compact implementation overhead. By using Circuit Switching technique a dedicated path delay from source Node to Destination Node exists.

# VIII. REFERENCES

- 1. L. Benini and G. De Micheli, "Networks on chips: A new SoC paradigm," IEEE Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002.
- 2. K. Goossens, J. Dielissen, and A. Radulescu, "Æthereal network on chip: Concepts, architectures, and implementations," IEEE Des. Test. Comput., vol. 22, no. 5, pp. 414–421, 2005.
- 3. S. Borkar, "Thousand core chips—A technology perspective," in Proc.ACM/IEEE Design Autom. Conf. (DAC), 2007, pp. 746–749.
- 4 P.-H. Pham, P. Mau, and C. Kim, "A 64-PE folded-torus intra-chip communication fabric for guaranteed throughput in network-on-chip based applications," in Proc. IEEE Custom Integr. Circuits Conf. (CICC), 2009, pp. 645–648.
- 5 C. Neeb, M. J. Thul, and N. Wehn, "Network-on-chip-centric approach to interleaving in high throughput channel decoders," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2005, pp. 1766–1769.
- 6. H. Moussa, A. Baghdadi, and M. Jezequel, "Binary de Bruijn on-chip network for a flexible multiprocessor LDPC decoder," in Proc. ACM/IEEE Design Autom. Conf. (DAC), 2008, pp. 429–434.
- 7. H. Moussa, O. Muller, A. Baghdadi, and M. Jezequel, "Butterfly and Benes-based on-chip communication networks for multiprocessor turbo decoding," in Proc. Design, Autom. Test in Euro. (DATE), 2007, pp. 654–659.
- 8. S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y.Hoskote, N. Borkar, and S. Borkar, "An 80-tile sub-100-w TeraFLOPS processor in 65-nm CMOS," IEEE J. Solid-State Circuits, vol. 43, no.1, pp. 29–41, Jan. 2008.
- 9. W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks:. San Francisco, CA: Morgan Kaufmann, 2004.