Universal switch blocks for three-dimensional FPGA design

G.-M. Wu, M. Shyu and Y.-W. Chang

Abstract: The authors consider the switch-block design problem for three-dimensional FPGAs. A three-dimensional switch block M with W terminals on each face is said to be universal if every set of nets satisfying the dimension constraint (i.e. the number of nets on each face of M is at most W) is simultaneously routable through M. A class of universal switch blocks for three-dimensional FPGAs is presented. Each of the switch blocks has 15W switches and switch-block flexibility 5 (i.e. \( F_S = 5 \)). It is proved that no switch block with less than 15W switches can be universal. The proposed switch blocks are compared with others of the topology associated with those used in the Xilinx XC4000 FPGAs. Experimental results demonstrate that the proposed universal switch blocks improve routability at the chip level. Further, the decomposition property of a universal switch block provides a key insight into its layout implementation with a smaller silicon area.

1 Introduction

A conventional field-programmable gate array (FPGA) (Fig. 2a) consists of an array of logic blocks that can be connected by routing resources [1]. The logic blocks contain circuits used to implement logic functions. The routing resources consist of wire segments and switch blocks. Figure 1b illustrates a switch block in which the programmable switches, denoted by dashed lines between terminals, are shown. There are many reports on four-sided switch blocks [2-8].

A three-dimensional FPGA architecture (Fig. 2a) is a generalisation based on the conventional 2-D FPGA; it stacks a number of 2-D FPAG blocks together by MCM fabrication techniques [9, 10], where each logic block has six adjacent neighbours, as opposed to four in the 2-D case [9]. The 3-D switch blocks are not the same as the conventional switch blocks (Fig. 2b). Each switch block is connected with six adjacent switch blocks. Therefore, they enable each channel segment to connect to some subset of the channel segments incident on the other five faces of the 3-D switch block. This unique architecture motivates our study of the 3-D switch blocks.

A three-dimensional switch block M with W terminals on each face is said to be universal if every set of nets satisfying the dimension constraint (i.e. the number of nets on each face of M is at most W) is simultaneously routable through M. This paper presents a class of universal switch blocks for three-dimensional FPGAs. Each switch block has 15W switches and switch-block flexibility 5 (i.e. \( F_S = 5 \)). We prove that no switch block with less than 15W switches can be universal. We also compare the proposed switch blocks with others of the topology associated with those used in the Xilinx XC4000 FPGAs. Experimental results demonstrate that the universal switch blocks improve routability at the chip level.

2 Switch-block modelling

This Section presents the modelling for 3-D switch blocks and their routing. It is shown that the 3-D switch-block design problem can be transformed into the six-sided one. A three-dimensional switch block is a cubic block with W terminals on each face of the block. The size of the 3-D switch block is referred W. Some pairs of terminals, on different faces of the block, may have programmable switches and thus can be connected by programming the switches to be ‘ON’. We represent a 3-D switch block by \( M_{3D}(T, S) \), where T is the set of terminals, and S the set of switches. Let the faces \( F_1, F_2, F_3, F_4, F_5 \), and \( F_6 \) represent the front, rear, left, right, top, and bottom faces, respectively (Fig. 3). Label the terminals \( t_1, t_2, ..., t_W \), \( t_{2,1}, t_{2,2}, ..., t_{2,6} \), \( t_{3,1}, t_{3,2}, ..., t_{3,6} \), \( t_{4,1}, t_{4,2}, ..., t_{4,6} \), \( t_{5,1}, t_{5,2}, ..., t_{5,6} \), and \( t_{6,1}, t_{6,2}, ..., t_{6,6} \) starting from the terminals on \( F_1 \) to those on \( F_6 \). Let \( T(F) = \{ t_{1,1}, ..., t_{1,W} \} \) (front terminals), \( T(H) = \{ t_{2,1}, ..., t_{2,W} \} \) (rear terminals), \( T(L) = \{ t_{3,1}, ..., t_{3,W} \} \) (left terminals), \( T(R) = \{ t_{4,1}, ..., t_{4,W} \} \) (right terminals), \( T(T) = \{ t_{5,1}, ..., t_{5,W} \} \) (top terminals), and \( T(B) = \{ t_{6,1}, ..., t_{6,W} \} \) (bottom terminals). Figure 3b shows the labelling of the terminals on \( F_1 \). Therefore, \( S = \{ t_{i,j}, t_{j,i} \} \) there exists a programmable switch between \( t_{ij} \) and \( t_{ji} \), and \( T = \cup_{i \in [1,6]} \cup_{j \in [1,6]} \{ t_{ij}, t_{ji} \} \). For convenience, we often refer to a switch block \( M_{3D}(T, S) \) simply as \( M_{3D} \) omitting \( T \) and \( S \), if there is no ambiguity about \( T \) and \( S \), or \( T \) and \( S \) are not of concern in the context.

A hexagonal switch block (HSB) is a six-sided switch block with V terminals on each side of the block. We say that the HSB is of size \( V \). We represent an HSB by \( M_{3D}(T_{HS}, S_{HS}) \), where \( T_{HS} \) is the set of terminals and \( S_{HS} \) the set of programming switches. Label the terminals \( t_{1,1}, t_{1,2}, ..., t_{1,3}, t_{2,1}, t_{2,2}, ..., t_{2,3}, t_{3,1}, t_{3,2}, ..., t_{3,3} \) starting from the rightmost terminal on the bottom side and proceeding clockwise (Fig. 3c). Let \( T_{HS}(i) = \{ t_{i,1}, ..., t_{i,3} \}, \)
where $i = 1, 2, 3, 4, 5,$ or 6. Therefore, $S_b = \{ (t_{m,n,t_{ax}}) \}$ there exists a programmable switch between terminal $t_{m,ax}$ and terminal $t_{n,ay}$, where $m \neq u, m,u = 1,2, \ldots, 6, n, v = 1,2, \ldots, V,$ and $T_b = \cup T_b(i)$, where $i = 1, 2, \ldots, 6$. For convenience, we often refer to $M_d(T_b, S_b)$ simply as $M_b$, omitting $T_b$ and $S_b$, if there is no ambiguity about $T_b$ and $S_b$, or $T_b$ and $S_b$ are not of concern in the context.

In the following, we transform the design problem for the 3-D switch blocks into that for the HSBs. For convenience, we modify the terminology isomorphism used in [3] as follows. Let $M(T, S)$ ($M'(T', S')$) be a 3-D or a hexagonal switch block. We have the following definition.

**Definition 1:** Two switch blocks $M(T, S)$ and $M'(T', S')$ are isomorphic if there exists a bijection $f:T \rightarrow T'$ such that $(t_{m,n,t_{ax}}) \in S$ if and only if $(f(t_{m,n}), f(t_{ax})) \in S'$ and, for any two terminals $t_{m,n}$ and $t_{ax}$, $t_{m,n}, t_{ax} \in T$ if and only if $f(t_{m,n}), f(t_{ax}) \in T'$.

In other words, $M(T, S)$ and $M'(T', S')$ are isomorphic if we can relabel the terminals of $M$ to be the terminals of $M'$, maintaining the corresponding switches in $M$ and $M'$; and for terminals on the same side (face) of $M$, their corresponding terminals are also on the same side (face) of $M'$. For any two isomorphic switch blocks, we have the following theorems.

**Theorem 1:** [3] Any two isomorphic switch blocks have the same routing capacity.

**Theorem 2:** For any $M_d$ of size $W$, there exists an $M_h$ of the same size such that $M_d$ and $M_h$ are isomorphic, and vice versa.

**Proof:** For an $M_d(T, S)$ of size $W$, we can construct an $M_d(S, T_d)$ of the same size such that $(t_{m,n,t_{ax}}) \in S$ if $t_{m,n} \in S$, where $m \neq u, m,u = 1,2, \ldots, 6$ and $n,v = 1,2, \ldots, V$. Let the mapping function $f:T \rightarrow T_d$ be $f(t_{m,n}) = t_{ax}$. Obviously, $(t_{m,n,t_{ax}}) \in S$ if and only if $(f(t_{m,n}), f(t_{ax})) \in S$. Therefore, by definition 1, $M_d$ and $M_h$ are isomorphic. For an $M_d(S, T_d)$ of size $W$, we can construct an $M_d(S, T)$ of the same size such that $(t_{m,n}, t_{ax}) \in S$ if $(t_{m,n}, t_{ax}) \in S$, where $m \neq u, m,u = 1, \ldots, 6$ and $n,v = 1,2, \ldots, V$. Similarly, there exists the bijection $f':T_d \rightarrow T$ such that $f(t_{m,n}) = t_{ax}$ and $(t_{m,n}, t_{ax}) \in S$ and only if $(f(t_{m,n}), f(t_{ax})) \in S$. Therefore, $M_d$ and $M_h$ are isomorphic.

By theorems 1 and 2, the design problem for the 3-D switch block is equivalent to that for the six-sided switch

![Fig. 1 Conventional FPGA and its switch block](image1)

a) Conventional FPGA architecture

b) Conventional four-sided switch block

![Fig. 2 3-D FPGA and switch block](image2)

a) 3-D FPGA

b) 3-D switch block

![Fig. 3 3-D switch block and corresponding six-sided switch block](image3)

a) Model of 3-D switch block

b) One face of 3-D switch block and its terminals

c) Corresponding six-sided switch block.

50

block. Therefore, we shall focus on the six-sided switch block in the rest of the paper. We can classify all connections passing through a switch block into a number of categories. For an HSB, connections can be of 15 types. See Fig. 4 for the type definition.

![Fig. 4 Fifteen types of connection in an HSB](image)

A routing requirement vector (RRV) \( n \) for an HSB is a 15-tuple \((n_1,2, \ldots, n_{1,6}, n_{2,3}, \ldots, n_{2,6}, n_{3,4}, \ldots, n_{3,6}, n_{4,5}, n_{4,6}, n_{5,6})\), where \( n_{ij} \) is the number of type-\((i,j)\) connections required to be routed through an HSB, \(0 \leq n_{ij} \leq V, i, j = 1,2, \ldots, 6, i \neq j\). An RRV \( n \) is said to be routable on an HSB \( M_b \) if there exists a routing for \( n \) on \( M_b \).

The routing capacity of a switch block \( M \) is referred to as the number of distinct routable vectors on \( M \); that is, the routing capacity of \( M \) is the cardinality \( |\{n | n \text{ is routable on } M\}| \). A switch block \( M \) with \( V \) terminals on each side is called ‘universal’ if every set of nets satisfying the dimension constraint (i.e. the number of nets on each side of \( M \) is at most \( V \)) can simultaneously be routed through \( M \). We have the following definition.

**Definition 2:** An HSB \( M_b \) of size \( V \) is called universal if the following set of inequalities is the necessary and sufficient conditions for an RRV \( n = (n_1,2, \ldots, n_{1,6}, n_{2,3}, \ldots, n_{2,6}, n_{3,4}, \ldots, n_{3,6}, n_{4,5}, n_{4,6}, n_{5,6}) \) to be routable on \( M_b \):

1. \( n_{1,2} + n_{1,3} + n_{1,4} + n_{1,5} + n_{1,6} \leq V \)
2. \( n_{1,2} + n_{2,3} + n_{2,4} + n_{2,5} + n_{2,6} \leq V \)
3. \( n_{1,3} + n_{2,3} + n_{3,4} + n_{3,5} + n_{3,6} \leq V \)
4. \( n_{1,4} + n_{2,4} + n_{3,4} + n_{4,5} + n_{4,6} \leq V \)
5. \( n_{1,5} + n_{2,5} + n_{3,5} + n_{4,5} + n_{5,6} \leq V \)
6. \( n_{1,6} + n_{2,6} + n_{3,6} + n_{4,6} + n_{5,6} \leq V \)

We refer to the dimension constraint as the set of inequalities which characterises a six-sided universal switch block of size \( V \). Therefore, the dimension constraint for an HSB is the set of (1)-(6) listed in definition 2. Note that the number of nets routed through each side of a switch block cannot exceed \( V \); therefore, a universal switch block has the maximum routing capacity.

### 3 Universal switch blocks

In this Section, we present an algorithm for constructing symmetric HSBs and it is proved that symmetric HSBs are universal. The symmetric HSB of size \( V \) has only 15 \( V \) switches. We prove that no HSB with less than 15 \( V \) switches can be universal. Based on isomorphism operations (theorem 1), we can identify a whole class of universal switch blocks.

#### 3.1 Symmetric switch blocks

**Algorithm Symmetric_Switch_Block**(V)

**Input:** \( V \) size of the hexagonal switch block.

**Output:** \( M_b(T_h, S_b) \) the hexagonal switch block of size \( V \);

- \( T_h \): set of terminals; \( S_b \): set of switches.

/* See Fig. 3c for the terminal labeling. */

1. \( T_h \leftarrow \{t_{i,j} | \forall i = 1,2, \ldots, 6, \forall j = 1,2, \ldots, V; \}
2. \( S_b \leftarrow \emptyset; \)
3. for \( k = 1 \) to \( \lfloor \frac{V}{2} \rfloor \) do
   4. for \( i = 1 \) to \( 6 \) do
   5. for \( j = 1 \) to \( 6 \) do
   6. \( S_b \leftarrow S_b \cup \{(t_{i,k} \cup t_{j,k+1})\}, i \neq j; \)
7. if \( V \) is odd
8. for \( i = 1 \) to \( 6 \) do
9. for \( j = 1 \) to \( 6 \) do
10. \( S_b \leftarrow S_b \cup \{(t_{i,j} \cup t_{j+i})\}, i \neq j; \)
11. **Output:** \( M_b(T_h, S_b) \).

**Fig. 5 Algorithm for constructing a six-sided symmetric switch block of size \( V \)**

as the ‘symmetric switch block’. Figure 6 shows two examples of symmetric switch blocks. For a symmetric switch block, it has the flexibility \((F_3)\) of 5; thus, the total number of switches in the symmetric switch block size \( V \) is

\[
6 \times V \times F_3 = 3 \times V \times 5 = 15 V
\]

For a symmetric switch block with an even \( V \) terminals on each side, it can be partitioned into \( V/2 \) sub-blocks of size two; and for an odd \( V \), it can be partitioned into \( \lceil V/2 \rfloor \) sub-blocks of size two and one sub-block of size one. Thus, we have the following property. A symmetric switch block of size \( V \) can be partitioned into \( \lceil V/2 \rfloor \) symmetric sub-blocks of size two and \((V \mod 2)\) symmetric sub-block of size one. (We call this property the decomposition property.)

Note that these sub-blocks are not interacting with each other; thus, each sub-block can be considered independently.

**Fig. 6 Two symmetric hexagonal switch blocks**

\( a \) Symmetric HSB of \( V = 3 \)
\( b \) Symmetric HSB of \( V = 2 \)

#### 3.2 Proof of universality

In this subsection, it is proved that the symmetric switch blocks constructed by algorithm Symmetric_Switch_Block are universal. To show that the symmetric switch blocks are universal, we first prove that the symmetric HSBs of size two are universal.
Lemma 1: The HSB $M_k$ of size two constructed by algorithm Symmetric Switch Block is universal.

Proof: By definition 2, we must prove that $n$ is routable on $M_k$ if and only if the following inequalities are simultaneously satisfied:

$$
\begin{align*}
\text{(7)} & \quad n_{1,1} + n_{1,3} + n_{1,4} + n_{1,6} + n_{1,8} \leq 2 \\
\text{(8)} & \quad n_{2,1} + n_{2,3} + n_{2,4} + n_{2,6} + n_{2,8} \leq 2 \\
\text{(9)} & \quad n_{3,1} + n_{3,3} + n_{3,4} + n_{3,6} + n_{3,8} \leq 2 \\
\text{(10)} & \quad n_{4,1} + n_{4,3} + n_{4,4} + n_{4,6} + n_{4,8} \leq 2 \\
\text{(11)} & \quad n_{5,1} + n_{5,3} + n_{5,4} + n_{5,6} + n_{5,8} \leq 2 \\
\text{(12)} & \quad n_{6,1} + n_{6,3} + n_{6,4} + n_{6,6} + n_{6,8} \leq 2
\end{align*}
$$

(II) It is not difficult to identify all of the RRVs satisfying (7)–(12). (In fact, there are 2578 such RRVs.) We verify the RRVs and conclude that they can all successfully be routed on the HSB of size two constructed by algorithm Symmetric Switch Block. In fact, based on the work in [11], we need to check only the RRVs in the corresponding dominating set (see [11] for the definition of dominating sets). The key insight is that the two terminals, say terminals $b$ and $c$, which connect to a terminal, say $a$, do not share any switch (Fig. 7b); thus the connections associated with them are non-interacting, except those associated with $a$.

(Only if) For an HSB $M_6$ of size two, the total number of connections routed through each side of $M_6$ cannot exceed two. Hence, if $n$ is routable on $M_6$, the six inequalities must be satisfied.

Let $U_V$ denote the set of RRVs which satisfy the dimension constraint for an HSB of size $V$. An RRV $\gamma \in U_V$ is called a maximal RRV (MRRV) if there exists no other RRV in $U_V$ that dominates $\gamma$. In the following, we show that all RRVs in $U_V$ can be decomposed into $U_{V-2}$ and $U_2$.

Similarly, we need to check only the RRVs in the corresponding dominating set (i.e. MRRVs). We have the following lemma.

Lemma 2: When an MRRV $\gamma \in U_V$ is routed on an HSB, all unused terminals, if any, must be on the same side and the number of unused terminals $\phi_{\text{unused}} = 2c$, $0 \leq c \leq \lfloor V/2 \rfloor$, $c \in \mathbb{Z}$.

Proof: If there are two unused terminals on different sides (say sides $i$ and $j$, $i < j$), we can increase $\gamma$ by one without violating the dimension constraint, implying that $\gamma$ is not maximal: a contradiction. Hence, all unused terminals, if any, must be on the same side. Note that the total number of terminals is $\phi_{\text{total}} = 6 V$, an even number. Assume that there are $\phi_{\text{unused}}$ used terminals. Obviously, $\phi_{\text{unused}}$ is even since each switch is incident on two terminals. Also, $\phi_{\text{unused}} = \phi_{\text{total}} - \phi_{\text{used}} \leq V$ since all unused terminals, if any, must be on the same side. Since $\phi_{\text{total}}$ and $\phi_{\text{used}}$ are even numbers and $0 \leq \phi_{\text{unused}} \leq V$, $\phi_{\text{unused}} = \phi_{\text{total}} - \phi_{\text{used}} = 2c$, $0 \leq c \leq \lfloor V/2 \rfloor$, $c \in \mathbb{Z}$.

Consider the MRRVs in $U_V$. By lemma 2, we can classify the MRRVs into two types. One is that all terminals are used (i.e. $\phi_{\text{unused}} = 0$), and we call an MRRV of this type a ‘complete MRRV’. The other is that an even number of terminals on the same side are unused (i.e. $\phi_{\text{unused}} = 2c$, $c \in \mathbb{Z}$), and we call an MRRV of this type a ‘degenerate complete MRRV’.

To show that an MRRV in $U_V$ can be decomposed into $U_{V-2}$ and $U_2$, we first construct a multiple graph and a weighted graph for the MRRV as follows: for any MRRV we construct a multiple graph $G_{\text{mul}}(V_m, E_m)$, where $V_m = \{v_1, v_2, \ldots, v_6\}$. If $n_{ij} = 1$, construct an edge between $v_i$ and $v_j$ with weight 1; if $n_{ij} \geq 2$, construct two edges between $v_i$ and $v_j$ with total weights equal to $n_{ij}$. (We call the two edges a ‘multi-edge’.) We induce a weighted graph $G_{\text{mul}}(V_m, E_m)$ from $G_{\text{mul}}(V_m, E_m)$ by substituting a weighted edge for a multi-edge. Figures 8b and 8c show a multiple graph $G_{\text{mul}}$ for

**Fig. 7** Two symmetric HSBs and their sub-blocks

a Decomposition of symmetric HSB of $V = 4$

b Decomposition of symmetric HSB of $V = 3$

52 IEE Proc.-Circuits Devices Syst., Vol. 151, No. 1, February 2004
exists an Weighted graph Fig. 8 Routing instance and corresponding multiple and weighted graphs

We define MRRV and connections associated with a complete MRRV for a side of an HSB, an edge e ∈ E_o represents a type of connection between two sides of the HSB, and weight(e) denotes the number of connections of the type associated with e.

Let C_k denote a connected component of k vertices in G_m. We have the following lemma.

Lemma 3: For a weighted graph G_m associated with a complete MRRV, there exists no isolated vertex in G_m and for k ≥ 3, C_k contains no degree-one vertex.

Proof: There exists no isolated vertex in G_m, since the total connections associated with a complete MRRV for a side of an HSB must be equal to V. Suppose there exists a G_m(V_m, E_m) (associated with a complete MRRV) for an HSB of size V with a connected component C_k and for k ≥ 3, C_k contains a degree-one vertex v_i. Let v_i connect to a vertex v_j by an edge e_{ij} = (v_i, v_j). Since G_m is associated with a complete MRRV and v_i, v_j is a degree-one vertex. The total number of connections associated with v_i, weight (e_{ij}) is equal to the dimension constraint V. Further, the total number of connections associated with a vertex can be V at most; since weight (e_{ij}) = V, v_i only connects to v_j. Hence, the degree of v_i must also equal one, and v_i and v_j form a C_2. A contradiction. Therefore, there exists no C_k (k ≥ 3) with degree-one vertex in G_m.

A hamiltonian subcycle of a multiple graph G_m(V_m, E_m) is a simple cycle that contains a subset of vertices in V_m. Two hamiltonian subcycles in a multiple graph are called independent if no vertex is shared by two subcycles. A multiple graph G_m(V_m, E_m) is said to be sub-hamiltonian if it contains a set of independent hamiltonian subcycles and all vertices in V_m are on the subcycles; otherwise, it is non-sub-hamiltonian. Also, we call a weighted graph sub-hamiltonian if its associated multiple graph is sub-hamiltonian. For three RRVs u, v, and x, we say u is a sub-RRV of v if there exists an x such that v = u + x. Let n_1 be a sub-RRV of n. We define Ω and Ω’ as follows:

\[ \Omega = \{ n | n \in U \} \]
\[ \Omega' = \{ n | n = n_1 + n_2, n_1 \in U \text{ and } n_2 \in U \} \]

where V ≥ 2

We have the following lemmas.

Lemma 4: If the multiple graph of an MRRV n is sub-hamiltonian, n has a complete sub-RRV n_2 ∈ U_2.

Proof: Let G_m be a sub-hamiltonian graph associated with an MRRV n. G_m has a set of independent hamiltonian subcycles S = {c_1, c_2, ..., c_k} that covers all vertices in G_m. We can choose a sub-RRV n_2 of n as follows. For any cycle c_i ∈ S, \( c_i = < v_{i_1}, v_{i_2}, ..., v_{i_m}, v_{i_1} > \). If k = 2, let

\[ n_2(v_i, v_j) = 2; \text{ otherwise, } n_2(v_i, v_j) = \ldots = n_2(v_i, v_j) = 1. \]

We traverse G_m based on S. S includes independent hamiltonian subcycles which contain all vertices in G_m. All vertices will be visited one time. A vertex in G_m corresponds to one side of an HSB associated with n. Every vertex contributes two degrees in S. Thus, the number of connections with one side of the HSB is equal to 2. n_2 satisfies the following inequalities:

\[ n_2(s_1(s_1 + 1)/2) \leq \frac{V}{2} \]

Thus, n_2 ∈ U_2 and it is a complete MRRV.

Lemma 5: (RRV decomposition property) \( \Omega = \Omega' \).

Proof: First, we show that \( \Omega' \subseteq \Omega \). By definition 2, an RRV n_1 ∈ U_{V'-2} if and only if the following inequalities are simultaneously satisfied:

\[ n_1(s_1) + n_1(s_1 + 1) \leq \frac{V}{2} \quad (13) \]

\[ n_1(s_1) + n_1(s_1 + 1) \leq \frac{V}{2} \quad (14) \]

\[ n_1(s_1) + n_1(s_1 + 1) \leq \frac{V}{2} \quad (15) \]

\[ n_1(s_1) + n_1(s_1 + 1) \leq \frac{V}{2} \quad (16) \]

\[ n_1(s_1) + n_1(s_1 + 1) \leq \frac{V}{2} \quad (17) \]

\[ n_1(s_1) + n_1(s_1 + 1) \leq \frac{V}{2} \quad (18) \]

and for an RRV n_2 ∈ U_2, the following inequalities are simultaneously satisfied:

\[ n_2(s_1(s_1 + 1)/2) \leq \frac{V}{2} \quad (19) \]

\[ n_2(s_1(s_1 + 1)/2) \leq \frac{V}{2} \quad (20) \]

\[ n_2(s_1(s_1 + 1)/2) \leq \frac{V}{2} \quad (21) \]

\[ n_2(s_1(s_1 + 1)/2) \leq \frac{V}{2} \quad (22) \]

\[ n_2(s_1(s_1 + 1)/2) \leq \frac{V}{2} \quad (23) \]

\[ n_2(s_1(s_1 + 1)/2) \leq \frac{V}{2} \quad (24) \]

Let n = n_1 + n_2, n_1 = n_1(s_1) + n_1(s_1 + 1), n_2 = n_2(s_1(s_1 + 1)/2), n_3 = n_3(s_1), n_4 = n_4(s_1), ..., n_6 = n_6(s_1), n_7 = n_7(s_1), n_8 = n_8(s_1). Combining (13) and (19), (14) and (20), (15) and (21), (16) and (22), (17) and (23), and (18) and (24), we obtain (25)-(30)

\[ n_1 + n_1 + n_1 + n_1 + n_1 + n_1 + n_1 + n_1 \leq V \quad (25) \]

\[ n_2 + n_2 + n_2 + n_2 + n_2 + n_2 + n_2 + n_2 \leq V \quad (26) \]

\[ n_3 + n_3 + n_3 + n_3 + n_3 + n_3 + n_3 + n_3 \leq V \quad (27) \]

\[ n_4 + n_4 + n_4 + n_4 + n_4 + n_4 + n_4 + n_4 \leq V \quad (28) \]

\[ n_5 + n_5 + n_5 + n_5 + n_5 + n_5 + n_5 + n_5 \leq V \quad (29) \]

\[ n_6 + n_6 + n_6 + n_6 + n_6 + n_6 + n_6 + n_6 \leq V \quad (30) \]

Thus, n ∈ U_V and we have Ω ⊆ Ω.

Next, we show that Ω ⊆ Ω'. It suffices to show that each MRRV n ∈ Ω is in Ω'. By lemma 2, all unused terminals for routing an MRRV on an HSB are on the same side, and the number of unused terminals is even. Without loss of
generality, assume that all unused terminals, if any, are on side 1 and the number of unused terminals is equal to \( c_1 \), \( 0 \leq c_1 \leq \lfloor V/2 \rfloor \). By definition 2, an MRRV \( n \in \Omega \) if and only if the following equalities are simultaneously satisfied:

\[
\begin{align*}
&n_{1,2} + n_{1,3} + n_{1,4} + n_{1,5} + n_{1,6} = V - 2c_1 \\
n_{1,2} + n_{2,3} + n_{2,4} + n_{2,5} + n_{2,6} = V \\
n_{1,3} + n_{3,4} + n_{3,5} + n_{3,6} = V \\
n_{1,4} + n_{4,5} + n_{4,6} + n_{5,6} = V \\
n_{1,5} + n_{5,6} + n_{3,5} + n_{4,6} + n_{5,6} = V \\
n_{1,6} + n_{2,6} + n_{3,6} + n_{4,6} + n_{5,6} = V
\end{align*}
\]

Algorithm RRV-Decomposition listed in Fig. 9 shows a method to decompose \( n \in \Omega \) into \( n_1 \) and \( n_2 \), where \( n_1 \in U_1 \) and \( n_2 \in U_2 \). It derives \( n \) based on the two cases: (i) \( n \) is a complete MRRV, and (ii) \( n \) is a degenerate complete MRRV.

**Algorithm: RRV-Decomposition (n)**

**Input:** \( n \) : An MRRV in \( \Omega \).

**Output:** \( n_1, n_2 \) : MRRVs such that \( n = n_1 + n_2 \), \( n_1 \in U_1 \), and \( n_2 \in U_2 \).

1. Lines 1-7 construct a multiple graph \( G_{n_1}(V_{n_1}, E_{n_1}) \).
2. Lines 8-12 decompose \( G_{n_1} \) into \( n_1 \) and \( n_2 \).
3. Lines 13-16 process the case where \( n \) is a degenerate complete MRRV.

**Fig. 9** Algorithm for RRV decomposition, assuming that all unused terminals, if any, are on side 1

We first consider the case where \( n \) is a complete MRRV. Let \( G_n \) be a weighted graph of \( n \) and \( C_k \) be a connected component with \( i \) vertices in \( G_n \). By lemma 3, there exists no isolated vertex or any \( C_k, k \geq 3 \), with a degree-one vertex in \( G_n \). Thus, the number of vertices in \( G_k \) could only be 2, 3, 4, or 6.

We classify all weighted graphs for complete MRRVs into four categories \( x, \beta, \gamma \), and \( \delta \), listed in Table 1. (Note that the weighted graphs, except \( C_2 \), contain no isolated vertex or degree-one vertex.) Categories \( x, \beta, \gamma \), and \( \delta \) represent the cases where \( G_n \) consists of three \( C_2 \)'s, one \( C_2 \) and one \( C_4 \), two \( C_2 \)'s, and one \( C_4 \), respectively. The total number of weighted graphs with complete MRRVs are 56, and twelve of them are illegal, which can be verified by checking if the total edge weights of the graphs equal 3 \( \ell \). (Note that all \( V \) terminals are used for a complete MRRV, and a connection is incident on two terminals.)

**Fig. 10** Weighted graphs for two complete MRRVs

\( a \) Illegal weighted graph; its total weight is equal to \( 2V \)

\( b \) Legal weighted graph with a hamiltonian subcycle \( c_1 = \langle v_1, v_2, \ldots, v_6, v_1 \rangle \)

![Diagram](image)

**Table 1:** Number of weighted graphs for complete MRRVs

<table>
<thead>
<tr>
<th>Category</th>
<th>Number of weighted graphs</th>
<th>Number of illegal graphs</th>
<th>Number of legal graphs</th>
</tr>
</thead>
<tbody>
<tr>
<td>( x )</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>( \beta )</td>
<td>3</td>
<td>0</td>
<td>3</td>
</tr>
<tr>
<td>( \gamma )</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>( \delta )</td>
<td>52</td>
<td>12</td>
<td>40</td>
</tr>
<tr>
<td>Total</td>
<td>56</td>
<td>12</td>
<td>44</td>
</tr>
</tbody>
</table>

Authorized licensed use limited to: National Taiwan University. Downloaded on January 14, 2009 at 01:49 from IEEE Xplore. Restrictions apply.
Obviously, \( n_i \in U_{V,2} \). Applying similar techniques, we can show that all multiple graphs associated with degenerate complete MRRVs are sub-hamiltonian. Thus, \( \Omega \subseteq \Omega' \). Based on the above discussion, we have \( \Omega' = \Omega \).

Two sub-blocks are said to be ‘independent’ if they do not share any terminals. By lemma 5, independent switch sub-blocks can be considered individually and merged to form a larger switch block, and the routable RRVs for the merged switch block are identical to those given by applying vector addition operations on the routable RRVs for the independent switch sub-blocks. With the decomposition properties of symmetric switch blocks and RRVs, we can prove the following theorem by using mathematical induction.

**Theorem 3:** The symmetric switch blocks are universal.

**Proof:** Using definition 2, we shall show that, for an HSB \( M_n \) of size \( V \) constructed by algorithm Symmetric Switch, Block, \( n \) is routable on \( M_n \) if and only if the following inequalities are simultaneously satisfied:

\[
\begin{align*}
    n_{1,2} + n_{1,3} + n_{1,4} + n_{1,5} + n_{1,6} &\leq V \\
    n_{1,2} + n_{2,3} + n_{2,4} + n_{2,5} + n_{2,6} &\leq V \\
    n_{1,3} + n_{2,3} + n_{3,4} + n_{3,5} + n_{3,6} &\leq V \\
    n_{1,4} + n_{2,4} + n_{3,4} + n_{4,5} + n_{4,6} &\leq V \\
    n_{1,5} + n_{2,5} + n_{3,5} + n_{4,5} + n_{5,6} &\leq V \\
    n_{1,6} + n_{2,6} + n_{3,6} + n_{4,6} + n_{5,6} &\leq V 
\end{align*}
\]

For the HSBs constructed by the algorithm, we have the following key observations (Fig. 7). For an HSB of an even \( V \), we can partition it into \( \lfloor V/2 \rfloor \) non-interacting sub-blocks (shown in Fig. 7a); each sub-block has the same topology as that of \( \beta \) shown in Fig. 7a. For an HSB of an odd \( V \), we can partition it into \( \lfloor V/2 \rfloor \) non-interacting sub-blocks, with each of the \( \lfloor V/2 \rfloor \) sub-blocks identical to \( \beta \) and one sub-block with a clique topology formed by the six terminals from the middle of each side (Fig. 7b). Because terminals in different sub-blocks are non-interacting, each sub-block can be considered independently (lemma 5). Therefore, each symmetric HSB consists of \( \lfloor V/2 \rfloor \) independent universal switch sub-blocks of size two, and each of size one if \( V \) is odd. Further, by lemma 1, an HSB of size two constructed by algorithm Symmetric Switch Block is universal. (If) For an even \( V \), by lemma 5, we can decompose a vector \( n = (n_{i,2}, n_{i,3}, \ldots, n_{i,6}, n_{j,2}, n_{j,3}, \ldots, n_{j,6}) \) into \( \lfloor V/2 \rfloor \) non-interacting sub-RRVs \( n_i = (n_{i,1,2}, n_{i,1,3}, \ldots, n_{i,1,6}, n_{i,2,3}, \ldots, n_{i,2,6}, n_{i,3,4}, \ldots, n_{i,3,6}, n_{i,4,5}, n_{i,4,6}, n_{i,5,6}),\) \( n_j = (n_{j,1,2}, n_{j,1,3}, \ldots, n_{j,1,6}, n_{j,2,3}, \ldots, n_{j,2,6}, n_{j,3,4}, \ldots, n_{j,3,6}, n_{j,4,5}, n_{j,4,6}, n_{j,5,6}),\) where \( i = 1, 2, \ldots, \lfloor V/2 \rfloor, \) such that each sub-RRV satisfies the following set of inequalities:

\[
\begin{align*}
    n_{i,1,2} + n_{i,1,3} + n_{i,1,4} + n_{i,1,5} + n_{i,1,6} &\leq V \\
    n_{i,1,2} + n_{i,2,3} + n_{i,2,4} + n_{i,2,5} + n_{i,2,6} &\leq V \\
    n_{i,1,3} + n_{i,2,3} + n_{i,3,4} + n_{i,3,5} + n_{i,3,6} &\leq V \\
    n_{i,1,4} + n_{i,2,4} + n_{i,3,4} + n_{i,4,5} + n_{i,4,6} &\leq V \\
    n_{i,1,5} + n_{i,2,5} + n_{i,3,5} + n_{i,4,5} + n_{i,5,6} &\leq V \\
    n_{i,1,6} + n_{i,2,6} + n_{i,3,6} + n_{i,4,6} + n_{i,5,6} &\leq V 
\end{align*}
\]

By lemma 1, each RRV \( n_i \), satisfying (49)-(54) must be routable on the HSB of size two, and by lemma 5, \( n \) is also routable on the symmetric HSB of size \( V \).

For an odd \( V \), by lemma 5, we can decompose a vector \( n \) into \( \lfloor V/2 \rfloor \) sub-RRVs \( n_i, n_j \), where \( i = 1, 2, \ldots, \lfloor V/2 \rfloor \), such that each of the sub-RRVs \( n_i, n_j \), \( i = 1, 2, \ldots, 2 \), \( \lfloor V/2 \rfloor \), satisfies the set (49)-(54), and the remaining one \( n_{i_1,2} \) satisfies the following set of inequalities:

\[
\begin{align*}
    n_{[V/2],1,2} + n_{[V/2],1,3} + n_{[V/2],1,4} + n_{[V/2],1,5} + n_{[V/2],1,6} &\leq 1 \\
    n_{[V/2],2,1} + n_{[V/2],2,3} + n_{[V/2],2,4} + n_{[V/2],2,5} + n_{[V/2],2,6} &\leq 1 \\
    n_{[V/2],1,3} + n_{[V/2],2,3} + n_{[V/2],3,4} + n_{[V/2],3,5} + n_{[V/2],3,6} &\leq 1 \\
    n_{[V/2],1,4} + n_{[V/2],2,4} + n_{[V/2],3,4} + n_{[V/2],4,5} + n_{[V/2],4,6} &\leq 1 \\
    n_{[V/2],1,5} + n_{[V/2],2,5} + n_{[V/2],3,5} + n_{[V/2],4,5} + n_{[V/2],5,6} &\leq 1 \\
    n_{[V/2],1,6} + n_{[V/2],2,6} + n_{[V/2],3,6} + n_{[V/2],4,6} + n_{[V/2],5,6} &\leq 1 
\end{align*}
\]

We have

\[
\begin{align*}
    n &= \sum_{i=1}^{\lfloor V/2 \rfloor} n_i 
\end{align*}
\]

By lemma 1, each RRV \( n_i \), satisfying the set (49)-(54) must be routable on the symmetric HSB of size two. Further, the last RRV is also routable on an HSB of size one (a clique of six terminals). Hence, by lemma 5, \( n \) is also routable on the symmetric HSB of size \( V \).

(Only if) For an HSB \( M_n \) of size \( V \), the total number of connections routed through each side of \( M_n \) cannot exceed \( V \). Hence, if \( n \) is routable on \( M_n \), (1)-(6) must be satisfied. \( \square \)

**Theorem 4:** No HSBs with less than \( 15 \) switches can be universal.

**Proof:** By definition 2, an RRV with only one non-zero component \( V \) such as \( (V, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) \) is routable on a universal HSB. Hence it needs at least \( V \) non-interacting switches for each type of connections to construct a universal HSB. Since there are \( 15 \) types of connections in an HSB, the smallest number of switches needed to construct a universal HSB is \( 15V \).

As mentioned in subsection 3.1, the total number of switches used in a six-sided symmetric switch block is \( 15V \). Thus, the symmetric HSBs are the ‘cheapest’ universal ones, i.e. it uses the minimum number of switches to construct a universal switch block. Note that the \( 15V \) switches requirement is quite small, compared to a fully connected HSB which has \( 15V^2 \) switches. By theorem 1, any two isomorphic switch blocks have the same routing capacity. Hence, we have following lemma.

**Lemma 6:** For any two isomorphic switch blocks \( M_n(T, S) \) and \( M_n(T', S') \), \( M_n(T, S) \) is universal if and only if \( M_n(T', S') \) is universal.

By lemma 6, we can obtain a whole class of universal switch blocks by performing isomorphism operations on a symmetric switch block.
4 Experimental results

To explore the effects of switch-block architectures on routing on a 3-D FPGA, we implemented a maze router in the C language and ran it on a SUN Ultra workstation. We tested the area performance of the router based on the CGE [2] and the SEGA [12] benchmark circuits. A logic-block pin was connected to any of the $W$ tracks in the adjacent routing channel. These circuits were routed on a two-layer 3-D FPGA and randomly assigned the layer for each terminal of a net. The switch-block architectures used were the symmetric switch blocks and clique-based (Xilinx XC4000-like) switch blocks. The quality of a switch block was evaluated by the area performance of the detailed router. Table 2 shows the results. For the results listed in this table, we determined the minimum number of tracks $W$ required for 100% routing completion for each circuit, using the two kinds of switch blocks. Because net ordering often affects the performance of a maze router, the benchmark circuits were routed by using the following three net-ordering schemes to avoid possible biases: (i) net order as given in the original benchmark circuits, (ii) shortest net first (non-decreasing order of net lengths), and (iii) longest net first (non-increasing order of net lengths). Also, since our main goal is to make fair comparisons for switch-block architectures, no optimisation such as rip-and-reroute phase, was incorporated in the maze router (optimisation might bias the comparison).

Results show that, between the two kinds of switch blocks, the symmetric switch blocks usually needed the minimum $W$s for 100% routing completion, no matter what net order was used. The results show that the performed symmetric switch blocks improve routability at the chip level. (An average of 6% improvement in area performance was achieved.) Thus also performed experiments to explore the effects of net density on the area performance of switch blocks. Connections were randomly generated on a $15 \times 15 \times 3$ (number of logic blocks in the three layers) 3-D FPGA. For this purpose, it was assumed that the number of pins on each logic blocks was unlimited (so that we could test denser circuits). As shown in Fig. 11, no matter how dense the circuit is (numbers of connections ranging from 400 to 1600), the symmetric switch blocks consistently outperform the clique-based switch blocks. (An average of 10% improvement in the area performance was achieved.)

5 Concluding remarks

We have proposed a class of the universal switch blocks for three-dimensional FPGAs. Each switch block has $15W$ switches and $FS = 5$. It has been proved that no switch block with less than $15W$ switches can be universal. The proposed switch blocks have been compared with those of

### Table 2: Minimum numbers of tracks needed for detailed-routing completion

<table>
<thead>
<tr>
<th>Circuit</th>
<th>Number of logic blocks</th>
<th>Number of connections</th>
<th>Original order</th>
<th>Shortest net first</th>
<th>Longest net first</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>Symmetric</td>
<td>Sym.</td>
<td>Sym.</td>
</tr>
<tr>
<td>BUSC</td>
<td>$13 \times 12 \times 2$</td>
<td>392</td>
<td>7</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td>DMA</td>
<td>$18 \times 16 \times 2$</td>
<td>771</td>
<td>9</td>
<td>8</td>
<td>10</td>
</tr>
<tr>
<td>BNRE</td>
<td>$22 \times 21 \times 2$</td>
<td>1257</td>
<td>9</td>
<td>8</td>
<td>10</td>
</tr>
<tr>
<td>DFSM</td>
<td>$23 \times 22 \times 2$</td>
<td>1422</td>
<td>9</td>
<td>8</td>
<td>9</td>
</tr>
<tr>
<td>Z03</td>
<td>$27 \times 26 \times 2$</td>
<td>2135</td>
<td>9</td>
<td>8</td>
<td>10</td>
</tr>
<tr>
<td>9symml</td>
<td>$11 \times 10 \times 2$</td>
<td>259</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td>alu2</td>
<td>$15 \times 13 \times 2$</td>
<td>511</td>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
<tr>
<td>alu4</td>
<td>$19 \times 17 \times 2$</td>
<td>851</td>
<td>8</td>
<td>9</td>
<td>10</td>
</tr>
<tr>
<td>apex7</td>
<td>$12 \times 10 \times 2$</td>
<td>300</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td>example2</td>
<td>$14 \times 12 \times 2$</td>
<td>444</td>
<td>8</td>
<td>8</td>
<td>9</td>
</tr>
<tr>
<td>k2</td>
<td>$22 \times 20 \times 2$</td>
<td>1256</td>
<td>11</td>
<td>10</td>
<td>12</td>
</tr>
<tr>
<td>term1</td>
<td>$10 \times 9 \times 2$</td>
<td>202</td>
<td>7</td>
<td>6</td>
<td>7</td>
</tr>
<tr>
<td>too_large</td>
<td>$15 \times 14 \times 2$</td>
<td>519</td>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
<tr>
<td>vda</td>
<td>$17 \times 16 \times 2$</td>
<td>722</td>
<td>7</td>
<td>9</td>
<td>10</td>
</tr>
<tr>
<td>Total</td>
<td>—</td>
<td>—</td>
<td>109</td>
<td>118</td>
<td>125</td>
</tr>
<tr>
<td>Comparison</td>
<td>—</td>
<td>—</td>
<td>1.00</td>
<td>1.08</td>
<td>1.00</td>
</tr>
</tbody>
</table>

![Fig. 11 Comparison of area performance using the symmetric switch blocks and clique-based switch blocks for different numbers of connections on a $15 \times 15 \times 3$ 3D-FPGA](image)
Xilinx XC4000-like FPGAs. Experimental results have shown that the proposed universal switch blocks improve routability at the chip level.

There are several significant future research directions:

• Exploration of the universal switch blocks of 3-D FPGAs with respect to multi-terminal pins: in this paper, the theoretical analysis is based on two-pin nets. Nevertheless, the benchmark circuits also contain significant numbers of multi-pin nets. The experimental results based on circuits with multi-pin nets conform to the theoretical findings based on two-pin nets. The approach can be extended to the design of universal switch blocks with respect to multi-pin nets. For example, one may first model the types of a specified multi-pin net (say, \( C(6, 3) = 20 \) types of nets for three-pin nets). After the modelling is established, similar procedures may be applied for further analysis.

• Development of global/detailed routers that can consider the universal switch-block architectures: to develop an FPGA router considering the switch-block architecture, we shall first develop a congestion metric associated with the switch block, and then perform the routing based on congestion control of switch-block density as well as classical channel density.

6 References


