Architecture Design of Full HD JPEG XR Encoder for Digital Photography Applications

Chia-Ho Pan, Ching-Yen Chien\(^1\), Wei-Min Chao, Sheng-Chieh Huang\(^1\), and Liang-Gee Chen, Fellow, IEEE

Abstract — To satisfy the high quality image compression requirement, the new JPEG XR compression standard is introduced. The analysis and architecture design with VLSI architecture of JPEG XR encoder are proposed in this paper which can encode 4:4:4 1920\(\times\)1080 high definition photo in smooth. According to the simulation results, the throughput of the proposed design can encode 44.2 M samples/sec. This design can be used for digital photography applications to achieve low computation, low storage, and high dynamical range features.

Index Terms — JPEG XR, High Definition Photo, Joint Photographic Experts Group (JPEG), VLSI Architecture.

I. INTRODUCTION

Many advanced multimedia applications require image compression technology with higher compression ratio and better visual quality. High quality, high compression rates of digital image and low computational cost are important factors in many areas of consumer electronics, ranging from digital photography to the consumer display equipment applications such as digital still camera and digital frame. These requirements usually involve computationally intensive algorithms imposing trade-offs between quality, computational resources and throughput.

For high quality of digital image applications, the extended range of color range has becoming more important in recent emerging need of the consumer product. In the past, the digital cameras and the display equipments in the consumer market typically had 8 bits per channel. Today the condition is quite different. In the consumer market, digital cameras and the desktop display panels also have 12 bits of information per channel. If the information per channel of digital image is still compressed into 8 bits, 4 bits of information per channel is loss and the quality of the digital image is limited. Due to the improvement of the display equipments, the JPEG XR is designed for the high dynamic range (HDR) and the high definition (HD) photo size. JPEG XR which is already under organized by the ISO/IEC Joint Photographic Experts Group (JPEG) Standard Committee is a new still image coding standard and derived from the window media photo [1]-[3]. The XR of JPEG XR means the extended range. It means that JPEG XR supports the extended range of information per channel. The goal of JPEG XR is to support the greatest possible level of image dynamic range and color precision, and keep the device implementations of the encoder and decoder as simple as possible.

For the compression of digital image, the Joint Photographic Experts Group [4], the first international image coding standard for continuous-tone natural images, was defined in 1992. JPEG is a well-known image compression format today because of the population of digital still camera and Internet. Another image coding standard, JPEG2000 [5], was finalized in 2001. Differed from JPEG standard, a Discrete Cosine Transform (DCT) based coder, the JPEG2000 uses a Discrete Wavelet Transform (DWT) based coder for better coding efficiency. The JPEG2000 not only enhances the compression, but also includes many new features, such as quality scalability, resolution scalability, region of interest, and lossy/lossless coding in a unified framework. However, the design of JPEG2000 is much complicated than the JPEG standard. The core techniques and computation complexity comparisons of these two image coding standard are shown in [6].

In this paper, the analysis of image standards and the architecture design of JPEG XR are proposed. Comparison was made to analyze the compression performance among JPEG, JPEG2000, and JPEG XR. Fig. 1 is the peak signal-to-noise ratio (PSNR) results under several different bitrates. The test color image is 512\(\times\)512 Lena. The image quality of JPEG XR is very close to that of JPEG2000. The PSNR difference between JPEG XR and JPEG2000 is under 0.5dB. The coding performance of JPEG is significantly lower than the other two standards. Fig. 2 shows the subjective views at 60 times compression ratio. The block artifact of JPEG image in Fig. 2 is easily observed through inspection, while the JPEG XR demonstrates acceptable qualities by implementing the pre-filter function.
This paper is organized as follows. In Section II, we present the fundamentals of JPEG XR. Section III describes the characteristics of our proposed architecture design of JPEG XR encoder. Section IV shows the implementation results. Finally, a conclusion is given in Section V.

II. FUNDAMENTALS

The JPEG XR image compression standard has many options for different purposes. In the following section, the modules of JPEG XR are introduced.

A. Tiles

At the first step of the JPEG XR compression, the tile size has to be decided for the optimization of the compression result. Since different tile size makes different compression result, Table I shows the compression results under the same quantization with different tile size for four test 512x512 benchmark images. The encoded image size will increase with the number of tiles when the small tiles size has been chosen as shown in Table I.
When the image has been divided into different tiles, each tile is processed independently. It makes JPEG XR more suitable for the design of hardware implementation and efficient for the memory buffer size. The pixels crossing the boundary of the tiles have to be processed without any data dependency. The Huffman table must be altered accordingly and retransmitted, and the adaptive scan order has to be rebuilt as well. The characteristic of data independency also overcomes the error impact to reserve some data when the error occurs. Although the division of tiles is helpful for the hardware implementation and data reserving in error environment, it decreases the data compression efficiency. However, it depends on the tradeoff between hardware design considerations and the data compression efficiency.

B. Color Conversion

The purpose of color conversion is to reflect the color sensitivity according to the characteristics of human eyes. The color space is converted from RGB to YUV in the JPEG XR encoder. The color conversion is reversible, in other words, this color conversion is a lossless conversion. The color conversion equation is

\[ V = B - R \]
\[ U = -\left[ R - G + \left\lfloor \frac{V}{2} \right\rfloor \right] \]
\[ Y = G + \left\lfloor \frac{U}{2} \right\rfloor - \text{offset} \]

where offset is 128.

C. Pre-filter

There are three overlapping choices of the pre-filter function: non-overlapping, one-level overlapping and two-level overlapping. Fig. 4 shows different trade-offs consideration of three overlapping choices. The test image is 512x512 color Lena image. The non-overlapping condition is used for fastest encoding and decoding but it is efficient in low compression ratio mode or lossless mode. However, this mode potentially introduces blocking at low bitrates. The one-level overlapping function, compared to the non-overlapping, has higher compression ratio but it needs additional time for those results. The two-level overlapping has the highest computation complexity. The PSNR and objective image quality of two-level overlapping are better than the other two at low bitrate area. The pre-filter function is recommended for both high image quality and further compression ratio considerations at high quantization levels. For high image quality issue, the pre-filter function eliminate the block effect which is sensitive of visual quality. For further compression ratio considerations, the bitrate saving of overlapping is up to 20% at low bitrate area as shown in the low bitrate area of Fig. 4 in the Lena test image.

D. Photo Core Transform

Photo Core Transform (PCT) function transforms the pixel value after pre-filter computing to the lowest frequency coefficient DC, low frequency coefficients AD, and high frequency coefficients AC. There are two parts process for the PCT as shown in Fig. 5. The macroblock (MB) is partitioned into 16 4x4 blocks. In each part process, each 4x4 block is pre-filtered and then transformed by 4x4 PCT. A 2x2 transform is applied to four coefficients by four times for each 4x4 block in first part process. The low frequency coefficient of these four coefficients is processed to the top-left coefficient. After first part process, the DC coefficients of 16 4x4 blocks can be collected as a 4x4 DC block. The second part process is for the 4x4 DC block from the first part process. The second part 2D 4x4 PCT is built by using the three operators: 2x2 $T_h$, $T_odd$ and $T_odd_odd$

**Fig. 5. The 1st part and 2nd part process of PCT.**

**Fig. 6. DC prediction model.**
PSEUDO CODE:

Hozi_weight = \text{abs}(\text{lowpass}_Y(1)) + \text{abs}(\text{lowpass}_Y(2)) + \text{abs}(\text{lowpass}_Y(3)) \text{abs}(\text{lowpass}_U(2)) + \text{abs}(\text{lowpass}_V(2));
Verti_weight = \text{abs}(\text{lowpass}_Y(4)) + \text{abs}(\text{lowpass}_Y(5)) + \text{abs}(\text{lowpass}_Y(6)) \text{abs}(\text{lowpass}_U(5)) + \text{abs}(\text{lowpass}_V(5));
if (4 \times \text{Hozi_weight} < \text{Verti_weight})
then "predict from LEFT"
else if (\text{Hozi_weight} < 4 \times \text{Verti_weight})
then "predict from TOP"
else
then "NULL predict"

Fig. 7. (a) Prediction of AD low pass. (b) Prediction model and AC high pass.

E. Quantization

The quantization is the process of rescaling the coefficients after the transform process is applied. The quantization uses the quantized value to divide and round the coefficients after PCT transformed to an integer value. For the lossless coding mode, the quantized value = 1. For the lossy coding mode, the quantized value > 1. The quantization of JPEG XR use integer operations. The advantage of integer operation keeps the precision after scaling operations and only uses shift operation to perform the division operations. The quantization parameter is allowed to differ across high pass band, low pass band and DC band. It varies to different value according to the sensitivity of human vision in different coefficient bands.

F. Prediction

The DC prediction of JPEG XR is different from the other image/video standards. There are three directions of DC prediction: LEFT, TOP, and LEFT and TOP. As shown in Fig. 6, the JPEG XR prediction rules of DC coefficient use the DC coefficient of left MB and top MB to decide the direction of DC prediction. Around the boundary constrain of MBs, the blue MBs only predict from left MB and the gray MBs only predict from top MB.

The AD/AC block can be predicted from blocks on its TOP or LEFT as Fig. 7. If the predicted direction is LEFT, the prediction relationship example of AD coefficients is shown as Fig. 7(a). The AD predicted direction follows DC prediction model as described in Fig. 6. The AC predicted direction can be decided by the pseudo code described in Fig. 7(b). After comparing the Hozi_weight and the Verti_weight, the predicted direction can be decided. The computation after prediction judgment of AC is similar to AD that can be reused to reduce the coefficient value of the block. The top Fig. 7(b) shows the prediction relationship example of AC coefficients when the prediction direction is TOP.

G. Adaptive Encode and Packetize

The adaptive scan is used for the first step of entropy coding which is based on the latest probability of non-zero coefficients. Fig. 8 shows an example that the previous scan order is changed to update scan order based on the probability. The most probable non-zero coefficients are scanned first and the probability is counted by numbers of the non-zero coefficients. If the non-zero probability is larger than the previous scanned coefficient, the scan order is exchanged. By doing so, the non-zero coefficients are collected together and processed in an orderly fashion.

After the coefficients of current block are scanned by adaptive scan order, JPEG XR uses two entropy coding schemes. Fig. 9 demonstrates an example of coding a 4x4 block when ModelBits is 4 bits. It means each coefficient is represented by 4 bits. The coefficients of 4x4 block are scanned by adaptive scan order first, then the ModelBits is updated by the total overhead coefficients in each band per channel. The coefficients which can be represented under the ModelBits are encoded by the FlexBits table. For the extra bit such as the 17 can not be represented in four bits as the example in Fig. 9, the encoding block will increase extra bit for the new Run-Level Encode (RLE) coding. The RLE function is added into the bitstream length for the overhead coefficient. The Levels, the Runs before nonzero coefficients, and the number of overhead coefficients are counted for RLE. Then the RLE block is encoded first while different size of Huffman table is used to make the bit allocation in optimization. After processing the RLE algorithm, the RLE results and FlexBits are packetized.

<table>
<thead>
<tr>
<th>x</th>
<th>4</th>
<th>1</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td>10</td>
<td>13</td>
<td>8</td>
</tr>
</tbody>
</table>

Fig. 8. Adaptive scan order updating example.
In the following section, a JPEG XR encoder architecture design is presented. For the system requirement, the consideration of functional block pipelining, and the architecture design of the pre-filter, transform, quantization, prediction, and the entropy coding modules are all discussed in the following sections.

### A. System Architecture and Functional Block Pipelining

HD photo image of our design specification has 2073600 pixels (1920x1080) and one MB is consisted of 256 pixels (16x16). In this image size, there are 8160 MBs in one frame. Besides, there are 53 blocks (1 DC_Y, 1 DC_UV, 3 AD, and 48 AC for Y, U and V) to be processed for encoding one MB in entropy coding unit for 4:4:4 format. Each block uses 26 clock cycles, because the number of processing cycles for encoding is about the same as that for bitstream generation unit in the worst case. Thus, it takes 1378 clock cycles (53x26) to encode a MB. And then, one frame takes 11244480 clock cycles (8160x1378). For the 12-bit high dynamic range input, the throughput achieve 44.2M samples/sec (1920 x 1080 x 7.11 x 3 = 44.2 M) for 4:4:4 baseline mode.

The pipeline stages of this JPEG XR encoder design can be divided into three steps, as shown in Fig. 10. Since the color conversion, pre-filter and the PCT modules are computed with the 4x4 block matrix style with no feedback information, they are arranged into the same stage at the beginning. In order to decrease the area and the power, only one structure is used to process Y, U and V. The prediction unit is used as second stage for different direction comparisons with the feedback processed pixels for the DC/AD/AC region. Last but not least, the adaptive encode and the packetizer modules with high data dependency are divided as the third stage.

### B. Pre-filter, PCT, and Quantization

Fig. 11 shows the data flow of stage 1. A 32 bits external bus can transmit 2 R/G/B coefficients at one time, thus the R/G/B block buffer will be fully filled within 24 clock cycles. Each block uses 26 clock cycles, because the number of processing cycles for encoding is about the same as that for bitstream generation unit in the worst case. Thus, it takes 1378 clock cycles (53x26) to encode a MB. And then, one frame takes 11244480 clock cycles (8160x1378). For the 12-bit high dynamic range input, the throughput achieve 44.2M samples/sec (1920 x 1080 x 7.11 x 3 = 44.2 M) for 4:4:4 baseline mode.

This paper uses a memory reused method to reduce the memory access bandwidth for the pre-filter and PCT module. The black line in Fig. 12(a) is the boundary of MB. When the blue range is to be processed after the orange range, the pre-filter function only have to deal with the amount of new data instead of the entire block data into the register. This is due to the presence of previously stored...
column coefficients of blocks in the registers and the capability to be utilized as the coefficients in the orange range. Therefore, the memory buffer size and the memory access for pre-filter are reduced. Otherwise, the additional SRAM will be needed to store the data of the entire block from off-chip memory and the execution time will be increased. The simulations in Fig. 12(b) shows the amount of memory access can be reduced from 50.13 MB/s to 33.42MB/s without any other clock cycle penalty by comparing with the conventional method.

Because the functions of stage 1 are computing with the 4x4 block matrix style with no feedback information, the three pipeline architecture for stage1 is used: color conversion (CC), pre-filter, PCT (include quantization). For the memory allocation, the color conversion requires 6 blocks per column, the pre-filter requires 5 blocks, and the PCT including quantization uses 4 blocks to execute the function as shown in Fig. 13. At the end of PCT, DC block can be processed to reduce one pipeline bubble when the next new pipeline stage starts. Hence, this well arranged pipeline timing schedule efficiently eliminated excessive spendings of any other clock cycle to process the DC block.

The pipeline architecture for PCT including Quantization is shown in Fig. 14. Additional registers are implemented to buffer the related coefficients for the next pipeline processing. The left multiplex selects the inputs for two parts PCT process. Initially, the input pre-filtered coefficients are selected to process the PCT algorithm. Then, the orange block (DC) will be processed after the 16 block has been computed. The quantization stage de-multiplex the DC, low pass band AD and high pass band AC coefficients to suitable process element. The processed data are arranged into the quantized coefficients of Y, U, V SRAM blocks for prediction operation.

C. Prediction Unit

The quantized data stored in Y, U, V SRAM blocks are processed with the subtract operation as the prediction algorithm. Three SRAM blocks are used in the this design. One 1440x4 byte SRAM is used to buffer the 1 DC and 3 AD coefficients for the TOP AD prediction coefficients in the prediction judgement, so that the regeneration of these data are unnecessary when they are selected in the prediction mode. In Fig. 15, two 768x4 bytes SRAMs are used to save the quantized coefficients of current block and the predicted coefficients for current block.

D. Adaptive encode, and Packetizer

There are three complex data dependency loops in the adaptive encode module as shown in Fig. 16. The first one is the adaptive-scan-order block used to refresh the scan order. The second is the updated ModelBits block, which decides how many bits are necessary to represent one coefficient. Third, the adaptive Huffman encode block to choose the most efficient Huffman table.

The design of adaptive scan block counts the numbers of the non-zero coefficients to decide whether the scan order should be exchanged or not. After the processing of the adaptive scan block, there are two paths to be chosen for the coefficient as the architecture design of RLE shown in Fig. 17(a). The coefficients which can be represented under ModelBits are coded by the FlexBits table. The other coefficients change the coefficient to be the absolute values as shown in the grey block. And then the mask table and barrel shifter generate the Flexbits and Level. After the processing of the RLE algorithm, Fig. 17(b) shows the design of adaptive Huffman encode algorithm. The table index starts from default value. There is a accumulated delta value in the Discriminant Register. If the Discriminant Register is bigger than the upperbound value, the table index will add one. Otherwise, the table index is decreased if the smaller condition occurs.
The module of absolute value
Barral Shifter

Adaptive scan order

(a)

(b)

Input

Coefficient < 0?

Level

Run

Register

Output

Discriminant

Delta table

Fig. 17. RLE coding architecture and Adaptive Huffman encode algorithm.

Table 3: Implementation Results of FPGA Prototype System

<table>
<thead>
<tr>
<th>Function Blocks</th>
<th>Adaptive Look-Up Tables (ALUTs)</th>
<th>Critical Path (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Color conversion</td>
<td>484</td>
<td>51.01 for 4 coefficients</td>
</tr>
<tr>
<td>Pre-filter</td>
<td>4080</td>
<td>86.9 for 16 coefficients</td>
</tr>
<tr>
<td>PCT / quantization</td>
<td>5191</td>
<td>122.79 for 16 coefficients</td>
</tr>
<tr>
<td>Prediction</td>
<td>1952</td>
<td>10.2 for 1 coefficient</td>
</tr>
<tr>
<td>Adaptive encode</td>
<td>6262</td>
<td>18.215 for 1 coefficient</td>
</tr>
<tr>
<td>Packetizer</td>
<td>380</td>
<td>7.4 for 1 coefficient</td>
</tr>
<tr>
<td>On-chip SRAM</td>
<td>1440x4 Bytes (x1)</td>
<td>44.2 Mega pixels within one second</td>
</tr>
<tr>
<td></td>
<td>768x4 Bytes (x2)</td>
<td>7.11 fps for 4:4:4 HDTV(1920x1080)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>42.66 fps for 4:4:4 VGA(720x480)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>145.43 fps for 4:4:4 CIF(352x288)</td>
</tr>
</tbody>
</table>

Fig. 18. The architecture of (a) RLE Huffman encode and (b) codeword concentrate

TABLE III
IMPLEMENTATION RESULTS OF FPGA PROTOTYPE SYSTEM
**IV. IMPLEMENTATION RESULTS**

A prototype is implemented to verify the proposed VLSI architecture for JPEG XR. And Table III shows the implementation result of each module of JPEG XR by using the FPGA prototype system. It is used to test the conformance bitstreams for the certification.

The area and the power is decreased because only one hardware is used for the pre-filter, PCT including quantization block. Besides, the extra registers are used to increase the pipeline stages for achieving the specification, such as the color conversion, PCT/quantization and the adaptive encode block. And the on-chip SRAM blocks are used to store the reused data processed with the prediction module to eliminate the memory access.

**V. CONCLUSION**

Compared with the JPEG2000, the coding flow of the JPEG XR is simple and has lower complexity in the similar PSNR quality at the same bit rate. Hence, the JPEG XR is very suitable for implementation with the dedicated hardware used to manage HD photo size images for the HDR display requirement. In this paper, we initially analyzed the comparison of JPEG XR with other image standard, and then a three-stage MB pipelining was proposed to process the capacity and hardware utilization. Lastly, the pre-filter and PCT function was designed to reduce 33.3% memory access from off-chip memory. We also made a lot of efforts on module designs. The timing schedule and pipelining of color conversion, pre-filter, PCT & quantization modules are well designed. In order to prevent accessing the coefficients from off-chip memory, an on-chip SRAM is designed to buffer the coefficients for the prediction module with only some area overhead. For the entropy coding, we designed a codeword concentrating architecture for the throughput increasing of RLE algorithm. And the adaptive encode and packetizer modules efficiently provide the coding information required for packing the bitstream. Based on this research result, we contribute a VLSI architecture for 1920x1080 HD photo size JPEG XR encoder design. Our proposed design can be used in those devices which need powerful and advanced still image compression chip, such as the next generation HDR display, the digital still camera, the digital frame, the digital surveillance, the mobile phone, the camera and other digital photography applications.

REFERENCES


Chia-Ho Pan was born in Tainan, Taiwan, R.O.C., in 1975. He received the B.S. and M.S. degrees in Electrical Engineering from National Central University, Taoyuan, Taiwan, R.O.C., in 1997 and 1999, respectively. He is currently pursuing the Ph.D. degree at the Graduate Institute of Electrical Engineering from National Taiwan University, Taipei, Taiwan, R.O.C.

His major research interests include video coding, video codec VLSI architecture design and wireless multimedia systems.

Ching-Yen Chien was born in Taoyuan, Taiwan, R.O.C., in 1984. He received the B.S. degrees in Electrical Engineering from National Chin-Yi University of Technology, Taichung, Taiwan, R.O.C., in 2006. He is currently pursuing the Master degree at the Department of Electrical and Control Engineering from Nation Chiao-Tung University, Hsinchu, Taiwan, R.O.C..

His major research interests include image coding, image codec VLSI architecture design and human visual systems (HVS).

Wei-Min Chao was born in Taoyuan, Taiwan, R.O.C., in 1977. He received the B.S. and M.S. degrees from the Department of Electronics Engineering, National Taiwan University, Taipei, Taiwan, R.O.C., in 2000 and 2002, respectively. He is currently pursuing the Ph.D. degree at the Graduate Institute of Electrical Engineering from National Taiwan University, Taipei, Taiwan, R.O.C.

His research interests include video coding algorithms and VLSI architecture for image and video processing.

Sheng-Chieh Huang was born in ChungHua, Taiwan, in 1967. He received the B.S. degree in Hydraulic Ocean Engineering from Nation Cheng Kung University, in 1991, and the M.S. and Ph.D. degrees in Electrical Engineering from National Taiwan University, in 1993 and 1999.

He is an Assistant Professor with the Department of Electrical and Control Engineering at National Chiao-Tung University. His current research interests are VLSI design in DSP/DIP architecture design, computer architecture, video coding system, and sense/Traditional Chinese Medicine (TCM) SOC design.
Liang-Gee Chen (S’84–M’86–SM’94–F’01) was born in Yun-Lin, Taiwan, R.O.C., in 1956. He received the B.S., M.S., and Ph.D. degrees in Electrical Engineering from National Cheng Kung University, Tainan, Taiwan, R.O.C., in 1979, 1981, and 1986, respectively. He was an Instructor (1981–1986), and an Associate Professor (1986–1988) in the Department of Electrical Engineering, National Cheng Kung University. While in the military service during 1987–1988, he was an Associate Professor in the Institute of Resource Management, Defense Management College. In 1988, he joined the Department of Electrical Engineering, National Taiwan University (NTU), Taipei, Taiwan. During 1993–1994, he was a Visiting Consultant with the Digital Signal Processing (DSP) Research Department, AT&T Bell Labs, Murray Hill, NJ. In 1997, he was a Visiting Scholar with the Department of Electrical Engineering, University of Washington, Seattle. During 2001 to 2004, he was the first Director of the Graduate Institute of Electronics Engineering (GIEE), NTU. During 2004 to 2006, he is also the Director of the Electronics Research and Service Organization, Industrial Technology Research Institute, Hsinchu, Taiwan. Currently, he is a Professor with the Department of Electrical Engineering and GIEE at NTU. His current research interests are DSP architecture design, video processor design, and video coding systems.

Dr. Chen has served as an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS for Video Technology since 1996, as Associate Editor of IEEE TRANSACTIONS ON VLSI SYSTEMS since 1999, and as Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II since 2000. He has been the Associate Editor of the Journal of Circuits, Systems, and Signal Processing since 1999, and a Guest Editor for the Journal of Video Signal Processing Systems. He is also an Associate Editor of the PROCEEDINGS OF THE IEEE. He was the General Chairman of the 7th VLSI Design/CAD Symposium in 1995 and of the 1999 IEEE Workshop on Signal Processing Systems: Design and Implementation. He is the Past-Chair of Taipei Chapter of IEEE Circuits and Systems (CAS) Society, and is a member of the IEEE CAS Technical Committee of VLSI Systems and Applications, the Technical Committee of Visual Signal Processing and Communications, and the IEEE Signal Processing Technical Committee of Design and Implementation of Signal Processing Systems. He is the Chair-Elect of the IEEE CAS Technical Committee on Multimedia Systems and Applications. During 2001–2002, he served as a Distinguished Lecturer of the IEEE CAS Society. He received the Best Paper Award from the R.O.C. Computer Society in 1990 and 1994. Annually from 1991 to 1999, he received Long-Term (Acer) Paper Awards. In 1992, he received the Best Paper Award of the 1992 Asia-Pacific Conference on circuits and systems in the VLSI design track. In 1993, he received the Annual Paper Award of the Chinese Engineer Society. In 1996 and 2000, he received the Outstanding Research Award from the National Science Council, and in 2000, the Dragon Excellence Award from Acer. He is a member of Phi Tan Phi.