As 40Gb/s optical communication systems enter the commercial stage, the transceiver, which is a key component of these systems, requires lower power dissipation, a size reduction, and a wider frequency range to meet the requirements of several standards, such as OC-768/STM-256 (39.8Gb/s), OTU-3 (43.0Gb/s), and 4-10GbE-LANPHY (44.6Gb/s). 40Gb/s transceivers have already been reported in SiGe-based technology. However, they dissipate more than 10W in total and do not support 39.8-to-44.6Gb/s wide-range operations [1-2]. There have been recent reports on CMOS transceivers, but their speed performance is still less than 40Gb/s and their output signal suffers from large jitter [3-5]. In this paper, 40Gb/s SFI-5-compliant TX and RX chips in 65nm CMOS technology consume 2.8W each. This low power dissipation allows for a small and low-cost plastic BGA package. The TX has a full-rate clock architecture that is based on a 40GHz VCO, a 40Gb/s retiming D-FF, and 40GHz clock-distribution circuits that lead to a low jitter of 0.57psrms and 3.1pspp at 40Gb/s. A 40/20GHz clock-timing-adjustment circuit based on a phase interpolator is used to ensure wide-range error-free operations (BER < 10-12) at 39.8 to 44.6Gb/s. A quadruple loop architecture is introduced in the CDR circuit of the RX, resulting in a 38Gb/s error-free operation (BER < 10-12) at 231-1 PRBS with a low rms jitter of 210fs in the recovered clock.