As in any other engineering activity, the design of semiconductor chips (ICs) encompasses several separate, but often closely coupled, design activities. Today's system-on-a-chip (SoC) development exemplifies such a design flow, comprising tasks such as design conceptualization, architectural design and optimization, logic synthesis, physical implementation, and design verification. Evolving design tools and design methodologies require changes in software design-tool capabilities to meet current and future chip-design needs.
Chip Complexity and Cost Is Rising
The job of SoC design is becoming far more difficult and more expensive with each generation in process technology. Chip complexity is increasing dramatically on many frontsclock speed, logic density, core complexity, package pin count, and reduced power consumption all contribute to the manpower and time needed to produce a working chip. Photomasks are a significant factor in chip cost, with mask sets costing around $1M for a chip fabricated in a 130nm CMOS process and increasing with each process advancement. As process nodes shrink, designers must more accurately model physically driven electrical effects that influence performance. Such effects include crosstalk, signal integrity, interconnect and package-pin inductance, and electromigration. In addition, these affects are often interdependent, requiring optimization tools to deal with these performance-limiting factors simultaneously, rather than one at a time.
Chip Design has Many Facets
When developing SoCs, the chip designer must move between several different design activities, going between several abstraction levels, transitioning from higher to lower design representations, and dealing with increasing design detail along the way. Each design taskregister-transfer-level (RTL) optimization, logic synthesis, chip place and route (P&R), logic and timing verification, and so onrequires the designer to use one or more design-software packages, EDA tools, to accomplish the necessary design tasks (Figure 1). Shifting between design-abstraction levels requires an intricate interface between the EDA tools working at each level to guarantee integrity between the design-data formats each tool uses.
Figure 1: Chip design showing logic and physical flows |
One clear line of demarcation within chip-design tasks is the separation of front-end (logic) and backend (physical) design activities. The front-end/backend "barrier" represents an important transition point for the designer, due to the importance of communicating design intent from logical- to physical-design representations. A successful transition from front-end to backend design is necessary for achieving timing closuremeeting the chip's timing constraints without expensive and time-consuming logic-synthesis/physical-implementation iterations.
Working with SynthesisThe Early Days
A critical piece of the EDA-tool suite is logic synthesis, commercially developed in the mid-1980s. Virtually every SoC currently designed uses a logic-synthesis tool to go from an RTL design representation, an architectural design view, to a gate-level representation, a structural design view. Along with conversion from an RTL to a gate-level representation (an RTL to gate-level 'translation'), the logic-synthesis tool also optimizes the chip's design according to design constraints based on timing or area specifications. It is this optimization that has led to a serious problema discontinuity in the handoff from logic to physical design.
As originally developed, logic synthesis was a tool for abstracting a gate-level design representation from an RTL representation, according to timing constraints a designer would place on the design. The user of the logic-synthesis tool would identify a library of gate-level logic cells the synthesis tool would target, along with the chip's timing constraints. The tool would generate, or synthesize, a gate-level representation of the design for the process technology of the targeted library that, supposedly, would meet the intended timing constraints for that design.
To map the library into the design to meet timing constraints, the synthesis tool would use estimates of wiring loads on the various design nets, the so-called estimated wireload model, based on statistical wire (interconnect) lengths on each net. Since these net parasitic loads were only estimates, the chip's physical implementation, using P&R software tools, would yield an unpleasant surprise. The estimates were almost always much different from the timing delays the designer would get by extracting the wire lengths (and loads) from the placed-and-routed chip and using this data to simulate the chip's timing behavior. Along with poor initial timing estimation, logic optimization in synthesis also does a poor job handling incremental optimization changessmall changes can result in major design perturbations.
Non-compliance of chip performance to the timing specification required the designer to modify the design at the RT level, re-synthesize, and then redo the chip's layoutexpensive operations that took huge chunks of time. Adding to the design problem was the reality that you often needed more than one synthesis/P&R reiteration to meet timing specificationsin other words, to reach timing closure. With shrinking time-to-market cycles and profit margins, this design methodology has quickly become unacceptable.
Achieving Timing ClosureFirst Pass
To help alleviate the problem of poor interconnect timing models based on statistical wire loads, EDA vendors developed some techniques to obtain better timing information prior to chip P&R operations. The first of these techniques involved floorplanning, estimating the juxtaposition of major functional blocks on the physically implemented chip and deriving wireload models from the more accurate wire lengths (when compared to statistically wire lengths) between these blocks. However, including floorplanning into the chip design flow only lessened the timing-closure problemit did not eliminate it. Designers still found they still needed several synthesis/P&R iterations to meet timing specifications for complex designs such as SoCs.
Another approach was to include more "intelligence" into the P&R tools, allowing them to optimize the chip's layout to meet timing and area specifications. However, this has now led to the current problem in achieving first-pass timing closurelack of a smooth handoff between front-end logic-synthesis and backend physical-design tools. This synthesis/physical-implementation barrier is due to several problems, including lack of a unified data model when moving from front-end to backend tools and an overlap of capabilities between synthesis and physical-design tools.
Synthesis Does Too Much
When originally designed, logic-synthesis tools needed certain optimization capabilities to help meet timing specifications. When synthesizing a structural representation (gate level) of a chip design from its RTL (architectural) representation, a logic-synthesis tool must choose a structure, often from several alternatives, that will meet timing and/or area requirements. The tool does this by optimizing the gate-level design to meet these specifications. The synthesis tool now comprises two sets of capabilitiescompiling the gate-level design from an RTL representation, and optimizing the chip to meet design objectivesin the logical design space.
However, logic synthesis does a poor initial estimate of wire delay. When the physical-design tool takes the gate-level netlist from the synthesis tool, it must re-optimize the chip so that it meets design constraints in the physical design space. In other words, the P&R tools must "throw out" the optimization provided by logic synthesis and re-optimize taking into account the more accurate physical layout. This wastes time in the logic-synthesis operation (optimization dramatically increases run time) and actually hinders the operation of the subsequent physical-implementation tool.
Inserting a software tool between synthesis and physical implementation to more closely couple the logical and physical tools may provide some additional design optimization for small and medium designs (Figure 2). However, this added physical-synthesis tool does not eliminate the need to re-optimize large SoC designs in the physical-layout environment. In addition, a pre-P&R physical-optimization tool may also use a logic-synthesis core, thus just redoing the optimization done by the logic-synthesis toolanother waste of time.
Figure 2: Chip design flow including a physical-synthesis tool
To illustrate the complexity and inefficiency of designing a chip using the current logic-synthesis/physical-synthesis/physical-implementation flow, a benchmark four-million-gate design took 26 weeks of design time. Each iteration through this flow requires around 20 intermediate files, comprising command files, scripts, data-translation files, and data files such as netlist, DEF/PDEF, SDF, SPEF, and PLIB. For a 4M-gate design, with 20-30 blocks, that adds up to several hundred files. Adding to the complexity of the accompanying data management are tasks dealing with directory structure, version control, user environments, hardware platform, and design network.
Parasitic extraction on our 4M-gate design creates a 7 Gbyte file that takes 12 hours to read out from an extraction tool and 10 hours to read into a static-timing-analysis tool. A complete iteration through the RTL-to-GDSII flow with commonly used point tools requires over 27 hours of run time and around 15 Gbytes of disk space for all the necessary intermediate files (Figure 3). Finally, to reach timing closure you typically need between 6 and 10 full synthesis/physical-implementation iterations. You can see that a lot of time and effort are wasted due to an inefficient front-end/backend handoff. There are no advantages in having such long file I/O times and large file transfersthey don't add anything to design quality.
Figure 3: Current design flow is marked by large file sizes, long read and write times, and multiple synthesis/P&R iterations
Another problem with a physical-synthesis tool following logic synthesis is that such a tool is usually concerned only with timing specifications or, at most, also with some types of crosstalk optimization. Physical synthesis does not adequately address crosstalk and signal-integrity problems, since these problems are very layout dependent and the physical-synthesis tool does not have knowledge of the final chip layout. Furthermore, physical-synthesis tools do not address problems caused by interconnect inductance and do not understand the complex layout design-rules associated with complex SoCs. Even with physical synthesis, design issues such as these result in difficult-to-solve front-to-back design problems.
Size Is Important
Besides the issues already described concerning traditional synthesis to physical-layout handoff, another problem with popular synthesis tools concerns tool capacity. Logic-synthesis design-size limitations limit the size of the design that this type of tool can handle. Large chip size, even down at the few-hundred-thousand gate level, requires you to partition the design into several blocks, with each block within the capacity limitation of the logic-synthesis tool. Current SoC sizes at 10-million-gates and higher may require dozens of partitions, leading to problems reconciling the timing budgets between these blocks when considering overall chip timing. This is true even when the design includes several pre-designed silicon-IP cores.
Design partitioning requires guard-banding performance. This means that you must over-constrain interfaces to allow back-end tools to account for inaccuracies in front-end tools. You also have to define "registered" interfaces to allow constraints to be setthis results in extra logic in the design. In addition, you must derive the constraints and keep them consistent with any change in design-block constraints.
The need for a large number of design partitions places an additional resource load on both the design team and on logic-synthesis and other design tools. Partitioning adds complexity to resource allocation for the various serial and parallel design activities, and in accounting for the interdependencies between these activities. Reducing the number of partitions decreases the cost of design development and EDA-tool licenses, reduces time-to market due to decreased tool run times and chip design time, and improves quality of results.
Timing Isn't Everything
Several EDA tools emphasize timing closure, but the real issue in chip design is reaching design closure. Meeting timing specifications is only one of many design goals. You also have to meet specifications dealing with power dissipation, signal integrity, antenna effects, and packaging and design-for-manufacturing (DFM) constraints.
The design-tool flow must deal with all of these constraints, not just those concerned with on-chip timing, at all phases of the design, especially at the front-end. Current design flows do not do a good job with this, requiring expensive design modifications late into the design.
What Is Needed
It is clear that the front-end tools and methodologies designers currently use have several drawbacks:
- They are inadequate for today's multimillion-gate SoCs
- They cannot adequately account for the process and physical details of ever-shrinking process technologies
- They hinder, rather than help, designers achieve first-pass silicon success.
Overcoming the front-end (synthesis)/back-end (P&R) discontinuity requires several changes in the way chip EDA tools operate and how they interface with each other. One important change would be a logic-synthesis tool that does not have more capability than it needs to have to work with the newer, more efficient physical-implementation tools. For example, a front-end tool flow, based on logic synthesis, would comprise tools for tasks such as:
- Checking RTL code (linting)
- Selecting the best architecture to meet design specifications
- Logic optimization
- Technology mapping and library selection
- Inserting design-for-test (DFT) scan logic
- Performing feasibility analyses (area, timing, power, routing congestion).
The logic-synthesis tool would not do irreversible logic optimization to meet timing or area constraintsthis is a task for the back-end P&R tool, which also does clock-tree insertion, design-block placement, and both global and detailed routing. Other back-end tasks that a physical-implementation tool should address include delay extraction, reliability analysis, and power and signal integrity based on a fully wired chip layout.
Front-end tools are critical for providing RTL synthesis along with design performance and chip-size estimates prior to the chip's physical implementationa virtual prototype of the designwith fast area and timing estimates and an RTL-to-netlist flow that takes just a few hours, not days. It is at this point that the design can be handed off to an intelligent P&R tool for optimization and successful physical implementation.
SoC technology has changed. Along with this change is a need to review current EDA-tool capabilities and design methodologies. We need to make sure that the tools a chip designer uses provide an optimum environment for efficient and accurate design, even if that means changing the tools to meet advanced technology requirements.
About the Author
Jim Lipman is a consultant providing marketing, writing, and other electronics industry services, specializing in EDA tools and ASIC/SoC design methodologies. His job experience includes chip-design R&D, marketing, marcom, technical editing, and on-line publishing of technical content for engineers.