CMP - United Business Media TechOnline
All Articles Products Courses Papers VirtuaLabs Webinars Web



 
LoginRegister
      TechOnline > Design Article
Under the Hood
December 04, 2002

Software Is Key to Choosing Dual- or Single-Processor SoCs

Jack Shandle
TechOnline

It's no secret that the cost and time spent developing application software is having a growing impact on SoC designs. However, that consensus often does not extend to critical architectural choices made early in the design cycle.

In particular, the choice between using DSP and RISC cores in tandem or getting along with a single-core should include software development as a major consideration. Since DSP functionality is a fast growing requirement of an increasing number of applications, the dual-core/single-core question is going to confront many more design teams in the next few years.

There is no easy answer—even experts disagree. The issues you must include in the design matrix include:

  • Number-crunching performance
  • Memory utilization
  • Estimating the sheer amount of code you must develop
  • Coping with two operating systems and their corresponding tools sets
  • Assuring the availability of software components such as signal-processing algorithms
  • Accurately determining the amount of assembly language code to optimize performance tool integration
  • Risk tolerance.

Computation Requirements
The obvious place to start is by assessing number-crunching requirements. If a RISC core with DSP extensions can't handle the computation load, the team has several options. It can add a coprocessor, replace the RISC core with a DSP core, or, add a DSP core. Choosing the right option requires a careful analysis of the application, says Jeff Bier, general manager of Berkeley Design Technology (BDTi). Typical DSP applications are comprised of multiple functions, he says, such as MP3 decode or speech compression. Each such function is in turn typically comprised of multiple algorithms. For example, within MP3 would include a discrete cosine transform, filters, and so on. One aspect of choosing a processor architecture is to determine how frequently a function runs under various operating conditions. A technique called application-profiling is used to derive function performance from algorithm benchmark results. This technique will be described in greater detail later.

In virtually all cases, you need to write some signal-processing code in assembly language. Therefore, beyond the top-level consideration of matching processor performance to the application's feature set, the design team has two additional tasks. First, to estimate the number of lines of assembly language that must be written for the single- and dual-core solutions. Second, estimate the impact developing that code will have on the length of the design cycle. Even beyond that, the team must see if the needed algorithms are available off-the-shelf (but customizable) for the single-core solution.

Ready to take the dual-core option as a matter of course? Think again. Even in a thoughtfully partitioned SoC, dual-core architectures introduce a significant layer of complexity driven largely by inter-processor communication. In a dual-core design, interprocessor synchronization and communication is required; this must be carefully planned for in both the hardware and software designs.

Staffing presents still another issue. DSP code developers are typically EEs who understand algorithms and hardware implications. They are also a small group compared to the available pool of RISC software developers.

Decision Criteria
Brian Carlson, DSP marketing manager for LSI Logic, has seen many design teams struggle with the dual-core/single-core dilemma. He agrees with Bier that the decision often resolves itself into the feasibility of having one or two design teams. "If you have to combine these two types of guys on the same processor and do both types of processing on it, that quickly becomes more of a headache than if you have two cores."

On the other hand, Carlson thinks that code development on a dual-core system typically does not entail significantly more—and more difficult—code development. "Unless there is a lot of inter-processor communication code to develop where you get into things like task switching, task synchronization and hard-real time execution," he says, "the single-core system could involve more code development because it's doing multiple other things besides the DSP functions."

The criteria Carlson has seen emerge is a combination of performance, cost and power sensitivity, scalability requirements, and of making a determination of how tightly coupled the two cores have to be (Figure 1). Dual-core projects tend to fall into the categories of high performance requirements, relatively low cost/power sensitivity, loosely coupled processors, and a high scalability requirement.


Figure 1:  Applications go a long way in determining whether the design should have one processor core or two

LSI Logic's solution to the dual-core/single-core dilemma is to provide a unified development tool set, RTOS options, and other components that make the dual-core option less expensive at least. "It used to be if you were developing a dual-core," Carlson says, "you'd have two different sets of tools, two sets of controllers, and so forth. In other words, you have twice the expense." That's no longer the case with LSI's ZSP cores, but other companies are moving in the same direction.

High Abstraction Helps
From a software vendor's perspective, still more considerations come into play. Somewhat surprisingly, the dual-core solution has more advantages to Gareth Noyes, corporate alliance manager at Wind River Systems.

"DSP is nearly always autonomous," he says, which, in Carlson's terminology, means loosely coupled. Although code development can proceed virtually in parallel, debugging typically requires a common session. Noyes, who managed the launch of Wind River's VSPWorks product for developing signal-processing-intensive applications, contends that by raising software development to a higher level of abstraction with callable C libraries, "many clever things can be done that can't be with native DSP code."

An example is VSPWorks, in which inter-processor communications is handled by the operating system. Even the DSP kernel, which must be small and run on very limited memory, is treated as part of the OS (a microkernel). This approach leads back to the questions: How much middleware do you need (GUI code, network protocols, and so on for the CPU and signal-processing algorithms for the DSP); and, can you get it off the shelf?

Nick Lethaby, DSP/BIOS Product Manager at Texas Instruments, agrees that two teams can develop the control and DSP software "in isolation but the inter-processor visibility comes in when you bring them together and see how they interact in real time." That's accomplished with an emulator that controls both the CPU and the DSP cores simultaneously.

Providing a common interface to set breakpoints on both the CPU and the DSP is much less important than the ability to simultaneously halt both processors to view precisely how they are interacting, he says.

Debugging Techniques
Debugging the application software that runs on SoCs is tricky, time consuming, and prone to error because signals are inaccessible by the physical probes that software developers typically use for board-level debugging.

There is a solution, however, and it is based primarily by on the IEEE Standard JTAG specification (Figure 2). Although created as a means of scanning registers in a single I/O line, nothing in the JTAG specification prevents the insertion of JTAG scan registers anywhere in the logic. By inserting them in multiple signal lines, designers can enter specific data and read results inside the chip.


Figure 2:  The JTAG scan register interrupts a signal line with circuitry that allows it to capture the incoming signal, substitute a test signal, and move data to other scan registers in a serial chain

You can generate plenty of debugging data this way but there is still just one set of I/O pins to access the data. This means the results from each internal scan register must be daisy-chained together and disassembled by the debugging host. The more scan registers, the more data to be processed.

"Cores make it more complicated," says BDTi's Bier, because you typically have to get all your information about what's going on in the chip from one set of I/O pins. You may pull out a hundred thousand bits and you have to figure out which four or eight bits are the ones you want to analyze."

Complexity Rules
The fundamental problem associated with scan-based debugging is dealing with complexity. Every instance of register loading involves three steps: Shifting the right number of bits into the scan chain at the right place so that they reach the right scan register; issuing a command to transfer the data to the scan register; and, issuing a follow-up command to transfer the data into the processor's register. To recover data from the processor's registers, similar steps are followed in the opposite order.

Additional complications come into play because design iterations do not necessarily have the same internal chain, especially if the SoC design team allows place-and-route tools to create the scan chain. Another drawback of scan-based debug is that the processor must be stopped in order to access memory registers, set breakpoints, or change data values in memory. In some instances, stopping the processor renders the test useless. So another debug technology called background-debug mode (BDM) is used to complement JTAG scan. BDM can generally implement these activities without stopping the processor. BDM also uses an instruction set rather than bit manipulation to probe processors, memory, and peripherals.

Not surprisingly, tools are available to help software developers manage these problems. Tools from Green Hills Software, Wind River Systems, and other vendors make the complexities just described practically transparent to the developer. Not the least of their advantages is that they allow developers to debug in-source code rather than native machine code, and to use high-level languages.

Software Probes
Combining scan-based debugging with BDM and newer techniques such as "software probes," available from several vendors, provides a complete debugging environment. In scan-based debugging, everything happens through the host's parallel port. Probes, on the other hand, speed the process by utilizing specialized logic in the target that controls and clocks the scan chain. They also employ a higher data-rate connection such as Ethernet between the host and the target.

"The Green Hills Probe, for example, can achieve code/data download speeds in excess of 500 Kbytes/s," says Vice President of Marketing John Carbone.

Dual-processor SoCs present an even more complex debugging environment, says Craig Franklin, Green Hills Software's vice president of engineering. His view is that the same development tool should be capable of debugging code running on both the RISC and DSP cores. The value of this approach is that the tool can halt each processor simultaneously when one of them finds a breakpoint in its code. It can also create a trigger in another part of the SoC to halt the processors so the developer can examine the data point and the code that controls it, he says.

Application Profiling
SoC design teams faced with a project that includes both a RISC and a DSP core are almost certain to be more familiar with RISC architectures and developing software for them. But tools and techniques that exist almost entirely in the DSP space can be invaluable in the critical assessment of whether a DSP core has the right stuff to handle the applications that will run on it.

One such technique is application profiling—a very practical use of performance benchmarks generated either internally or by consultants such as BDTi.

Application profiling estimates the number of times key algorithms are executed when an application is run. This is straightforward when the application is written in C because algorithm kernels can be identified as subroutines. If only assembly code is available, it can be run on an instruction set simulator that has profiling capabilities, says BDTi's Bier.

The computation demands the application requirements of the DSP core under various use scenarios can be estimated by combining standard benchmark data for kernels such as FFTs and filters with the application profile. This is accomplished by multiplying the benchmark execution times by the number of times each is run under different application-use scenarios.

 

For more information, read Application Profiling: One Processor or Two? by Jeff Bier and Adam Lins of Berkeley Design Technology.
 
It's important to remember, says Bier, that, "application profiles are for one point in time and the processing load often changes from time to time. A PDA, for example, interacts with application traffic in a number of ways but an MP3 player is either on or off." (See sidebar article: Application Profile: One Processor or Two?)

Besides being useful in determining if the core has the right stuff to handle the application, profiling can help the design team decide which parts of the application should be coded in assembly language for high-performance processing. Profiling can also indicate how the total computational load should be split between the DSP and RISC cores.

"It is equally relevant," says Bier, "to consider a memory profile, which interacts with the application profile in a number of ways." It helps the design team estimate how much off-chip memory is required by the DSP core. "If an application can't run without accessing off-chip memory," says Bier, "there's a price to be paid. External memory accesses slow down the processor, for example, and drive I/O pins that have implications for the design's power budget."

Trend Favors Dual-Core Solutions
As with most hardware/software integration problems, the trend for the future is to deliver as close to total solutions as possible. At Texas Instruments, Lethaby says, the conclusion is that software complexity will increase dramatically in many SoC products.

"A 3G modem is four to five times more complex than a GSM modem," he says, adding that smart phones will also include calendars, games, digital cameras, and who knows what else, all increasing software's role in the solution. As a result, TI's goal is not just to provide the hardware but also to increase by 10% each year the software TI provides to its customers.

Beyond the basic solution of supplying an ever-increasing number of software plug-ins, TI has created an architecture in its OMAP processor that make the integration easier (Figure 3). For example, OMAP provides specific hardware for microprocessor-DSP communication. This makes it possible to provide a driver and higher-level communications services (such as, data streaming and DSP boot loading) with the chip, eliminating several months of work for the end customer, Lethaby says.


Figure 3:  Texas Instruments' OMAP architecture provides hardware support for inter-processor communication that, in turn, makes software integration easier

One strong economic driver is the escalating cost of integrating development tool chains, middleware, RTOSes and other components. When Wind River surveyed its 10 largest customers, for example, it found that 50% of their composite R&D budget is being spent on tool and component integration—these figures include both SoC and board-level development.

In early November, both Green Hills and Wind River announced fully integrated development suites and new pricing models to go along with them. Wind River offers the choice of development platforms that address market segments: consumer, industrial, network equipment, server appliances, and aerospace/defense. Green Hills took a one-size-fits-all approach with its TotalDeveloper package that includes an RTOS, C/C++ compilers, source-level debugger, networking protocols, hardware debug probe, powerful development and analysis tools, on-site engineering services, and premium technical support.

The new offerings are nothing less than a paradigm shift, according to Paul Zorfass, senior analyst for International Data Corp. But as quickly as software tool chains move to higher levels of abstraction, the complexity of SoC design seems always to be one step ahead.


About the Author

Contributing writer Jack Shandle is a former chief editor of both Electronic Design magazine and ChipCenter.com. He holds a BSEE degree and has written hundreds of articles on all aspects of the electronics OEM industry. Jack is president of eContentWorks, a consultancy that creates high-value content for publishers, eOEM corporations, and industry associations. His email address is jshandle@earthlink.net.

 
Rate this article
WORSE | BETTER
1 2 3 4 5




Berkeley Design Technology
Green Hills Software
LSI Logic
Texas Instruments
Wind River
   

ARTICLE
1. The Give and Take of Designing RISC/DSP Dual-Core SoCs

ARTICLE
2. Shared, Switched, or Networked? The Uncharted Future of On-Chip Buses

ARTICLE
3. DSPs Repeating History