The basics of using an enhanced TPU
Michal Hanak, Milan Brejl, Freescale Semiconductor
Embedded.com
The advanced timer modules available on a number of advanced processors
" such as the TPU and enhanced TPU (eTPU) on the ColdFire and PowerPC
CPUs " are powerful event-driven controllers that are useful in a wide
range of engine and machine control applications.
While their main purpose is to off-load the main CPU when processing
time-triggered or external pin-triggered events, and when controlling
the output pins in a timely manner, these event-driven timer modules
can also be used to configure an on-chip edge-driven "logic analyzer"
device.
Although programming an event-driven eTPU requires a slightly
different approach compared to standard microprocessor units, it is not
as complicated as it may seem at first sight.
In this article, we will discuss the little bit of low-level
programming needed to use the TPU in this way, using the MCF5235
ColdFire processor, and its eTPU and Fast Ethernet Controller.
Mostly
for demonstration purposes, we will use the uClinux
operating system
and its TCP/IP networking to implement communication between the device
and the host PC. The business card-sized evaluation board M5235BCC
was
selected as the development platform (see
Figure 1, below).
While the Logic Analyzer built on the eTPU and ColdFire MCF5235 can
not compete with today's high-end analyzer devices, it has proven to be
useful in the development of many different applications as a zero-cost
replacement of such expensive equipment. We also see some potential for
this eTPU-based Logic Analyzer when developing and tuning other
eTPU-based applications, simply because it shows all the signals as
seen through eTPU eyes.
 |
| Figure
1: In addition to a 16 channel eTPU connector, the M5235BCC development
board contains an MCF5235 ColdFire processor, 16MB RAM / 2MB Flash, an
10/100 Ethernet transceiver, a BDM connector, and an RS232 console
connector. |
Edge-driven Logic Analyzer
So, what is this edge-driven logic analyzer that we are going to build?
In short, it is a device that should behave like a standard logic
analyzer by means of measuring multiple digital input signals, saving
them and displaying the resultant waveforms on the screen.
Unlike the classic logic analyzer which samples the input signal
states periodically at very high rates and saves the data to memory,
the event-driven device is idle-waiting for an edge on any of its input
signals. Each such edge causes the value of a global free-running timer
to be saved in memory, together with information on the channel, where
the edge occurred, and the polarity of the edge. Using such data, the
input signals can be fully reconstructed and the resultant waveforms
can be drawn, similar to a standard logic analyzer device.
Of course, drawing the signal waveforms is not the only feature
users look for in a logic analyzer. Today, modern devices offer a
wealth of signal analysis options, as well as advanced methods to
"trigger" the signal sampling to "catch" the moment of interest. On the
other hand, I personally worked on many applications where a simple
drawing of the signal waveforms, with a simple "wait-for-edge" trigger,
helped a lot in successfully completing the development.
The logic analyzer we are about to design does not aspire to replace
high-end devices. In the first place, this application should
demonstrate the capabilities of the eTPU unit and perhaps also to free
it from the shackles of engine control. We hope that after reading this
article, you realize that using a processor with an eTPU module may be
a reasonable single-chip solution in applications where the processor
requires some external logic or a PLA device to handle input and output
signals.
eTPU Logic Analyzer Function
The eTPU function Logic Analyzer (LA), assigned to 16 eTPU inputs,
utilizes the eTPU hardware for capturing input transitions and storing
their times to RAM.
The eTPU module (Figure 2, below)
has a micro-engine, which serves channel events. Each of the eTPU
channels includes double input capture hardware. When a transition is
detected on a channel, a 24-bit transition time is captured to the
channel capture register. At the same time, the channel asks the engine
to service this capture event.
If the engine is idle, it starts the service immediately. If the
engine is busy servicing another channel, the service is delayed. If a
second transition is detected on the same channel prior to servicing
the first, the second transition time is captured to the second capture
register available on the channel. This enables the capture of very
narrow pulses.
The main goals of the channel event service are storing a transition
record to RAM and releasing the capture hardware for following
transitions. The transition record is a 32-bit word consisting of the
following fields:
* transition time (24 bits)
* channel number (4 bits)
* transition polarity (1 bit)
One or more records can be stored to memory within one service. If
one transition is captured in the channel hardware, one record is
stored. If both channel capture registers hold transition times, two
records are stored. Moreover, the polarity of the input signal can
indicate that a third transition came, the time of which is not
captured. In this case, one additional record is stored, using the
current time instead of the lost transition time.
The transition records are stored in a circular buffer in the eTPU
Data RAM. Within the 1.5kB of eTPU Data RAM available on the MCF5235,
1kB is allocated as the buffer. This enables the storage of up to 256
transitions without CPU service. The eTPU buffer has two threshold
levels " one at the middle and one at the end of the buffer. From the
beginning, the buffer is filled from the bottom, checking for the first
threshold level.
When this level is reached, an interrupt is generated to the CPU.
The CPU needs to copy the first half of the eTPU buffer to a large CPU
buffer. In between, the eTPU uses the second half of the buffer to
store the following transition records. When the second eTPU buffer
threshold is reached, another interrupt is generated to the CPU in
order to take over the second half of the eTPU buffer. The first half
of the eTPU buffer has already been copied, so the eTPU continues to
store from the bottom of the buffer again.
The frequency of the eTPU clock counter is 37.5MHz. Thus, the
transitions are captured with a 27ns precision.
 |
| Figure
2: TPU module operation. |
Recently, the basic functionality of the LA eTPU function has been
extended to support the following features:
* 32-bit time range
The 37.5MHz clock overflows the 24-bit range every 447ms. In order to
increase the range of total measurement time, the LA function checks
for clock counter overflows, and every time an overflow occurs, a
special record is stored in the buffer. This record means an update of
the 32-bit transition times' most significant byte. Thanks to this, the
total maximum measurement time is now almost two minutes.
* Maximum measurement time
The LA function enables limiting the maximum measurement time, and
notifies the CPU when it elapses.
* Trigger
The LA function supports a simple trigger algorithm. For each channel,
either a low-high, high-low, both, or no transition can be marked as
the triggering transition. The first triggering transition detected
generates an interrupt to the CPU, and its time is stored as the
trigger time.
eTPU to CPU Interface
It has become a convention that the eTPU function developers create
interface routines to their eTPU functions. These API routines provide
an easy interface for the CPU application, enabling the initialization
and control of the eTPU functions. The API aids the CPU application
developer, avoiding the necessity of knowing any details of the eTPU
function implementation.
The same approach is applied to the LA function as well. The
interface routines are called Logan API. The
API includes four easy to use run-time functions:
void LoganInit(void)
void LoganStartMeasure(void)
void LoganForceTrigger(void);
void LoganStopMeasure(logan_state_t why)
and two interrupt handlers:
void LoganBufferFilled(void)
void LoganTriggerOrStopDetected(void)
The API also creates the following global data structure (below to the left):
The logan_measure_t
data structure consists of three parts. The measurement setup
substructure is filled according to user selection of active channels,
trigger condition, pre-trigger time, and total measurement time. When
these values are set, the function can be called in order to initialize
the eTPU module for measurement. The measurement itself is started by
the function LoganInit()LoganStartMeasurement().
The measurement state value monitors the Logan operation. At the
moment the measurement is started, the state turns from LOGAN_DISABLED
to LOGAN_WAITING_FOR_TRIGGER. When the trigger condition is detected,
the state reflects it by the value LOGAN_TRIGGER_DETECTED, directly
followed by LOGAN_START_LOCATED.
If no trigger is detected for a long time, the measurement can be
triggered manually by the function Measurement is stopped in one of
three ways. Either when the total measurement time elapses, the CPU
buffer is filled, or manually by the function LoganForceTrigger().LoganStopMeasurement().
All of these possibilities are reflected by an appropriate state
value change. The third part of the logan_measure_t data
structure, the data substructure, is filled by the Logan API
after the measurement has finished. It includes information necessary
to visualize the measured signals " the time when the trigger was
detected, the time boundaries of the whole measurement, and
corresponding positions in the CPU buffer of the recorded edges.
The interrupt handlers process the eTPU interrupts generated during
the measurement. The LoganBufferFilled()
is called on an eTPU channel interrupt, raised when either the lower
half or the upper half of the eTPU buffer has been filled. This
function copies the eTPU buffer data to the CPU buffer, and checks for
a buffer overflow.
The LoganTriggerOrStopDetected()
is called on an eTPU global interrupt, raised when the trigger
condition is met or when the total measurement time elapses. In the
case of trigger detection, the trigger time is stored, the measurement
start time is calculated, and the corresponding position of the first
measurement transition in the CPU buffer is located, so that the
pre-trigger time period is protected. In the case of a total time
lapse, measurement is stopped, the stop time and the corresponding
position of the last transition in the CPU buffer are stored in the
data substructure.
CPU to Host Interface
Having done the eTPU work, the CPU-programmer now has the Logic
Analyzer operation fully under control using a simple C-language API.
To really create the signal waveforms on the screen, however, two major
problems are still to be resolved.
The first one is the implementation of an Ethernet network-based
communication between the CPU and the host PC. The second challenge is
to write a PC-based application to process and display the waveform
data and to provide the user with some kind of graphical interface to
the Logic Analyzer configuration.
Network Interface
As none of us who participated on developing this application was an
expert on communications and networking, we decided to find some
standard, ready-to-use solution which would fit our needs. It turned
out that there are a lot of solutions which enable standard TCP/IP
communication for the ColdFire platform, so we started to think about
which one would be the best.
Finally, we decided to use the full uClinux operating system and its
network capabilities to do the job. We thought this would be a great
opportunity to demonstrate the combination of low-level time-critical
tasks with high-level control software on a single chip " exactly what
we had wanted to achieve from the beginning. With a vague idea of what
it means to implement the uClinux, and with minimal previous
experience, we were also looking forward to learning something new.
Please
take the remainder of this article as a report on how we
progressed in the development, rather than an official guide to making
uClinux work on the ColdFire. An experienced embedded-Linux guru would
surely find some imperfections, but this is simply the route we took.
To start the uClinux or any embedded-Linux development, it is always
better to have a standalone Linux-based workstation. In our case, this
was a common PC computer running RedHat Linux 9.
As a starting point, it is worth reading the uClinux overview on the uClinux.org
web site
and especially the pages dedicated to the uClinux port for ColdFire
platforms .
Before starting, download the GCC compiler and other m68k
(ColdFire)
build tools from the site as well. In our application, we have used
the Linux
kernel 2.4, which can be
built using the GCC version 2.95.3. In case you want to be up-to-date
and use the newer 2.6 kernels, the GCC version 3 will be required. Back
in March 2005, when we were working on this application, there were
known issues in using the GCC 3 so we decided to use the older and more
stable version of both the kernel (2.4.27) and the GCC (2.95.3).
eTPU in Linux
In Linux, just as in any other operating system, things do get
complicated when you want to access the processor resources at
low-level. In theory, the system separates and protects the memory
spaces of different running applications, as well as the memory space
of the operating system kernel.
The mechanism used to achieve such protection is called "virtual
memory" and it also utilizes a bit of the silicon (Memory Management
Unit " MMU). On high-end processors such as x86, PowerPC, or ColdFire
54xx, the system and the MMU ensure that the user processes live in
their own linear memory space, without direct access to the memory of
other processes or the kernel. All addresses are simply logical
("virtual") and do not refer directly to the physical memory " in fact
the address values completely lose their meaning if taken outside of
the process context.
The Operating System kernel and the MMU manage the
virtual-to-physical address translation tables for each process and
they translate addresses on-the-fly as the process is running. The user
process has no way of turning the translation mechanism off, so even if
there are memory-mapped peripheral registers on a well-known fixed
(physical) address, the process can not access them. In other words,
the operating system kernel is the only one who may access the device
peripherals.
So how can a standard (user-space) application make use of a
peripheral device? It always needs to go through its kernel-space ally
" the so called "device driver". Using the software interrupt called a
"system call", the process activates the system call handler in the
kernel and passes parameters describing what it needs to do.
From the application programmer perspective however, this
complicated mechanism is pretty much hidden in the standard C library.
Function calls such as fopen, fork, socket, etc. always end up with a
system call to the operating system kernel and are passed further on to
the appropriate device driver.
There are several kinds of device drivers in Linux. The simplest one
is the so called "character" device, which mimics the behavior of a
plain file in the file system. With a character driver, standard file
I/O operations are used to access the driver and to exchange data
between kernel-space and user-space applications.
In our application, we will write the character device driver to act
as an interface between the low-level Logic Analyzer running on the
eTPU and the user application which implements the TCP/IP network
connection.
The ColdFire processors of the 5200 family, which also includes the
MCF5235 used in our application, do not have the Memory Management
Unit, and there is also no concept of "virtual" memory. The Linux
operating system could not normally run on such a platform without
significant modification.
The uClinux patch,
as it
was originally called, enables the Linux to run from the one and only
memory area, and to use it for both kernel and user applications.
Despite this drastic MMU-surgery, it still retains a lot of its great
features and enables advanced operating system-based applications to
run on cheap and simple devices.
From our Logic Analyzer perspective, and because there is no memory
protection in uClinux, it would be easily possible to access the eTPU
unit directly from the user space TCP/IP application. However, this
would not be "the right way" and would definitely close the door to
reusing the code on the PowerPC-based platforms. Obeying the Linux
rules, we have developed both the Logic Analyzer kernel driver and the
TCP/IP user-space application separately.
Logic Analyzer, the Big Picture
The Logic Analyzer application (see schematic
in Figure 3, below) consists of the following parts:
* The eTPU unit, running the low-level Logic Analyzer function on
all its channels
* The kernel driver (logan_drv), managing the eTPU operation and
implementing the high-level C interface to the Logic Analyzer operation
* The TCP/IP server application (logan_srv) running in user-space and
accessing the kernel driver through the virtual file interface.
* The TCP/IP client application, which is the Microsoft Windows-based
ActiveX visualization component in our case (not covered by this
article).
 |
| Figure
3: Schematic view of the eTPU Logic Analyzer application |
There is also a web server running on the uClinux system which
presents the HTML infrastructure pages to the client. The main "Logic
Analyzer" HTML page also contains an auto-installing cabinet file of
the ActiveX visualization component, so there is no need to install it
separately on the client computer. When the page is opened, the ActiveX
self-installs, establishes its own TCP connection to the logan_srv
server, and starts "streaming" measured data.
Having the web server in the system, there is also the possibility
of using the Common Gateway Interface (CGI) application and use a
standard HTTP protocol to communicate with the Logic Analyzer. When
testing this approach, it turned out that the HTTP overhead degrades
the performance significantly.
The version of the "boa" web server
we had used did not support the "keep-alive" links that time, so the
visualization agent needed to re-establish a TCP connection each time
it wanted to retrieve the device status or data. Although not tried, we
believe that with the latest version of the "boa" web server, or the
heavyweight "Apache"
server, it would be possible to use the CGI as a
solution comparable with the standalone "logan_srv"
application.
All the communication between the kernel driver, TCP/IP server, and
TCP/IP client runs in a common plain text protocol:
* The "logan_drv"
kernel driver accepts text-based commands on its input (writing to the
virtual /dev/logan file), and returns the status or data also as plain
text (reading the file until end-of-file).
* The TCP/IP server is
a very simple text-based forwarding machine. Any text message it
receives from the TCP/IP client is sent on (written) to the /dev/logan file.
Then in-turn, the TCP/IP server also reads the /dev/logan file and
sends its complete content back to the client.
Although not really optimal with regards to the communication
speeds, using text-based communication eases the debugging and testing
of the driver and the system as a whole. The "echo" and "cat" commands can be used from the
uClinux console to write to and read the /dev/logan file,
and to exercise and diagnose the underlying eTPU Logic Analyzer
function.
Next in this two part article:
Building the uClinux application.
Michal Hanak is systems engineer
and Milan Brejl, Ph.D., is System Application Engineer is Freescale Semiconductor's Roznov Czeck Systems Center.
|