By Andrew W. Davis
Video technology is playing an increasing role in a number
of real-time systems. While many early systems were tied in to
standards developed for the broadcast industry, video is now
moving out on its own. One of the areas where this is most
evident is in machine vision, where systems today use frame
sizes and frame rates totally distinct from common broadcast
industry standards.
Camera Overview
The video camera represents the interface electronics
between the optical imaging system and the viewing or image
analysis system, which in machine vision applications, is the
computer.
In industrial applications, there is a wide variety of
problems to be solved, depending on the type and size of object
under study, whether the object is moving or stationary, the
type and size of measurement to be made or defect to be
detected, and so on. Hence the vision industry is populated
with a wide variety of lighting, optics, cameras, and computer
interface options.
The photosensitive devices used inside video cameras can be
solid state or non-solid state technology. Non-solid state
devices are the classical vacuum tube-based photosensors such
as vidicons and image orthicons found in older cameras. These
have gradually been replaced by solid state devices which are
continually improving in resolution, uniformity (pixel to pixel
variations), and cost. Solid state devices, as their name
implies, are based on silicon technology and offer many
advantages, including smaller size and higher reliability over
their tube counterparts. Solid-state photosensor devices
include photodiodes, phototransistors, charge injection devices
(CIDs), and charge-coupled devices (CCDs). CCDs have advantages
today in terms of both cost and performance, and are now almost
the exclusive technology used in machine vision cameras.
CCD-based cameras come in two basic formats, area arrays and
linear arrays. Each comes in different sizes to meet different
resolution needs. Some common area cameras are 256x256,
512x512, and 1kx1k. Linear cameras typically range from 512x1
to 4kx1. When scenes are not linear or orthogonal, special
camera formats may be available to meet special applications,
such as circular line cameras.
The CCD photosensor is a sampled device. The CCD itself is
composed of discrete photosites or pixels that accumulate
(analog) electrical charges based on the quantity of light
hitting each one. For area cameras, each photosite is scanned
out of the CCD device in a pixel-by-pixel, line-by-line
sequence, thereby creating an analog video signal. Typically
the analog voltage levels are sequenced in accordance to the
RS-170 or CCIR video standard and the appropriate
synchronization signals are added by other pieces of the
camera's electronics. The result is a standard video signal
that is compatible with other standardized video devices.
For linear-array-based cameras, known as line-scan cameras,
the 2-D image is produced a line at a time, typically by moving
objects past a stationary camera, but occasionally by moving
the camera over stationary objects. Line-scan cameras generally
produce nonstandard video signals.
Video Standards
Video signals have long adhered to standards, a necessity in
the world of broadcast TV, where content providers and
broadcasting companies needed the support of multiple,
independent video equipment vendors and the availability of
consumer-level plug-and-play capabilities. With standards,
cameras, monitors, recorders, and signal generators can all
handle the same signals. Standards specify the specific
scan-rate timing, the number and order of lines in an image
frame, the image aspect ratio, synchronization signals that
indicate the beginning of each line and each frame, color
signal encoding, if any, and image brightness and color signal
voltage levels. Hence, until recently, most cameras available
in the market adhered to one of the video standards. Today, a
wide variety of non-standard cameras are also available to meet
various applications needs, but the low-cost segment of the
market is dominated by standards-compliant units. While
numerous video signal standards have evolved, the most common
are those that have been adopted as national standards for
commercial broadcast television use.
EIA RS-170
In the United States, RS-170, produced by the Electronic
Industries Association, embodies the technical specifications
that were originally defined in the late 1930s in order to
standardize the black and white TV industry. RS-170 defines an
aspect ration of 4:3, a 2:1 interlaced scan technique, and
horizontal and vertical synch pulses. An entire RS-170 frame is
made up of 525 lines; each frame is sequenced out every 33.33
milliseconds (ms). Hence each field contains 262.5 lines
sequenced every 16.67 ms, for a line time of 63.49 s. This
divides out into a line frequency of 15.75 kHz, a commonly
referenced RS-170 line rate. The 262.5 lines/field however are
not all for video information. The vertical sync interval and
settling period chew up 20 line times, leaving 242.5 for image.
Similarly the horizontal sync process uses up 10.9 s, leaving
52.59 s of active line time, which determines the sampling
rate needed to achieve any give number of pixels per line.
RS-170 also specifies electrical voltages. The overall range is
a 1V swing from -0.286V to +.714V. The zero voltage level is
the blanking level. Sync pulses go from 0V to -0.286V.
With 242.5 active lines per field or 485 active lines per
frame and a specified 4:3 aspect ratio, RS-170 yields 646.66
square pixels per line. (Square pixels are easier to deal with
in machine vision applications, but not absolutely a
requirement.) The actual computer-based implementation is
typically 640x480. The brightness level of any pixel is
represented with a number which reflects the resolution of the
A/D converter, which is not part of the RS-170 specification.
Eight bits is typical, though 12, 16, and 32-bits are also used
in specialty applications.
The RS-170 extension has several extensions. RS-343A defines
video signals of higher resolution containing between 675 and
1023 lines per image frame. This is based on the RS-170
specification with modified timing waveforms. Similarly, RS-330
defines additional video signal electrical performance
characteristics for the RS-170 signal.
NTSC
In the 1950s, the National Television Systems Committee adapted
a color standard widely known as NTSC, also known as RS-170A,
since it is a modification or superset of the RS-170 standard.
NTSC modifies the RS-170 standard to work with color video by
adding color information to the existing monochrome
brightness signal. The NTSC signal is a composite color signal
because it is created by combining color and brightness
information on a single signal. The alternative is to separate
the components onto separate signals (R-G-B for example). Color
signals representing hue and saturation are combined using
phase and amplitude modulation techniques into a single
chrominance signal. The chrominance signal is added to the
RS-170 brightness signal (luminance), together with a color
reference signal called color burst at the start of each
line.
The NTSC system allows the coexistence of monochrome and
color television, an important constraint at the time it was
introduced. Other schemes being considered at the time would
have required broadcasters to broadcast two signals, one for
monochrome, one for color. Today, NTSC is often derided as a
kludge. For many machine vision applications, NTSC is the low
price-low performance configuration.
CCIR
The CCIR video standard is the European equivalent of RS-170.
CCIR specifies a 625-line image with a frame rate of 40 ms, a
2:1 interlaced scan, and a 4:3 aspect ratio. The 50 frame per
second, like the 60 frame per second number in RS-170, matches
the frequency rate of the electrical power system used in the
different countries. The CCIR standard was also adopted for
color. This is known as PAL, phase alternation line. However,
France and a few other countries use a third color standard
called SECAM.
CCIR-601
CCIR-601 is a standard for the digital encoding of component
color TV. It uses a 4:2:2 sampling scheme for Y, U, and V with
luminance (Y) sampled at 13.5 MHz and chrominance (U and V)
sampled at 6.75 MHz. These frequencies work for both 525/60
NTSC and 625/50 SECAM and PAL systems. CCIR-601 specifies that
720 pixels be displayed on each line of video. CCIR-601 is a
digital video standard, different from the other standards
discussed above, which are analog. It also deals with component
signals rather than composite signals.
Y/C-Video
While component systems carry the R-G-B color information on
separate signals, and composite signals carry all the
information on one signal, an intermediate standard has
evolved. The Y/C component color standard conveys the color
video signal as a luminance (Y) signal identical to the
standard RS-170 monochrome video signal and a chrominance (C)
signal identical to the chrominance subcarrier defined in the
NTSC standard. However, by using separate signals, a higher
quality level is achieved. Y/C video is also known as S-Video,
super-video, and S-VHS.
Non-Standard Video
Some specialized nonstandard video formats have also emerged
over the years. These are especially relevant to the vision
industry. Some of these nonstandard formats use synchronization
timing established by the RS-170 or CCIR standards. For
example, some digital video cameras conform to RS-170 timing,
but transfer their image as a digital data stream rather than
as an analog signal. The video signal is typically of superior
quality.
Cameras for Machine Vision
RS-170 was optimized for the human perceptual system and the
technology available to the broadcast TV industry decades ago.
Interlaced video reduces flicker for the human eye and 30
frames per second eliminates many noise problems associated
with 60 Hz power supplies. The 4:3 aspect ratio makes for a
pleasing TV image. But for machine vision applications like
metrology and inspection, where a computer and not a human eye
is the image recipient, these specifications make little sense.
For example, with interlaced lines, the computer has to reorder
the data to make a sensible image while in the human eye this
is done automatically (through persistence). And the non-square
aspect ratio typically results in non-square pixels. This
complicates calculations, since 4 pixels in the x direction
would represent a distance different from 4 pixels in the y
direction. And calculating the length of a line at an arbitrary
angle is more complicated still. For machine vision, square
pixels are a great advantage. Finally, for applications where
motion is involved, being locked in to the 30 frames or 60
fields per second specified in RS-170 has no logical basis and
is a distinct disadvantage.
Hence, the standard for vision cameras today is the
"nonstandard" variable scan camera. With variable scan there
are no fixed restrictions on the organization of the pixels or
the timing of the video. Rather, these are user defined and
application dependent. Variable scan cameras provide a level of
flexibility not afforded by RS-170 cameras; in this way, the
data does not consist of a significant number of unwanted
pixels nor does it require complicated lighting and exposure
solutions. Progressive scanning allows for full frame
resolution when images of moving objects are grabbed, and
images can be processed as they are acquired, rather than
having to wait until the entire image is available.
Almost all machine vision cameras today are based on CCD
sensor technology. The CCDs in turn can be either array (2-D)
or line scan (1-D) designs. A variation on line scan cameras is
TDI (time delay integration), which does multistage integration
for enhanced sensitivity.
Following are some factors to consider when evaluating a
camera technology for a machine vision application.
- Area Cameras Versus Line-Scan Cameras
Both architectures are readily available. Area cameras are
essentially made up of many line scan sensors stacked to form
a 2-D matrix. Linear sensors are easier to fabricate to very
tight tolerances than are matrix sensors and for applications
where absolute uniformity is a concern, line scan CCDs have
an advantage. In general, line scan cameras offer higher
resolution and speed than is possible with matrix
cameras.
Another major architectural concern is the use of multiple
output arrays which provide maximum access (see speed
discussion below) to the pixels but increase the processing
complexity. Line-scan camera are available with linear
output, in which case all pixels are read out from a single
serial CCD; bilinear output, which divides the image into
even and odd pixels; and multi-tapped output.
Area-scan cameras can be based on interline transfer, full
frame, and frame transfer architectures. With interline, each
column of pixels has a transfer gate and vertical shift
register to transfer the charge to the horizontal CCD for
read out. This format provides excellent image smear
characteristics (and fast shutter times) without an external
shutter. The full frame architecture utilizes the pixels both
to collect charge and to shift charge to the horizontal CCD
for read out. Because of the dual function, an external
shutter is required to block incident light during the
transfer period. Frame transfer is similar to full frame with
the addition of a light shielded frame storage region. While
one set of pixels is active imaging, the other set is busy
transferring the previous frame, hence this format requires
twice the silicon area of a full frame device for a given
pixel size. Several protected pixel elements never see light
and are used for dark field calibration.
- Resolution
Matrix cameras typically offer 512 x 512 resolution, though
several vendors now offer 1K x 1K resolution. Line scan
cameras typically have 512 or 1K pixels per line, but can go
as high as 4K or even 8K in some high end applications.
Resolution in the other dimension is a function of how often
the lines can be grabbed and how fast the object is moving
past the camera lens. Practical resolution of course is a
function of the number of pixels and the size of the object
imaged.
Another measure of resolution is gray scale depth or tonal
resolution, rather than spatial resolution. Most machine
vision systems work with 8-bit grays (256 shades), but some
may work with 1-bit data (black and white only). Other
applications in medicine, biology, and astronomy may require
special cameras with far higher tonal resolution.
- Pixel Aperture
This is the ratio of the length of the two sides of a pixel.
Square pixels have an aperture of 1:1 and are greatly favored
for machine vision applications.
- Speed
Non-standard line scan (and area array) cameras can operate
at high speeds. Frame rates for RS-170 are fixed at 30 per
second; line scans cameras have variable scan rates and can
produce thousands of lines per second, each one of which is
available for processing (in the order in which is received)
almost immediately. This is in direct contrast to the RS-170
situation, where interlacing makes line ordering more
difficult. With RS-170, the computer typically waits until
the entire image is available before it begins any
processing. It is common to align a line scan camera with the
pixels perpendicular to the direction of motion for
applications where object motion is present. The integration
time can then be adjusted to suit the application.
Speed generally refers to how quickly the machine can get at
a given pixel in order to make some calculation or decision.
Typically, all the pixels have to be read out before any
pixel value can be accessed (some cameras now have ways
around this). For any given pixel output rate (say 40 million
pixels per second), speed then increases as the number of
pixels in an image decreases. Pixel output rate is one of the
fundamental bottlenecks in machine vision. If the camera can
deliver only 5 megapixels per second (or if the computer
interface can handle only 5 megapixels per second), then a
superfast imaging bus and imaging processor will be starved
for data. Parallel and multi-tap camera outputs are now
available to provide higher pixel output rates.
- Precision
If the camera is being used as a measurement tool, then the
geometric repeatability will be of concern, as it would be
for any measurement device.
- Shutters
CCD cameras can (electronically) make short term exposures
without mechanical shutter devices. Short exposures may
require brighter lighting in order to be able to capture an
image. Most cameras have asynchronous shuttering, the ability
to integrate or shutter at any time based on an external
signal. In essence, a trigger pulse resets the vertical sync.
With RS-170 compatible devices, shuttering often means
obtaining data from only one field, reducing the vertical
resolution by a factor of two. This problem is not present
with progressive scan cameras. Mechanical shutters are not
reliable enough for industrial camera use. An electronic
shutter allows a user to select the exact time to
expose.
- Rugged Mechanical Design
Shake and bake industrial applications may involve hot, wet,
oily, dirty environments. The camera needs to survive in all
of this. Size may also be of importance, not only for
reliability, but also because a camera may have to fit into
some tight spaces, like inside another machine. Video cameras
have moved far in the miniaturization realm in the past few
years. The availability of standard lenses and optics for the
camera may also be of concern, though most vendors are moving
towards standard optical interfaces.
- Rugged Electronic Design
Noise is the enemy of all test and measurement devices and of
all sensors. Camera electronics should be noise immune. New
designs also have some measure of antiblooming capabilities
which prevents bright spots on one array element from
corrupting the output of an adjacent element. An electronic
iris can account for overall brightness by sensing when to
increase or decrease exposure time.
- Time Delay Integration (TDI)
A technique designed-in to some cameras. For example a
1024x64 line scan array would take 64 snapshots with a set
time delay between each snapshot and then read out the line
as a single image (with a light gain of 64). This "averaging"
process greatly improves the signal-to-noise ratio without
causing the blurring problems that would occur with large
integration times and moving objects. TDI is useful for
creating images for high speed processes with low light
levels.
- Dynamic Range
All CCD sensors have a sigmoidal curve when their input is
plotted against their output. This is also true for the human
eye, photographic film, and display monitors.
A fundamental noise source (thermally generated electrons, a
problem which can be mitigated by cooling the CCD) in any CCD
sensor provides some level of output, even when input is
absent. The saturation level occurs when the output level hits
a ceiling, and the CCD device cannot provide any more output,
no matter what the input level rises to. In between is a useful
range where the inputs and the outputs correspond nearly
linearly. This is known as the dynamic range and for a typical
CCD might be 1000:1. A problem in machine vision is to have the
lighting level of the application fall within the useful range
of the vision system. Techniques to adjust the available light
to the dynamic range include adding more or brighter lights or
an image intensifier, adjusting the mechanical aperture on the
lens to let in or leave out more light, adding a filter to cut
down the light level, adjusting the shutter time, which on a
CCD is the integration time (this will be limited by the motion
of the object under study).
Camera Interfaces for the Computer
Each type of camera requires its own type of computer
interface in order to meet the signal requirements for sync,
strobe, and data control. Cameras can be grouped into
approximately four families, and board level vendors (companies
selling frame grabbers) typically produce products to support
these different camera types:
- Variable Scan Camera Interfaces
These usually support both area-scan and line-scan cameras.
The low end of the market might be described by cameras which
output data at up to 25 MHz digitizing rates (8-bit pixel
resolution) while the top end is 50 MHz and requires more
expensive interface electronics. Interface boards provide the
signals that define the pixel, line, and frame timing. Most
of the products in the market here also support RS-170 and
CCIR cameras and can either synchronize to the timing of the
camera or generate timing to synchronize multiple
cameras.
- Color Camera Interfaces
These digitize true color images in real time from video
sources compatible with NTSC, PAL, RGB, and S-VHS. Many of
the boards perform on-board color space conversion (HSI, YUV,
YIQ, YCrCb) with a 3x3 matrix multiplier.
- Digital Camera Interfaces
These boards provide a direct interface to RS-422 or TTL
video sources. They provide the same function as traditional
frame grabbers, except that there is no need for A/D
conversion, since the camera is already digital. Flexibility
is very high, with support for line-scan and area-scan
cameras with 8-, 16-, or 24-bit single-ended TTL and 8- or
16-bit differential inputs being very common.
- Multi-Tap Cameras
Multi-tap cameras provide a performance (frames per second)
boost at the expense of some downstream programming
complexity, since the pixels have to be put back together to
form an image. Some cameras are available with up to 8-taps
that can be working simultaneously at 10 or 20 MHz each.
Vendors today seem to be following one of two different
approaches to camera interface design. One school of thought is
that the lowest cost, optimized design is a frame grabber board
designed specifically for one type of camera or application.
Others design basic grabber boards with a modular approach to
the acquisition front-end. A family of different acquisition
modules support different camera types is available to plug-in
to one or more "motherboards." The "dedicated" school argues
that the additional components and connectors required by a
modular approach reduce reliability and increase cost. The
counterpoint is that modularity optimizes flexibility, reduces
time-to-market, and minimizes troubleshooting since time-proven
modules can migrate across product lines. It also provides
OEMs, integrators, and end users with flexibility and with a
high level of insurance against obsolescence as new cameras
come into the market.