Digital signal processing is concerned with the problem of
transforming one sequence of numbers into another in a way that
meets certain practical objectives. For example, filtering involves
the separation of one sequence into two or more components (such
as, separating a telephone system voice signal from noise, echoes,
and other unwanted disturbances). Historically, much successful use
has been made of linear systems
that satisfy the principle of
superposition:
Furthermore, the most popular linear structures are
time-invariant, meaning that if the input sequence {xk} is shifted
j units in time to obtain {xk-j}, the output sequence {yk} is shifted by the same amount, to
{yk-j}. All linear,
time-invariant systems (abbreviated LTI) may be completely
characterized by their response, here denoted {hk}, to the impulse input
k (equal to
1 when k = 0 and zero otherwise). Given {hk}, the response {yk} for any input sequence {xk} is given by the convolution
relation
:
The practical utility of this class of systems arises from two
sources: first, the fact that LTI models can reasonably approximate
the dynamics of many important physical systems (at least over a
suitably restricted range of operating conditions) and second, that
the principle of superposition may be exploited to obtain many
useful system analysis and design results.
Why We Need Nonlinear Structures
Nonlinear systems are those for which the principle of
superposition does not hold and, as the following examples
illustrate, there are important DSP problems for which linear
solutions are either nonexistent or demonstrably inferior to
nonlinear solutions. One example is the use of chaotic signals for
secure communication
, since linear models cannot exhibit chaotic
responses to simple inputs. An example where nonlinear methods
yield better results than the best known linear approaches is that
of filtering to reduce interference terms in the discrete Wigner
distribution, an extension of the power spectrum for signals with
time-varying spectral characteristics
. Similarly, nonlinear filters are better suited
for edge preservation in noisy images than are linear filters
.
Another important situation where linear structures are
demonstrably inadequate is in treatment of impulsive noise:
Figure 1 shows 1024 physical property measurements from an
industrial manufacturing process, which exhibits spikes like the
points marked "?" at k
720, k
800, and k
870.
Figure 1: Industrial process data
sequences often contain outliers like those marked with "?" in this
figure.
Denoting this data sequence {Yk}, it may be described by the
additive outlier model:
Yk =
k + ok
where {
k} is the
nominal part of the data sequence that is of primary interest.and
{ok} is a sequence of outliers,
assuming the value 0 for most k but occasionally assuming large
values that may arise from a variety of sources, including gross
measurement errors. These outliers can profoundly influence the
results of subsequent analysis
, leaving us with two practical alternatives: the
development of special analysis procedures that are inherently
outlier-resistant, or the use of data cleaning filters to attenuate
the outliers, followed by standard analysis procedures. The first
of these approaches is usually more effective, but generally more
complex to implement, motivating practical interest in the
development of simple data-cleaning filters.
An ideal data cleaning filter
applied to the observed data sequence {Yk} would return the nominal data
sequence {
k} without
any modification, regardless of the magnitude or location of the
outliers {ok}. For a linear,
time-invariant filter
, the response to the contaminated data
sequence is given by
To reject outliers regardless of their position or magnitude,
the second sum should vanish, implying hi = 0 for all i, but to avoid
distortion, the first sum should be equal to
k, implying
h0 = 1and hi = 0 forall i
0. Since these requirements are in conflict,
it follows that ideal data-cleaning filters are necessarily
nonlinear. Further, nonlinear filters exist whose impulse response
is identically zero, thus making them effective against isolated
outliers, but which also leave a large class of root sequences
completely unmodified. A specific example is the standard median
filter, defined as follows: first, for each k, define the moving
data window wk as the
set of successive data samples xk-m through xk+m. Then, rank-order the values in
this data window from smallest to largest to obtain the set:
The output of the median filter is the middle value in this
list, w(0). Complete
characterizations of the median filter's root sequences are known
, and this set includes all monotonically
increasing or decreasing sequences.
Conversely, it is also known that the distortion introduced by
the median filter is unacceptable for some applications, motivating
interest in the Hampel filter



which belongs to the class of decision-based
filters
. This filter compares the central point in the
data window xk with the median
w(0) and, if this distance is
too large (relative to an outlier-resistant scale estimate S
computed from wk), the
filter output is Yk = w(0); otherwise, the filter output is
Yk = xk. This filter involves two tuning
parameters: the window width 2m + 1, and a threshold parameter t
that determines the filter's agressiveness (specifically, the
filter only modifies xk if
¦ xk - w(0) ¦ > tS). Two key
observations are first, that this filter reduces to the standard
median filter when t = 0 and second, that the root sequence set
increases as t increases, including essentially all input sequences
as t >
.
Nonlinear DSP Structures
Despite their utility, nonlinear systems are more difficult to
analyze and design than linear systems, in part because only
partial characterizations exist. These characterizations may be
either behavioral, obtained by replacing Equation 1, or
structural, obtained by replacing Equation 2. Because it is
more popular in practice, this article focuses primarily on the
structural approach, although the behavioral approach is considered
briefly at the end of this section to illustrate its general
nature.
For LTI systems, a significant disadvantage of Equation 2
is that {hi} represents an
infinite sequence of design parameters. Fortunately, most of the
LTI systems used in DSP applications may also be represented as
for some integers p > 0 and l < m, and some
real numbers ai and bi. Here, we consider the following
nonlinear generalization of this structure, which includes an
uncountably vast range of important special cases:
For linear systems, taking p = 0 in Equation 4 removes
the first sum and yields a system whose impulse response
coefficients are given by hi =
bi for
< i < m and zero
otherwise. In the digital filtering literature, these systems are
called finite-impulse-response (FIR) filters. Analogously, taking p
= 0 in Equation 5 removes the functional dependence of
yk on past outputs yk-j and defines the class of
nonlinear FIR structures. This class contains most of the nonlinear
filter structures in current use in DSP applications



. As specific examples, note that both the
standard median filter and the Hampel filter belong to this class,
with
= -m.
Taking p > 0 in Equation 5 defines the class of
recursive structures, which exhibit greater flexibility than the
FIR case but are also more difficult to analyze. For example,
Figure 2 shows the responses of two different systems to the
same input sequence, consisting of five steps, each of the same
amplitude. The smoother curve shows the response of the Hammerstein
structure:
consisting of the cascade connection of the memoryless
nonlinearity g(x)=x1/3 followed by a linear dynamic
model. For continuous functions g() and stable linear models,
the resulting Hammerstein structure is BIBO stable (refer to the
section "Qualitative Behavior" for a definition of BIBO
stability).
Figure 2: This comparison of
Hammerstein and Lur'e model responses to the same input sequence
illustrates the difference between linear and nonlinear recursive
terms
Similar results do not extend to Lur'e structures, also composed
from these same two components, but with the nonlinear function
g() appearing as a feedback element around the linear
system
. This point is illustrated by the less regular
curve in Figure 2, which shows the response of the Lur'e
structure:
Although both of these structures are recursive, the recursive
terms are linear in the Hammerstein case but nonlinear in the Lur'e
case; as a consequence, the general qualitative behavior of the
Hammerstein structure does not change dramatically with increasing
input amplitude, while that of the Lur'e structure does: the step
responses change from monotonic, to slightly oscillatory, to highly
oscillatory, to sustained oscillations, and finally chaotic.
Further amplitude increases eventually lead to unstable (in other words,
unbounded) responses.
Returning to the idea of behavioral characterizations of
nonlinear systems, consider the following two examples. First,
replacing Equation 1 with
defines the class of positive-homogeneous systems
, which includes both the standard median filter
and the Hampel filter. The second behavioral characterization
considered here is location invariance
and is also exhibited by both of these nonlinear
filters:
for any constant c. Both of these conditions are natural ones in
many filtering applications since, taken together, they imply that
unit conversions (such as, temperature conversion from Farenheit to
Celsius) applied to filter inputs result in the corresponding
changes in filter outputs. Consequently, it is particularly
interesting to note that the only nonlinear FIR filters based on
functions
() such that
(x, . . ., x) is differentiable with respect
to x are linear-weighted averages
. This observation means that nonlinear FIR
filters exhibiting this scale-invariance involve non-smooth
nonlinearities to which Taylor series expansions and related
notions like small-amplitude linearization are not applicable.
Choosing

()
Historically, polynomials represent one of the most popular
classes of functions because they are extremely well-behaved, being
infinitely differentiable everywhere and easy to compute. Taking
() as a multivariable polynomial leads
to the class of polynomial filters
but again, most applications use the FIR subclass
obtained for p = 0

, often called Volterra filters or Volterra
models
. A typical application is compensating for
loudspeaker nonlinearities
.
Another popular function class is based on artificial neural
networks (ANNs)
, combining weighted linear combinations with
smooth saturation nonlinearities. An extremely simple example is
the following three-layer structure:
Here, xk represents the
input sequence, h1(k) and
h2(k) define a two-node hidden
layer, Yk is the network
output, and d is an unspecified synaptic weight parameter. These
networks may be either recurrent structures corresponding to p >
0, as in this example, or nonlinear FIR structures. As in the case
of polynomials, the functions
() defined by ANNs are smooth (in other
words, infinitely differentiable) and hence continuous, but unlike
polynomials, these functions are also bounded, which can be a
significant advantage in some applications. More generally, the
functional flexibility of ANNs is comparable to that of
polynomials, as both of these function classes can approximate any
continuous function
() arbitrarily well on any closed and
bounded set
. Despite these observations, it is important to
note that there are certain forms of qualitative behavior that
cannot be exhibited by either of these function classes; in
particular, note that these functions cannot exhibit either
positive-homogeneity or location invariance properties.
A class of functions that can exhibit both of these forms of
behavior is the class of piecewise linear functions, leading to the
class of linear multimodels
. It has recently been shown that the FIR
restriction of this structure class includes all L-filters, whose
output Yk consists of a linear
combination of the rank-ordered elements w(j) defined in Equation 3
. Because this rank-ordering is invariant under
both scaling by
>0 and translation by a constant c, it
follows that L-filters are both positive-homogeneous and
location-invariant. A simple class of recursive linear multimodels
is the positive-homogeneous TARMA
or TARMAX
structure, defined by:
Although this structure is not location-invariant, its
simplicity of form facilitates behavioral analysis
and it is quite flexible, reducing to Equation
2 when ci = 0 and di = 0, including well-behaved
Hammerstein structures when ci
= 0 but di
0, and yielding chaotic responses for some
ci
0.
Qualitative Behavior
Stability may be defined in various ways, most of which are
equivalent for LTI systems but not for nonlinear systems. The
following stability notion is useful, both because it is a
generally desirable system characteristic and because it can
provide useful guidance in selecting nonlinear DSP structures. A
system with input xk and output Yk is bounded-input, bounded-output
stable, or BIBO stable if, given any finite M, ¦ xk¦< M for all k
implies ¦ Yk ¦
< N for all k for some finite N. It can be shown that any
nonlinear FIR structure based on a continuous function
() is BIBO stable
and, as an important special case, it follows
that Volterra filters are BIBO stable. Conversely, Figure 3
shows the response of the following rational FIR structure

:
to a noise-corrupted sinusoidal input sequence {xk} . The function
() on which this nonlinear FIR filter is
based is not continuous because Yk diverges when the denominator sum
goes to zero, so the previous BIBO stability result does not apply
and the character of the response shown in Figure 3 is a
consequence of this system's lack of BIBO stability. Similarly, the
Lur'e structure illustrates that recursive structures are generally
not BIBO stable, although they can be. For example, because ANNs
involve bounded nonlinearities, any ANN-based structure is BIBO
stable, even in the recurrent case, p > 0.
Figure 3: Not all filters are
BIBO stable, as illustrated by this response of a rational FIR
filter to a noisy sinusoid input.
Another general characteristic of nonlinear FIR structures is
that if the input is ultimately periodic, meaning xk+P = xk for all k > M, then so is
the output sequence. Specifically, note that for k > M +
m, we have
As a special case, taking P = 1 corresponds to sequences that
are ultimately constant, a class that includes both steps and
impulses. The practical implication of Equation 7 is that
nonlinear FIR structures cannot change the period of periodic
inputs, nor can they exhibit nonperiodic responses to such inputs.
Similarly, sustained oscillations or chaotic responses to pulse
inputs like impulses and steps are also impossible. The Lur'e
structure step responses shown in Figure 2 illustrate that
these conclusions do not extend to nonlinear recursive
structures.
More generally, typical behavior of recursive structures is
illustrated in Figure 4. The upper two plots show the
responses of the ANN defined in Equation 6 to the same step
input for two different values of the parameter d: when d = 3.9
(upper left plot), the response decays initially to a presistent,
regular oscillation, but when d = 3.0 (upper right plot), the
network's response to this same step input is persistent but highly
irregular. Qualitatively analogous behavior is seen in the lower
left plot, showing the response to a sinusoidal input for d = 3.0;
the key point here is that because of the nonlinear recursive terms
in this recurrent network structure, nonperiodic responses to
periodic inputs are possible, in marked contrast to the nonlinear
FIR case. Finally, the lower right plot shows the sinusoidal
response of the following Lur'e structure:
which is BIBO stable because it may be represented as a stable
linear model with a bounded, additive perturbation (in other words,
¦ 4sinYk-1¦
< 4 for all k).
Figure 4: Recurrent neural
network and Lur'e structures need not preserve either asymptotic
constancy or periodicity.
Most currently popular nonlinear DSP structures and infinitely
many as yet unexplored ones may be expressed as special cases of
Equation 5. Consequently, the design of nonlinear DSP
structures generally reduces to the specification of the integers
p,
, and m and the function
() appearing in this equation.
In some respects, the selection of the integers p,
, and m has more profound consequences:
specifically, setting p = 0 leads to the class of generally
well-behaved and extremely popular nonlinear FIR structures.
Alternatively, certain applications like the generation of chaotic
sequences require recursive structures with p > 0, which are
generally not as well behaved and more difficult to analyze.
Similarly, real-time applications require strict causality,
implying
> 0, but off-line signal and data
processing applications often benefit from non-causal structures,
for which
< 0; as a specific example, note that
= -m for the median and Hampel filters.