418 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 3, MAY 2007
Correspondence
A Real-Time Visual Inspection System for Railway
Maintenance: Automatic Hexagonal-Headed
Bolts Detection
Francescomaria Marino, Arcangelo Distante, Pier Luigi Mazzeo,
and Ettore Stella
Abstract—Rail inspection is a very important task in railway mainte-
nance, and it is periodically needed for preventing dangerous situations.
Inspection is operated manually by trained human operator walking along
the track searching for visual anomalies. This monitoring is unacceptable
for slowness and lack of objectivity, as the results are related to the ability of
the observer to recognize critical situations. The correspondence presents
a patent-pending real-time Visual Inspection System for Railway (VISyR)
maintenance, and describes how presence/absence of the fastening bolts
that fix the rails to the sleepers is automatically detected. VISyR acquires
images from a digital line-scan camera. Data are simultaneously prepro-
cessed according to two discrete wavelet transforms, and then provided to
two multilayer perceptron neural classifiers (MLPNCs). The “cross val-
idation” of these MLPNCs avoids (practically-at-all) false positives, and
reveals the presence/absence of the fastening bolts with an accuracy of
99.6% in detecting visible bolts and of 95% in detecting missing bolts. A
field-programmable gate array-based architecture performs these tasks in
8.09 µs, allowing an on-the-fly analysis of a video sequence acquired at 200
km/h.
Index Terms—Machine vision, neural network applications, object
recognition, pattern recognition, rail transportation maintenance, real-time
systems.
I. INTRODUCTION
Railway maintenance is a particular application context in which the
periodical surface inspection of the rolling plane is required in order
to prevent any dangerous situation. Usually, this task is performed by
trained personnel who, periodically, walk along the railway network
searching for visual anomalies. Actually, this manual inspection is
slow, laborious, and potentially hazardous, and the results are strictly
dependent on the capability of the observer to detect possible anomalies
and to recognize critical situations.
With the growing high-speed railway traffic, companies the world
over are interested in developing automatic inspection systems that
are able to detect rail defects, sleepers’ anomalies, as well as missing
fastening elements. These systems can increase the ability to detect
defects and reduce the inspection time in order to guarantee more
frequently the maintenance of the railway network.
In this correspondence, we introduce a patented [1] real-time Visual
Inspection System for Railway (VISyR) maintenance that is able to
detect missing fastening bolts and other rail defects. For the sake of
conciseness, this correspondence deals only with the automatic bolts
Manuscript received December 30, 2004; revised May 11, 2005. This work
was supported in part by the Italian Ministry of University and Research (MIUR)
under Research Project PON “RAILSAFE.” This correspondence was recom-
mended by Editor D. Zhang.
F. Marino is with the Dipartimento di Elettrotecnica ed Elettronica
(DEE), Facolta` di Ingegneria, Politecnico di Bari, 70125 Bari, Italy (e-mail:
marino@poliba.it).
A. Distante, P. L. Mazzeo, and E. Stella are with the Istituto di Studi sui
Sistemi Intelligenti per l’Automazione (ISSIA) CNR, 70126 Bari, Italy (e-mail:
distante@ba.issia.cnr.it; mazzeo@ba.issia.cnr.it; stella@ba.issia.cnr.it).
Digital Object Identifier 10.1109/TSMCC.2007.893278
detection, while the hardware and software architecture of a second
block, devoted to other kinds of defects, is described in [2].
Usually two kinds of fastening elements are used to secure the rail to
the sleepers: hexagonal-headed bolts and hook bolts. They essentially
differ by shape: the first one has a regular hexagonal shape having
random orientation, the second one has a more complex hook shape
that can be found oriented only in one direction.
In this correspondence, the case of hexagonal-headed bolts is dis-
cussed. As shown in our previous works [3], [4] and shortly recalled,
detection of this kind of bolt is more difficult than that of more complex
shapes (e.g., hook bolts) in view of the similarity of the hexagonal bolts
with the shape of the stones that are in the background. Nevertheless,
detection of hook bolts is also treated in Section VII-E.
Even if some works have been performed, which deal with railway
problems—such as track profile measurement (e.g., [5]), obstruction
detection (e.g., [6]), braking control (e.g., [7]), rail defect recognition
(e.g., [8] and [9]), ballast reconstruction (e.g., [8]), switches status
detection (e.g., [10]), control and activation of signals near stations
(e.g., [11]), etc.—to the best of our knowledge, in the literature there
are no references to the specific problem of fastening elements recog-
nition (except for our works [3], [4]). The only available approaches
are commercial vision systems [8], which consider only fastening ele-
ments having regular geometrical shape (like hexagonal bolts) and use
geometrical approaches to pattern recognition to resolve the problem.
Moreover, these systems are strongly interactive. In fact, in order to
reach the best performances, they require a human operator for tuning
any threshold. When a different fastening element is considered, the
tuning phase has to be re-executed.
Contrariwise, VISyR is completely automatic and needs no tuning
phase. The human operator has only the task of selecting images of the
fastening elements to manage. No assumption about the shape of the
fastening elements is required, since the method is suitable for both
geometric and generic shapes.
The processing core of VISyR is basically composed of a bolt detec-
tion block (BDB) and a rail analyzer block (RAB) [2]. In order to avoid
(in practice, completely) false positive (FP) detection, BDB intersects
the results of two different classifiers. Therefore, it is composed of not
only two 2-D discrete wavelet transforms (DWTs) [12]–[16] that sig-
nificantly reduce the input space dimension, but also of two multilayer
perceptron neural classifiers (MLPNCs) that recognize the hexagonal-
headed bolts on the sleepers. BDB gets an accuracy of 99.6% in de-
tecting visible bolts and of 95% in detecting missing bolts. Moreover,
because of its crossed detecting strategy, BDB reveals only one FP over
2250 lines of processed video sequence.
An FPGA-based hardware implementation (performing BDB com-
putations in 8.09 µs), in cooperation with a simple—but efficient—
prediction algorithm (which, exploiting the geometry of the railways,
extracts from the long video sequence the few windows to be analyzed)
allows real-time performance, since a long sequence of images cover-
ing about 9 km has been inspected at an average velocity of 152 km/h,
with peaks of 201 km/h.
Moreover, because of the FPGA technology chosen for the devel-
opment, VISyR is characterized by a great degree of versatility. For
instance, detection of different kinds of bolts can be performed simply
by downloading onto the FPGA different neural weights (generated by
a proper training step) during the setup.
The correspondence is organized as follows.
1094-6977/$25.00 © 2007 IEEE
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 3, MAY 2007 419
Fig. 1. Acquisition system.
In Section II, an overview of VISyR is presented. Section III in-
troduces the developed prediction algorithm. Section IV describes the
2-D DWT preprocessing. The MLPNC is illustrated in Section V. The
implemented hardware architecture is described in Section VI. Experi-
mental results and computing performance are reported in Section VII.
Conclusive remarks and future perspectives are given in Section VIII.
II. SYSTEM OVERVIEW
VISyR acquires images of the rail by means of a DALSA PIRANHA
2 line-scan camera1 having 1024 pixels of resolution (maximum line
rate of 67 kLine/s) and using the Cameralink protocol [17]. Further-
more, it is provided with a PC-CAMLINK frame grabber (Imaging
Technology CORECO, St. Laurent, QC, Canada).2 In order to reduce
the effects of variable natural lighting conditions, an appropriate illu-
mination setup equipped with six OSRAM 41850 FL light sources was
also installed. In this way, the system is robust against changes in nat-
ural illumination. Moreover, in order to synchronize data acquisition,
the line-scan camera is triggered by the wheel encoder. This trigger sets
the resolution along y (main motion direction) at 3 mm, independently
from the train velocity; the pixel resolution along the orthogonal direc-
tion x is 1 mm. The acquisition system is installed under a diagnostic
train during its maintenance route (see Fig. 1).
The captured images are inspected in order to detect rail defects: in
particular, this correspondence focuses on the detection of hexagonal-
headed bolts that fix the rail to the sleepers. This issue is crucial in
maintenance process, because it gives information about their eventual
absence.
VISyR’s bolts detection is based on MLPNCs. Computing perfor-
mance of MLPNCs is strictly dependent on:
� a prediction algorithm for identifying the image area (windows)
candidates that contain the patterns to be detected;
� the input space size (i.e., the number of coefficients describing
the pattern).
To predict the image areas that eventually may contain the bolts,
VISyR calculates the distance between two next hexagonal-headed
bolts and, based on this information, predicts the position of the
windows in which the presence of the bolt should be expected (see
Section III).
For reducing the input space size, VISyR uses a features extrac-
tion algorithm that is able to preserve all the important information
1http://vfm.dalsa.com/products/features/piranha2.asp
2http://www.coreco.com
about input patterns in a small set of coefficients. This algorithm is
based on 2-D DWTs [12]–[16], since DWT concentrates the signifi-
cant variations of input patterns in a reduced number of coefficients
(see Section IV). Specifically, both a compact wavelet introduced by
Daubechies [12], and the HDWT (also known as Haar transform [16])
are simultaneously used, since we have verified that, for our specific
application, the logical AND of these two approaches avoids—almost
completely—the FP detection (see Section VII-B).
The logical scheme of VISyR’s processing blocks is shown in Fig. 2.
A long video sequence captured by the acquisition system is fed
into the prediction algorithm block (PAB). Moreover, PAB receives
a feedback from the BDB, as well as the coordinates of the railways
geometry from the rail detection and tracking block (RD&TB, a part of
the RAB). PAB exploits this knowledge for extracting 24× 100 pixel
windows where the presence of a bolt is expected (some examples are
shown in Fig. 3).
These windows are provided to the 2-D DWT preprocessing block
(DWTPB). DWTPB reduces these windows to two sets of 150 co-
efficients (i.e., D LL2 and H LL2), resulting, respectively, from a
Daubechies DWT (DDWT) and a Haar DWT (HDWT). D LL2 and
H LL2 are therefore provided, respectively, to the Daubechies clas-
sifier (DC) and to the Haar classifier (HC). The output from DC and
HC are combined in a logical AND in order to produce the output of
MLPN classification block (MLPNCB). It reveals the presence/absence
of bolts and produces a pass/alarm signal that is displayed online (see
Fig. 4), and in case of alarm (i.e., absence of the bolts), recorded with
the position into a log file.
BDB and RD&TB, which are the most computationally complex
blocks of VISyR, are implemented in hardware on an Altera’s Stratix
FPGA. PAB is a software tool developed in MS Visual C++ 6.0 on a
general-purpose host.
III. PAB
PAB extracts from the video sequence the image area candidates that
contain the hexagonal-headed bolts, i.e., only those windows requiring
inspection.
Because of the rail structure (see Fig. 5), the distance Dx between
rail and fastening bolts is constant and a priori known. In this way,
automatic railway detection and tracking is fundamental in determining
the position of the bolts along the x direction. VISyR performs this task
by using RD&TB [2].
In the second instance, PAB forecasts the position of the bolts along
the y direction. To reach this goal, it uses two kinds of search:
� exhaustive search;
� jump search.
In the first kind of search, a window exhaustively slides on the areas
at a (well-known) distance Dx from the rail location, until it finds con-
temporaneously (at the same y) the first occurrence of the left and of
the right bolts. At this point, it determines and stores this position (A)
and continues in this way until it finds the second occurrence of both
the bolts (position B). Now, it calculates the distance along y between
B and A (Dy) and the process switches on the jump search. In fact, as
is well known, the distance along y between two adjacent sleepers is
fixed. Therefore, the jump search uses Dy to jump only to those area
candidates that enclose the windows containing the hexagonal-headed
bolts, saving on computational time and speeding up the performance
of the whole system. If, during the jump search, VISyR does not find
the bolts in the position where it expects them, then it stores the po-
sition of the fault (this is cause for alarm) in a log file and restarts
the exhaustive search. A pseudocode describing how exhaustive search
and jump search commutate is shown in Fig. 6.
420 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 3, MAY 2007
Fig. 2. Functional diagram of VISyR. Rounded blocks are implemented in an FPGA-based hardware whereas rectangles are implemented in a software tool on
a general purpose host. [&] denotes logical AND.
Fig. 3. Examples of 24× 100 windows extracted from the video sequence
containing hexagonal-headed bolts. Resolutions along x and y are different
because of the acquisition setup.
IV. 2-D DWTPB
In pattern recognition, input images are generally preprocessed in
order to extract their intrinsic features.
The wavelet transform [12]–[16] is a mathematical technique that
decomposes a signal in the time domain by using dilated/contracted
and translated versions of a single finite duration basis function, called
the prototype wavelet. This differs from traditional transforms (e.g.,
Fourier transform, cosine transform, etc.), which use infinite-duration
basis functions. The 1-D continuous wavelet transform of a signal x(t)
is
W (a, b) =
1√
a
∫
x(t)ψ¯
(
t− b
a
)
dt (1)
where ψ¯( t−b
a
) is the complex conjugate of the prototype wavelet
ψ( t−b
a
), a is a time dilation, and b is a time translation.
Due to the discrete nature (both in time and amplitude) of most
applications, different DWTs have been proposed according to the
nature of the signal, the time, and the scaling parameters.
The 2-D DWT [12]–[16] works as a multilevel decomposition tool.
A generic 2-D DWT decomposition level j is shown in Fig. 7. It can be
seen as the further decomposition of a 2-D data set LLj−1 (LL0 being
the original input image) into four subbands LLj , LH j , HLj , and
HH j . The capital letters and their position are related, respectively, to
the applied monodimensional filters (L for low-pass filter, H for high-
pass filter) and to the direction (first letter for horizontal, second letter
for vertical). The band LLj is a coarser approximation of LLj−1. The
bands LH j and HLj record the changes along horizontal and vertical
directions of LLj−1, respectively, while HH j shows high-frequency
components. Because of the decimation occurring at each level along
both the directions, any subband at the level j is composed of Nj ×Mj
elements, where Nj = N0/2j and Mj = M0/2j .
As an example, Fig. 8 shows how two decomposition levels are
applied on an image of a bolt.
Different properties of the DWT can be emphasized by using differ-
ent filters for L and H . Because of this flexibility, the DWT has been
successfully applied to a wide range of applications. Moreover, we have
found [3], [4] that orthonormal bases of compactly supported wavelets
introduced by Daubechies [12] are excellent tools for characterizing
hexagonal-headed bolts with a small number of features containing the
most discriminating information, gaining in computational time.
Due to the setup of VISyR’s acquisition, PAB provides DWTPB with
windows of 24× 100 pixels to be examined (Fig. 3). Different DWTs,
varying the number of decomposition levels, have been experimented
in order to reduce this number without losing in accuracy. The best
compromise has been reached by the LL2 subband consisting only of
6× 25 coefficients. Using the clarifier described in Section V, it gets
an accuracy of 99.9% in recognizing bolts in the primitive windows.
Simultaneously, the block computes also the LL2 subband of a
HDWT [16], since we have found that the cross validation of two
classifiers (processing, respectively, D LL2 and H LL2, i.e., the
output of DDWT and HDWT, see Fig. 2) practically avoids FP detection
(see Section VII-B).
V. MLPNC
Neural networks have been revealed as useful tools for many appli-
cations, such as extracting data from images (e.g., [18]) and classifica-
tions (e.g., [19]). In our classification task, we have focused on neural
networks. In fact:
� Neural network classifiers have a key advantage over geometry-
based techniques because they do not require a geometric model
for the object representation [20].
� Neural network classifiers separate the classes using curved sur-
faces, by this way outperforming K-NN classifiers, which sep-
arate the classes by means of linear surfaces. Moreover, K-NN
classifiers continuously iterate the training using as feedback the
results of the performed classifications, making themselves more
complex and computational expensive.
� Contrary to the id-tree, neural networks have a topology perfectly
suitable for hardware implementation.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 3, MAY 2007 421
Fig. 4. VISyR’s online monitor. At the moment of this snapshot, VISyR is signalling the presence of the left and right bolts.
Fig. 5. Geometry of a rail. A correct forecast of Dx and Dy notably reduces the computational load.
Fig. 6. Pseudocode for the exhaustive search–jump search commutation.
Inside neural classifiers, we have chosen the MLP classifiers since
in our previous works [3], [4], they have been revealed more precise
than their counterpart RBF in the considered application.
VISyR’s BDB employs two MLPNCs (DC and HC in Fig. 2), trained,
respectively, for DDWT and HDWT. DC and HC have an identical
topology (they differ only in respect of the values of the weights) and
are constituted by three layers of neurons (input, hidden, and output
layer). In the following, DC is described; the functionalities of HC
can be straightforwardly derived. The input layer is composed of 150
neurons D n′m (m = 0, . . . , 149) corresponding to the coefficients
D LL2(i, j) of the subband D LL2 according to
D n′m = D LL2(m/25,mmod 25). (2)
The hidden layer of DC [HC] consists of 10 neurons D n′′k (k =
0, . . . , 9); they derive from the propagation of the first layer according
to
D n′′k = f
(
D bias′k +
149∑
m=0
D w′m,kD n
′
m
)
(3)
422 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 3, MAY 2007
Fig. 7. 2-D DWT: The jth level of subband decomposition. ↓ represents decimation by 2.
Fig. 8. Application of two levels of 2-D DWT on a subimage containing an
hexagonal-headed bolt.
while the unique neuron D n′′′0 at the output layer is given by
D n′′′0 = f
(
D bias′′ +
9∑
本文档为【A real-time visual inspection system for railway】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。