2017 International Conference on Electronic and Information Technology (ICEIT 2017) ISBN: 978-1-60595-526-1
Detection of Event Using Trajectory Hyper-graphs Method
Jia KE
1,2and Xiao-jun CHEN
2,3,a,*1
School of management, Jiangsu University 2
School of Computer Science and Communication Engineering, JiangSu University 3
Affiliated Hospital of Jiangsu University, Jiangsu University
*Corresponding author [email protected]
Keywords: Detection of event, Hyper-graph, Trajectory hyper-graph, Diagonal matrix, Hausdroff distance
Abstract. The main function of machine vision is to improve the flexibility and automation of the production; extraction of semantic information in the video data is difficult in the field of machine vision. The video event detection involves the shooting of the surveillance video, and the detection and conversion to the data processing and analysis. This paper presents the trajectory hyper-graph theory, recognition of events in video and sub events, so it can strengthen the ability of event classification. By doing the experiments of several monitoring video (MV) datasets from different scenes for event detection, we find that the numbers of vertices and clusters of trajectory and multi-label hyper-graph fusion method (TG-MLG) are larger than these of the other two methods, and it has better description performance. Using the trajectory hyper-graph theory for detection of event, the high-level semantic information can be used for video classification, searching and forecasting.
Introduction
An appropriate representation method of the relationships between video objects is crucial for video events detection. In the literature, numerous works using graph or hyper-graph have been proposed. For instance, Hakeem proposed clustering method based on image segmentation for complex event detection. Based on temporal relationship between simple events, this method clusters relevant events by graph cutting. Huang employed video hyper-graph partitioning to detect moving objects [1]. In many real world problems, traditional graph which is based on a single similarity function is insufficient for representing the relations among a set of objects [2]. In general, different pairwise graphs can be built based on affinity functions computed from different features. Then, a weighted similarity measure using all the features could be established in order to combine these representations [3]. However, simply taking their weighted sum as the new affinity function may lead to the loss of some information which is crucial to clustering task [4]. On the other hand, one may consider the relationship among three or more data points to determine if they belong to the same cluster. For example, we may compute the probability that one object and its neighbors belong to the same category. This representation for data sets with higher order relationships is termed as hyper-graph which is defined on a set of vertices and a set of weighted hyper-edges [5].
of target is constructed. Meanwhile, multi-label semantic hyper-graphs are established to represent semantic concepts in video. These two hyper-graphs are segmented by spectral clustering method to yield clusters of trajectory and label. Finally, they are integrated to reflect the relevance between their vertices. Experiments are performed to detect complex events in video and the results confirm the effectiveness of our method.
Hyper-graph Theory
Hyper-graph is based on graph and set theory. Objects with common features belong to a set. Different levels of abstraction can be attributed to the set of sets. In such a way, a structure based on inclusion relation of sets can be established. Hyper-graph has emerged as a useful tool to describe such structure.
Fusing Trajectory and Multi-label Hyper-graphs for Event Detection
According to the spatial-temporal characteristics of trajectory as well as the co-occurrence of semantic tags, this paper uses spectral segmentation to partition trajectory hyper-graph and multi-label hyper-graph, thus producing segmentation results of video events. From the fusion results of temporal dependencies between events, we can get the classification of complex events, thus achieving complex event detection.
Definition of Trajectory
As the variation of time parameters, target motion can be represented by a series of spatial changes in continuous video frames. Therefore, trajectory is defined as a form exhibiting simultaneous spatial and temporal variation in this paper. We have improved mixture Gaussian model in [6] so as to identify the moving target and used invariant contour moment to describe multiple moving targets in consecutive frames. Taking the
center-of-mass coordinate of the moving target as the spatial position (x yi, i) of objecti, we can extract trajectory from the variation of consecutive spatial position of moving targets.
Definition 1. In a video, let oi stand for i−th moving target with trajectory objecti
T
expressed as a triple (x y ti, i, i). It means at time point ti, the spatial position of the moving target oi is (x yi, i), where xi and yi respectively denotes the horizontal and vertical coordinate. Let R2 and R denote the spatial and temporal domain of target motion, respectively, then the temporal-spatial domain can be expressed as R× 2
R .
Definition 2. The velocity vector of trajectory TS at time point t(ti ≤t≤ti+1) is
( , )=( - , - )
-
-i + 1 i i + 1 i x y
i + 1 i i + 1 i
x x y y
p p
t t t t . The velocity of the target between (x yi, i) and (xi +1,yi +1) can
be expressed as
2 2
( - ) + ( - )
-i + 1 i i + 1 i
i + 1 i
x x y y
t t .
Similarity between Trajectories
trajectory is also a main characteristic of moving target. Therefore, we combine the space distance and motion feature to measure the similarity of trajectories, wherein the motion feature is represented by the velocity direction of the motion point on the trajectory.
In the set of trajectories, two trajectories ti and tjare selected. The spatial distance
between them is dsp
(
ti,,tj)
expressed as:
(
,)
min{
(
,) (
, ,)
}
sp i j i j j i
d t t = d t t d t t
(1)
where
(
,)
max minj i
i j
b t a t
d t t a b
∈ ∈
= −
. The more similar ti and tj are, the smaller the spatial distance between them is, and vice versa.
In addition to spatial distance, the velocity direction also can be used as a measure of
similarity between trajectories. Let ap be any point on ti and ( , ) a a x p y p
p p be the
corresponding velocity vector. Similarly, let bq be any point on tj and ( , ) b b x q yq
p p be the
corresponding velocity vector. Then, the velocity direction can be defined as
cosθ = ( , ) ( , )
( , ) ( , )
i
i
a a b b
x p y p x q yq
a a b b
x p y p x q yq
p p p p
p p p p
(2) In terms of velocity direction, the distance between trajectories is given by
( , ) 1 cos
ve i j
d t t = − θ
(3) Combining trajectory spatial distance and velocity direction distance, the similarity between trajectories is given by
1
( , ) ( , ) (1 ) ( , )
i
i j sp i j ve i j
p t
D t t k d t t k d t t n ∈
=
∑
i + − (4)
where the weight of the dve( , )t ti j is given by formula(3), n is the number of points on
i
t
, k is a trade-off coefficient used to balance the influence of spatial distance and velocity direction distance. In this paper, we simply set the value of k as 0.5, meaning that the two kinds of distance have equivalent influence on the similarity between trajectories. Then, the hyper-graph is adopted to perform clustering analysis.
Establishing Trajectory Hyper-graph
Based on the hyper-graph theory discussed in Section 2, we construct trajectory hyper-graph to represent the trajectories of multiple moving targets. Each vertex corresponds to a trajectory in a video. We consider a set of trajectory consisting of n
trajectories { ,..., ,..., }t1 ti tn , where ti and tj are two trajectories. Obtaining the vertex set
{
1 2}
SV = v v, ,…,vn (| S |V =n) of trajectory hyper-graph, we can first get the similarity
measurement from spatial and temporal feature vectors of each trajectory, and then
calculate the affinity matrix MFof size | S | | S |V × V of the vertex set SV. Finally, we take each vertex as the centroid and form a hyperedge e containing such vertex and its m-1
closest vertex (in the sense of affinity matrix MF).
In summary, we can build hyper-graph GTthrough the following steps:
Input: The set of trajectories{ ,..., ,..., }t1 ti tn extracted from a video
Step1. Determine the vertex of trajectory hyper-graph GT: each trajectory is viewed
as a vertex, the vertex of GT is set as SV t =
{
v v1, 2,…,vn}
.Step2. Determine the hyperedge of GT set as SE t=
{
e e1, 2,…,em}
:Step2.1. Let ti and tj be denoted by vi and vj respectively, the affinity between
them is
( , )
exp i j
F ij
D t t M
σ
= −
where σ is the standard deviation calculated from
( , )i j
D t t
.
Step2.2. According to the | S | | S |V × V affinity matrix MF of the vertex set SV t , construct hyperedge: view each vertex as the centroid and form a hyperedge e
containing such vertex and its m−1 closest vertex (in the sense of affinity matrix MF).
The resulting hyperedge of trajectory hyper-graph GT set is given by SE t=
{
e e1, 2,…,em}
.Output: The trajectory hyper-graph GT
The above algorithm is the procedure for constructing trajectory hyper-graph GT. As we can see, the similarity in both spatial and velocity direction distance of the multiple
moving targets will influence the affinity matrix MF of trajectory hyper-graph GT.
Based on this, we can calculate the correlation matrix MHt of GT, the diagonal matrix
W M
of hyperedge weight denoted as ( )
j i
ij i F
v e
w e M
∈
=
∑
ij
, the diagonal matrix Dv of vertex
degree , and the diagonal matrix of hyperedge De.
We use the hyper-graph spectral segmentation algorithm 1 to partition hyper-graph
T G
into sub-hyper-graphs recursively until the value of c S
( )
and the average weights of hyperedge in each sub-hyper-graph are not less than a given threshold. In this way, we can obtain the trajectory cluster of the hyper-graph, which is denotedasTC={TC1,...,TCl,...,TCm}.It is easy to see from spectral partitioning characteristics that, the correlation between trajectories from the same trajectory clusters is large while it is small between trajectories from different trajectory clusters.
Experimental Analysis
Experiment Scene
We selected several monitoring video (MV) datasets from different scenes for event detection experiments, including road traffic (RTMV), expressway traffic (ETMV), canals traffic (CTMV), dock (DMV), parking lot (PLMV), residence (RMV). Some typical frames from these videos are show in Table 1.
Table 1 shows detailed information of the involved experimental videos. In experiments, our method will analyze and process the following events: normal and abnormal traffic events in RTMV and ETMV, multi-ship sailing in a channel in CTMV, the tanker loading dock (unloading) event in DMV, the car parking and the vehicle access event in PLMV, the residents, visitors and other personnel access event in RMV. The experimental environment is Intel (R) Core (TM) i5 CPU, 8G RAM and a 5400RPM IDE hard drive, the operating system is 32-bit Windows 2010 Server.
Comparing the Numbers of Vertices and Clusters
In the experiments, we compare the proposed trajectory and multi-label hyper-graph fusion method (TG-MLG) with two related methods CGC and HG-LGC. CGC detects events based on ordinary graph while HG-LGC introduces trajectory hyper-graph with Hausdroff similarity measurement into the hyper-graph model established by multi-label semi-supervised learning. Figure. 1 shows the number of vertices in the hyper-graphs the above methods generate on each dataset, where the vertices of TG-MLG and HG-LGC consist of those produced by the trajectory and multi-label hyper-graphs. Figure. 2 compares the numbers of clusters generated, where the clusters of TG-MLG and HG-LGC comprise those produced by the trajectory and multi-label hyper-graphs. Therefore, the numbers of vertices and clusters of TG-MLG are larger than those of the other two methods.
0 50 100 150 200 250 300
RTMV ETMV CTMV DMV PLMV RMV
N
um
b
er
o
f
v
er
ti
ce
s
in
h
y
p
er
g
ra
ph
HG-LGC
TG-MLG
[image:5.612.129.485.331.643.2]CGC
Figure. 1. Number of vertices in hyper-graph with three different methods.
0 5 10 15 20 25 30 35 40 45
RTMV ETMV CTMV DMV PLMV RMV
N
u
m
b
er
o
f
cl
u
st
er
s
in
h
y
p
er
g
ra
p
h
HG-LGC
TG-MLG
CGC
Figure 2. Number of clusters in hyper-graph with three different methods.
Conclusion
[image:5.612.131.483.337.467.2] [image:5.612.135.481.488.625.2]semantic concept model of events. Based on the hyper-graph theory, this paper established detection of event according to the trajectory of characteristics and time moving target extraction. At the same time, we propose the construction of the concept of the video with multi label hyper-graph. We put the mapping relationship between two hyper-graph pairwise fusion track and multi label, and complex events can be detected. The experiment results show that our method is better than other methods for precision and recall rate. In the future, we will apply this method to detect other types of events in video.
Acknowledgements
This research has partially been supported by National Natural Science Foundation of China under Grant No. 61773184, 61502206 and 61502208, College Natural Science Research of Jiangsu Province under Grant No. 14KJB520008, Senior Technical Personnel of Scientific Research Fund of Jiangsu University under Grant No. 13JDG126, Research Innovation Program for College Graduates of Jiangsu Province under Grant No. KYLX15_1078, New Technologies and Projects of Affiliated Hospital of Jiangsu University under Grant No. xjs2016035, Medical Research Project of Jiangsu Provincial Health and Family Planning Commission under Grant No. X2017003, Research on hospital management innovation of Jiangsu Hospital Association under Grant No. JSYJY-3-2017-216.
References
[1] Mezaris, V., Scherp, A., Jain, R., Kankanhalli, M.S. Real-life events in multimedia Detection, representation, retrieval, and applications[J], Multimedia Tools and Applications, 2014, (70): 1-6.
[2] Hongeng S, Navatia R, Bremond F. Video-based event recognition-activity representation and probabilistic recognition methods[J],Computer vision and image understanding, 2004, 96:129-162.
[3] Hakeem A, Shah M. Learning, detection and representation of multi-agent events in videos [J]. Artificial Intelligence, 2007,171 (8-9):586-605.
[4] Ruocco, M., Ramampiaro, H. A scalable algorithm for extraction and clustering of event-related pictures [J]. Multimedia Tools and Applications, 2014, (70): 55-88.
[5] Jiang Y, She Q Q, Li M, et al. A transductive multilabel text categorization approach [J]. Journal of Computer Research and Development, 2008, 45 (11):1817-1822.