• No results found

In future work, we plan to improve the motion compensation scheme that is used in Chapter 3 with a novel RANSAC-based motion model. Our goal is to replace the binary mask which separates human from camera motion by a more sophisticated one that will segment background in several “super-planes” and produce separate motion model for each one. These larger areas are approximately planar and are connected via a projective transformation (expressed by the homography matrix) between different frames. The technique will use superpixel technology (i.e. SLICO) on colour images in order to segment image in similar colour cues and merge them in larger areas based on the motion flow that they follow. Accurate motion flow can be provided by either tracking densely sampled interest points or by matching SIFT descriptors and then using RANSAC homography for outlier extraction.

A motion saliency algorithm is currently designed in order to index the objects that move within the video scene. Several Bag-of-Visual-Words (BoVW), one for each saliency region, can afterwards be used in order to encode action descriptors in a more sophisticated way and tackle the absence of geometric properties that is observed in the BoVW encoding schemes.

Finally, we plan to use depth maps from Kinect cameras and decrease even more the computational cost that is required for ADL detection and recognition, so that we can achieve real time processing. In this scenario, we will work on subtracting human subject from the background using depth map analysis and create regions of interest in the body parts using superpixel technology. By providing a skeleton for each visible body part, we plan to use histograms of depth gradients in conjunction with skeleton correspondences to describe the desired activity.

Original publications and references

This thesis has been partially published in the following papers:

• Journals

1. K. Avgerinakis, A. Briassouli, Y. Kompatsiaris, ”Optimal trajectories and Motion Boundary Activity Areas for Activities of Daily Living (ADL) recog-nition”, Journal of Ambient Intelligence and Smart Environments J AISE, accepted, 2013.

2. K. Avgerinakis, A. Briassouli, Y. Kompatsiaris, ”Activity detection using Sequential Statistical Boundary Detection (SSBD)”, Computer Vision and Image Understanding CV IU , submitted, 2014.

• Books

1. K. Avgerinakis, A. Briassouli, I. Kompatsiaris, ”Activity Detection and Recognition of Daily Living Events”, book chaper in Springer book, Com-prehensive Health Monitoring and Personalized Feedback Using Multimedia Data

• Conferences

1. K. Avgerinakis, K. Adam, A. Briassouli, Y. Kompatsiaris, “Moving camera human activity localization and recognition with motionplanes and multiple homographies”, submitted to ICIP 2015

2. S. Poularakis, K. Avgerinakis, A. Briassouli, Y. Kompatsiaris, “Computa-tionally efficient recognition of activities of daily living”, submitted to ICIP 2015

119

3. A. Moumtzidou, K. Avgerinakis, E. Apostolidis, F. Markatopoulou, K.

Apostolidis, T. Mironidis, S. Vrochidis, V. Mezaris, Y. Kompatsiaris, I. Patras,

”VERGE: A Multimodal Interactive Video Search Engine”, Proc. 21st Int.

Conf. on MultiMedia Modeling (MMM15), Sydney, Australia, Jan. 2015.

4. N. Gkalelis, F. Markatopoulou, A. Moumtzidou, D. Galanopoulos, K. Avgeri-nakis, N. Pittaras, S. Vrochidis, V. Mezaris, I. Kompatsiaris, I. Patras, ”ITI-CERTH participation to TRECVID 2014”, Proc. TRECVID 2014 Workshop, Orlando, FL, USA, November 2014.

5. A. Moumtzidou, K. Avgerinakis, E. Apostolidis, V. Aleksic, F. Markatopoulou, C.Papagiannopoulou, S. Vrochidis, V. Mezaris, R. Busch, I. Kompatsiaris,

”VERGE: An Interactive Search Engine for Browsing Video”, Video Browser Showdown (VBS) 2014, Dublin, Ireland, January 2014

6. F. Markatopoulou, A. Moumtzidou, C. Tzelepis, K. Avgerinakis, N. Gkalelis, S. Vrochidis, V. Mezaris, and I. Kompatsiaris, “ITI-CERTH participation to TRECVID 2013,” in TRECVID 2013 Workshop, Gaithersburg, MD, USA, 2013.

7. K. Avgerinakis, A. Briassouli, I. Kompatsiaris. “Activity detection and recognition of daily living events”. In the 1st ACM MM Workshop on Multi-media Indexing and Information Retrieval for Healthcare (MIIRH): held in conjunction with ACM MM 2013, Barcelona, Spain, Oct 2013.

8. K. Avgerinakis, A. Briassouli, I. Kompatsiaris. “Robust Monocular Recog-nition of Activities of Daily Living for Smart Homes”. To be presented at the 9th International Conference on Intelligent Environments (IE2013), Athens, Greece, July 18-19, 2013.

9. K. Avgerinakis, I. Kompatsiaris, ”DemCare action dataset for evaluating dementia patients in a home-based environment”, To be presented at Ambient TeleCare session of Innovation in Medicine and Healthcare (InMed), Athens, 17-19 July 2013.

10. K. Avgerinakis, A. Briassouli , I. Kompatsiaris, ”Recognition of Activities of Daily Living”, 5th International Symposium on Monitoring & Surveillance Research (ISMSR): Healthcare-Safety-Security, Athens, Greece, Nov. 9 2012.

11. K. Avgerinakis, A. Briassouli , I. Kompatsiaris, ”Smoke Detection Using Temporal HOGHOF Descriptors and Energy Colour Statistics from Video”, Firesense Workshop, 8-9 November 2012, Antalya, Turkey.

12. K. Avgerinakis, A. Briassouli, I. Kompatsiaris, ”Video monitoring for ac-tivities of daily living recognition”, AAL Forum, Eidhoven, The Netherlands, Sep. 24-27 2012.

13. K.Avgerinakis, A.Briassouli, I. Kompatsiaris, ”Real Time Motion Changes for New Event Detection and Recognition”, ACCV 2010 Workshop - 10th International Workshop on Visual Surveillance VS 2010, Queenstown, New Zealand, November 2010, pp 1 - 10.

14. K.Avgerinakis, A.Briassouli, I. Kompatsiaris, ”Real Time Illumination Invariant Motion Change Detection”, ACM Multimedia 2010 Workshop -1st ACM ARTEMIS2010 International Workshop, Firenze, Italy, October 2010,pp.75-80.

15. K. Avgerinakis, A. Briassouli, I. Kompatsiaris, ”Video processing for judicial applications”, ict4justice, September 2009 publication, Skopje, FYROM.

Clustering algorithms for Visual Vocabulary

Local based approaches for activity representation demand to quantize the activity feature space and create visual vocabularies(i.e. K-Means cluster centers, GMM means) in order to build fixed size action descriptors. Here we present the most common ones : KMeans and GMM.

B.1 K-means clustering

K-means is the most common and simple technique used for constructing visual vo-cabularies [49], [12], [21], [61], [20]. Its most popular implementation is based on least squares quantization [104], but is computationally cumbersome. Although improved versions [105] have succeeded in minimizing its high computational cost, K-means in [105]

still requires storage space proportional to the square of the number of cluster centers, making it impractical for many cluster centres.

Given a set of feature vectors: X = {¯x1, ¯x2, ..., ¯xL}, where ¯xi ∈ RD, i = {1, 2, ..., L}

derived from the training set videos, K-Means looks for the K optimal cluster centers CC = {¯c1, ¯c2, ..., ¯cK}, ¯cl ∈ RD, l = {1, 2, ..., K} that partition the feature space in the best possible way. For an initial set of K cluster centers {¯c11, ¯c12, ..., ¯c1K} at iteration t = 1, the method of [104] alternates between assignment and update steps. In the assignment step, at iteration t, each observation ¯xi, i ∈ {1, .., L} is assigned to its closest cluster center ¯ctk, k ∈ {1, ..., K}, which yields the lowest within cluster sum of squares (WCSS):

k = arg min

j∈{1,2,...,K}k¯xi− ¯ctjk2, (B.1) 122

so the set Skt of observations in each cluster center ctk at iteration t is: