Success Rate

Full text

(1)Modular Decomposition and Analysis of Registration based Trackers Abhineet Singh, Ankush Roy, Xi Zhang and Martin Jagersand.

(2) Registration based Tracking • Find the optimal warp or geometric transformation that registers each image in a sequence with the template. 1.

(3) Registration based Tracking ଴. ‫ܠ‬૚. ‫ܠ‬૛. ‫ ܠ‬૚૚ ‫ ܠ‬૚૛. ‫ܠ‬૜. ‫ܠ‬૝. ‫ ܠ‬૚૜ ‫ ܠ‬૚૝. ‫ܠ‬૞. ௧. ‫ܠ‬૟. ‫ ܠ‬૚૞ ‫ ܠ‬૚૟. ‫ܠ‬ૠ. ‫ܠ‬ૡ. ‫ ܠ‬૚ૠ ‫ ܠ‬૚ૡ. ‫ૢܠ‬. ‫ ܠ‬૚૙. ‫ ܠ‬૚ૢ ‫ ܠ‬૛૙ 2.

(4) Motivation • Learning/detection based trackers are not suitable for tasks requiring fast and high precision tracking – Visual Servoing – Virtual reality – SLAM. 3.

(5) MTF Usage Example – Multi Target Tracking. UAV Trajectory Estimation Online Image Mosaicing 4.

(6) Motivation • Progress in registration based tracking has become fragmented since Lucas Kanade[Lucas81] – myriad of contributions that are not well connected. • An intuitive way exists to relate these by decomposing the tracking task into three modules – most contributions are confined to only one or two of these modules. • Modular Tracking Framework (MTF)[Singh16] to easily plug in new methods B. Lucas, T. Kanade, “An iterative image registration technique with an application to stereo vision”, 1981 A. Singh, M. Jagersand, “Modular Tracking Framework: A Unified Approach to Registration based Tracking”, 2016, available at: http://webdocs.cs.ualberta.ca/~vis/mtf/. 5.

(7) 6.

(8) Search Method • Optimization method that finds the SSM parameters corresponding to the warped patch that maximizes the AM similarity function. • Two main categories: – Gradient descent • Newton or Gauss Newton method. – Stochastic Search • Sampling based 7.

(9) Search Method – Examples (Gradient Descent) • Variants of Lucas Kanade (LK)[Baker01] method – Forward Additive (FALK) • ‫ = ܜ‬૙ , ‫ ܜ‬, ‫ିܜ‬૚ + ‫ܜ‬ ઢ‫ܜܘ‬. – ‫ିܜܘ = ܜܘ‬૚ + ઢ‫ܜܘ‬. – Inverse Additive (IALK). • uses constant approximation of ‫ ܜ‬computed from ૙. – Forward Compositional (FCLK). • ‫ = ܜ‬૙ , ‫. ܜ‬, ‫ ܜ‬, ‫ିܜ‬૚ ઢ‫ܜܘ‬. – ‫ିܜܘ = ܜܘ‬૚ ∘ ઢ‫ܜܘ‬. – Inverse Compositional (ICLK) • ‫ = ܜ‬૙ , ‫ ܜ‬, ‫ ܜ‬, ‫ିܜ‬૚ ઢ‫ܜܘ‬. – ‫ିܜܘ = ܜܘ‬૚ ∘ ઢ‫ି ܜܘ‬૚. • Efficient Second Order Minimization (ESM)[Benhimane04] – combines FCLK and ICLK 8. S. Baker, I. Matthews, “Equivalence and Efficiency of Image Alignment Algorithms”, 2001 S. Benhimane, E. Malis, “Real-time image-based tracking of planes using efficient second-order minimization”, 2004.

(10) Search Method – Examples (Stochastic) • Nearest Neighbor Search (NN) [Dick13] – generate samples by warping ૙ – find the nearest neighbor to ‫ ( ܜ‬, ‫ିܜ‬૚ ) and update ‫ିܜ‬૚ with the inverse of the corresponding ‫ܜ‬ – combined with ICLK for stability (NNIC). • Particle Filter (PF) [Kwon14] – generate samples by warping ‫ ( ܜ‬, ‫ିܜ‬૚ ) – compute weight for each and estimate ‫ ܜ‬as weighted average of samples T. Dick et. al, “Realtime Registration-Based Tracking via Approximate Nearest Neighbor Search”, 2013 J. Kwon, H. S. Lee, F. C. Park, K. M. Lee, “A Geometric Particle Filter for Template-Based Visual Tracking”, 2014. 9.

(11) Appearance Model. • A similarity measure between two image patches: – candidate warped patch from the current image – template extracted from the initial image. • Two main categories: – SSD like – Robust[Richa12] 10. R. Richa, R. Sznitman, G. Hager, “Robust Similarity Measures for Gradient-based Direct Visual Tracking”, 2012.

(12) Appearance Model – Examples • Sum of Squared Differences (SSD)[Baker01] ଵ ଶ. – ૙ , ‫ = ܜ‬− ∥ ૙ −‫∥ ܜ‬ଶ. • Sum of Conditional Variance (SCV)[Richa11] ଵ ଶ. – ૙ , ‫ = ܜ‬− ∥ ‫| ܜ‬૙ −‫∥ ܜ‬ଶ – Using several joint distributions computed from corresponding sub regions of ‫ ܜ‬and ૙ gives a variant called LSCV[Richa14]. • Reversed Sum of Conditional Variance (RSCV)[Dick13] – ૙ , ‫= ܜ‬. ଵ − ଶ. ∥ ૙ − ૙ |‫∥ ܜ‬ଶ. • Zero mean Normalized Cross Correlation (ZNCC)[Ruthotto10] ଵ ଶ. – ૙ , ‫ = ܜ‬− ∥. ۷૙ ିஜబ ஢బ. −. ۷‫ି ܜ‬ஜ౪ ஢౪. ∥ଶ 11. R. Richa, R. Sznitman, R. Taylor, G. Hager, “Visual Tracking Using the Sum of Conditional Variance”, 2011 R. Richa, et. al, “Direct visual tracking under extreme illumination variations using the sum of conditional variance”, 2014 L. Ruthotto, “Mass-preserving registration of medical images”, 2010.

(13) Appearance Model – Examples (cont’d) • Mutual Information (MI)[Dame10] – ૙ , ‫∑ = ܜ‬௜௝ ࡵ࢚ ࡵ૙ ( , )

(14) . ࡼࡵ࢚ ࡵ૙ ௜,௝ ࡼࡵ࢚ ௜ ࡼࡵ૙ ௝. • Cross Cumulative Residual Entropy (CCRE)[Richa12] – ૙ , ‫∑ = ܜ‬௜௝ ∗ࡵ࢚ ࡵ૙ ( , )

(15) . ࡼ∗ࡵ࢚ ࡵ ௜,௝ ૙. ࡼ∗ࡵ࢚ ௜ ࡼࡵ૙ ௝. • Normalized Cross Correlation (NCC)[Scandaroli12] – ૙ , ‫= ܜ‬. ۷૙ ିஜబ ۷‫ି ܜ‬ஜ౪ . ஢బ ஢౪ 12. A. Dame, E. Marchand, “Accurate Real-time Tracking Using Mutual Information”, 2010 G. G. Scandaroli, M. Meilland, R. Richa, “Improving NCC-Based Direct Visual Tracking”, 2012.

(16) State Space Model. • A warping function or geometric transformation that represents the set of allowable image motions of the object – embodies any constraints placed on the warp parameter space • search efficiency • alignment precision. – includes • degrees of freedom (DOF) of allowed motion • actual parameterization of the warping function 13.

(17) State Space Model – Examples. • Translation : S 2

(18) –

(19) ,

(20) . • Isometry/Euclidean : 3. • Similitude/Similarity: 4 –

(21) ,

(22) cos

(23) sin

(24) sin

(25) cos . • Affine : 6 –

(26) ,

(27) cos

(28) sin –

(29) , 1

(30)

(31)

(32) sin

(33) cos 1

(34)

(35) 14.

(36) State Space Model – Examples (cont’d) • Homography : ܵ = 8. •. ଵା௣భ ௫ೖ ା௣మ ௬ೖ ା௣య ଵା௣ర ௬ೖ ା௣ఱ ௫ೖ ା௣ల ் – ‫ܠ ܟ‬௞ , ‫= ܘ‬ , ଵା௣ళ ௫ೖ ା௣ఴ ௬ೖ ାଵ ଵା௣ళ ௫ೖ ା௣ఴ ௬ೖ ାଵ SL3 Homography[Benhimane04] : ܵ = 8. ‫ݔ‬௞ – ‫ ܠ ܟ‬௞ , ‫∗ ࡳ = ܘ‬ො ‫ݕ‬ ௞. • = ∑଼௜ୀଵ ௜ ௜ ∈ (3) , ௜ ∶ (3) basis. • Corner Homography : ܵ = 8 ‫ݔ‬௞ – ‫ ܠ ܟ‬௞ , ‫∗ ࡳ = ܘ‬ො ‫ݕ‬ ௞ • =. argmin ∑ସ௜ୀଵ ࡹ. ௜௫ ௜௫ + ଶ௜ିଵ ∗ − + ௜௬ ௜௬ ଶ௜. ଶ. ௜௫ • ௜ = 1 ≤ ≤ 4 : bounding box corners ௜௬ ‫ݔ‬ ࡳ ∗ො ‫= ݕ‬. ௚బబ ௫ା௚బభ ௬ା௚బమ ௚భబ ௬ା௚భభ ௫ା௚భమ ். ,. ௚మబ ௫ା௚మభ ௬ା௚మమ ௚మబ ௫ା௚మభ ௬ା௚మమ. ݃଴଴ with ࡳ = ݃ଵ଴ ݃ଶ଴. ݃଴ଵ ݃ଵଵ ݃ଶଵ. ݃଴ଶ ݃ଵଶ ݃ଶଶ. 15.

(37) State Space Model – Examples (Demo). 16.

(38) System design. • Search Method (SM) – Finds the warp that maximizes the similarity measure 5.

(39) System Design. 18.

(40) Evaluation Benchmarks TMT. PAMI. UCSB. LinTrack. 19.

(41) MJ Note CRV16 Results • Two datasets – Tracking for Manipulation Tasks (TMT)[Roy15] • 109 sequences, 70592 frames. – Visual Tracking Dataset provided by UCSB[Gauglitz11] • 96 sequences, 6889 frames. • Alignment Error ஺௅ used to compare tracking result with ground truth – =. !. "#$%

(42) − &". • Success Rate (SR) plot used to measure tracking performance – x axis : error threshold ' ∈ 0, 20 – y axis : fraction of frames with () < ' A. Roy, X. Zhang, N. Wolleb, C. P. Quintero, M. Jagersand, “Tracking Benchmark and Evaluation for Manipulation Tasks ”, 2015 S. Gauglitz, T. Hollerer, M. Turk. “Evaluation of interest point detectors and feature descriptors for visual tracking”, 2011. 20.

(43) Note CRV 16 results. 21.

(44) Evaluation Methodology - Datasets • 4 large publicly available datasets with a total of over 100K frames – TMT – UCSB – LinTrack – PAMI. Without Subsequences With Subsequences Dataset Trackable SubTotal Trackable Total Sequences sequences Frames Frames Frames Frames TMT 109 70592 70483 1090 390470 389380 UCSB 96 6889 6793 960 41170 40210 LinTrack 3 12477 12474 30 68700 68670 PAMI 28 16511 16483 280 91400 91120 Total 236 106469 106233 2360 591740 589380. • Each sequence tested from 10 different starting points for an effective total of nearly 600K frames 22.

(45) Evaluation Methodology – Performance Metric • Alignment Error ஺௅ – =. !. "#$%

(46) − &". • Success Rate (SR) – – – –. x axis : error threshold ' ∈ 0, 20 y axis : fraction of frames with () < ' each sequence tracked from 10 different starting points measures both accuracy and robustness. • Failure Rate (FR) – reinitialize whenever () exceeds 20 – count the number of such failures – additional metric for tracking robustness. 23.

(47) Success Rate. Results: Learning vs. 2DOF Registration Based Trackers (Accuracy) 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0. 5. 10. 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 FCLK:10.348 0.3 ICLK: 9.820 ESM:10.017 0.25 NN: 9.135 0.2 CMT: 9.076 0.15 RCT: 5.978 0.1 TLD: 6.686 0.05 MIL: 7.975 0 15 20 0 Error Threshold. NNIC: 9.353 LMS:13.825 LMES:10.599 DSST:11.650 KCF: 9.359 Struck:10.697 GOTURN: 1.514. 5. 10. 15. Euclidean pixel registration error. 20 24.

(48) Results: Learning vs. 2DOF Registration Based Trackers (Accuracy – VOT 2016). 25.

(49) Results: Learning vs. 2DOF Registration Based Trackers (Speed) 1469.91. FCLK. 2261.73. ICLK 1262.25. ESM. 1406.49. NN NNIC. 748.31 167.32. LMS LMES GOTURN. 135.80 51.80. MIL 15.39 KCF. 56.52. FRG 25.33 TLD 19.92 Struck 39.40 RCT DSST CMT 0. 83.93 55.84 76.18. 500. 1000. 1500. 2000. 2500. Speed in FPS 26.

(50) MTF vs. ViSP Speeds (FPS) Average speedup:. . . 31.75 691.74 43.72. 1898.09. 41.85 1652.96 19.28. 308.17 32.44. 655.92. 38.97 546.53 13.31. 52.23 40.88 650.14 49.01 548.61. 1.52. 46.83 34.72. 650.05. 33.51 0. 10. 10. 1. 2. 10 Speed in FPS. 599.64 3. 10. ESM/SSD:17.89 ESM/ZNCC:18.72 FALK/MI:30.81 FALK/SSD:11.19 FALK/ZNCC:15.90 FCLK/MI:3.92 FCLK/SSD:14.02 FCLK/ZNCC:20.22 ICLK/MI:15.98 ICLK/SSD:39.50 ICLK/ZNCC:43.41 Average:21.79. 10. 4 27.

(51) Results –Learning Based Trackers 0.9. 0.9. 0.8. 0.8. Speeds in Frames per Second 18. PF1K. 679. Success Rate. NNIC 0.7. 0.7. 0.6. 0.6. 2625. NN1K. 2656. ICLK. 3103. 0.4. 0.4. 0.3. 0.3 KCF DSST 0.2 FCLK ESM ICLK 0.1 FALK NNIC. 0.2 0.1. 5 10 15 Error Threshold. 0 20 0. 2150. IALK. 0.5. 0.5. 0 0. NN10K. RCT TLD CMT IALK NN1K NN10K PF1K. 5 10 15 Error Threshold. FCLK. 904. FALK. 891. ESM. 805. CMT. 38. RCT. 37 10. TLD KCF. 90. DSST. 89. 20 10. 0. ZNCC with Translation. 10. 2. 10. 28. 4.

(52) Results –Learning Based Trackers (Demo). ZNCC with Translation. 29.

(53) Results – Search Methods 0.9. 0.8. Success Rate. 0.7. 0.8. 0.8. 0.6. 0.7. 0.7. 0.5. 0.6. 0.6. 0.4. 0.5. 0.5. 0.3 0. 5. 10. 15. 200.40. 0.9. 0.9. LSCV. 5. 10. 15. RSCV. 0.4 20 0. 0.9. MI. 0.8. 0.8 Success Rate. 0.9. ZNCC. SSD. 5. 10. 15. 20. NCC. 0.8. 0.7 0.7. 0.7 0.6. 0.6. 0.6. FCLK ICLK ESM FALK IALK NNIC. 0.5 0.5. 0.4 0. 0.5. 0.4. 5. 10. 15. 20. 0.3 0. 5. 10 15 Error Threshold. 0.4 20 0. 5. 10. 15. 30 20.

(54) 31.

(55) Results – Search Methods (Demo) • The four variants of Lucas Kanade fail at different times • Sequences from TMT. RSCV with Homography 32.

(56) Results – Search Methods (Demo) • The four variants of Lucas Kanade fail at different times • Sequence from UCSB. 33.

(57) Results – Search Methods (Demo) • NN has more jitter than LK type SMs – decreases with more samples. 34.

(58) Results – Search Methods (Demo) • NNIC is more robust to motion blur • Sequence from UCSB. 35.

(59) Results – Appearance Models 0.9. 0.9. 0.9. FCLK. ICLK 0.8. 0.8. 0.8. Success Rate. 0.7. 0.7. 0.7 0.6. 0.6. 0.6 0.5. 0.5. 0.4 0 0.9. Success Rate. ESM. 0.5. 0.4. 5. 10. 15. 0.3 20 0 0.9. FALK. 5. 10. 15. IALK. 0.4 20 0 0.9. 0.8. 0.8. 0.8. 0.7. 0.7. 0.7. 0.6. 0.6. 0.6. 0.5. 0.5. 0.5. 0.4. 0.4. 0.4. 0.3 0. 5. 10. 15. 0.3 20 0. 5. 10 15 Error Threshold. 0.3 20 0. 5. 10. 15. 20. NNIC. SSD ZNCC SCV RSCV LSCV MI NCC. 5. 10. 15. 36. 20.

(60) 37.

(61) Results – Appearance Models (Demo). FCLK with Homography 38.

(62) Results – Appearance Models (Demo). 39.

(63) Results – State Space Models 0.9. Success Rate. 0.8 0.7 0.6 Translation Isometry Similitude Affine SL3 Homography Corner Homography Homography. 0.5 0.4 0.3 0. 5. 10 Error Threshold. ESM with ZNCC. 15. 20. 40.

(64) Results – State Space Models (Demo). 41.

(65) Results – State Space Models (Demo). 42.

(66) Conclusions? Questions • Tested different combinations of sub modules leading to several interesting observations that were missing in the original papers. – used two large datasets with over 77,000 frames in all to ensure statistical significance.. • Compared robust similarity metrics with traditional SSD type measures. • Compared formulations against online learning based trackers to validate their usability for precise tracking • Provided an open source tracking framework called MTF using which all results can be reproduced. Conclusions. – can also address practical tracking requirements with its efficient C++ implementation MTF is available at: http://webdocs.cs.ualberta.ca/~vis/mtf/ along with all datasets and this presentation. 43.

(67) 44.

(68) References • S. Baker and I. Matthews, “Equivalence and Efficiency of Image Alignment Algorithms”, CVPR 2001 • S. Benhimane and E. Malis, “Real-time image-based tracking of planes using efficient second-order minimization ”, IROS 2004 • A. Dame and E. Marchand, “Accurate Real-time Tracking Using Mutual Information”, ISMAR 2010 • T. Dick, C. Perez, A. Shademan and M. Jagersand, “Realtime Registration-Based Tracking via Approximate Nearest Neighbor Search”, RSS 2013 • S. Gauglitz, T. Hollerer and M. Turk. “Evaluation of interest point detectors and feature descriptors for visual tracking”, IJCV 2011 • J. Kwon, H. S. Lee, F. C. Park, and K. M. Lee, “A Geometric Particle Filter for Template-Based Visual Tracking”, TPAMI 2014 • R. Richa, R. Sznitman, R. Taylor and G. Hager, “Visual Tracking Using the Sum of Conditional Variance”, IROS 2011 45.

(69) References (cont’d) • R. Richa, R. Sznitman and G. Hager, “Robust Similarity Measures for Gradient-based Direct Visual Tracking”, CIRL Technical Report 2012 • R. Richa, M. Souza, G. Scandaroli, E. Comunello and A. Wangenheim, “Direct visual tracking under extreme illumination variations using the sum of conditional variance”, ICIP 2014 • A. Roy, X. Zhang, N. Wolleb, C. P. Quintero and M. Jagersand, “Tracking Benchmark and Evaluation for Manipulation Tasks ”, ICRA 2015 • L. Ruthotto, “Mass-preserving registration of medical images”, Thesis 2010 • G. G. Scandaroli, M. Meilland and R. Richa, “Improving NCCBased Direct Visual Tracking”, ECCV 2012 • A. Singh and M. Jagersand, “Modular Tracking Framework: A Unified Approach to Registration based Tracking”, arXiv:1602.09130, 2016 46.

(70) Results –Learning Based Trackers (Demo). 47.

(71)

No results found