Ph. D. Study Plan
Sune Keller
IT University, Copenhagen
Last Revision: March 2006
Project: PDE-based Video Processing
Starting date: 01-09-2004
Thesis submission date: 31-08-2007
Student Supervisors Sune Høgild Keller, 140275-2331
ITU, Department of Innovation, Image Group Rued Langgaardsvej 7, DK-2300 Kbh. S. Phone: +45 7218 5093
E-mail: [email protected]
Professor Mads Nielsen
ITU, Department of Innovation, Image Group Rued Langgaardsvej 7, DK-2300 Kbh. S. Phone: +45 7218 5075
E-mail: [email protected]
Ass. Professor François Bernard Lauze ITU, Department of Innovation, Image Group Rued Langgaardsvej 7, DK-2300 Kbh. S. Phone: +45 7218 5070
E-mail: [email protected]
Changes April 2006
The following sections have been updated:
• Study Visit Abroad – I will most likely not do a longer stay.
• Educational Requirements – I have taken another course.
• Teaching and Student Project Supervision – I have taught and supervised more and begun registering other duty work.
• Papers, Reports, Presentations and Conferences – Talks – more activities.
• Leave – I might still go on leave to do industrial development of my project, but nothing is certain yet and thus it is left out of the plan.
Changes October 2005
The following sections have been updated:
• Educational Requirements – I have taken another course.
• Teaching and Student Project Supervision – I have taught and supervised more.
• Papers, Reports, Presentations and Conferences – more activities.
• Leave – a new section.
• Time Schedule – leave and switch of two milestones.
Project Description
Objective
Advanced model based schemes for digital image and image sequence restoration, especially in-painting, have been developed over the last 5-10 years. Today rather simple schemes are used for digital video processing in broadcasting chains. The objective of this project is to apply and further develop methods used for restoration and inpainting to the field of video processing, focusing on video format conversion involving enhancement of space and time resolution, a work already begun in my master thesis [1].
Background
The digital technology has invaded the television and video media, the DVD has taken over from VHS, digital cameras and video input cards have made it possible to watch and edit video on PC’s and at the same time CRT displays are replaced by modern and larger plasma and LCD displays and projectors. Conventional television sets, PC screens, flat panel displays, projectors and digital cameras have different display formats. They differ by screen height and width in pixels (spatial resolution), by the number of frames displayed each second (temporal resolution) and by the manner each frame is scanned. This gives rise to a number of different video formats, and conversion between the formats are needed, especially up-conversions requiring enhancement in the form of more pixels of information than what is contained in the original video sequence.
Schemes used for up-conversions in video processing today are often developed ad hoc and heu-ristically to fit a certain low cost hardware platform and the focus seems to be on solving the practical problem at hand without any thought on the underlying theory ([2], [3]).
Schemes developed for inpainting of regions of missing image (sequence) data take their offspring in an attempt to model images and image sequences as physical entities taking into account, that an image sequence is a projection of the real, physical world. By translation of the causality, ordering and coherence of the physical world onto the recordings of the world by using mathematical models of variation, high quality digital inpainting can be done ([4]). This theoretical framework and its modelling is valid for video processing in general and in my master thesis it is shown, that it can be applied to the video format conversion called deinterlacing with success. The master thesis was the first attempt to convert an inpainting scheme into a video format conversion scheme and this Ph. D. project will continue that work.
Why has nobody else tried this before? Because the inpainting is a rather novel discipline to image sequence processing and the schemes are computationally heavy. Therefore it has never been introduced to the world of video processing, a world mainly rooted in electrical engineering, while inpainting largely has been researched in the world of mathematics and computer science.
compensated (MC) schemes giving substantial improvements over non-motion compensated schemes ([3], [5] and [6]). In video processing very simple methods for ME and the following MC schemes are used, where as ME and MC in inpainting integrates the flow in the afore mentioned advanced theoretical framework ([7], [8]).
Theoretical Framework
The theoretical framework uses as it first step Bayes’ Inference. In Bayes two probability terms are formulated, likelihood and prior. The likelihood term mathematically states the data already in existence; a set of pixels you want to keep. The prior term is a mathematical model of what you think is supposed be; how you based on the known data can fill in the blanks between the kept original pixels, typically given as a variation that tries to model the causality, order and coherence in video recordings of the world, e.g. that changes over time is due to motion and lightening. Variations like Total Variation (TV) are such (weak) priors. Even though TV seems a simple model mathematically, it is very complex and one of the most advanced priors currently applicable to image sequences.
The a posteriori resulting from the likelihood and the prior in Bayes’ is then to be maximized (MAP: maximize a posteriori) to get a result as close to the real/correct solution as possible given your mathematical models. To do so, you reformulate the problem as an energy functional of the image sequence, where the task is to minimize the energy to get the optimal solution. To do so you rewrite the problem as a set of Partial Differential Equations (PDE) to be solved. In a few cases the optimal solution can be found directly, but in most cases iterative methods are needed to get as close to the optimal solution as possible.
Research Questions
First the key issues that needs to be addressed. Resolution enhancements to be researched are:
• Deinterlacing – the conversion of interlaced scan video to progressive scan video by creating the missing lines – has been the subject of my master thesis. A motion adaptive (MA) total variation PDE based deinterlacer has been developed, implemented and tested. Using ME and MC instead of MA will improve PDE based deinterlacing significantly. So the question is: Will PDE-based MC deinterlacing work and how much better than known methods will it be? Deinterlacing can be seen as an enhancement either spatially or temporally (see [1] or [3] for details) and is therefore closely related to the two next enhancements.
• Super Resolution (SR) is enhancement of the resolution in the 2D spatial dimensions only. Deinterlacing gives a doubling of the pixel density in a given image sequence, but SR can be either less (e.g. PAL 576x720 to XGA 768x1024 pixels), the same or more (e.g. 576x1024 to 1600x2000) and the question is: How much can you increase the resolution and get a high image quality? This also depends on whether the goal is to make stills, increase a TV-input to the resolution of a LCD-screen or something third. So will PDE based (MC) SR work and how good will it be in a given setting? This can be decided by testing and comparing to the outputs of known SR methods.
how many new frames can be inserted in a sequence without loss of quality and how does it compare to known methods?
These three enhancements can also be combined for certain uses, e.g. if an interlaced PAL signal is to be shown on an 768x1024 progressive scan plasma screen, then deinterlacing followed by SR is needed.
Besides these three resolution enhancements, other PDE-based applications can be investigated to the extend time permits:
• Given two camera positions, e.g. for a football match, any camera angle in between the two can be generated by choice of the viewer.
• A scene is described by a high resolution 2D photographic image and a low resolution 3D depth map acquired by a laser scan. Combining the information from these two can give a high resolution 3D image of the scene. This can be applied to other multimodalities as well, e.g. in medical imaging to transfer information from a high resolution MR scan to a low resolution PET scan to get SR PET.
Other interesting areas of application can most likely be found. The key issue for these others including the two given just above is whether the problem can be described and solved using the theoretical framework outlined in this study plan.
Some additional issues and questions that are highly likely to be addressed as a part of the de-velopment of the resolution enhancement schemes are:
• Statistical image sequence analysis to detect whether a sequence is progressive or interlaced to choose wether to deinterlace or not.
• Motion/optical flow: improvement and optimization of ME for the PDE-based MC schemes.
• As most ME methods are optimized for progressive image sequences incl. PDE-based ME, special care has to be taken when redesigning for use on interlaced image sequences.
• Total Variation is a rather primitive model of images and image sequences. Can the use of other distributions/variations than TV improve the schemes? Can these other priors then give solvable PDE's that improve the results?
• Can the schemes be improved by better numerical implementations?
• Can the schemes be improved by better data initialization?
• Collecting a set of image sequences making up a good general representation of video material to give realistic testing.
• Finding and using test sequences used by others for easy comparison of results.
• Can the commonly used gradient descent solution for PDE’s be replaced by other iterative methods to improve on quality of the results and/or the reduction of computational complexity?
• Can iterative methods be improved?
• Can iterative solutions be replaced by direct solutions?
Plan from the beginning
The overall plan to understand and develop PDE based video processing is given here. It is a less structured parallel to the Time Schedule that follows later.
Phase 1:
Get full and in depth understanding of the theoretical framework outlined in this study plan and the MC Inpainting work done by Francois Lauze and Mads Nielsen ([7]) and work on PDE based motion estimation ([7], [8] and other).
Attain broader knowledge on image sequence enhancement resolution and motion by literature studies. Identification of research questions: By attaining further knowledge on the subject of PDE-based video processing I might very possibly need to refine, add to and redefine the research questions given in this study plan to optimize the outcome of my work.
First MC PDE based scheme: A method for frame rate doubling.
Search for industrial/business partner, trying to make first contact based on co-work with CCBR (CCBR being a business partner but not yet in the field of tv/video/film).
Phase 2:
Develop, implement and test PDE-based MC deinterlacing, spatial super resolution and super slow/super resolution in time.
Do combinations of the three for specific task(s).
Phase 3:
Find other areas of applications. Define the problems in the framework of Bayes’ Inference and solving by PDE’s, then develop, implement and test solutions.
Phase 4: Visit abroad.
Phase 5: Write thesis.
Study Visit(s) Abroad
The intention of one long and/or several smaller study visit(s) abroad incl. conferences, workshops etc. is to stay in an external research environment and get others angles on ones work. In the best case scenario it will also result in international research collaboration.
I will (pre)visit for a few weeks with The Mathematical Image Analysis Group of Professor Joachim Weickert at the University of Saarland, Germany in the spring or early summer 2006 and also plan go there for 1-2 months in the fall 2006.
In January 2005 I visited the image group at University of North Carolina (UNC) in Chapel Hill, my main host being Prof. Stephen Pizer. In a very packed two days program, I leaned a great deal about medical image analysis, segmentation and advanced video processing and analysis.
Educational Requirements
As a Ph. D. student I am required to obtain 30 ECTS by attending courses, summer schools, con-ferences, workshops etc. I am already in the process of fulfilling the requirement by having completed the following Ph. D. courses:
Foundations of Image Analysis. Ph. D. course at ITU held by Ole Fogh Olsen, Mads Nielsen, Kim Steenstrup Petersen and Francois Lauze, fall 2004, 7.5 ECTS.
Pattern Recognition, Ph.D. study group at ITU headed by Marleen de Bruijne, fall 2004, 7.5 ECTS.
Statistical Models of Images, Ph.D. course at ITU held by Kim S. Pedersen and Martin Lilholm, spring 2005, 2.5 ECTS.
Non-Linear Shape Modelling, Ph.D. course at ITU hosted by Ole fogh Olsen, Lectured by Xavier Pennec, Sarang Joshi and Mads Nielsen, fall 2005, 4.5 ECTS.
Ongoing Image Canon, Ph.D. seminar/study group at ITU organized by Ole Fogh Olsen, Seesions headed by different mebers of the Image Group at ITU 2006, 7.5 ECTS.
Total 22 ECTS so far (29.5 with ongoing).
I intended to take the pedagogical course for Ph.D.’s at ITU but did not have the time in august 2005 when it was offered, and now I have almost no teaching left and don’t expect to take the course as earlier planned.
Independent Studies
Besides taking courses, an important part of a Ph. D. study is to follow up on the development within your area of research by conference attendance, reading papers and other literature. Also some of the background knowledge needed for my research might not be covered in available courses and must therefore be obtained by reading relevant literature. The references [2], [3] and [5] – [8] are examples of this.
Presentational Requirements
According to my contract I am to do a certain amount of duty work at ITU. 560 hours is to be spent teaching. Another 280 hours are to be spent on other non-administrative presentational work at ITU to possibly relieve the scientific staff. So far I have been assigned to the committee that is to get the library at ITU up and running on full scale but don’t expect to spend much time on this if it ever gets to do any work(?). The use of the remainder of the 280 hours will be decided by Mads Nielsen a possible use being The MICCAI 2006 Conference, which is hosted by The Image Analysis Group (a bit of work allready done). So far I have also spend time attending group meetings and being a member of the PhD study board as well as being the webmaster of the home page of the Image Group.
Teaching and Supervision of Student Projects
To meet the requirement of 560 hours of teaching I have taught and supervised:
• The exercise part of the ITU course Image Analysis in the fall semester of 2004. This amounts to 93 hours of teaching.
• In the spring 2005, 2.5 lectures, most of the exercises and a lot of the organization and planning of the course Introduction to Multimedia System (IM). 55% of the (1.152 x) 252 hours, in total 160 hours.
• In the fall 2005, 2.5 lectures, most of the exercises and a lot of the organization and planning of the course MultimediaProgramming (MMMP). 55% of the (1.152 x) 252 hours, in total 160 hours.
• In the spring 2006, 3.5 lectures, 4/11 of the exercises and a lot of the organization and planning of the course MultimediaProgramming (MMMP). 33% of the 270 hours, in total 90 hours planned (most of the hours already done by March 2006).
• In the fall semester of 2004 I supervised a Basic Programming student in the project Ray Tracer. 5 hours.
• The IM project Air Hockey in the spring 2005, 9 hours.
• The IM 2005 summer project Chat room, no hours, case of cheating.
• The project: Netgæt Live, 16-weeks, three students, 50%, 15 hours planned.
In total 427 hours so far (532 with planned hours).
I am thus very close to have all my teaching done. (I have 28 hours to do project supervision or a couple of ‘guest lectures’ at a course.)
Other Duty Work
Department and group meeting:
2005: 30 hours (PhD study board max, I have spend close to 50 hours). 2006 until April 1st: 7.5 hours
PhD study board meetings
2005: 34 hours (12 meetings) + other activities. 2006 until April 1st: 6 hours (3 meetings)
Image Group webmaster 2006 until April 1st: 9 hours.
15/3 2006: 3 hours.
Total: 91.5 hours
Papers, Reports, Presentations and Conferences
It is difficult to plan ahead which parts of your work will result in papers, but so far Francois Lauze, Mads Nielsen and I have filed Patent applications covering the work done in my master thesis as well as its extension into motion compensated methods for Deinterlacing and resolution enhancements in time (SS) and space (SR), which will also cover most of the work planned for this Ph. D. project. I am credited 30% of the invention.
Further on I plan to write articles on my work and discoveries when ever possible and strive to attend as many relevant workshops and conferences as possible and do presentations of my work. Regarding conferences I have attended the IEEE WACV, Motion and PETS 2005 Workshops held in Jan. 2005, the 2005 Scale Space Conference in April 2005 and IEEE MMSP in Oct/Nov 2005. Other conferences that could be relevant to visit are ECCV, ICCV, ICIP, EMMCVPR and others.
Publications
• September 2004: The Patent Application A Method of Adding Information to a Frame or a Field in a Video Sequence has been filed. Inventors: S. Keller, F. Lauze and M. Nielsen.
• April 2005: Sune Keller, Francois Lauze and Mads Nielsen: A Total Variation Motion Adaptive Deinterlacing Scheme, In: Proceedings of Scale Space 2005, 2005.
• October 2005: Sune Keller, Kim S. Pedersen and Francois Lauze: Detecting Interlaced or Progressive Source of Video, In: Proceedings of MMSP 2005.
• September 2005: The Patent Application Method of and Apparatus for Forming a Final ImageSequence has been filed (both US and PCT). Inventors: S. Keller, F. Lauze and M. Nielsen.
• Planned: Sune Keller and François Lauze: Variational Deinterlacing, Technical Report, ITU, 2006, final revision by co-author François Lauze is the only thing missing.
• Planned: Sune Keller, François Lauze and Mads Nielsen: Variational Motion Compensated Deinterlacing, Paper submitted for the 2006 ECCV workshop Statistical Methods in Multi-Image and Video Processing.
Talks
• Presentation of this study plan at the biweekly Foundations of Image Analysis Meeting, ITU, 30/11 2004.
• Presentation of work on a ‘Progressive or Interlaced Video Input Detector’ at the biweekly Foundations of Image Analysis Meeting, ITU, 1/2 2005.
• Presentation of work on Detecting Interlaced or Progressive Source of Video at the Ph.D. course Statistical Models of Images, ITU, May 2005.
• 4 Presentations of our work on Deinterlacing to get funding for industrial use of our patent. Presentations given to board members of CCBR A/S Claus Christiansen and Hervé Gisault and/our other representatives of CCBR. 1/3, 8/6, 17/6 and 9/8 2005.
• Presentation of initial work on Motion Compensated Deinterlacing (MCDI) at the biweekly Foundations of Image Analysis Meeting, ITU, 25/10 2005.
• November 2005: Oral presentation of the paper Detecting Interlaced or Progressive Source of Video at the 2005 IEEE Multimedia Signal Processing Workshop in Shanghai, China.
• December 14th 2005, DIKU: Oral presentation of my work on deinterlacing and planned work on super resolution and super time resolution to members of the Image Group at DIKU and researchers from Philips Research (Display technology).
• January 26th 2006, ITU: Oral presentation of my work on deinterlacing and planned work on super resolution and super time resolution and results of our deinterlacer to Torben Dalgaard, Manager, Technology & Innovation, Picture, Bang & Olufsen a/s.
Poster Presentations
• April 2005: Presentation at the 2005 Scale Space and PDE Methods in Computer Vision Conference of my paper: A Total Variation Motion Adaptive Deinterlacing Scheme.
Leave
By doing the above mentioned presentations to CCBR A/S, we were to be given fundings to start a company developing the methods for motion compensated deinterlacing and super resolution and producing results. This has been postponed as the investor CCBR A/S possibly wants B&O as a co-investor and negotiations are ongoing.
Time Schedule
Milestones ↓
1/9 2004: Day 1. By 1/4 2005: *
* * * *
Patent application filed.9
Completed Foundations of Image Analysis and Pattern Recognition. 9 Attended IEEE WACV, Motion & PETS ’05 workshop. Visited UNC. 9 Paper on Master thesis work accepted at Scale Space 2005. 9
Work on a ‘interlaced or progressive’ input detector: Initial testing completed an documented. 9
By 1/10 2005: *
*
*
Go to the Scale Space 2005 Conference and do presentation of paper. 9 Gain insight in MC Inpainting, motion, and Motion Estimation. 9
Developing, implementing, testing and documenting a PDE-based motion compensated frame doubler. Working on super slow, that is redesigning the frame doubler to add more than one frame between each pair of existing frames. / Developing and implementing a PDE-based motion compensated deinterlacer. 9
By 1/4 2006: * *
*
*
*
Testing and documenting the PDE-based motion compensated deinterlacer. 9 Write technical report on deinterlacing and paper on motion compensated deinterlacing. 9
Start developing and implementing a PDE-based motion compensated super resolution 2x2-scheme. 9
ECTS: Taking the course Non-Linear Shape Modelling at ITU. 9
Do extensive presentation of work to industry to help getting our technology used ‘for real’ and actually make our patenting worth while. 9
Oral presentation at the IEEE MMSP 2005 of my paper: Detecting Interlaced or Progressive Source of Video. 9
By 1/10 2006: *
*
* *
*
*
*
Testing, and documenting a PDE-based motion compensated super resolution 2x2-scheme.
Continue developing and implementing a PDE-based motion compensated super resolution schemes.
Document and test work on PDE-based motion compensated super resolution. Further develop the 2x2 super resolution scheme to other magnification factors and possibly other point spreads functions other then square. Implement, test and document.
Writing paper on motion compensated super resolution, most likely for the EURASIP Journal on Applied Signal Processing, Special Issue on Super-Resolution Enhancement of Digital Video.
ECTS: Follow the Image Canon study group to get final ECTS (I might squeeze in another course.)
Teaching hours: Do teaching to get final 28 teaching hours done.
Other duty work: Continue duty work (webmaster, study board and MICCAI conference etc.).
Pre-visit at The Mathematical Image Analysis Group, University of Saarland, Germany.
Present paper on motion compensated deinterlacing if accepted at conference.
By 1/4 2007: *
*
* *
Developing, implementing, testing and documenting a PDE-based motion compensated frame doubler. Working on super slow, that is redesigning the frame doubler to add more than one frame between each pair of existing frames.
Do combination(s) of the three for specific tasks.
Other video/image processing problems that can be solved using PDE-based methods and the theoretical framework given in the section Background of this study plan.
Shorter study visit abroad. Start writing dissertation.
Other duty work: Continue duty work to get final hours (webmaster, study board etc.).
By 31/8 2007:
*
Possible improvements of PDE based MC deinterlacing, super resolution and super resolution in time.
Write dissertation.
Signatures
_____________________________ _______________________________
Sune Høgild Keller Mads Nielsen
_____________________________ François Bernard Lauze
References
[1] Keller, Sune Høgild: PDE-based deinterlacing, Master Thesis, ITU, June 2004.
[2] Wang, Yao; Ostermann, Jörn; Zhang, Ya-Qin: Video Processing and Communications. Upper Saddle
River, NJ, Prentice Hall, 2002. ISBN: 0-13-017547-1.
[3] Bellers, E. B.; De Haan, G.: Deinterlacing: A Key Technology for Scan Rate Conversion, Elsevier,
Amsterdam, 2000, ISBN: 0-444-50594-6.
[4] Nielsen, Mads: Bayes inference and regularization, 2002, http://www.itu.dk/courses/DFB/E2003/Lectures/filmNote.pdf.
[5] Biswas, Mainak; Nguyen, Truong: A novel de-interlacing technique based on phase correlation motion
estimation, International Symposium on Circuits and Systems, ISCAS, 2003.
[6] Thomas, G. A.: A Comparison of Motion-Compensated Interlace-to-Progressive Conversion Methods.
In: Signal Process.: Image Commun., vol. 12, no. 3, 1998.
[7] Lauze, Francois; Nielsen, Mads: A Variational Algorithm For Motion Compensated Inpainting, in:
Proceedings of BMVC (British Machine Vision Conference), 2004.
[8] Brox, Thomas; Bruhn, Andrés; Papenberg, Nils; Weickert, Joachim: High Accuracy Optical Flow