Albert-Ludwigs-Universität Freiburg
Robot Learning Proseminar/Seminar
SS 2021
Robot Learning Lab
Evaluation
▪
Abstract →
At most 2 pages
▪
Presentation →
At most 20 minutes
▪
Summary →
At most 7 pages
excluding bibliography and figures
▪
Final grade → Abstract, Presentation, Summary, Seminar participation
Evaluation Bachelor’s Due Date Master’s Due Date
Paper Abstract 18/06/2021 18/06/2021 Seminar Presentation 23/07/2021 16/07/2021 Paper Summary 30/07/2021 30/07/2021
Bachelor’s Seminar:https://rl.uni-freiburg.de/teaching/ss21/robotlearning-bachelor/
Enrollment Procedure
Select three papers in decreasing order of preference
Register for the seminar in HISInOne
Students selected based on HISInOne Priority Suggestion
Students assigned a paper based on their paper preference
Bachelor’s Form Master’s Form
On 29/04/2021 On 29/04/2021
●
Tremendous progress on complex, high dimensional data
○ Speech Recognition
○ Computer Vision
○ Natural Language Understanding
●
Autonomous systems smart enough to operate in the real world
Robot Learning
Perception
●
Complex environments
●
Noisy observations and sensors
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow, Waleed et. al., 2017
Rohit Mohan and Abhinav Valada, "EfficientPS: Efficient Panoptic Segmentation", arXiv preprint arXiv:2004.02307, 2020.Patil, Ashok Kumar & Kumar, G Ajay & Kim, Tae-Hyoung & Chai, Young-Ho. (2018). Hybrid approach for alignment of a pre-processed three-dimensional point cloud, video, and CAD model using partial point cloud in retrofitting applications.
International Journal of Distributed Sensor Networks. 14. 155014771876645. 10.1177/1550147718766452.
Unknown, Open World
●
Unknown world → Many unlabelled samples
●
Uncertainty estimation
●
Adversarial attacks
Towards Open Set Recognition, Scheirer et. al., 2012
Explaining and Harnessing Adversarial Examples, Goodfellow et. al., 2014
Autonomous Decision Making
●
Reinforcement learning for short- and long-term decision making
Reinforcement Learning
●
Model free RL
○ Adapts to complex scenarios
○ Directly optimise policy
○ Data intensive
●
Model-based RL
○ Learns a world model
○ Promise of better generalisation
Learning Latent Dynamics for Planning from Pixels, Hafner et. al., 2019
Expensive Real World Data
●
Sim2Real
○ Domain adaptation
○ Action and dynamics noise
●
Offline RL
○ Large amounts of unstructured data
○ Little annotated / expert data
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection. Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen D4RL: Datasets for Deep Data-Driven Reinforcement Learning, Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine
Weak- and Self-Supervision
●
Provide labels for simpler tasks
○ Object presence and absence
○ Consistency over time
○ Viewpoint invariance
●
Reduce oversight
○ Automatic resets
○ Reward labelling
Time-Contrastive Networks: Self-Supervised Learning from Video, Sermanet et. al., 2018 TossingBot: Learning to Throw Arbitrary Objects, Zeng et. al., 2019.
Bachelor Seminar Topics
1. PolarNet: An Improved Grid Representation for
Online LiDAR Point Cloud Semantic Segmentation
● Uses polar representation instead of spherical or BEV
● Proposes a novel ring-based CNN for use with the polar representation
● Beat the SOTA while using lesser parameters
Supervisor: Nikhil Gosala - Link to paper
Source: Zhang et al. 2020
▪ Point clouds quantised and projected into a polar grid
▪ Ring CNN applied to polar grid to learn the semantic mapping.
Supervisor: Nikhil Gosala - Link to paper
1. PolarNet: An Improved Grid Representation for
Online LiDAR Point Cloud Semantic Segmentation
Source: Zhang et al. 2020
● Addresses the challenge of drone navigation in urban environments
● Uses data from cars and bicycles to address lack of drone-relevant data
● Learns the steering angle and collision probability to control the drone
Supervisor: Nikhil Gosala - Link to paper
2. DroNet: Learning to Fly by Driving
Source: Loquercio et al.
Supervisor: Nikhil Gosala - Link to paper
2. DroNet: Learning to Fly by Driving
Video Link
Supervisor: Juana Valeria Hurtado - Link to paper
3. Let there be Color: Joint End to End Learning of
Global and Local Image Priors for Automatic Image
Colorization with Simultaneous Classification
● Approach to introduce colour into grayscale images
● Merges patch-level local features with image-level global features
Source: IIzuka et al.
Supervisor: Juana Valeria Hurtado - Link to paper
3. Let there be Color: Joint End to End Learning of
Global and Local Image Priors for Automatic Image
Colorization with Simultaneous Classification
● Local features capture texture or objects
● Global features capture time of day, location (indoors/outdoors)
● Changing the global features can help re-imagine the image
Supervisor: Juana Valeria Hurtado - Link to paper
4. Context Encoders - Feature Learning by
Inpainting
● Approach to hallucinate missing / blacked out regions in image
● Validates the ability of a network to understand structure in scenes
● L2 loss → Captures overall structure
● Adversarial loss → Chooses one mode out of several possible ones
Supervisor: Juana Valeria Hurtado - Link to paper
4. Context Encoders - Feature Learning by
Inpainting
Supervisor: Dr. Daniele Cattaneo - Link to paper
5. R2D2: Repeatable and Reliable Detector and
Descriptor
● Approach to generate reliable and repeatable keypoints
● Detect-and-Describe instead of Detect-then-Describe
● Good keypoints:
○ Repeatable (generate strong cues, like corners)
○ Reliable (ability to uniquely match across images)
● Predict discriminativeness along with keypoints
Source: Revaud et al.
Supervisor: Dr. Daniele Cattaneo - Link to paper
5. R2D2: Repeatable and Reliable Detector and
Descriptor
Source: Revaud et al.
Ground Truth Keypoints Keypoint Repeatability Keypoint Reliability
Supervisor: Dr. Daniele Cattaneo - Link to paper
6. SegMatch: Segment Based Place Recognition
in 3D Point Clouds
● A place recognition approach for 3D point clouds for use in SLAM
● Generates and matches 3D segments:
○ Alleviates the need for accurate object detection or notion of “object”
○ More informative as opposed to keypoint descriptors
Supervisor: Dr. Daniele Cattaneo - Link to paper
6. SegMatch: Segment Based Place Recognition
in 3D Point Clouds
▪
Video Link
Segment Matching Loop Closure Detection
Supervisor: Eugenio Chisari - Link to paper
7. Sim-to-Real Transfer of Robotic Control with
Dynamics Randomization
● Bridge the “reality-gap” between simulation and real-world
● Randomizes the robot simulator dynamics for policy generalisation
● Policies trained in simulation can directly be used in the real world!
● LSTM used to model the memory nodes of the DeepRL network
Source: Peng et al.
Policy Network Value Network
Supervisor: Eugenio Chisari - Link to paper
7. Sim-to-Real Transfer of Robotic Control with
Dynamics Randomization
Video Link
Supervisor: Eugenio Chisari - Link to paper
8. Continuous Control for High-Dimensional State
Spaces: An Interactive Learning Approach
● Introduces human corrective feedback into the policy optimisation phase
● No reward function needed
● Sample size greatly reduced because of expert intervention
● Human actively corrects the errors of the policy and physically interacts with
the system during the learning process
Source: Perez-Dattari et al.
Supervisor: Eugenio Chisari - Link to paper
8. Continuous Control for High-Dimensional State
Spaces: An Interactive Learning Approach
Supervisor: Daniel Honerkamp - Link to paper
9. Playing Atari with Deep Reinforcement
Learning
● CNN to directly learn a control policy from high-dimensional input using RL
● Network is trained with a variant of Q-Learning
● Single model trained once plays multiple Atari games
● This paper led to a $500 million acquisition of DeepMind by Google
Source: Mnih et al.
Supervisor: Daniel Honerkamp - Link to paper
9. Playing Atari with Deep Reinforcement
Learning
Source: Mnih et al.
Supervisor: Daniel Honerkamp - Link to paper
10. Recurrent Models of Visual Attention
● CNNs have high computational complexity
● CNNs replaced with RNNs to control the computation complexity
○ RNN adaptively selects only the most relevant features for a given task
○ The next location to be attended to depends on the past regions and the task
● The computation time is independent of the input size
● Model is non-differentiable and is trained using RL for a specific task
Source: Mnih et al.
Supervisor: Daniel Honerkamp - Link to paper
10. Recurrent Models of Visual Attention
Source: kevinzakka GitHub
● The network incrementally combines the different regions to learn a
dynamic representation of the image
Supervisor: Dr. Tim Welschehold- Link to paper
11. Efficiency and Equity are Both Essential: A
Generalized Traffic Signal Controller with Deep
Reinforcement Learning
● An RL method to learn a traffic signal control policy to optimise traffic flow
● Reward function accounts for both Equity (Variance) and Efficiency (Mean)
● Adaptive discounting used to reduce the reward for transition phases
Source: Yan et al.
Supervisor: Dr. Tim Welschehold - Link to paper
12. Use the Force, Luke! Learning to Predict
Physical Forces by Simulating Effects
● Approach to infer forces exerted on objects from a video
● Predicts physical information that can be used to recreate the interaction
● Force and contact point prediction used to deform object mesh to recreate
the action
Source: Ehsani et al.
Supervisor: Dr. Tim Welschehold - Link to paper
12. Use the Force, Luke! Learning to Predict
Physical Forces by Simulating Effects
Video Link
Master Seminar Topics
Supervisor: Nikhil Gosala - Link to paper
1. Perceive, Attend, and Drive: Learning Spatial
Attention for Safe Self-Driving
Source: Wei et al.
● Lot of computation in perception is wasted on regions that don’t matter!
● An approach to automatically attend to important regions of the image for
use with motion planning
● The attention mask increases safety by allowing focused computation
Supervisor: Nikhil Gosala - Link to paper
1. Perceive, Attend, and Drive: Learning Spatial
Attention for Safe Self-Driving
Video Link
Supervisor: Nikhil Gosala - Link to paper
2. Orthographic Feature Transform for Monocular
3D Object Detection
● An approach to convert a 2D frontal perspective view to a 3D orthographic
view for 3D object detection in monocular images
● Orthographic Feature Transform maps features in frontal view to features in
the Bird’s eye view
● Generates a holistic view where object scale and distance are meaningful
Source: Roddick et al.
Supervisor: Nikhil Gosala - Link to paper
2. Orthographic Feature Transform for Monocular
3D Object Detection
Source: Roddick et al.
Video Link
Supervisor: Juana Valeria Hurtado - Link to paper
3. Neural Scene Graphs for Dynamic Scenes
Source: Ost et. al.
● A neural rendering approach for accurate view synthesis of dynamic
scenes using scene graph representations
● The network decomposes the scene into static and dynamic objects and
encodes their representations using scene graphs
● Novel scenes can be generated by manipulating the learnt scene graphs
Supervisor: Juana Valeria Hurtado - Link to paper
3. Neural Scene Graphs for Dynamic Scenes
Source: Ost et. al.
4. Semantic Object Prediction and Spatial Sound
Super-Resolution with Binaural Sounds
Supervisor: Juana Valeria Hurtado - Link to paper
● Approach to perform semantic object prediction using binaural sound
● Network trained using a teacher-student approach
○ A vision network acts as a teacher
○ Audio network is the student and learns to generate the same results as the teacher
● Depth estimation and spatial sound super resolution used to boost results
Source: Vasudevan et al.
4. Semantic Object Prediction and Spatial Sound
Super-Resolution with Binaural Sounds
Supervisor: Juana Valeria Hurtado - Link to paper
Source: Vasudevan et al.
5. xMUDA: Cross-Modal Unsupervised Domain
Adaptation for 3D Semantic Segmentation
Supervisor: Dr. Daniele Cattaneo - Link to paper
● Transfer features learnt on one dataset to another dataset
● Exploits multiple modalities to improve the result of unsupervised domain
adaptation between datasets
● Each modality learns to mimic the better qualities of the other, making the
overall domain adaptation objective better
Source: Jaritz et al.
5. xMUDA: Cross-Modal Unsupervised Domain
Adaptation for 3D Semantic Segmentation
Supervisor: Dr. Daniele Cattaneo - Link to paper
Source: Jaritz et al.
6. STaR: Self-Supervised Tracking and
Reconstruction of Rigid Objects in Motion with
Neural Rendering
Supervisor: Dr. Daniele Cattaneo - Link to paper
● Self-supervised tracking and dynamic scene reconstruction approach from
multi-view RGB videos
● Explicitly handles rigid motion, which is the main cause of failure in
competing approaches
● NeRFs and rigid poses are jointly optimised to achieve the goal
Source: Yuan et al.
6. STaR: Self-Supervised Tracking and
Reconstruction of Rigid Objects in Motion with
Neural Rendering
Supervisor: Dr. Daniele Cattaneo - Link to paper
Video Link
7. Self-Supervised Correspondence in Visuomotor
Policy Learning
Supervisor: Eugenio Chisari - Link to paper
● Approach to improve generalisation of visuomotor policy learning using
self-supervised correspondences
● Self-supervised dense correspondence training generalises across
different objects using very little data
● Also propose a technique for multi-camera time-synchronised dense
spatial correspondence learning
Source: Florence et al.
7. Self-Supervised Correspondence in Visuomotor
Policy Learning
Supervisor: Eugenio Chisari - Link to paper
Video Link
8. Accelerating Reinforcement Learning with
Learned Skill Priors
Supervisor: Eugenio Chisari - Link to paper
● Approach taps into previously learned skills to learn a prior over skills
● Deep Latent Variable model jointly learns embedding space of skills and
skill priors
● Skill priors accelerate downstream policy learning on complex tasks
Source: Pertsh et al.
8. Accelerating Reinforcement Learning with
Learned Skill Priors
Supervisor: Eugenio Chisari - Link to paper
Source: Pertsh et al. Competing approaches Their approach
9. Asymmetric Self-Play for Automatic Goal
Discovery in Robotic Manipulation
Supervisor: Daniel Honerkamp - Link to paper
● A zero-shot generalisation approach to solve multiple unseen tasks using
asymmetric self-play
● Two agents play against one another - Alice and Bob
○ Alice proposes a challenge to Bob
○ Bob tries to solve that challenge
● Bob learns a policy by observing Alice’s trajectory and cloning its behaviour
● ASP always gives an achievable goal and comes up with many novel goals
9. Asymmetric Self-Play for Automatic Goal
Discovery in Robotic Manipulation
Supervisor: Daniel Honerkamp - Link to paper
Video Link
10. Language as a Cognitive Tool to Imagine
Goals in Curiosity-Driven Exploration
Supervisor: Daniel Honerkamp - Link to paper
● A DeepRL approach that automatically imagines out-of-distribution goals
for goal discovery and open-ended learning
● The model interprets the OOD goal using separation of:
○ Learned goal-achievement reward function
○ Policy relying on deep-sets, gated attention and object-centered representation
Source: Colas et al.
10. Language as a Cognitive Tool to Imagine
Goals in Curiosity-Driven Exploration
Supervisor: Daniel Honerkamp - Link to paper
Source: Colas et al. Goals provided by the social partner
Imagined goals (Soma goals are meaningless)
11. Reinforcement Learning with Videos:
Combining Offline Observations with Interaction
Supervisor: Dr. Tim Welschehold - Link to paper
● Model learns a policy using human
demonstration videos along with online data collected by the robot
● Human experience videos don’t have reward or
motion cues
○ Model estimates a reward using domain-invariant representation of the images
● The model successfully learns challenging vision
based skills with much less data
Source: Schmeckpeper et al.
11. Reinforcement Learning with Videos:
Combining Offline Observations with Interaction
Supervisor: Dr. Tim Welschehold - Link to paper
Source: Schmeckpeper et al.
12. A GMM Layer Jointly Optimized with
Discriminative Features within a Deep Neural
Network Architecture
Supervisor: Dr. Tim Welschehold - Link to paper
● Propose a NN with GMM as its final layer and
jointly trained using Asynchronous SGD
● The best of both worlds:
○ Superior feature extraction of NNs
○ Well-studied and structured classification of GMMs
● Evaluate the improvement obtained by:
○ Joint training of the GMM and the other layers
○ Changing network depth in the GMM network and the vanilla one
○ Using a GMM model instead of a vanilla NN
Source: Variani et al.