Lecture Series on Cognitive Technical Systems
In this lecture series, renowned international experts are invited and give talks on their current research. The lectures are open to all researchers and students at the University of Freiburg.
||Thursday, 5.3.2015 14:00
Faculty of Engineering, Building 101, Room 00-010/14
Prof. Thomas Pock, TU Graz
Efficient block optimization methods for computer vision
| ABSTRACT: In this talk I will discuss recent advances in block optimization
methods for minimizing non-smooth optimization problems in computer
vision and image processing. It turns out that a large class of 2D and
3D total-variation regularized problems can be reduced to an algorithm
that computes exact solutions with respect to certain subsets of the
variables in each iteration. For example, if the subsets are 1D total
variation problems, we can efficiently compute their solutions based on
dynamic programming. Furthermore, we can make use of gradient
acceleration techniques to additionally speed up the algorithms.
I will show applications to computing globally optimal minimizers of
total variation regularized stereo problems.
|SHORT BIO: Thomas Pock received his MSc (1998-2004) and his
PhD (2005-2008) in Computer Engineering (Telematik) from Graz
University of Technology. After a Post-doc position at the
University of Bonn, he moved back to Graz University of Technology
where he has been an Assistant Professor at the Institute for
Computer Graphics and Vision. In 2013 Thomas Pock received the START
price of the Austrian Science Fund (FWF) and the German Pattern
recognition award of the German association for pattern recognition
(DAGM) and in 2014, Thomas Pock received an starting grant from the
European Research Council (ERC). Since June 2014, Thomas Pock is a
Professor of Computer Science at Graz University of Technology
(AIT Stiftungsprofessur "Mobile Computer Vision") and a
principal scientist at the Department of Safety and Security
at the Austrian Institute of Technology (AIT).
The focus of his research is the development of mathematical
models for computer vision and image processing in mobile scenarios
as well as the development of efficient algorithms to compute these models.
Prof. Paolo Favaro, University of Bern
Total Variation Blind Deconvolution: The Devil is in the Details
| ABSTRACT: In the past decade a renewed major effort has been devoted to the problem of blind deconvolution.
Many of current approaches are essentially built on an iterative alternating energy minimization where
at each step either the sharp image or the blur function are reconstructed. Much of the success of these
algorithms can be attributed to the use of sparse gradient priors. However, recent work of Levin et al.
has showed that this class of algorithms suffers from a major shortcoming: They favor the no-blur solution,
where the sharp image is the blurry input and the blur is a Dirac delta. In contrast, one can observe
experimentally that these alternating minimization algorithms converge to the desired solution even
when initialized with the no-blur one.
We will show both analysis and experiments to resolve this paradoxical conundrum. We find that both
observations are right. Our analysis is based on the most basic of these algorithms, which was
already introduced in the early work of You and Kaveh in 1996 and later by Chang and Wong in 1998.
We show that the procedure of Chang and Wong does not minimize the cost function it set out to.
In particular, the delayed scaling (normalization) in the iterative step of the blur kernel is fundamental to the convergence of the algorithm. We show that this small detail in the implementation is what allows eluding
the no-blur solution, rather than current variants of Chang and Wong’s algorithm.
We also introduce our own adaptation of this algorithm and show that, in spite of its extreme simplicity,
it is very robust and achieves a performance on par with the state of the art.
|SHORT BIO: Paolo Favaro received the Laurea degree (BSc+MSc) from Università di Padova, Italy in 1999, and the M.Sc. and Ph.D. degree in electrical engineering from Washington University in St. Louis in 2002 and 2003 respectively.
He was a postdoctoral researcher in the computer science department of the University of California, Los Angeles and subsequently in Cambridge University, UK. Between 2004 and 2006 he worked in medical imaging at Siemens Corporate Research, Princeton, USA. From 2006 to 2011 he was Lecturer and then Reader at Heriot-Watt University and Honorary Fellow at the University of Edinburgh, UK. In 2012 he became full professor at Universität Bern, Switzerland.
His research interests are in computer vision, computational photography, machine learning, signal and image processing, estimation theory, inverse problems and variational techniques.
Prof. Stefan Roth, TU Darmstadt
Locally Rigid Models for 3D Scene Flow
| ABSTRACT: 3D scene flow estimation -- simultaneously recovering geometry and 3D
motion from stereo video sequences -- remains a challenging task, despite much progress in both classical disparity and 2D optical flow
estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a
collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixel-to-segment
assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed model combines an
occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Inference is carried out using discrete
fusion moves. I will demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI
benchmark. In particular, the locally rigid scene representation enables 3D scene flow to outperform dedicated optical flow techniques
at 2D motion estimation, thus for the first time realizing the theoretical advantage of having multiple views. This is joint work with Christoph Vogel and Konrad Schindler.
|SHORT BIO: Stefan Roth received the Diplom degree in Computer Science and Engineering from the University of Mannheim, Germany in 2001. In 2003 he received the ScM degree in Computer Science from Brown University, and in 2007 the PhD degree in Computer Science from the same institution. Since 2007 he is on the faculty of Computer Science at Technische Universität Darmstadt, Germany (Juniorprofessor 2007-2013, Professor since 2013). His research interests include probabilistic and statistical approaches to image modeling, motion estimation, human tracking, and object recognition. He received several awards, including honorable mentions for the Marr Prize at ICCV 2005 (with M.~Black) and ICCV 2013 (with C.~Vogel and K.~Schindler), the Olympus-Prize 2010 of the German Association for Pattern Recognition (DAGM), and the Heinz Maier-Leibnitz Prize 2012 of the German Research Foundation (DFG). He serves as associate editor for the International Journal of Computer Vision (IJCV) and IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
Prof. Guy Gilboa, Technion
A transform-based variational framework
| ABSTRACT: A new framework is proposed for variational analysis and processing. It defines a functional-based nonlinear transform and inverse-transform. The framework is developed in the context of total-variation (TV), but it can be generalized to other one-homogeneous functionals. An eigenfunction, with respect to the subdifferential of the functional, such as a disk in the TV case, yields an impulse in the transform domain. This can be viewed as a generalization of known spectral approaches, based on linear algebra, which are extensively used in image-processing, e.g. for segmentation. Following the Fourier intuition, a spectrum can be computed to analyze dominant scales in the image. Moreover, new nonlinear low-pass, high-pass and band-pass filters can be designed with very precise scale selection. Relations to sparse signals and to nonlocal-TV will be discussed. An example of a texture processing application will be shown, illustrating possible benefits of this new framework.
|SHORT BIO: Guy Gilboa received his PhD from the Electrical Engineering Department, Technion – Israel Institute of Technology in 2004. He was a postdoctoral fellow at UCLA, hosted by Prof. Stanley Osher (2004-2007). He later joined a start-up company developing 3D sensors (a pioneering technology at that time) which was sold to Microsoft. In Microsoft he conducted research within the Kinect project. In 2011 he moved to Philips Healthcare as a senior researcher in the field of medical imaging. In March 2013 he went back to the academy for tenure-track, and was appointed an assistant professor at the Electrical Engineering Dept., Technion. His interests are PDE and variational methods for image processing and computer vision. He received several prizes including the Eshkol prize by the Ministry of Science, Technion outstanding PhD thesis, Vatat scholarship and Gutwirth prize.
Prof. Vittorio Ferrari
University of Edinburgh
Large-scale object localization in ImageNet
| ABSTRACT: ImageNet is a large hierarchical database of object classes containing 15 million images. Unfortunately only a small fraction of them is manually annotated with bounding-boxes, and none with pixelwise segmentations. This prevents useful developments, such as learning object detectors for thousands of classes. Our goal is to automatically populate ImageNet with many more bounding-boxes and segmentations, by leveraging existing manual annotations and by transferring knowledge between classes across the semantic hierarchy of ImageNet. I will present results of our large-scale knowledge transfer approach on half a million images, covering more than 500 object classes. These auto-annotated bounding-boxes and segmentations are available for download at our website
|SHORT BIO: Vittorio Ferrari is a Reader at the School of Informatics of the University of Edinburgh which he joined in December 2011. He leads the CALVIN research group on visual learning. He received his PhD from ETH Zurich in 2004 and was a post-doctoral researcher at INRIA Grenoble in 2006-2007 and at the University of Oxford in 2007-2008. Between 2008 and 2012 he was Assistant Professor at ETH Zurich, funded by a Swiss National Science Foundation Professorship grant. In 2012 he received the prestigious ERC Starting Grant, and the best paper award from the European Conference in Computer Vision for his work on large-scale image auto-annotation. He is the author of over 60 technical publications, most of them in the highest ranked conferences and journals in computer vision and machine learning. He regularly serves as an Area Chair for the major vision conferences.
Prof. Mubarak Shah
University of Central Florida
Discovering Motion Primitives for Unsupervised Grouping and One-shot Learning of Human Actions, Gestures, and Expressions
| ABSTRACT: Automatic analysis of videos is one of most challenging problems in Computer vision. In this talk I will introduce the problem of action, event, and activity representation and recognition from video sequences. I will begin by giving a brief overview of a few interesting methods to solve this problem, including trajectories, volumes, and local interest points based representations.
The main part of the talk will focus on a newly developed framework for the discovery and statistical representation of motion patterns in videos, which can act as primitive, atomic actions. These action primitives are employed as a generalizable representation of articulated human actions, gestures, and facial expressions. The motion primitives are learned by hierarchical clustering of observed optical flow in four dimensional, spatial and motion flow space, and a sequence of these primitives can be represented as a simple string, a histogram, or a Hidden Markov model.
I will then describe methods to extend the framework of motion patterns estimation to the problem of multi-agent activity recognition. First, I will talk about Similarity invariant matching of motion patterns in order to recognize simple events in surveillance scenarios. I will end the talk by presenting a framework in which a motion pattern represents the behavior of a single agent, while multi-agent activity takes the form of a graph, which can be compared to other activity graphs, by attributed inexact graph matching. This method is applied to the problem of American football plays recognition.
|SHORT BIO: Dr. Mubarak Shah, Agere Chair Professor of Computer Science, is the founding director of Center for Research in Computer Visions at University of Central Florida (UCF). He is a co-author of three books (Motion-Based Recognition (1997), Video Registration (2003), and Automated Multi-Camera Surveillance: Algorithms and Practice (2008)), all by Springer. He has published extensively on topics related to visual surveillance, tracking, human activity and action recognition, object detection and categorization, shape from shading, geo registration, visual crowd analysis, etc. Dr. Shah is a fellow of IEEE, IAPR, AAAS and SPIE. In 2006, he was awarded the Pegasus Professor award, the highest award at UCF, given to a faculty member who has made a significant impact on the university. He is ACM Distinguished Speaker. He was an IEEE Distinguished Visitor speaker for 1997-2000, and received IEEE Outstanding Engineering Educator Award in 1997. He received the Harris Corporation's Engineering Achievement Award in 1999, the TOKTEN awards from UNDP in 1995, 1997, and 2000; SANA award in 2007, an honorable mention for the ICCV 2005 Where Am I? Challenge Problem, and was nominated for the best paper award in ACM Multimedia Conference in 2005 and 2010. At UCF he received Scholarship of Teaching and Learning (SoTL) award in 2011; College of Engineering and Computer Science Advisory Board award for faculty excellence in 2011; Teaching Incentive Program awards in 1995 and 2003, Research Incentive Award in 2003 and 2009, Millionaires' Club awards in 2005, 2006, 2009, 2010 and 2011; University Distinguished Researcher award in 2007 and 2012. He is an editor of international book series on Video Computing; editor in chief of Machine Vision and Applications journal, and an associate editor of ACM Computing Surveys journal. He was an associate editor of the IEEE Transactions on PAMI, and a guest editor of the special issue of International Journal of Computer Vision on Video Computing. He was the program co-chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
Jun.-Prof. Björn Ommer
University of Heidelberg
A Compositional Approach to Shape for Visual Recognition of Objects and Abnormalities
| ABSTRACT: Shape is a natural, highly prominent characteristic of objects that human vision utilizes everyday. But despite its expressiveness, shape poses significant challenges for category-level object detection in cluttered scenes: Object form is an emergent property that cannot be perceived locally but becomes only available once the whole object has been detected and segregated from the background. Thus we address the detection of objects and the assembling of their shape simultaneously. We learn a dictionary of meaningful contours by contour co-activation and a joint, consistent placement of all contours in an image yields a robust shape-based detection of objects in a multiple instance learning framework. The compositional grouping of object parts can be extended to the parsing of complete scenes and videos and it provides a feasible approach to abnormality detection. Therefore, video frames are parsed by establishing a set of hypotheses that jointly explain all the foreground while, at the same time, trying to find normal training samples that explain the hypotheses. Consequently, a direct detection of abnormalities can be avoided. This is crucial since the class of all irregular objects and behaviors is infinite and thus no (or by far not enough) training samples are available. Time permitting I will also talk about recent extensions to shape matching and multiple instance learning.
|SHORT BIO: Björn Ommer is an assistant professor for Scientific Computing and leads the Computer Vision Group at the University of Heidelberg. He has studied computer science at the University of Bonn, Germany. In 2003 he was awarded a diploma (~M.Sc.) in computer science. After that he pursued his doctoral studies at ETH Zurich Switzerland and received his Ph.D. degree from ETH Zurich in 2007. His dissertation "Learning the Compositional Nature of Objects for Visual Recognition" was awarded the ETH Medal. Thereafter, he held a post-doctoral position at the UC Berkeley. He serves as an associate editor for the journal Pattern Recognition Letters. Björn is one of the directors of the HCI, a member of the extended board of directors of the IWR, principle investigator in the research training group 1653 ("Spatio/Temporal Graphical Models and Applications in Image Analysis"), and a member of the executive board and scientific committee of the Heidelberg Graduate School HGS MathComp. He has received the Outstanding Reviewer Award at CVPR 2010 and CVPR 2011.
Dr. Felix Bießmann
Linear and nonlinear methods for decoding myoelectric and neural signals
| ABSTRACT: The field of Machine Learning offers powerful non-linear and non-parametric methods. A number of recent studies however showed empirically that in many biomedical or neuroscientific applications simple linear methods are sufficient. This talk discusses two examples.
The first part will focus on myoelectric control of hand prostheses (joint work with Otto Bock). Here a linear decoder achieves competitive results in predicting the 2D-hand position from muscle activity compared to state-of-the-art non-linear methods.
The second part (joint work with MPI Tuebingen) will review some attempts to quantify how much neural information can be decoded from functional magnetic resonance imaging signals (fMRI). Using non-linear and linear methods we decode intracranially measured neural data from simultaneously recorded high-resolution fMRI. We investigate which neural features are reflected in the fMRI signal and which fMRI features carry neural information. These estimates can help to guide parameter selection in fMRI studies for optimal decoding of stimulus information. In line with other studies we find that linear methods predict neural activity as good as non-linear decoders. Moreover our results indicate that simple second order mutual information estimators capture the neural information contained in fMRI signals as good as estimators that take into account higher order moments.
Prof. Michael Unser
Multi-dimensional steerable wavelets for bioimage analysis and processing
| ABSTRACT: We present an extended family of wavelet transforms that are self-reversible (tight frame property) and that can be spatially rotated
in a data-adaptive fashion by forming suitable linear combinations (steerability property) in any number of dimensions. Our construction takes advantage of the remarkable invariance properties (with respect to translation, scaling and rotation) of the Riesz transform and its higher-order variant. Our leading example of transform provides a gradient-like multiresolution decomposition of the image. This representation is well suited for the extraction of directional features in 2-D or 3-D. We demonstrate that it can yield a concise (and, for the most part, reversible) representation of 3-D biomedical images by a wavelet sketch (in the spirit of David Marr's primal sketch). We also introduce higher-order Riesz wavelets that behave like multi-scale derivatives. These wavelets can be combined to construct application-specific detectors, while preserving the self-reversibility property. In particular, we present a signal-adapted design based on principal components, which performs remarkably well for image denoising. We also demonstrate the usefulness of these tools for decomposing micrographs in morphological components (e.g. spot vs. filaments) and for detecting keypoints and junctions in cellular arrays.
|SHORT BIO: Michael Unser is Professor and Director of EPFL's Biomedical Imaging Group, Lausanne, Switzerland. His main research area is biomedical image
processing. He has a strong interest in sampling theories, sparsity, multiresolution algorithms, wavelets, and the use of splines for image processing. He has published about 200 journal papers on those topics. From 1985 to 1997, he was with the Biomedical Engineering and Instrumentation Program, National Institutes of Health, Bethesda USA, conducting research on bioimaging and heading the Image Processing Group. Dr. Unser has held the position of associate Editor-in-Chief (2003-2005) for the IEEE Transactions on Medical Imaging and has served as Associate
Editor for the same journal, the IEEE Transactions on Image Processing, and the IEEE Signal Processing Letters. He is currently member of the editorial boards of Foundations and Trends in Signal Processing, and Sampling Theory in Signal and Image Processing. He co-organized the first IEEE International Symposium on Biomedical Imaging (ISBI2002) and was the founding chair of the technical committee on Bio Imaging and Signal Processing (BISP) on the IEEE Signal Processing Society. Dr. Unser is a fellow of the IEEE (1999), an EURASIP fellow (2009), and a member of the Swiss Academy of Engineering Sciences. He is the recipient of several international prizes including three IEEE-SPS Best Paper Awards and two Technical Achievement Awards from the IEEE (2008
SPS and EMBS 2010).
Prof. Richard Hartley
Australian National University
Riemann Geometry Approach to Computational Problems in Geometric Vision
| ABSTRACT: Many optimization topics in Computer Vision geometry involve determining an optimal point on a manifold. Examples are rotation estimation, in which an optimal point in rotation space is sought for the relative placement of a pair of cameras, or finding an Essential matrix, represented by a point on the so-called "Essential Manifold" of all allowable essential matrices. This talk examines the topology and geometry of such manifolds with a view to discovering results about convergence, optimality, and convexity of algorithms such as estimation or averaging on these manifolds. As a specific example a new Riemannian metric for the essential manifold is introduced, inherited fro the Euclidean metric of its embedding in the space of 3x3 matrices. The (somewhat complicated) form of the geodesics is computed, along with exponential and logarithm maps on this manifold. The application to rotation or essential matrix averaging is explained, in particular robust L1 estimation techniques.
|SHORT BIO: Professor Richard Hartley is head of the computer vision group in the Department of Information Engineering, at the Australian National University, where he has been since January, 2001. He is also the Program Leader for the Autonomous Systems and Sensor Technology Program of National ICT Australia, a research centre set up in 2002 with funding from the Australian Government. Dr. Hartley worked at the General Electric Research and Development Center from 1985 to 2001. During the period 1985-1988, he was involved in the design and implementation of Computer-Aided Design tools for electronic design and created a very successful design system called the Parsifal Silicon Compiler. In 1991 he was awarded GE's Dushman Award for this work. He became involved with Image Understanding and Scene Reconstruction working with GE's Simulation and Control Systems Division. This division built large-scale flight-simulators. Dr. Hartley's projects in this area were in the construction of terrain models and texture mosaics from aerial and satellite imagery. This involved research in camera modelling, stereo matching and scene reconstruction. In 1991, he began an extended research effort in the area of applying projective geometry techniques to reconstruction using calibrated and semi-calibrated cameras. This research direction was one of the dominant themes in computer vision research throughout the 1990s. In 2000, he co-authored (with Andrew Zisserman) a book for Cambridge University Press, summarizing the previous decade’s research in this area. From 1995 he was GE project leader for a shared-vision project with Lockheed-Martin involving design and implementation of algorithms for an AFIS (fingerprint analysis) system being developed under a Lockheed-Martin contract with the FBI. This involved work in feature extraction, interactive fingerprint editing and fingerprint database matching. He also investigated application of fingerprint scanners to point of sale systems. Under this contract he also led work on applications of DNA database technology.
Dr. Michael Ginger
Honda Research Institute Europe
Movement Learning and Control for Robots in Interaction
| ABSTRACT: In this talk, I will introduce our work in the field of movement generation for redundant robots, such as humanoid- or complex industrial
robots. The talk will cover our perspective on movement representations and control, and draw the line from reactive control approaches towards more integral schemes using optimization and planning. I'll finally report on our work in the field of imitation learning and present an interactive control and learning system that allows a humanoid robot to learn new movement skills from a human tutor. Finally, I will show a set of experiments in which a humanoid robot interactively learns and then performs the learned movement skills bimanually and in different situations.
|SHORT BIO: Michael Gienger received the diploma degree in Mechanical Engineering from the Technical University of Munich, Germany, in 1998. Then, he was
research assistant at the Institute of Applied Mechanics of the Technical University of Munich, addressing issues in design and realization of biped robots. He received his Ph.D. degree with a dissertation on Design and Realization of a Biped Walking Robot. After this, Michael Gienger joined the Honda Research Institute Europe in Germany in 2003. Currently he works as a principal scientist in the field of robotics. His research interests include mechatronics, robotics, control systems, imitation learning and cognitive systems. He is also a scientific coordinator for the Research Institute for Cognition and Robotics (CoR-Lab) of the Bielefeld University.
Prof. Dr. Daniel Cremers
Convex Optimization for Computer Vision
| ABSTRACT: Numerous computer vision problems can be solved by variational methods and partial differential equations. Yet, many traditional approaches correspond to non-convex energies giving rise to suboptimal solutions and often strong dependency on appropriate initialization. In my presentation, I will show how problems like image segmentation, multiview stereo reconstruction and optic flow estimation can be formulated as variational problems. Subsequently, I will introduce methods of convexification which allow the computation of globally optimal or near-optimal solutions. The resulting algorithms provide robust solutions, independent of initialization and compare favorably to spatially discrete graph theoretic approaches in terms of computation time, memory requirements and accuracy.
|SHORT BIO: Daniel Cremers received Bachelor degrees in Mathematics (1994) and Physics (1994), and a Master's degree in Theoretical Physics (1997) from the University of Heidelberg. In 2002 he obtained a PhD in Computer Science from the University of Mannheim. Subsequently he spent two years as a postdoctoral researcher at the University of California at Los Angeles (UCLA) and one year as a permanent researcher at Siemens Corporate Research in Princeton. From 2005 until 2009 he was associate professor at the University of Bonn. Since 2009 he holds the chair for Computer Vision and Pattern Recognition at the Technical University, Munich. His publications received several awards, including the award of Best Paper of the Year 2003 by the International Pattern Recognition Society and the 2005 UCLA Chancellor's Award for Postdoctoral Research. In December 2010 the magazine Capital listed Prof. Cremers among "Germany's Top 40 Researchers Below 40".