Welcome — this is a landing page for the Image Processing part of the lecture Image Processing and Computer Graphics. Here we are providing additional information, helpful materials and hints for the exercises, and perhaps occasional errata for the lecture.
If you have any questions, if anything isn't (but should be) working, or if you want to present your results – just ask us tutors and assistants, after all that's why we are here. Come see us at our offices or shoot us an email.
UpFor the structure tensor, reuse your code for the Lucas-Kanade flow from Exercise 2!
Considering what the structure tensor really does, can you think of false positives (FP)? In this case a FP would be some image structure that gets a high "cornerness" score but does not actually contain a corner.
Extra For the Eigen decomposition of a 2x2 matrix, check out this blog entry.
The nicest way to visualize your found keypoint correspondences is to
Make a new image twice as wide as the original images,
copy both original images side-by-side into the new image and then
draw lines between the two image parts.
Our rudimentary image classes do not offer line-drawing builtins, but it's not hard to do:
Point origin = ...
Point target = ...
Point increment = (target-origin)/1000
Point current_point = origin
do 1000 times:
current_point = current_point + increment
set "current_point" in image to red
Extra Want better lines? Check out the famous Bresenham algorithm! Subtle hint: You will have to master the Bresenham algorithm in the Computer Graphics part of the lecture anyway, so why not do it now and get a headstart before the christmas break?
Extra To parallelize a for
loop with OpenMP, put a ”pragma” in front of the loop header:
#pragma omp parallel for
for ( ... ) { ...
Add -fopenmp
to the CXXFLAGS
and LDFLAGS
variables in the makefile. The compiler and runtime will do the rest for you. This pragma also allows for runtime-conditional multithreading:
#pragma omp parallel for if(CONDITION)
for ( ... ) { ...
This loop will be parallelized if and only if CONDITION
evaluates to true
.
Beware of premature optimization! Not all solvers are easily parallelizable.
The nice aspect of the Lucas-Kanade method is that every pixel of the flow field can be treated separately. There are two straightforward ways to solve the system $$\left(\begin{matrix} K_\rho * I_x^2 & K_\rho * I_xI_y\\ K_\rho * I_xI_y & K_\rho * I_y^2 \end{matrix}\right) \left(\begin{matrix} u\\ v \end{matrix}\right) = \left(\begin{matrix} -K_\rho * I_xI_t\\ -K_\rho * I_yI_t \end{matrix}\right)$$ for $u,v$:
Write out the equations: $$\left(K_\rho * I_x^2\right) \cdot u + \left(K_\rho * I_xI_y\right) \cdot v = -K_\rho * I_xI_t$$ $$\left(K_\rho * I_xI_y\right) \cdot u + \left(K_\rho * I_y^2\right) \cdot v = -K_\rho * I_yI_t$$ Now manually isolate $v$ from one equation, then use that to solve the other equation for $u$, then use the result to get $v$ from the first equation. Just like you used to solve a linear system in high school. Remember that convolution ($*$) is not the same as multiplication ($\cdot$)!
Rewrite the system: $$\left(\begin{matrix} u\\ v \end{matrix}\right) = \left(\begin{matrix} K_\rho * I_x^2 & K_\rho * I_xI_y\\ K_\rho * I_xI_y & K_\rho * I_y^2 \end{matrix}\right)^{-1} \left(\begin{matrix} -K_\rho * I_xI_t\\ -K_\rho * I_yI_t \end{matrix}\right)$$ Inverting a 2x2 matrix is easily done by hand, and afterwards you can directly compute $u,v$.
When implementing the Horn-Schunck method, make use of the extreme sparsity of the system matrix (all elements are zero, except for 7 diagonals — for a 640 by 480 image, that makes about four million useful entries out of almost four hundred billion, or roughly 0.001%). For example, if you use the Jacobi method, the update step $$x^{k+1} = D^{-1}\left(b-Mx^k\right)$$ is painfully slow if you step through every entry of $M$. Using the facts that at most 6 entries of any row in $M$ (actually 5 in practice) are nonzero and that we know their position will make your code faster by many orders of magnitude.
It is advisable to implement your own sparse matrix instead of using a CMatrix
for $M$ ... unless you happen to have more than 100 GB of spare RAM when processing the Yosemite sequence.
Understanding the previous point is essential to this exercise, even if you use a faster linear solver. Slide 16 in class 6 can be very difficult to understand if you don't get the right intuition, but grasping how and why this works is very important.
Extra Finding out whether a given matrix has a certain attribute is sometimes simple and straightforward (symmetry, diagonality), other times it's simple but a bit of work (strict diagonal dominance), and then there are cases where it can seem like an arcane art (invertibility, positive/negative (semi-)definiteness). Here there is no substitute for experience. As an example, for the denoising matrix on slide 8 in class 4 you might want to check out the Gershgorin circle theorem (dt. Gerschgorin-Kreise).
A CVector
is a one-dimensional data structure of arbitrary size. A CMatrix
is a two-dimensional data structure, e.g. a grayscale image. A CTensor
is a three-dimensional data structure. Its layers can be retrieved and set as CMatrix
instances. Pixels of a CTensor
can be retrieved and set as CVector
instances.
All of these are templated. In our exercises we usually use float
-data, i.e. CMatrix<float>
etc.
Access pixels of a CMatrix
or CTensor
using their operator()
:
CMatrix<float> my_matrix;
...
float pixel_entry = my_matrix(x, y);
my_matrix(x+1, y+1) = 3.14f;
CTensor<unsigned char> my_tensor;
...
unsigned char pixel_blue_channel = my_tensor(x, y, 2);
my_tensor(x+1, y+1, 0) = (unsigned char)255;
If compiled in debug mode (make debug
), CVector
/CMatrix
/CTensor
instances will check if you access memory out of bounds and throw exceptions if you do. If your program fails with a segmentation fault
(dt. Speicherzugriffsfehler) or SIGSEGV
, this is usually very helpful.
In our exercises, PGM
files are gray scale images, and PPM
files are RGB images. When loading a .ppm file, the CTensor
will encode the color channels in its z dimension. For example, _myCTensor(x,y,2)
accesses the blue component of the RGB pixel at position (x,y). It may, however, be a good idea to use a one-layer CTensor
for gray scale images as well (the class provides the necessary methods) so that you won't have to rewrite your codebase if you want to switch to color images later!
For proper std::rand()
usage, #include
the <cstdlib>
and <ctime>
headers and seed the random number generator using the current system time:
std::srand(std::time(NULL));
The seeding has to be done only once, and it should happen before the first call to std::rand()
.
The Box-Muller method for generating Gaussian distributed random numbers works as follows:
First we need two random numbers uniformly distributed in $[0,1]$. C++'s std::rand()
generates uniformly distributed randoms, but they are integer values in the interval $[0,$RAND_MAX
$]$. We have to convert them:
double r1 = std::rand() / ((double) RAND_MAX);
double r2 = std::rand() / ((double) RAND_MAX);
Now we can create two Gaussian distributed randoms $R_1, R_2$ with mean $\mu=0$ and a given standard deviation $\sigma$ from $r_1, r_2$ with $R_1 = \sqrt{-2 \sigma^2 \ln(r_1)} \cos(2 \pi r_2)$ and $R_2 = \sqrt{-2 \sigma^2 \ln(r_1)} \sin(2 \pi r_2)$. Use std::log()
from the <cmath>
header for the natural logarithm $\ln$.
Note that while we need to generate two initial random numbers to use this method, we do not have to use both $R_1$ and $R_2$. If you want, you can simply ignore one of them.
Extra Use the new random numbers framework coming with C++11 instead of Box-Muller. You will need std::random_device
, std::mt19937
and std::normal_distribution
from the <random>
header. A good usage example can be found here.
The CMatrix
class has methods for the maximum (max()
) and minimum (min()
) pixel values that you need for the PSNR computation.
The logarithm with base 10 can be computed with std::log10()
in <cmath>
.
It's your choice whether you implement the Gaussian and Box filters as 2d filters, or you use the fact that they are separable and implement them as a combination of two 1d filter sweeps. Both variants have pros and cons:
The 2d method allows more filter designs (e.g. asymmetric), although we don't need that for this exercise. On the other hand, keeping boundary conditions in check is more complex.
The two-times-1d method is much faster, especially for large filter sizes; the complexity class is linear in the filter width instead of quadratic! However, the code is usually a bit more involved since you either need two filter functions (one for each direction), a single function that can switch directions or additional work such as transposing the image between filter sweeps.
Do not be fooled by the “recursive” filter: Actually implementing it as a purely recursive function is horribly inefficient! Use the dynamic programming paradigm: Compute $f_0$, then $f_1$, then $f_2$ etc., each time reusing the previous results.
Extra The original [Deriche 1990] paper for the recursive filter assumes that “the original signal is null” outside the image boundaries, and for this case it derives (assuming the image boundaries are $[0,N]$) that $f_{-2} = f_{-1} = g_{N+1} = g_{N+2} = 0$, which gives us the necessary termination cases. As the lecture states, it is “Hard (impossible?) to implement Neumann boundary conditions”.
Extra Filters can do many surprising and/or useful things. Can you guess what each of the following filter masks does, just by looking at it?
$$\left[\begin{matrix} 0 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{matrix}\right] \left[\begin{matrix} 0 & 0.333 & 0\\ 0 & 0.333 & 0\\ 0 & 0.333 & 0 \end{matrix}\right] \left[\begin{matrix} 0 & 1 & 0\\ 0 & 1 & 0\\ 0 & 1 & 0 \end{matrix}\right] \left[\begin{matrix} 0 & 1 & 1\\ 0 & 1 & 0\\ 0 & 1 & 0 \end{matrix}\right]$$ $$\left[\begin{matrix} 0 & 0 & 0\\ 1 & 0 & -1\\ 0 & 0 & 0 \end{matrix}\right] \left[\begin{matrix} 0 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & 0 \end{matrix}\right] \left[\begin{matrix} 0 & 0 & 0\\ 0 & 0 & 0\\ 0 & 1 & 0 \end{matrix}\right] \left[\begin{matrix} 0 & 0 & 0\\ 0 & 0 & 0\\ 1 & 0 & 0 \end{matrix}\right]$$ Implement these filters. Do the results match your expectations? Which of these filters are separable? Why? What do the separated 1d-filter masks look like?