Segmentation of Image Sequences for Object Oriented Coding
Introduction
The widespread coding techniques like MPEG-1 and 2 work on rectangular blocks
resulting in visual remarkable effects at high compression ratios (e.g. blocking).
Therefore second generation coding techniques like MPEG-4 work on basis of objects where
the image is partitioned into objects instead of blocks.
The image segmentation is done with respect to the human visual system
reducing visual artefacts. Furthermore an object based data
access is supported.
The temporal stability of the segmentation is of major importance
both for an efficient predictive coding of objects and for
object tracking combined with content dependent quality.
E.g. the speaker of a scene might be transmitted in
good quality whereas the background may be coded lossy.
Segmentation
The segmentation relies on centroid linkage region growing [1] and DRF edge
detection [2], in order to combine good global stability with high local
correctness.
It has been developed regarding
- the characteristics of the human visual system,
- the temporal stability for object based data access and for
predictive coding and
- the subjective image partition.
For taking advantage of the characteristics of the human visual system not
only the luminance information but also the chrominance information is used
for the segmentation process which is often neglected in the literature [3]
(see
Figure 1).
Figure 1: Segmentation with and without chrominance information
The second of the above listed requirements is difficult to achieve when segmentation is done for
each image seperately. So processing of an image takes into account the
segmentation of the preceding image if no scene change has been detected.
The segmentation works in two different modes: intraframe segmentation and
interframe segmentation.
Processing in intraframe mode is done purely 2D whereas in interframe mode
the segmentation is done also on basis of the segmentation of the preceding
image.
Compared to other existing approaches the temporal stability is improved by
including also motion information into the segmentation process.
Results can be seen in Figure 2 or as MPEG video:
with
or without boundary adaptation. The solution
seems to be something inbetween.
Figure 2: Temporal Segmentation: Images 1, 2, and 15 (with and without
boundary adaptation)
In order to achieve a segmentation conformable to a subjective image partition,
a hierarchical merging of regions is done. The goal is difficult to reach since there
is no information on the image semantics. Nevertheless motion information has
already been proven to be useful for semantic segmentation [4, 5].
In our contribution chrominance information is used as a second semantic
feature, e.g. it can be regarded as a property of the object material.
So a three layered hierarchy is built up. The first layer represents
the segmentation described above. In the second layer regions of the first layer are
merged with respect to similar chrominance and motion. The highest layer corresponds
to a semantic segmentation, where regions of the second layer are merged in case
of similar motion (see Figure 3).
Figure 3: Three Segmentation Layers
This hierarchy also has been temporal stabilized.
So, when building up the hierarchy for the actual image, it is tried first to reconstruct the relations
of the hierarchy of the preceding image. For the reconstruction the duration
of a particular constellation in the past is considered.
On basis of these hierarchy levels coding of the sequence can be done.
The layers are suitable both for an efficient prediction of old objects and for supporting
an object based data access up to choosing semantic objects.
Object Oriented Coding of Chrominance Information
The second segmentation layer has been successfully used for efficient coding of
chrominance information. As the human eye isn't as sensible to chrominance information
as to luminance information, coding can be done there very lossy, whereas for
luminance coding a more accurate technique should be used.
Compression factors of 1000 were reached for QCIF sources (compared to
full chrominance information) supposing that
coding of object contours and positions has to be done for greyscale images
anyway.
As can be seen in
Figure 4, the colours become less brilliant, but chrominance
edges stay steep, just detailed information is lost, e.g. the sky in the background.
Figure 4: Original image and coded with only 22 different
chrominance pairs (instead of 25344)
References
| [1] |
R. M. Haralick and L. G. Shapiro:
Image segmentation techniques,
Computer Vision, Graphics, and Image Processing, vol. 29, pp. 100-132, 1985 |
| [2] |
J. Shen and S. Castan:
Further results on DRF method for edge detection,
in 9th ICPR, Rome, 1988 |
| [3] |
P. Salembier, L. Torres, F. Meyer, and Ch. Gu:
Region-based video coding using mathematical morphology,
Proceedings of the IEEE, vol. 83, no. 6, pp. 843-857, June 1995 |
| [4] |
M. Hötter:
Objetorientierte Analyse-Synthese-Codierung basierend auf dem Modell bewegter, zweidimensionaler Objekte,
PhD thesis, 1992, published as VDI-Fortschrittbereicht (Reihe 10, Nr. 217),
VDI-Verlag |
| [5] |
F. Fechter:
Konturgesteuerte Bildmischung durch Bewegungssegmentierung,
Fernseh- und Kino-Technik, vol. 49, no. 11, pp.651-660, November 1995 |
If there are any questions, feel free to
contact me