Pattern Recognition and
Image Processing Group
Institute of Visual Computing and Human-Centered Technology
Former (1990-2021)
PRIP
TU Wien Informatics

Augmented Inpainting of Videos with a Cache (AIV+C)

Type of work:
Bachelor thesis
Author:
Aron Ingruber
Contact:
aingruber(at)prip.tuwien.ac.at

Abstract

Image inpainting refers to the task of recreating missing data in an image by taking into account the remaining information in the image. It can be used to restore old pictures that have been digitized to improve the visual quality of an image or to remove unwanted objects or subjects from an image. Video inpainting can be seen as an extension of image inpainting into a third temporal dimension since videos are essentially images sequences. The goal of image and video inpainting methods is to recreate the missing in a way such that a human observer cannot tell if the image/video has been manipulated or not. In this work a video inpainting method is presented that is able to remove objects or subjects from videos under certain conditions. The object is removed from the first frame and the missing data is replaced using an image inpainting method. The final image is then written into a cache. This cache contains a video mosaic (essentially a panoramic image) that is composed of all previous frames. By detecting and matching SURF (Speeded-Up Robust Features) features, frames can be aligned with previous frames and the missing data in the current frame can be replaced with the data available in the cache. The proposed method is able to achieve high-quality inpainting results under certain conditions. However, it also has some limitations especially with regard to depth information that is not considered.

Implementation details

The proposed method was implemented in Matlab 2016b. The implementation is not optimized (not fully vectorized) and therefore the performance measurements can be improved by either optimizing the Matlab code or implementing the proposed method in a lower level programming language (e.g. C/C++). All tests were run on an Intel i7 4GHz desktop CPU and 16GB of RAM running Windows 10.

Results

Image inpainting results

The Pillar image

An example of a successful inpainting. The reason for this is that the background that shall be inpainting is not very complex and the only edge that crosses the inpainting area is straight.

Image size:
378×504 px
Number of missing pixels:
5,647 px
Processing time:
179 s
Performance:
31.5 px/s
Original
Original
Mask
Mask
Original and mask combined
Original and mask combined
Result
Result

The Street sign 1 image

An example of a mostly successful inpainting with some minor artifacts at the top of the inpainting area. Apart from the artifacts at the top of the inpainting area the inpainting result is visually plausible. The reason for this is that a large part of the inpainting area is rather thin which favors the correct propagation of incoming edges inside of the inpainting area.

Image size:
378×504 px
Number of missing pixels:
3,250 px
Processing time:
89 s
Performance:
36.5 px/s
Original
Original
Mask
Mask
Original and mask combined
Original and mask combined
Result
Result

The Street sign 2 image

An example of an unsuccessful inpainting. The reason why the proposed method fails to achieve a convincing inpainting is that there is not enough reference data for the edge of the pillar. The lower part of the inpainting is worse because the shadow of the street sign changes the color of the pillar and there is no other region of the pillar that is in shadow.

Image size:
378×504 px
Number of missing pixels:
3,704 px
Processing time:
103 s
Performance:
36.0 px/s
Original
Original
Mask
Mask
Original and mask combined
Original and mask combined
Result
Result

The Street sign 3 image

An example of an unsuccessful inpainting. The same resons apply as for The Street sign 2 image.

Image size:
378×504 px
Number of missing pixels:
5,376 px
Processing time:
146 s
Performance:
36.8 px/s
Original
Original
Mask
Mask
Original and mask combined
Original and mask combined
Result
Result

The Billboard image

An example of a mostly successful inpainting. The result looks mostly convincing, however the fence in the middle of the inpainting is not really convincing when taking a closer look.

Image size:
378×504 px
Number of missing pixels:
5,594 px
Processing time:
152 s
Performance:
36.8 px/s
Original
Original
Mask
Mask
Original and mask combined
Original and mask combined
Result
Result

The Street lamp image

An example of a successful inpainting. This image is especially difficult to inpaint in the middle of the image because the inpainting are is highly structured. Despite that the proposed method produces a visually convincing inpainting result.

Image size:
378×504 px
Number of missing pixels:
9,640 px
Processing time:
282 s
Performance:
34.2 px/s
Original
Original
Mask
Mask
Original and mask combined
Original and mask combined
Result
Result

Video inpainting results

Comparision Herling 1

A comparison between the proposed method and the method proposed by Herling [1]. The comparison is interesting because the proposed method is based on Herling’s method and the image inpainting part of the method is almost identical to the image inpainting part of Herling’s method. Because of this it is interesting to see that both methods produce very different results. The reason for this is probably the difference of the mask images. The original mask images were not available and therefore had to be drawn by hand.

Comparision Herling 2

Another comparison between the proposed method and the method proposed by Herling [1]. In this example the difference between the two methods are not as significant as they were in the previous example. Both produce visually plausible results.

The Wall video

An example of a mostly successful inpainting. The hole in the wall is filled up visually plausible and propagated properly. However, towards the end of the video some slight misalignments and a mismatch between the brightness (especially at the right side of the inpainting area) is noticeable. The misalignments are a general problem of the proposed method. The mismatch in brightness is caused by camera settings (exposure, aperture, ISO) that changed automatically during the recording of the video. In general, a mismatch in brightness could be compensated with an automatic gain compensation.

The Cars video

An example of a mostly successful inpainting. All moving objects (the cars) are removed from the video which leaves an almost static background. However, there are three problems that occur in this video:

  • The inpainting done in the first frame is different than the original background. When the actual background gets revealed throughout the video a border becomes visible.
  • The silhouette of the mask of the biggest car is noticeable when it gets closer to the camera. This is most likely caused by slight misalignments at the bottom of the inpainting area.
  • When the mask touches the right border of the video some artifacts (1-3 pixels wide black bars) are visible. The reason for this is the interpolation at the border of the valid data in the cache when warping the cache.

The street sign video

An example of an unsuccessful inpainting. The inpainting in the first frame is alright. However, since this scene contains multiple depth planes, it is not possible to project all frames on a single plane (which the proposed method does). The proposed method may only work for scenes that only have one depth plane or scene where all but one of the depth planes were removed.

The Tram video

An example of a mostly successful inpainting. The tram was removed successfully, however there are some misalignments at the border of the inpainting area. Some of the misalignments are caused by subpixel-misalignments that are a general problem of the proposed method. Other misalignments are caused by moving background objects (the second wagon of the tram) that cause an incorrect estimation of a geometric transformation. The reason for this is that some SURF features used to estimate the geometric transformation are located at the moving objects. These misalignments can be seen very well in front of the second wagon of the tram.

References

  1. Jan Herling. Advanced Real-Time Manipulation of Video Streams. Springer Vieweg, Nov 2014.
Contact: Mail: webmaster(at)prip.tuwien.ac.at | Tel: +43.1.58801.193301 | Fax : no longer available
2014-2020 PRIP, Impressum / Datenschutzerklärung
This page is maintained by Webmaster ( webmaster(at)prip.tuwien.ac.at ) and was last modified on 27. August 2019 09:25