Detection of swapped views in S3D movies

Introduction

Channel mismatch (the result of swapping left and right views) is a 3D video artifact that can cause major viewer discomfort. This artifact is hard to detect, but can be easily eliminated just by the swap of the views. We have developed a high accuracy neural network method for detecting scenes with channel mismatch.

“Stalingrad” movie (2013): inverted depth because of swapped views

Proposed method

Our channel-mismatch detection algorithm is based on five features. We use perspective, disparity distribution, binocular half-occlusion and motion-occlusion criteria alongside the result obtained by the neural network.

These features require the disparity or optical flow map. To obtain them we use a fast local block matching algorithm with posterior filtering based on LRC and image gradient confidence (uniform areas are considered to have low confidence).

The network accepts the left view and disparity map as input. The architecture is a convolutional encoder which is constructed of 5 blocks of convolution and max-pooling layers with two fully-connected layers of size 512 and 2. The network output is the probability of swapped views.

The neural network's architecture

The algorithm consists of the following steps:

We trained the convolutional network for 180.000 iterations on a dataset of 113.000 samples of the left view and disparity map from 7 movies with randomly swapped views. The logistic regression model was also trained on 30.000 samples from these films.

Experiments

The results of channel mismatch detection in S3D movies are listed below. The analysis showed the swapped views occurred even in high-budget movies (e.g. Avatar).

How often channel mismatch occurs

Movie name Release year Budget, $M Scenes with CM Total CM duration, sec Movie duration, sec CM percentage
The Child’s Eye 2010 $4.5 15 57.5 5823 0.9875%
The Nutcracker in 3D 2010 $90 9 28,9 6480 0,447%
3D Sex and Zen: Extreme Ecstasy 2011 $2.5 9 23,1 6775 0,341%
Spy Kids 3D — Game Over 2003 $39 5 10,3 5063 0,203%
Sharks 3D 2004 $5 1 8,9 3073 0,290%
Avatar 2009 $237 1 3,3 9702 0,034%

Comparison with other algorithms

The addition of a neural network feature shows great quality improvement in comparison with previously developed algorithms.

Algorithm AUC ROC Accuracy F-measure
Proposed method [1] 0,9963 0.9784 0.9789
Bokov et al. [2] 0,957 0.8946 0.8928
Shestov et al. [3] 0,901 0.8378 0.8409

Instead of conclusion

Channel mismatch perceptibility

A subjective study of channel mismatch perceptibility was conducted.

We composed a test sequence of 56 scenes with swapped views. In addition, the sequence included scenes preceding and following each scene. 59 people took part in the experiment, each participant rated the perceptibility of swapped views for every scene from 1 to 5.

The following two graphs show the dependence between channel mismatch perceptibility and release date in the first case, and movie budget in the second one.

Due to a large number of movies without channel mismatch, the situation is improving according to the trends, but the “outliers” (even in 2D-3D conversion) still occur.

Scenes with swapped views are more common for low budget movies.

Publications

[1] Sergey Lavrushkin and Dmitriy Vatolin, “Channel-mismatch detection algorithm for stereoscopic video using convolutional neural network,” in Proceedings of the 2018 3DTV Conference: The True Vision — Capture, Transmission and Display of 3D Video (3DTV-CON), 2018, pp. 1–4.

[2] Alexander Bokov, Sergey Lavrushkin, Mikhail Erofeev, Dmitriy Vatolin, and Alexey Fedorov, “Toward fully automatic channel-mismatch detection and discomfort prediction for S3D video,” in 2016 International Conference on 3D Imaging (IC3D). IEEE, 2016, pp. 1–7.

[3] Alexey Shestov, Alexander Voronov, and Dmitriy Vatolin, “Detection of swapped views in stereo image,” in 22nd GraphiCon International Conference on Computer Graphics and Vision, 2012, pp. 23–27.

28 May 2019
See Also
Call for HEVC codecs 2019
Fourteen modern video codec comparison
Parallax range estimation in S3D video
Parallax determines the depth of S3D movies. The range of parallaxes should be both comfortable and entertaining for spectators.
Geometric distortions analysis and correction
Production of low-budget movies is prone to errors. Our method automatically corrects rotation and scale mismatch.
Automatic detection of artifacts in converted S3D videos
Our set of algorithms detects edge sharpness mismatch, cardboard effect, and crosstalk noticeability.
Temporal shift estimation for stereoscopic videos
How to take into account geometric distortions in the estimation of the temporal shift?
Neural network-based algorithm for classification of stereoscopic video by the production method
What method was used to create the 3D scene?
Site structure