Occlusion Detection and Motion Estimation via Convex Optimization
[摘要] \newpage\appendix \section{Ambient-Lambert model} \label{app-lambert} In this section we show how to go from the assumptions (a)-(c) in sect. \ref{sect-intro} to eq. (\ref{eq-psi-data}).Let the scene $\{S, \rho\}$ be described by shape $S \subset \real^3$ (a collection of piece-wise smooth surfaces) and reflectance $\rho:S: \rightarrow \real^k$ (diffuse albedo). Deviations from diffuse reflectance will not be modeled explicitly and lumped as error (inter-reflection, sub-surface scattering, specular reflection, cast shadows). Coarse illumination changes are modeled as a contrast transformation of the image range, and all other illumination effects are lumped into the additive error. The large number of independent phenomena being aggregated into such an errormake it suitable to be modeled as a Gaussian random process (eq. (\ref{eq-i})-ii). Under these assumptions, the radiance $\rho$ emitted by an area element around a point $p\in S$ is modulated by a monotonic continuous transformation $m$ to yield the irradiance $I$ measured at a pixel element $x$, except for the discrepancy $n:D \rightarrow \real_+^k$, and the correspondence between the point $p\in S$ and the pixel $x\in D$ is due to the motion of the viewer $g\in SE(3)$, the special Euclidean group of rotations and translations in three dimensional (3-D) space: \begin{equation} \begin{cases} I(x,t) = m(t) \circ \rho(p) + n(x,t); ~~~~ p\in S \\ x = \pi(g(t) p); ~~~ x \in \pi(g(t)S) \\ I(x,t) = \nu(x,t) ~~~~ x ~ | ~ g^{-1}(t)\pi^{-1}(x) \notin S \end{cases} \end{equation} where $\pi:\real^3 \rightarrow \real^2; x \mapsto [x_1/x_3, \ x_2/x_3]^T$ is a central perspective projection. Away from the co-visible portion of the scene $S$, the image can take any value $\nu(x,t)$. Without loss of generality, the co-visible portion of the scene $S$ can be parametrized as the graph of a function (depth map), $p(x_0) = \bar x_0 Z(x_0)$, then\cutTwo{ \cite{sundaramoorthiPVS09} show that} the composition of maps \begin{equation} w: D \rightarrow \real^2; \ x_0 \mapsto x =w(x_0) \doteq \pi(g \bar x_0 Z(x_0)) \label{eq-motion-field} \end{equation} spans the entire group of diffeomorphisms. This is the {\em motion field}, which is approximated by the optical flow when assumptions (a)-(c) are satisfied. Here a bar $\bar x \in {\mathbb P}^2$ denotes the homogeneous (projective) coordinates of the point with Euclidean coordinates $x\in \real^2$. Combining the two equations above, we have the two equivalent representations: $I(w(x_0)) = m \circ \rho(x_0) + n(w(x_0)), ~~~ x_0 \in w^{-1}(D \backslash \Omega)$, or\begin{equation} {I(x) =m \circ\rho(w^{-1}(x)) + n(x) ~~~ x \in D \backslash \Omega} \label{eq-lambert-ambient} \end{equation} with a slight abuse of notation since we have parametrized $\rho: S \rightarrow \real^k$ with one of the image planes, via $\rho(x) \leftarrow \rho(p(x))$, and we have re-defined $n(x) \leftarrow n(w^{-1}(x))$. Here $D$ is the domain of the image, and $\Omega$ is the subset of the image where the object of interest is not visible (partial occlusion). It can be shown that $m$ can be eliminated via pre-processing by designing a representation that is a complete invariant statistic, that is a function of the image that is equivalent to it but for the effects of a contrast transformation \cite{morel}. There are several such functionals, including the curvature of the level sets of the image, or its dual (the gradient direction), or a normalization of contrast and offset of the image intensity, or spectral ratios if color images are available. In any case, we indicate this pre-processing via \begin{equation} \phi(I) = \phi(m\circ I). \end{equation} %\subsection{Correspondence} \label{sect-correspondence} Correspondence between two image regions can be established when they back-project onto the same portion of the scene $S$, or when that portion of the scene is {\em co-visible}. Therefore, establishing correspondence means, essentially, finding a scene (a shape $S$ and an albedo $\rho$) that, under proper viewing conditions including a motion $w$ and a contrast transformation $h$, yields a portion of each of the (two or more) images. This can be posed as an optimization problem, which under the assumptions (a)-(b) can be successively reduced into fewer and fewer unknowns: \begin{eqnarray} \arg\min_{m,\rho, g, S} \int_{D \backslash \Omega } | I(x,t) - m\circ\rho\circ \pi(g S)|dx= \text{ (thm.\7.4, \ p. 269 \ of \ \cite{robert})} \nonumber \\= \arg\min_{\rho,w} \int_{D\backslash\Omega} | \phi(I(x,t)) - \phi(\rho\circ w) |dx= \text{ (thm.\1, \ p. 4 \ of \ \cite{soattoY02cvpr})} \nonumber \\= \arg\min_{w} \int_{D \backslash \Omega } | \phi(I(x,t)) - \phi(I(x,t-dt)\circ w) |dx\nonumber \cutTwo{\label{eq-cost2}} \end{eqnarray} Of course, the (possibly multiply-connected) region $\Omega$ is also unknown, and can be represented via its characteristic function: \begin{equation} e_1(x) = \chi(\Omega) \end{equation} where $\chi:D \rightarrow \real^+$ is such that $\chi(x) = 1$ if $x\in \Omega$, and $\chi(x) = 0$ elsewhere.To ease the notational burden, we will assume that contrast has been eliminated via pre-processing, and drop the use of the function $\phi$, so we re-define $I \leftarrow \phi(I)$. Writing explicitly the dependency of the ``next image'' on the occlusion domain, we have \begin{equation} \arg\min_w \int_{D\backslash \Omega} | I(x,t+dt) - I(w(x,t),t) |^2 dx \end{equation} which is the ${\mathbb L}^2$ component of $\psi_{\rm data}$ in (\ref{eq-psi-data}). We tackle the classical problem of simultaneously detecting occlusions and estimating optical flow.We show that, under standard assumptions of Lambertian reflection and static illumination, the task canbe posed as a convex optimization problem. Therefore, the solution, computed using efficient algorithms,is guaranteed to be unique and global ly optimal, for any number of independently moving ob jects, andany number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded toenable evaluation of occlusion detection performance, in addition to motion estimation. We also discussthe shortcomings and limitations of our approach.
[发布日期] [发布机构] UCLA Henry Samueli School of Engineering and Applied Science
[效力级别] [学科分类] 计算机科学(综合)
[关键词] [时效性]