CSC2515 :: Machine Learning :: Final Project

On Learning Surface Light Fields

You can download a pdf version of the paper here.



GMM estimate image for scene with high-frequency lighting Ground truth image for scene with high-frequency lighting VMBF estimated color values for scene with high-frequency lighting
Table 1: Estimated color values for GBF (left) and VMBF (right) and correct image for a scene with high-frequency effects. The average error rates over all vertices are 0.0302 for the GBF and 0.0322 for the VMBF.

Introduction

The five-dimensional surface light field function [Miller 1998] represents the exitant radiance for all surface points in a scene. Its value incorporates all light transport effects such as shadows, reflection and refraction. In the past, tabulated or re-sampled color values [Miller 1998] [Wood 2000] have been used to approximate the surface light field function and the exitant radiance for new views has been interpolated from the stored values.
We propose the use of a radial basis function (RBF) network to approximate the surface light field. The parameters of the RBF network are obtained by supervised learning. At runtime these parameters are used to generate the exitant radiance for surface points for previously unknown view directions. The von Mises function [Arnold 1941] is employed as basis function for the RBF network. It is particularly well suited for the given problem as it is defined on the surface of the sphere which is the domain of the surface light field function. In the past, Gaussian basis functions have been used to approximate view depend functions [Green 2006] but such functions defined in Cartesian space suffer from the artifacts well known from cartography when mapping the surface of the earth onto a planar map; e.g. the distortion near the poles and the loss of the periodicity on the sphere.
In this work, we implemented an radial basis function network with von Mises basis functions (VMBF) to evaluate the suitability of this architecture for approximating surface light fields. An RBF network with Gaussian basis functions (GBF) has been implemented for comparison purpose. Our results show that, for a small number of basis functions, the VMBF outperforms the GBF but for 16 and more basis functions the GBF is superior. However, our results are limited in that only two scenes have been examined and, therefore, general conclusions are hardly possible.

Related Work

Related work to our approach can be found in the Machine Learning and the Computer Graphics literature.
In Computer Graphics, surface light fields, which are an image-based rendering techniques (IBR), and precomputed radiance transfer (PRT) are similar to our approach. Surface light fields have been introduced by Miller et al. [Miller 1998] who adapted the lumigraph [Gortler 1996] but only represented the light field on the surface of the scene objects. Wood et al. [Wood 2000] extended this approach and used least-squares optimal lumispheres, subdivided octahedron whose vertices are used as directional samples, on a dense but discrete set of surface sample points to approximate the surface light field. The color values at the sample points have been obtained using learning techniques and compressions schemes such as function quantization and principal function analysis; generalizations of vector quantization and principal component analysis, respectively; are used to obtain the necessary reduction of the storage size. Zickler et al. [Zickler 2005] trained a combination of lower-order polynomials and locally weighted regression to approximate a surface light field. Their model uses every visible surface point as training sample but they reported that, in practice, only a subset of these points is used to reduce the computational complexity. This is similar to our work.
PRT differs from these techniques and our approach in that transferred and not exitant radiance is stored. This permits to change the lighting environment at runtime but, in contrast to our work, prohibits local light sources. Most PRT techniques approximate transferred radiance by projecting the information into a set of basis functions so that only the coefficients have to be stored, which reduces the memory requirements significantly. A similarity with our work is the use of spherical basis functions, e.g. Spherical Harmonics [MacRobert 1948]. In fact, this work motivated us to explore the VMBF to approximate surface light fields. Recently, Green et al. [Green 2006] used machine learning techniques to approximate transferred radiance for high-frequency, view-dependent effects such as specular highlights. They employed a Gaussian mixture model which, compared to previous techniques, reduced the necessary number of coefficients significantly. In this work a semi-parametric model is used, training one RBF network for each view direction of each vertex. Our approach is similar to [Green 2006] in that the same RBF architecture is employed but differs from their work as we train only one RBF network per vertex, each approximating the information for all view directions.
Besides the vast amount of work on radial basis functions in general (see [Bishop 1995] for a good introduction to RBF networks), there is only little related work on problems defined on the surface of the sphere in the Machine Learning community. The most relevant one is those by Jenison and Fissel [Jenison 1995] [Jenison 1996]. In [Jenison 1995] the superiority of the VMBF over the GBF for the problem examined in this paper is demonstrated. [Jenison 1996] applies the VMBF to an regression problem defined on the surface of the sphere. 


Model Specification

Approximating the surface light field is a regression problem. In this section the proposed model to learn this function is introduced.

Von Mised Basis Functions

The VMBF is based on the Mises-Arnold-Fisher distribution [Arnold 1941], a spherical probability density function (pdf) which is the equivalent to the normal distribution [Fisher 1987] on the surface of the unit sphere. The normalization factor of the pdf can be absorbed into the weights of the RBF network ([Bishop 1995], p. 168) which yields the following basis function

Von Mised basis function

where \bgroup\color{blue}$\theta$\egroup and \bgroup\color{blue}$\phi$\egroup are the function arguments in spherical polar coordinates for azimuth and elevation, respectively, \bgroup\color{blue}$\alpha$\egroup and \bgroup\color{blue}$\beta$\egroup determine the center of the function on the surface of the unit sphere and \bgroup\color{blue}$\kappa$\egroup is a concentration parameter which determines the function width. Using such a function defined on the sphere is desirable as this naturally enforces the periodicity of the spherical domain and the singularity at the poles. A detailed introduction and a discussion of the properties of this function can be found in [Fisher 1987].

Gaussian Basis Functions

The well known Gaussian basis function is used for comparison purpose. First, because it is the standard basis functions for RBF networks ([Bishop 1995], p. 165) and second because it has been used in related work [Green 2006]. As for the VMBF, the normalization factor of the GBF can be absorbed into the weights of the RBF network. Following [Green 2006] we use a spherical covariance matrix which yields

\begin{displaymath}\bgroup\color{blue}
G\left(x,\mu,\sigma\right)=\exp\left(-\fr...
...a\cdot\left(x- \mu\right)^{T}\left(x-\mu\right)\right),
\egroup\end{displaymath}

where \bgroup\color{blue}$x = (\theta,\phi)^T$\egroupis the argument of the function, \bgroup\color{blue}$\mu$\egroup is the two-dimensional center of the Gaussian and \bgroup\color{blue}$\sigma$\egroup is its variance.

Radial Basis Function Network

The Radial Basis Function network is given by
\begin{displaymath}\bgroup\color{blue}
\hat{y}\left(x\right) = \sum_{k=0}^K \omega_k \cdot h_k\left(x\right)
\egroup\end{displaymath}

where \bgroup\color{blue}$h_k$\egroup is the \bgroup\color{blue}$k$\egroup-th basis function and \bgroup\color{blue}$\omega_k$\egroup is its weight. All basis functions are considered as independent, i.e. no parameters are shared. This architecture can also be interpreted as a generalized linear model.


Methodology

The surface light field is a five-dimensional function; three dimensions determine a position in space and two the direction of the exitant radiance in spherical polar coordinates. Following previous work [Wood 2000], we tabulate the spatial domain at each vertex and learn the view-dependent exitant radiance for each surface sample using an RBF network. For the VMBF, this yields a model with \bgroup\color{blue}$s=\sum_{i=0}^N \left( (3 + 1) \cdot K_i \right)$\egroupparameter, where the first term in the sum models the number of parameters for the non-normalized von Mises basis functions and the weights for each of those, \bgroup\color{blue}$K$\egroup is the number of basis functions employed for sample \bgroup\color{blue}$i$\egroup and \bgroup\color{blue}$N$\egroup is the total number of vertices. The tabulated surface samples are treated as independent which simplifies the training significantly because in this case only \bgroup\color{blue}$(3 + 1) \cdot K_i$\egroupparameters have to be optimized simultaneously. However, each optimization still remains non-trivial as determining the parameter for an RBF network is a nonlinear and non-convex problem ([Hastie 2001], page 36).
We employ numerical optimization to find parameters which minimize the sum of squared errors, the error function used in this work. Weight decay is used to regularize the optimization and avoid overfitting ([Hastie 2001], p. 356). The objective function is therefore given by

\begin{displaymath}\bgroup\color{blue}
\Phi= \sum_{n=1}^N \left(\hat{y} - y \right)^2 + \lambda \cdot \sum_{i=1}^Kw_k^2
\egroup\end{displaymath}

where \bgroup\color{blue}$\hat{y}$\egroup is the estimated function value and \bgroup\color{blue}$y$\egroup is its true value. The second term is the weight decay, where \bgroup\color{blue}$\lambda$\egroup is a tuning parameter. The training set consists of color value-view direction pairs. Due to the periodicity of the sphere, no constraints on \bgroup\color{blue}$\alpha$\egroup, \bgroup\color{blue}$\beta$\egroupand \bgroup\color{blue}$\kappa$\egroupare necessary for the VMBF. For the GBF, the variance has to be positive. We enforce this by computing the gradient w.r.t. \bgroup\color{blue}$\log\left(\sigma\right)$\egroupin the learning phase.
The gradients of the objective function, which are necessary for the numerical optimization, can be derived using the chain rule and found in [Jenison 1996] for the VMBF and in text books for statistics for the GBF.


Implementation

The learning procedure described in the previous section has been implemented in Matlab and C++. The training data has been generated using Mental Ray, a commercial high-quality renderer. The particular choice has been made because the Mental Ray integration in Maya allows to directly store vertex colors from pre-rendered scenes. We also experimented with generating images and extracting the color values using image processing techniques. However, in our tests this approach was highly error prone. The view directions are randomly drawn from the set of all possible view directions, for our test scenes in the upper hemisphere above the scene origin. Stratified sampling has been used to reduce the variance of the sample directions [Veach 1997]. 
One of the main issues which arose during the implementation were the long optimization times. To reduce these, only monochrome surface light fields have been examined this report. In general, different wavelength of light can be considered as independent, so that the results obtained can be generalized for images with multiple color channels. Other strategies to reduce the amount of computations are discussed in the next section.
The vertex color for a new images can be obtained by evaluating the RBF architecture for a particular view direction  \bgroup\color{blue}$(\theta_0,\phi_0)$\egroup with the parameters found in the training phase. Given the estimated vertex color values, the model can be rendered using standard renderer. Our current implementation uses Maya for this.


Results


Increasing \bgroup\color{blue}$V$\egroup reduces the test error rate but the decrease shows an asymptotic behavior, so that, beyond a certain point, the error can not be lowered significantly by using more training data. This makes it possible to bound the training set size without degrading the quality. For most problems also an optimum for the model complexity, this is the number of basis functions for an RBF network, exists. Next to these two parameter, also an optimal value for the weight decay tuning parameter \bgroup\color{blue}$\lambda$\egroup had to be found.
A particular difficulty was that different optimal values \bgroup\color{blue}$K_i$\egroup, \bgroup\color{blue}$V_i$\egroupand \bgroup\color{blue}$\lambda_i$\egroupmight exist for each vertex; so essentially we were faced with \bgroup\color{blue}$N$\egroup regression problems instead of one. However, using \bgroup\color{blue}$N$\egroup different sets of optimal parameters is not tractable. First, because a full cross-validation over the three dimensional parameter space would be necessary for each vertex; this was prevented by the long optimization times; and second, because using different parameters would prohibit an efficient implementation of the re-rendering.
Next to the already mentioned restriction to monochrome images, further simplification were necessary to reduce the computational complexity and obtain the results of the exploration in a reasonable time. We used only a randomly generated subset of the vertices for the experiments to achieve this.
The test scenes employed in our experiments contained 1689 (Table 1) and 190 vertices (Table 4); 16 and 17 vertices have been used for the exploration of the hyperparameter space, respectively. The three-dimensional search space was spanned by  \bgroup\color{blue}$K = \{ 1, 2, 4, 8, 16, 32\}$\egroup \bgroup\color{blue}$\lambda = \{0.05, 0.15, 0.25\}$\egroupand  \bgroup\color{blue}$V = \{ 200, 500, 750, 1000\}$\egroup.
The sum of the error rates over all vertices on a test set of size 200 for the VMBF and the GBF are shown in Table 2 and Table 3. It can be seen that all graphs show an asymptotic behavior for increasing values of \bgroup\color{blue}$K$\egroup. In Table 3, this behavior is more significant whereas the error rates in Table 2 decrease over the whole range of \bgroup\color{blue}$K$\egroup. Unfortunately, no optimum for \bgroup\color{blue}$K$\egroup exist. For the VMBF, values of 8 or 16 for \bgroup\color{blue}$K$\egroup are sufficient as the error does not decrease substantially for higher values; however, for the GBF such a close-to-optimal value of \bgroup\color{blue}$K$\egroup does not exist. Table 2 and Table 3 also show that, for different values of the ridge regression tuning parameter \bgroup\color{blue}$\lambda$\egroup, the error rates do not differ significantly. In contrast to this, for both basis functions the error rates for 200 views are higher than those for 500. For the VMBF, a training set size of 500 is sufficient to achieve an almost optimal error rate. This also holds for the GBF for the scene with 1689 vertices (Table 3). For the results reported in Table 2, this close-to-optimal value is reached for 750 views for the GBF. The graphs in Table 2 show that the error rates for 1000 views are higher than those for 750 views. We assume the reason for this is the non-convexity of the optimization.
In both scenes, for \bgroup\color{blue}$K>8$\egroup the GBF achieves lower error than the VMBF. For \bgroup\color{blue}$K=8$\egroup the error rates are roughly equivalent and for lower values of \bgroup\color{blue}$K$\egroup the VMBF outperforms the GBF. The error rates for \bgroup\color{blue}$K=1$\egroup are not reported in Table 2 and Table 3 but here, the error rates for the GBF are an order of magnitude higher than those for the VMBF. An interesting observation is that, for the VMBF, the vertices clearly form clusters w.r.t. to the error rates; a behavior which does not exist for the GBF. A comparison of the error rates of the two scenes shows that those reported in Table 3 are significantly below those in Table 2. This is surprising as both scene have almost the same materials and lighting environment; the only main difference is the spatial resolution of the sample points. In this section, the results obtained during the exploration of the hyperparameter space are discussed. The goal of the experiments was to find a number of basis functions \bgroup\color{blue}$K$\egroup and the training set size \bgroup\color{blue}$V$\egroup which provide efficient pre-computation as well as accurate re-rendering. For an efficient pre-computation, both, \bgroup\color{blue}$K$\egroup and \bgroup\color{blue}$V$\egroup, should be as small as possible. However, this conflicts with the desired accuracy.
Using the results obtained through the exploration of the hyperparameter space, we trained an RBF networks with the VMBF and the GBF with a training set of size 500, \bgroup\color{blue}$K=8$\egroup and \bgroup\color{blue}$\lambda=0.1$\egroup for both of our test scenes. The parameters have been chosen based on the optimal values for the VMBF. Here, we decided to use \bgroup\color{blue}$K=8$\egroup instead of \bgroup\color{blue}$K=16$\egroup because the error rates do not differ significantly but the optimization time is much higher for \bgroup\color{blue}$K=16$\egroup. In terms of re-rendering, \bgroup\color{blue}$K=8$\egroup is also a better choice for the experiments as it is an upper limit for an efficient implementation of the re-rendering on the GPU.
The results for one view of the test set for the scene with 1690 vertices can be seen in the images in Table 1. For the chosen parameter values, both basis functions are not capable of capturing the hard shadow. Even there is a visual difference between the two approximated images, e.g. in the uniformity of the shadows, no one clearly performs better. This observation is supported by the error rates, which are roughly equivalent for both basis functions. The same results have been obtained for other re-renderings of the same scene as well as for re-renderings of the scene with 190 vertices.

GBF Ground Truth VMBF
Table 4: Estimated color values for the GBF (left) and the VMBF (right) and     correct image. The average error rates over all vertices are 0.0097 for the GBF and 0.0087 for the VMBF.


Limitations and Future Work

Our work is limited in many respects. We want to address this in future work. First of all, the experiments described in the previous section have to be repeated on much large data sets and scenes with more complicated lighting environments. One question is how much the optimal values for the number of basis functions and the necessary training set size depend on the scene characteristics and if these can be estimated efficiently, for example based on the variance of the color values in the training set.
A crucial prerequisite for performing more experiments is to investigate more efficient ways to obtain the optimal parameters. As already mentioned, the convergence rate of the numerical minimizer we used so far is not optimal, especially for the VMBF. Here, it has to been explored if an adaption of a minimization technique to a function defined on the sphere could provide a better performance. An alternative would be to employ the EM algorithm for our problem. Originally, we discarded this possibility as the literature does not provide a consistent view if this could lead to improved performance [Ungar 1995] [Orr 1998] but given the current situation, we might consider this option again. In [Zickler 2005] different other efficient techniques for obtaining the parameter of an RBF are mentioned. It might be worth to explore these. Another idea is to implement the optimization process on a graphics processing units (GPU). The results of Hillesland et al. [Hillesland 2003] and the advance in GPU technology suggest a speedup up to a factor of 20. The simplicity of the RBF network, given the number of basis functions is constant and sufficiently small, suggests to also implement the re-rendering on the GPU. Here, we expect real-time performance even for complex scenes.
In the longer term more general questions regarding our approach can be addressed. One is the investigation of other spherical basis functions such as Spherical Wavelets [Schröder 1995] or splines defined on the sphere [Wahba 1981]. Improvements might also be possible when relaxing the assumption that the vertices are independent. For example instead of computing the error per sample point one could compute the error for a neighborhood in a rendered image. In [Green 2006] it is denoted that interpolating Gaussians between the vertices instead of the final colors leads to a significantly improved visual appearance, similar to the difference between Gouraud and Phong Shading. We think this can be adapted for our technique.
Another area of future work can be to move beyond the tabulation of the spatial domain. Learning the whole five-dimensional surface light field function function would be one possibility. Interesting would also to employ the VMBF to learn the transfer function and compare the results to [Green 2006].
Further work is also necessary to understand why we cannot see the superiority of the VMBF over the GBF reported in [Jenison 1995].


Conclusion

We presented a learning technique for the surface light field function based on radial basis function networks. We showed that Gaussian basis functions outperform the von Mises basis function for high numbers of basis functions \bgroup\color{blue}$K$\egroup but both perform equally well for \bgroup\color{blue}$K=8$\egroup, which has the most relevance in practice. Therefore, our particular spherical basis function does not provide an advantage over basis functions defined in cartesian space for approximating surface light fields. Unfortunately, both, VMBF and GBF, are not capable to reconstruct high frequency effects such as the sharp shadow.
Compared to existing techniques for surface light fields, our approach has the advantage of a low memory consumption without additional compression techniques. However, this comes to the prize that we do not achieve the quality of previous techniques, such as [Wood 2000], for high-frequency effects. The current approach is mainly limited by the training time for all vertices of a scene and due to the limited amount of data used for this report, no general applicability of our results can be assumed. 


Figures


Average Error Rates for GBF during hyperparameter exploration Average Error Rates for VMBF during hyperparameter exploration
Table 2:  Log average squared error for the VMBF (right) and the GBF (left) for a scene with 190 vertices (Table 4) and training sets with 200 (red), 500 (green), 750 (blue) and 1000 (magenta) views. The different values of lambda; 0.05, 0.15 and 0.25; are reported as solid, dashed and dash-dotted lines, respectively.

Average Error Rates for GBF during hyperparameter exploration Average Error Rates for VMBF during hyperparameter exploration
Table 3: Log average squared error for the VMBF (right) and the GBF (left) for a scene with 1689 vertices (Table 1) and training sets with 200 (red), 500 (green), 750 (blue) and 1000 (magenta) views. The different values of lambda; 0.05, 0.15 and 0.25; are reported as solid, dashed and dash-dotted lines, respectively.

Here, we present some additional graphs which could not be included in pdf version of the report.

VMBF performance for different training set sizes and different number of basis functions
Figure 1: Performance of RBF with VMBF for all samples of the hyperparameterspace exploration. Clearly visible are the clusters of vertices.

Comparision VMBF / GBF for scene estimate
Figure 2: Spatial distribution of the test error in the scene shown in Table 2. The region with the high error is the ground plane where the color value changes because of shadows.

Perspective Orthographic
Figure 4: Estimated color value and ground truth for a cube vertex of the scene in table 1. The fall off of the surface for increasing color values is caused by the cosine-term of the shading model.

Perspective Orthographic
Figure 5: Estimated color value and ground truth for a ground plane vertex of the scene in Figure 1. The fall off of the surface for increasing color values is caused by the cosine-term in the shading model. The valley is caused by the moving light so that a shadow occurs only for a certain view direction.

VMBF
Figure 3: Ground truth and estimated color value for the training data for one vertex in the ground plane in Figure 1.


References:
Arnold 1941
K. J. Arnold, On spherical probability distributions, PhD Thesis (unpublished), Massachusetts Institute of Technology, 1941
Bishop 1995
C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995
Fisher 1987
N. I. Fisher, Statistical analysis of spherical data, Cambridge University Press, 1987
Gortler 1996
S. J. Gortler, R. Grzeszczuk, R. Szeliski, M. Cohen, The Lumigraph, SIGGRAPH 1996, August 1996
Green 2006
P. Green, J. Kautz, W. Matusik, F. Durand, View-Dependent Precomputed Light Transport Using Nonlinear Gaussian Function Approximations, Proceedings of ACM 2006 Symposium in Interactive 3D Graphics and Games, 2006
Hastie 2001
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer-Verlag, 2001
Hillesland 2003
K. E. Hillesland, S. Molinov, R.Grzeszczuk, Nonlinear optimization framework for image-based modeling on programmable graphics hardware, SIGGRAPH 2003, August 2003
Jenison 1995
R. Jenison, K. Fissell, A Comparison of the von Mises and Gaussian Basis Function for Approximating Spherical Acoustic Scatter, IEEE Transactions on Neural Networks, 6, 5, 1995, pp. 1284-1287
Jenison 1996
Rick L. Jenison and Kate Fissell, A Spherical Basis Function Neural Network for Modeling Auditory Space, Neural Computation, Vol. 8, Issue 1, pp. 115-128, January 1996
MacRobert 1948
T. MacRobert, Spherical harmonics; an elementary treatise on harmonic functions, with applications, Dover Publications, 1948
Miller 1998
G. Miller, S. Rubin, and D. Ponceleon, Lazy Decompression of Surface Light Fields for Precomputed Global Illumination, Eurographics Workshop on Rendering 1998, June 1998
Orr 1998
M. J. L. Orr, An EM algorithm for regularized RBF networks, International Conference on Neural Network and Brain, 1998
Schröder 1995
Peter Schröder, Wim Sweldens, Spherical wavelets: efficiently representing functions on the sphere, SIGGRAPH 1995, September 1995
Ungar 1995
L. Ungar, R. De Veaux, EMRBF: A statistical basis for using radial basis function for control, Proceedings of ACC 95, 1995
Veach 1997
Eric Veach, Robust Monte Carlo Methods for Light Transport Simulation, PhD dissertation, Stanford University, December 1997
Wahba 1981
G. Wahba, Spline interpolation and smoothing on the sphere, SIAM Journal on Scientific and Statistical Computing, 2, 5-16, 1981
Wood 2000
D. N. Wood, D. I. Azuma, K. Aldinger, B. Curless, T. Duchamp, D. H. Salesin, W. Stuetzle, Surface light fields for 3D photography, SIGGRAPH 2000, July 2000
Zickler 2005
T. Zickler, S. Enrique, R. Ramamoorthi, P. Belhumeur, Reflectance Sharing: Image-based Rendering from a Sparse Set of Images, Eurographics Symposium on Rendering. 2005, June 2005

lessig (at) dgp.toronto.edu 2005/12/22