On Learning Surface Light Fields

The five-dimensional surface light field function [Miller 1998] represents the exitant radiance for all surface points in a scene. Its value incorporates all light transport effects such as shadows, reflection and refraction. In the past, tabulated or re-sampled color values [Miller 1998] [Wood 2000] have been used to approximate the surface light field function and the exitant radiance for new views has been interpolated from the stored values.
We propose the use of a radial basis function (RBF) network to approximate the surface light field. The parameters of the RBF network are obtained by supervised learning. At runtime these parameters are used to generate the exitant radiance for surface points for previously unknown view directions. The von Mises function [Arnold 1941] is employed as basis function for the RBF network. It is particularly well suited for the given problem as it is defined on the surface of the sphere which is the domain of the surface light field function. In the past, Gaussian basis functions have been used to approximate view depend functions [Green 2006] but such functions defined in Cartesian space suffer from the artifacts well known from cartography when mapping the surface of the earth onto a planar map; e.g. the distortion near the poles and the loss of the periodicity on the sphere.
In this work, we implemented an radial basis function network with von Mises basis functions (VMBF) to evaluate the suitability of this architecture for approximating surface light fields. An RBF network with Gaussian basis functions (GBF) has been implemented for comparison purpose. Our results show that, for a small number of basis functions, the VMBF outperforms the GBF but for 16 and more basis functions the GBF is superior. However, our results are limited in that only two scenes have been examined and, therefore, general conclusions are hardly possible.

Related Work

Related work to our approach can be found in the Machine Learning and the Computer Graphics literature.
In Computer Graphics, surface light fields, which are an image-based rendering techniques (IBR), and precomputed radiance transfer (PRT) are similar to our approach. Surface light fields have been introduced by Miller et al. [Miller 1998] who adapted the lumigraph [Gortler 1996] but only represented the light field on the surface of the scene objects. Wood et al. [Wood 2000] extended this approach and used least-squares optimal lumispheres, subdivided octahedron whose vertices are used as directional samples, on a dense but discrete set of surface sample points to approximate the surface light field. The color values at the sample points have been obtained using learning techniques and compressions schemes such as function quantization and principal function analysis; generalizations of vector quantization and principal component analysis, respectively; are used to obtain the necessary reduction of the storage size. Zickler et al. [Zickler 2005] trained a combination of lower-order polynomials and locally weighted regression to approximate a surface light field. Their model uses every visible surface point as training sample but they reported that, in practice, only a subset of these points is used to reduce the computational complexity. This is similar to our work.
PRT differs from these techniques and our approach in that transferred and not exitant radiance is stored. This permits to change the lighting environment at runtime but, in contrast to our work, prohibits local light sources. Most PRT techniques approximate transferred radiance by projecting the information into a set of basis functions so that only the coefficients have to be stored, which reduces the memory requirements significantly. A similarity with our work is the use of spherical basis functions, e.g. Spherical Harmonics [MacRobert 1948]. In fact, this work motivated us to explore the VMBF to approximate surface light fields. Recently, Green et al. [Green 2006] used machine learning techniques to approximate transferred radiance for high-frequency, view-dependent effects such as specular highlights. They employed a Gaussian mixture model which, compared to previous techniques, reduced the necessary number of coefficients significantly. In this work a semi-parametric model is used, training one RBF network for each view direction of each vertex. Our approach is similar to [Green 2006] in that the same RBF architecture is employed but differs from their work as we train only one RBF network per vertex, each approximating the information for all view directions.
Besides the vast amount of work on radial basis functions in general (see [Bishop 1995] for a good introduction to RBF networks), there is only little related work on problems defined on the surface of the sphere in the Machine Learning community. The most relevant one is those by Jenison and Fissel [Jenison 1995] [Jenison 1996]. In [Jenison 1995] the superiority of the VMBF over the GBF for the problem examined in this paper is demonstrated. [Jenison 1996] applies the VMBF to an regression problem defined on the surface of the sphere.

Model Specification

Von Mised Basis Functions

The VMBF is based on the Mises-Arnold-Fisher distribution [Arnold 1941], a spherical probability density function (pdf) which is the equivalent to the normal distribution [Fisher 1987] on the surface of the unit sphere. The normalization factor of the pdf can be absorbed into the weights of the RBF network ([Bishop 1995], p. 168) which yields the following basis function

where $\bgroup\color{blue}$\theta$\egroup$ and $\bgroup\color{blue}$\phi$\egroup$ are the function arguments in spherical polar coordinates for azimuth and elevation, respectively, $\bgroup\color{blue}$\alpha$\egroup$ and $\bgroup\color{blue}$\beta$\egroup$ determine the center of the function on the surface of the unit sphere and $\bgroup\color{blue}$\kappa$\egroup$ is a concentration parameter which determines the function width. Using such a function defined on the sphere is desirable as this naturally enforces the periodicity of the spherical domain and the singularity at the poles. A detailed introduction and a discussion of the properties of this function can be found in [Fisher 1987].

Gaussian Basis Functions

The well known Gaussian basis function is used for comparison purpose. First, because it is the standard basis functions for RBF networks ([Bishop 1995], p. 165) and second because it has been used in related work [Green 2006]. As for the VMBF, the normalization factor of the GBF can be absorbed into the weights of the RBF network. Following [Green 2006] we use a spherical covariance matrix which yields

where $\bgroup\color{blue}$x = (\theta,\phi)^T$\egroup$ is the argument of the function, $\bgroup\color{blue}$\mu$\egroup$ is the two-dimensional center of the Gaussian and $\bgroup\color{blue}$\sigma$\egroup$ is its variance.

Radial Basis Function Network

where $\bgroup\color{blue}$h_k$\egroup$ is the $\bgroup\color{blue}$k$\egroup$ -th basis function and $\bgroup\color{blue}$\omega_k$\egroup$ is its weight. All basis functions are considered as independent, i.e. no parameters are shared. This architecture can also be interpreted as a generalized linear model.

Methodology

The surface light field is a five-dimensional function; three dimensions determine a position in space and two the direction of the exitant radiance in spherical polar coordinates. Following previous work [Wood 2000], we tabulate the spatial domain at each vertex and learn the view-dependent exitant radiance for each surface sample using an RBF network. For the VMBF, this yields a model with $\bgroup\color{blue}$s=\sum_{i=0}^N \left( (3 + 1) \cdot K_i \right)$\egroup$ parameter, where the first term in the sum models the number of parameters for the non-normalized von Mises basis functions and the weights for each of those, $\bgroup\color{blue}$K$\egroup$ is the number of basis functions employed for sample $\bgroup\color{blue}$i$\egroup$ and $\bgroup\color{blue}$N$\egroup$ is the total number of vertices. The tabulated surface samples are treated as independent which simplifies the training significantly because in this case only $\bgroup\color{blue}$(3 + 1) \cdot K_i$\egroup$ parameters have to be optimized simultaneously. However, each optimization still remains non-trivial as determining the parameter for an RBF network is a nonlinear and non-convex problem ([Hastie 2001], page 36).
We employ numerical optimization to find parameters which minimize the sum of squared errors, the error function used in this work. Weight decay is used to regularize the optimization and avoid overfitting ([Hastie 2001], p. 356). The objective function is therefore given by

where $\bgroup\color{blue}$\hat{y}$\egroup$ is the estimated function value and $\bgroup\color{blue}$y$\egroup$ is its true value. The second term is the weight decay, where $\bgroup\color{blue}$\lambda$\egroup$ is a tuning parameter. The training set consists of color value-view direction pairs. Due to the periodicity of the sphere, no constraints on $\bgroup\color{blue}$\alpha$\egroup$ , $\bgroup\color{blue}$\beta$\egroup$ and $\bgroup\color{blue}$\kappa$\egroup$ are necessary for the VMBF. For the GBF, the variance has to be positive. We enforce this by computing the gradient w.r.t. $\bgroup\color{blue}$\log\left(\sigma\right)$\egroup$ in the learning phase.
The gradients of the objective function, which are necessary for the numerical optimization, can be derived using the chain rule and found in [Jenison 1996] for the VMBF and in text books for statistics for the GBF.

Implementation

The learning procedure described in the previous section has been implemented in Matlab and C++. The training data has been generated using Mental Ray, a commercial high-quality renderer. The particular choice has been made because the Mental Ray integration in Maya allows to directly store vertex colors from pre-rendered scenes. We also experimented with generating images and extracting the color values using image processing techniques. However, in our tests this approach was highly error prone. The view directions are randomly drawn from the set of all possible view directions, for our test scenes in the upper hemisphere above the scene origin. Stratified sampling has been used to reduce the variance of the sample directions [Veach 1997].
One of the main issues which arose during the implementation were the long optimization times. To reduce these, only monochrome surface light fields have been examined this report. In general, different wavelength of light can be considered as independent, so that the results obtained can be generalized for images with multiple color channels. Other strategies to reduce the amount of computations are discussed in the next section.
The vertex color for a new images can be obtained by evaluating the RBF architecture for a particular view direction $\bgroup\color{blue}$(\theta_0,\phi_0)$\egroup$ with the parameters found in the training phase. Given the estimated vertex color values, the model can be rendered using standard renderer. Our current implementation uses Maya for this.

Results

Increasing $\bgroup\color{blue}$V$\egroup$ reduces the test error rate but the decrease shows an asymptotic behavior, so that, beyond a certain point, the error can not be lowered significantly by using more training data. This makes it possible to bound the training set size without degrading the quality. For most problems also an optimum for the model complexity, this is the number of basis functions for an RBF network, exists. Next to these two parameter, also an optimal value for the weight decay tuning parameter $\bgroup\color{blue}$\lambda$\egroup$ had to be found.
A particular difficulty was that different optimal values $\bgroup\color{blue}$K_i$\egroup$ , $\bgroup\color{blue}$V_i$\egroup$ and $\bgroup\color{blue}$\lambda_i$\egroup$ might exist for each vertex; so essentially we were faced with $\bgroup\color{blue}$N$\egroup$ regression problems instead of one. However, using $\bgroup\color{blue}$N$\egroup$ different sets of optimal parameters is not tractable. First, because a full cross-validation over the three dimensional parameter space would be necessary for each vertex; this was prevented by the long optimization times; and second, because using different parameters would prohibit an efficient implementation of the re-rendering.
Next to the already mentioned restriction to monochrome images, further simplification were necessary to reduce the computational complexity and obtain the results of the exploration in a reasonable time. We used only a randomly generated subset of the vertices for the experiments to achieve this.
The test scenes employed in our experiments contained 1689 (Table 1) and 190 vertices (Table 4); 16 and 17 vertices have been used for the exploration of the hyperparameter space, respectively. The three-dimensional search space was spanned by $\bgroup\color{blue}$K = \{ 1, 2, 4, 8, 16, 32\}$\egroup$ , $\bgroup\color{blue}$\lambda = \{0.05, 0.15, 0.25\}$\egroup$ and $\bgroup\color{blue}$V = \{ 200, 500, 750, 1000\}$\egroup$ .
The sum of the error rates over all vertices on a test set of size 200 for the VMBF and the GBF are shown in Table 2 and Table 3. It can be seen that all graphs show an asymptotic behavior for increasing values of $\bgroup\color{blue}$K$\egroup$ . In Table 3, this behavior is more significant whereas the error rates in Table 2 decrease over the whole range of $\bgroup\color{blue}$K$\egroup$ . Unfortunately, no optimum for $\bgroup\color{blue}$K$\egroup$ exist. For the VMBF, values of 8 or 16 for $\bgroup\color{blue}$K$\egroup$ are sufficient as the error does not decrease substantially for higher values; however, for the GBF such a close-to-optimal value of $\bgroup\color{blue}$K$\egroup$ does not exist. Table 2 and Table 3 also show that, for different values of the ridge regression tuning parameter $\bgroup\color{blue}$\lambda$\egroup$ , the error rates do not differ significantly. In contrast to this, for both basis functions the error rates for 200 views are higher than those for 500. For the VMBF, a training set size of 500 is sufficient to achieve an almost optimal error rate. This also holds for the GBF for the scene with 1689 vertices (Table 3). For the results reported in Table 2, this close-to-optimal value is reached for 750 views for the GBF. The graphs in Table 2 show that the error rates for 1000 views are higher than those for 750 views. We assume the reason for this is the non-convexity of the optimization.
In both scenes, for $\bgroup\color{blue}$K>8$\egroup$ the GBF achieves lower error than the VMBF. For $\bgroup\color{blue}$K=8$\egroup$ the error rates are roughly equivalent and for lower values of $\bgroup\color{blue}$K$\egroup$ the VMBF outperforms the GBF. The error rates for $\bgroup\color{blue}$K=1$\egroup$ are not reported in Table 2 and Table 3 but here, the error rates for the GBF are an order of magnitude higher than those for the VMBF. An interesting observation is that, for the VMBF, the vertices clearly form clusters w.r.t. to the error rates; a behavior which does not exist for the GBF. A comparison of the error rates of the two scenes shows that those reported in Table 3 are significantly below those in Table 2. This is surprising as both scene have almost the same materials and lighting environment; the only main difference is the spatial resolution of the sample points. In this section, the results obtained during the exploration of the hyperparameter space are discussed. The goal of the experiments was to find a number of basis functions $\bgroup\color{blue}$K$\egroup$ and the training set size $\bgroup\color{blue}$V$\egroup$ which provide efficient pre-computation as well as accurate re-rendering. For an efficient pre-computation, both, $\bgroup\color{blue}$K$\egroup$ and $\bgroup\color{blue}$V$\egroup$ , should be as small as possible. However, this conflicts with the desired accuracy.
Using the results obtained through the exploration of the hyperparameter space, we trained an RBF networks with the VMBF and the GBF with a training set of size 500, $\bgroup\color{blue}$K=8$\egroup$ and $\bgroup\color{blue}$\lambda=0.1$\egroup$ for both of our test scenes. The parameters have been chosen based on the optimal values for the VMBF. Here, we decided to use $\bgroup\color{blue}$K=8$\egroup$ instead of $\bgroup\color{blue}$K=16$\egroup$ because the error rates do not differ significantly but the optimization time is much higher for $\bgroup\color{blue}$K=16$\egroup$ . In terms of re-rendering, $\bgroup\color{blue}$K=8$\egroup$ is also a better choice for the experiments as it is an upper limit for an efficient implementation of the re-rendering on the GPU.
The results for one view of the test set for the scene with 1690 vertices can be seen in the images in Table 1. For the chosen parameter values, both basis functions are not capable of capturing the hard shadow. Even there is a visual difference between the two approximated images, e.g. in the uniformity of the shadows, no one clearly performs better. This observation is supported by the error rates, which are roughly equivalent for both basis functions. The same results have been obtained for other re-renderings of the same scene as well as for re-renderings of the scene with 190 vertices.

Limitations and Future Work

Our work is limited in many respects. We want to address this in future work. First of all, the experiments described in the previous section have to be repeated on much large data sets and scenes with more complicated lighting environments. One question is how much the optimal values for the number of basis functions and the necessary training set size depend on the scene characteristics and if these can be estimated efficiently, for example based on the variance of the color values in the training set.
A crucial prerequisite for performing more experiments is to investigate more efficient ways to obtain the optimal parameters. As already mentioned, the convergence rate of the numerical minimizer we used so far is not optimal, especially for the VMBF. Here, it has to been explored if an adaption of a minimization technique to a function defined on the sphere could provide a better performance. An alternative would be to employ the EM algorithm for our problem. Originally, we discarded this possibility as the literature does not provide a consistent view if this could lead to improved performance [Ungar 1995] [Orr 1998] but given the current situation, we might consider this option again. In [Zickler 2005] different other efficient techniques for obtaining the parameter of an RBF are mentioned. It might be worth to explore these. Another idea is to implement the optimization process on a graphics processing units (GPU). The results of Hillesland et al. [Hillesland 2003] and the advance in GPU technology suggest a speedup up to a factor of 20. The simplicity of the RBF network, given the number of basis functions is constant and sufficiently small, suggests to also implement the re-rendering on the GPU. Here, we expect real-time performance even for complex scenes.
In the longer term more general questions regarding our approach can be addressed. One is the investigation of other spherical basis functions such as Spherical Wavelets [Schröder 1995] or splines defined on the sphere [Wahba 1981]. Improvements might also be possible when relaxing the assumption that the vertices are independent. For example instead of computing the error per sample point one could compute the error for a neighborhood in a rendered image. In [Green 2006] it is denoted that interpolating Gaussians between the vertices instead of the final colors leads to a significantly improved visual appearance, similar to the difference between Gouraud and Phong Shading. We think this can be adapted for our technique.
Another area of future work can be to move beyond the tabulation of the spatial domain. Learning the whole five-dimensional surface light field function function would be one possibility. Interesting would also to employ the VMBF to learn the transfer function and compare the results to [Green 2006].
Further work is also necessary to understand why we cannot see the superiority of the VMBF over the GBF reported in [Jenison 1995].

Conclusion

We presented a learning technique for the surface light field function based on radial basis function networks. We showed that Gaussian basis functions outperform the von Mises basis function for high numbers of basis functions $\bgroup\color{blue}$K$\egroup$ but both perform equally well for $\bgroup\color{blue}$K=8$\egroup$ , which has the most relevance in practice. Therefore, our particular spherical basis function does not provide an advantage over basis functions defined in cartesian space for approximating surface light fields. Unfortunately, both, VMBF and GBF, are not capable to reconstruct high frequency effects such as the sharp shadow.
Compared to existing techniques for surface light fields, our approach has the advantage of a low memory consumption without additional compression techniques. However, this comes to the prize that we do not achieve the quality of previous techniques, such as [Wood 2000], for high-frequency effects. The current approach is mainly limited by the training time for all vertices of a scene and due to the limited amount of data used for this report, no general applicability of our results can be assumed.

Figures

Here, we present some additional graphs which could not be included in pdf version of the report.