Research on Neural Radiance Fields (NeRF) has advanced rapidly due to their ability to impressively reconstruct 3D scenes from perspective camera images alone. Recently, other modalities, such as LiDAR point clouds and satellite imagery, have also been successfully explored for NeRF models. Despite the potential to create accurate reconstructions from each of these data sources, it can only be realized when the available data sufficiently covers a scene of interest, a condition that is hard to satisfy in practice for these sensor modalities in isolation. To tackle this issue, this work studies the unexplored task of training NeRFs by combining ground-based and satellite-based data, two data sources with complementary coverage attributes. We propose CaLiSa-NeRF, a novel NeRF model that simultaneously integrates perspective camera images, satellite images with Rational Polynomial Coefficients (RPCs), and LiDAR point clouds to better represent urban environments. Various techniques are introduced to harmonize these heterogeneous sensor inputs for NeRF training, and the resulting methods are able to represent both side and top views, unlike the methods restricted to a particular data origin. We demonstrate the effectiveness of the proposed methods by training and evaluating them on a real dataset collected from Riyadh.
Key Findings: