Creating City-Level High-Resolution 3D Models

Research on Neural Radiance Fields (NeRF) has advanced rapidly due to their ability to impressively reconstruct 3D scenes from perspective camera images alone. Recently, other modalities, such as LiDAR point clouds and satellite imagery, have also been successfully explored for NeRF models. Despite the potential to create accurate reconstructions from each of these data sources, it can only be realized when the available data sufficiently covers a scene of interest, a condition that is hard to satisfy in practice for these sensor modalities in isolation. To tackle this issue, this work studies the unexplored task of training NeRFs by combining ground-based and satellite-based data, two data sources with complementary coverage attributes. We propose CaLiSa-NeRF, a novel NeRF model that simultaneously integrates perspective camera images, satellite images with Rational Polynomial Coefficients (RPCs), and LiDAR point clouds to better represent urban environments. Various techniques are introduced to harmonize these heterogeneous sensor inputs for NeRF training, and the resulting methods are able to represent both side and top views, unlike the methods restricted to a particular data origin. We demonstrate the effectiveness of the proposed methods by training and evaluating them on a real dataset collected from Riyadh.

Key Findings:

  • Comprehensive Scene Representation: The integration of LiDAR, RGB images, and satellite data enables accurate omnidirectional views, demonstrating improved coverage and representation compared to single modality approaches.
  • Improved Reconstruction: Compared to ground-only or satellite-only models, CaLiSa-NeRF achieves better side-view renderings than satellite-only models and improved rooftop renderings compared to ground-only models, though side-view rendering quality shows slight degradation compared to ground-only cases.
  • Experimental Validation: Results on Riyadh datasets validate the capability of CaLiSa-NeRF for omnidirectional rendering, demonstrating proper rendering quality across side and rooftop views, with minimal degradation in ground-view quality.