Google新品,CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
发布日期:2025/2/24 22:07:43 浏览量:
Google新品,CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Recovering high-quality 3D scenes from a single RGB image is a challenging task in computer graphics. Current methods often struggle with domain-specific limitations or low-quality object generation. To address these, we propose CAST (Component-Aligned 3D Scene Reconstruction from a Single RGB Image), a novel method for 3D scene reconstruction and recovery. CAST starts by extracting object-level 2D segmentation and relative depth information from the input image, followed by using a GPT-based model to analyze inter-object spatial relationships. This enables the understanding of how objects relate to each other within the scene, ensuring more coherent reconstruction. CAST then employs an occlusion-aware large-scale 3D generation model to independently generate each object’s full geometry, using MAE and point cloud conditioning to mitigate the effects of occlusions and partial object information, ensuring accurate alignment with the source image’s geometry and texture. To align each object with the scene, the alignment generation model computes the necessary transformations, allowing the generated meshes to be accurately placed and integrated into the scene’s point cloud. Finally, CAST incorporates a physics-aware correction step that leverages a fine-grained relation graph to generate a constraint graph. This graph guides the optimization of object poses, ensuring physical consistency and spatial coherence. By utilizing Signed Distance Fields (SDF), the model effectively addresses issues such as occlusions, object penetration, and floating objects, ensuring that the generated scene accurately reflects real-world physical interactions. Experimental results demonstrate that CAST significantly improves the quality of single-image 3D scene reconstruction, offering enhanced realism and accuracy in scene recovery tasks. CAST has practical applications in virtual content creation, such as immersive game environments and film production, where real-world setups can be seamlessly integrated into virtual landscapes. Additionally, CAST can be leveraged in robotics, enabling efficient real-to-simulation workflows and providing realistic, scalable simulation environments for robotic systems.
在计算机图形学中,从单幅RGB图像中恢复高质量的3D场景是一项具有挑战性的任务。当前的方法经常与特定领域的限制或低质量的对象生成作斗争。为了解决这些问题,我们提出了一种新的三维场景重建和恢复方法CAST。CAST首先从输入图像中提取对象级2D分割和相对深度信息,然后使用基于GPT的模型来分析对象间的空间关系。这有助于理解场景中的对象如何相互关联,从而确保更连贯的重建。然后,CAST采用遮挡感知的大规模3D生成模型来独立生成每个对象的完整几何图形,使用MAE和点云条件来减轻遮挡和部分对象信息的影响,确保与源图像的几何图形和纹理精确对齐。为了将每个对象与场景对齐,对齐生成模型会计算必要的变换,从而允许将生成的网格准确放置并集成到场景的点云中。最后,CAST结合了物理感知的校正步骤,该步骤利用细粒度的关系图来生成约束图。该图指导物体姿态的优化,确保物理一致性和空间连贯性。通过利用带符号的距离场(SDF),该模型有效地解决了遮挡、对象穿透和浮动对象等问题,确保生成的场景准确地反映了现实世界的物理交互。实验结果表明,CAST显著提高了单幅图像三维场景重建的质量,增强了场景恢复任务的真实感和准确性。CAST在虚拟内容创作方面有实际应用,如沉浸式游戏环境和电影制作,其中真实世界的设置可以无缝集成到虚拟场景中。此外,CAST还可用于机器人领域,实现高效的实时模拟工作流程,并为机器人系统提供逼真、可扩展的模拟环境。
Bringing the vibrant diversity of the real world into the virtual realm, this collection reimagines open-vocabulary scenes as immersive digital environments, capturing the richness and depth of each unique setting. For each scene, the images display as follows: the top-left shows the input image, the top-center displays the rendered geometry, and the right presents the rendered image with realistic textures.
将真实世界充满活力的多样性带入虚拟世界,该系列将开放词汇场景重新想象为身临其境的数字环境,捕捉每个独特场景的丰富性和深度。对于每个场景,图像显示如下:左上方显示输入图像,中上方显示渲染几何体,右侧显示具有真实纹理的渲染图像。
论文地址:https://sites.google.com/view/cast4

马上咨询: 如果您有业务方面的问题或者需求,欢迎您咨询!我们带来的不仅仅是技术,还有行业经验积累。
QQ: 39764417/308460098 Phone: 13 9800 1 9844 / 135 6887 9550 联系人:石先生/雷先生