Recent advances in sparse voxel representations have significantly improved the quality of 3D content generation, enabling high-resolution modeling with fine-grained geometry. However, existing frameworks suffer from severe computational inefficiencies due to the quadratic complexity of attention mechanisms in their two-stage diffusion pipelines. In this work, we propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality. Our method leverages the compact VecSet representation to efficiently generate a coarse object layout in the first stage, reducing token count and accelerating voxel coordinate prediction. To refine per-voxel latent features in the second stage, we introduce Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7× speed-up in latent generation. To support this mechanism, we construct a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels. Extensive experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.
Overview of the Ultra3D framework. Ultra3D is a two-stage framework that first generates sparse voxel layout via VecSet and then refines it by generating per-voxel latent. The core of Ultra3D is Part Attention, an efficient localized attention mechanism that performs attention computation independently within each part group. Besides, when the input condition is an image, each part group performs cross attention only with the image tokens onto which its voxel tokens are projected.
Comparison With Other Methods. Our method produces higher fidelity and richer surface details. As highlighted in the red boxes, our results align more closely with the input image compared to other methods.
Select a method from the dropdown menu to compare. Drag to rotate.
● Scroll to zoom in/out
● Drag to rotate
● Press "shift" and drag to pan
@misc{chen2025ultra3defficienthighfidelity3d,
title={Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention},
author={Yiwen Chen and Zhihao Li and Yikai Wang and Hu Zhang and Qin Li and Chi Zhang and Guosheng Lin},
year={2025},
eprint={2507.17745},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.17745}
}