Deep Learning-based 3D Oral Cavity Reconstruction Using 2D Intraoral Images

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Medical Imaging & 3D Modeling · Depth: Advanced, long

Summary

A deep learning framework is proposed for 3D oral cavity reconstruction using only ten 2D intraoral images, eliminating the need for expensive hardware like intraoral scanners or Cone Beam CT. This software-based approach addresses patient discomfort, material deformation errors, and high equipment costs associated with conventional methods. The model, trained on 950 upper jaw samples from the Dental3DS dataset, employs MobileNetV2 for image encoding and Multi-head Attention for multi-view feature fusion. It directly predicts 50,000 3D vertex coordinates and achieved an accuracy of 77.49%, measured by nearest-neighbor matching with a distance threshold of 0.035. However, a limitation identified is the uneven distribution of predicted vertices, which tend to concentrate in high-density regions of the ground truth due to the Chamfer Distance loss function.

Key takeaway

For dental practitioners or AI scientists developing accessible 3D modeling solutions, this research demonstrates a viable software-only approach. You can achieve 3D oral cavity reconstruction from just ten 2D intraoral images, significantly lowering equipment costs and patient discomfort. However, you must address the current limitation of uneven point distribution caused by Chamfer Distance to ensure clinical utility. Future work should focus on refining loss functions for more uniform vertex prediction.

Key insights

3D oral models can be reconstructed from ten 2D images using deep learning, reducing cost and discomfort but facing point distribution challenges.

Principles

Software-based 3D reconstruction reduces hardware dependency.
Multi-head Attention effectively fuses multi-view features.
Chamfer Distance can cause uneven point distribution.

Method

An encoder-decoder model uses MobileNetV2 for feature extraction from ten 2D images, fuses them with Multi-head Attention, and a decoder predicts 50,000 3D vertex coordinates.

In practice

Use 2D intraoral images for cost-effective 3D modeling.
Explore MobileNetV2 and Multi-head Attention for multi-view tasks.
Consider loss function impact on point cloud uniformity.

Topics

3D Reconstruction
Deep Learning
Oral Cavity Modeling
Intraoral Imaging
MobileNetV2
Multi-head Attention
Chamfer Distance

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.