Crystal Structure Prediction • Foundation Models • Superconductors

Siamese Foundation Models for Crystal Structure Prediction

DAO unites a structure generator (DAO-G) and an energy predictor (DAO-P) in a pretrain–finetune framework, achieving state-of-the-art crystal structure prediction and over 2000× speedup versus DFT softwares on real-world superconductors.

Liming Wu1,2,3†   Wenbing Huang1,2,3†✉   Rui Jiao4,5   Jianxing Huang6   Liwei Liu6   Yipeng Zhou6  
Hao Sun1,2,3   Yang Liu4,5   Fuchun Sun4   Yuxiang Ren7✉   Ji-Rong Wen1,2,3✉

1Gaoling School of Artificial Intelligence, Renmin University of China  
2Beijing Key Laboratory of Research on Large Models and Intelligent Governance  
3Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE  
4Department of Computer Science and Technology, Tsinghua University  
5Institute for AI Industry Research, Tsinghua University  
6Advanced Computing and Storage Lab, Huawei Technologies  
7School of Intelligence Science and Technology, Nanjing University

Motivation & Overview

Predicting crystal structures from chemical compositions is a fundamental challenge in materials discovery — analogous to protein folding but with far more complex 3D geometries.

High Computational Cost

Traditional CSP methods — first-principles calculations, stochastic sampling, and evolutionary optimization — are inherently limited by high computational costs and poor scalability with system complexity.

🧩

Limited Generalizability

Existing deep generative models rely on domain-specific small datasets for training, leading to limited generalizability to unseen structures and unsatisfactory performance on widely recognized CSP benchmarks like MPTS-52.

🔭

Missing CSP-Specific Foundation Models

Prior crystal foundation models either target force-field prediction (GNoME, MACE-MP-0) or general-purpose generation (MatterGen) — none specifically targets CSP with thorough investigation.

Our Solution: Siamese Foundation Models

We propose Diffusion-based Crystal Omni (DAO), a pretrain–finetune framework comprising two complementary foundation models: DAO-G for generating stable crystal structures and DAO-P for predicting energy and assisting DAO-G. Both are built upon Crysformer, a geometric graph Transformer ensuring O(3) and periodic invariance for crystal structures.

DAO Framework Overview

The DAO framework: pretraining pipeline and downstream validation of DAO-G and DAO-P.

Key Contributions

Six principal advances that collectively push CSP forward.

1

Siamese Foundation Model Framework

First foundation model framework specifically designed for CSP, comprising DAO-G (generator) and DAO-P (predictor) that synergistically cooperate: DAO-P relaxes data and guides generation for DAO-G, while DAO-G augments structural data for DAO-P.

2

CrysDB: ~940K Crystal Pretraining Dataset

Curated from Materials Project and OQMD, comprising ~940K entries of stable and unstable crystals with energy annotations, enabling large-scale pretraining with rigorous deduplication to prevent data leakage.

3

Two-Stage Pretraining with Dataset Relaxation

Stage I pretrains DAO-G on all crystals; Stage II refines on a dataset where unstable structures are relaxed by DAO-P using L-BFGS, mitigating bias toward unstable energy landscapes.

4

Energy-Guided Sampling via Boltzmann Distribution

DAO-P provides energy-based guidance during DAO-G's sampling, steering generated structures toward lower-energy, more thermodynamically stable configurations using a principled exponential energy loss.

5

SOTA on CSP Benchmarks

Pretraining consistently improves performance across multiple backbone architectures. DAO-G (Crysformer + FlowMM) achieves the best Match Rates of 74.17% on MP-20 and 42.01% on MPTS-52.

6

Real-World Superconductor Validation

On Cr6Os2, DAO achieves 100% match rate with RMSE 0.0012 and over 2000× speedup per iteration vs. DFT. DAO-P predicts critical temperatures with errors as low as 0.04 K.

CrysDB Statistics

Statistics of CrysDB: source distribution, stable/unstable proportions, and feature distributions.

Method / Framework

DAO's pretrain–finetune pipeline with two Siamese foundation models and a two-stage pretraining strategy.

1

CrysDB Construction

Compile ~940K crystal entries from Materials Project (94,779 entries) and OQMD (848,105 entries), containing 3–30 atoms with Ehull < 1.0 eV/atom. After deduplication against downstream benchmarks, the final CrysDB contains 919,258 entries — 29% stable, 71% unstable from OQMD; 55% stable, 45% unstable from MP.

2

Stage I: Pretrain DAO-G on Full CrysDB

DAO-G is pretrained via a diffusion process (DiffCSP) to predict lattice noise and fractional coordinates score. Training on both stable and unstable crystals enables learning from a broader distribution. Simultaneously, DAO-P is pretrained with a mix-supervised loss: the diffusion CSP loss (self-supervised) plus an exponential energy loss (supervised) that provably converges to ground-truth intermediate energies under Boltzmann-constrained modeling.

3

Dataset Relaxation via DAO-P

DAO-P predicts energy gradients (force fields) for unstable structures (0.08 < Ehull ≤ 0.5 eV/atom) and relaxes them toward more stable configurations using the L-BFGS optimizer — replacing expensive DFT calculations with a fast ML-based alternative.

4

Stage II: Refine DAO-G on Relaxed Dataset

Continue pretraining DAO-G on the relaxed dataset with a reduced learning rate, refining the denoising process based on improved data quality and mitigating bias toward unstable regions.

5

Energy-Guided Sampling

During generation, DAO-P steers the sampling of DAO-G via energy guidance: ∇Mt log pt(Mt) = ∇Mt log qt(Mt) − β∇MtEt(Mt, t). The Boltzmann-weighted distribution promotes thermodynamically stable structures.

6

Finetune for Downstream Tasks

DAO-G is directly finetuned for CSP without architecture modification. DAO-P is finetuned for energy/property prediction with specialized heads across eight distinct datasets.

Crysformer Architecture

Both DAO-G and DAO-P are built on Crysformer, a geometric graph Transformer with four modules: (1) an embedding module with CGCNN embeddings and Fourier-Transform-based invariant edge features; (2) an invariant graph attention module with separate parametric networks for keys, values, and edge features; (3) a gated addition module for flexible residual connections; (4) noise and energy prediction heads. Crysformer ensures O(3) equivariance for noise output and O(3) invariance for energy output, along with periodic translation invariance — critical symmetries for crystal structures.

Crysformer Architecture

Crysformer: embedding, invariant graph attention, gated addition, and prediction heads.

Experiments & Results

Evaluation on two well-recognized CSP benchmarks: MP-20 (≤20 atoms, 45,231 crystals) and MPTS-52 (≤52 atoms, 40,476 crystals).

74.17%
Best Match Rate on MP-20 (1-shot)
42.01%
Best Match Rate on MPTS-52 (1-shot)
919K
Deduplicated CrysDB Entries
2000×
Speedup vs. DFT (per iteration)

CSP Performance (1-shot) on MP-20 & MPTS-52

Category Model Size MP-20 MR (%) ↑ MP-20 RMSE ↓ MPTS-52 MR (%) ↑ MPTS-52 RMSE ↓
Non-Pretrained CDVAE33.900.10455.340.2106
DiffCSP51.490.063112.190.1786
EquiCSP57.390.051014.850.1169
FlowMM61.390.056017.540.1726
Crysformer + DiffCSP51.550.091517.650.1428
Pretrained DiffCSP12.3M51.230.055218.500.0825
DiffCSP-large26.2M64.040.043330.770.0640
MatterGen25.3M67.400.033230.280.0703
FlowMM-large25.7M69.950.037833.780.0951
Crysformer + DiffCSP (DAO-G Stage I)25.2M65.600.041132.520.0731
Crysformer + FlowMM25.2M74.170.040042.010.1083

Bold green = best. All pretrained models trained on CrysDB. Results averaged over three runs.

Figure 4: Superconductor

Ablation studies: two-stage pretraining, polymorph generation, energy guidance, and stability rates.

Key Findings

  • Impact of Pretraining: Large-scale pretraining boosts DAO-G's Match Rate from 51.55% to 65.60% on MP-20. FlowMM also benefits substantially from pretraining.
  • Efficacy of Crysformer: DAO-G outperforms DiffCSP-large across nearly all metrics. Crysformer + FlowMM also surpasses FlowMM-large in Match Rate.
  • Priority on Large Systems: While MatterGen slightly outperforms on MP-20, DAO-G achieves higher MR on MPTS-52 (32.52% vs. 30.28%), demonstrating better scaling to larger-atom systems.
  • Flow Matching Advantage: Replacing diffusion with flow matching yields the best Match Rates of 74.17% on MP-20 and 42.01% on MPTS-52.

Ablation: Two-Stage Pretraining & Energy Guidance

📊

Stage I vs. Stage I+II

Including unstable data in pretraining (Stage I) outperforms stable-only pretraining. Adding Stage II (data relaxation) further improves MR and reduces RMSE on MP-20, and significantly reduces RMSE variance on MPTS-52.

🔋

Energy Guidance Benefits

Energy-guided sampling increases stability rate from 85.99% → 87.42% on MP-20 and 73.75% → 75.05% on MPTS-52. It reduces RMSE on MPTS-52 (0.0695 → 0.0688).

🔮

Polymorph Generation

DAO-G successfully generates all polymorphs in 72.2%, 54.5%, and 81.8% of 2-, 3-, and 4-polymorph cases. For Ni6O2F10 (4 conformations), all are hit with RMSEs of 0.0063, 0.0305, 0.0309, 0.0049.

DAO-P Energy Prediction Accuracy (Zero-Shot)

Without finetuning on MP-20 or MPTS-52, DAO-P achieves MAEs of 0.0260 eV/atom on MP-20 and 0.0514 eV/atom on MPTS-52 test sets — accuracy considered acceptable for materials science. DAO-P also achieves SOTA results on four out of eight crystal property prediction datasets.

Real-World Superconductor Analysis

Validating DAO on three real-world superconductors unseen during pretraining and finetuning: Cr6Os2, Zr16Rh8O4, and Zr16Pd8O4.

Superconductor Results

Superconductor experiments: structure prediction, Tc estimation, and speed comparison with DFT.

100%
Match Rate on Cr6Os2 (20-shot)
0.0012
RMSE on Cr6Os2 (20-shot best)
0.04 K
Tc Error on Zr16Pd8O4

Cr6Os2 (A15 Structure)

DAO-G achieves 100% Match Rate and RMSE = 0.0012 over 20 runs. DFT Ehull of generated structure: 0.02918 vs. experimental 0.02916 — a difference of only 0.00002 eV/atom.

Although unstable Cr6Os2 structures existed in pretraining data, DAO-G generates the stable superconducting structure — not merely memorizing training examples.

vs. QE optimizer: 75% MR, avg. RMSE 0.1310, and >2000× slower per iteration.

Zr16Pd8O4 (η-carbide, Fd3̄m)

Features rigid Wyckoff site occupancy and geometrically frustrated stella quadrangula lattices. DAO-G generates the structure with RMSE = 0.0172. Ehull difference: 0.0003 eV/atom.

Zr16Rh8O4

A minor lattice change (~0.5%) substituting Rh for Pd significantly affects superconducting properties (Tc: 2.73 K → 3.73 K). DAO-G resolves this with RMSE = 0.0212.

DAO-P Tc errors: 2.02 K (Cr6Os2), 0.26 K (Zr16Rh8O4), 0.04 K (Zr16Pd8O4).

Data Augmentation Improves Tc Prediction

Using DAO-G to generate structures for 748 superconductors without structural data consistently improves DAO-P's Tc prediction across all 5 cross-validation folds, reducing average MAE (logK) from 0.761 → 0.714.

Speed Advantage

QE optimizer: ~138 minutes over 38 iterations. DAO-G: 1000 sampling iterations in 1.5 minutes — over 2000× faster per iteration.

Conclusion & Impact

DAO demonstrates the significant potential of Siamese foundation models for advancing materials science research and development.

🎯

SOTA Crystal Structure Prediction

Pretrained FlowMM achieves state-of-the-art results on both MP-20 and MPTS-52 benchmarks, with 74.17% and 42.01% Match Rates. Pretraining consistently benefits multiple backbone architectures.

🔄

Synergistic Generator–Predictor Interaction

DAO-P enhances DAO-G via dataset relaxation and energy-guided sampling; DAO-G augments structural data for DAO-P when structural information is unavailable.

🧪

Practical Superconductor Analysis

DAO accurately predicts structures and critical temperatures for real-world superconductors, outperforming DFT in both efficiency and accuracy — a promising step toward designing novel high-temperature superconductors.

Future Directions

  • Scaling to larger systems: Expanding the pretraining dataset beyond 30 atoms could improve MPTS-52 performance (currently 42.01% MR in 1-shot, 46.78% in 20-shot).
  • Advanced generative models: Integrating more advanced generative approaches into the pretraining process.
  • Novel superconductor design: Moving beyond structure prediction and Tc estimation toward property-guided design of novel high-temperature superconductors.

BibTeX Citation

If you find our work useful, please consider citing:

@article{wu2026dao,
	title = {Siamese foundation models for crystal structure prediction},
	issn = {2041-1723},
	doi = {10.1038/s41467-026-72362-3},
	journal = {Nature Communications},
	author = {Wu, Liming and Huang, Wenbing and Jiao, Rui and Huang, Jianxing and Liu, Liwei and Zhou, Yipeng and Sun, Hao and Liu, Yang and Sun, Fuchun and Ren, Yuxiang and Wen, Ji-Rong},
	year = {2026},
}

Contact

If you have any questions, feedback, or collaboration ideas, feel free to reach out:

📧 Email: wlm155@126.com