De novo design of functional nucleic acids of aptamers
Summary
InstructNA is a novel framework designed for the de novo generation of functional nucleic acids (FNAs), such as transcription factor-binding DNA and protein-binding aptamers, without requiring structural information. It integrates nucleic acid large language models (NA-LLMs) with high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX) data. The framework involves continually pretraining an existing NA-LLM with HT-SELEX data to create a domain-adapted FNA-LLM, followed by training a lightweight decoder. A key component is the HC-HEBO (hill climbing–heteroscedastic and evolutionary Bayesian optimization) algorithm, which refines FNA design in a continuous latent space. InstructNA demonstrated superior performance, generating 100% and 200% more strong aptamer binders for LOX1 and CXCL5 protein targets, respectively, compared to traditional HT-SELEX, with sequence similarities as low as 38% to original aptamers.
Key takeaway
For AI Researchers and computational biologists focused on molecular design, InstructNA offers a robust approach to overcome limitations in traditional FNA discovery. Your teams should consider integrating NA-LLMs with HT-SELEX and Bayesian optimization to accelerate the development of novel aptamers and other functional nucleic acids, potentially yielding higher affinity binders with greater sequence diversity than conventional methods.
Key insights
InstructNA combines NA-LLMs and HT-SELEX with Bayesian optimization for efficient de novo functional nucleic acid design.
Principles
- Integrate LLMs with high-throughput experimental data.
- Iterative refinement in latent space enhances design.
- Local optimization (HC-HEBO) improves binding specificity.
Method
InstructNA continually pretrains NA-LLMs with HT-SELEX data, trains a decoder, and uses the HC-HEBO algorithm for iterative, function-guided optimization of FNA sequences in a continuous latent space.
In practice
- Apply InstructNA for designing novel aptamers.
- Utilize HC-HEBO to optimize binding affinity.
- Incorporate diverse seeding sequences for better results.
Topics
- Nucleic Acid LLMs
- Aptamer Design
- De Novo Design
- Bayesian Optimization
- HT-SELEX
Code references
Best for: AI Researcher, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.