BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature

· Source: Artificial Intelligence · Field: Science & Research — Life Sciences & Biology, Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, quick

Summary

BioMiner is a multi-modal extraction framework designed to automate the mining of protein-ligand bioactivity data from scientific literature, addressing the bottleneck of manual curation. It separates bioactivity semantic interpretation from ligand structure construction. The system infers bioactivity semantics through direct reasoning and resolves chemical structures using a chemical-structure-grounded visual semantic reasoning paradigm, where multi-modal large language models process chemically grounded visual representations to infer inter-structure relationships, with exact molecular construction handled by domain chemistry tools. For evaluation, the BioVista benchmark was established, comprising 16,457 bioactivity entries from 500 publications. BioMiner achieved an F1 score of 0.32 for bioactivity triplets on this benchmark. Its utility is shown through applications like building a pre-training database from 82,262 data points, improving downstream model performance by 3.9%, and accelerating protein-ligand complex bioactivity annotation with a 5.59-fold speed increase.

Key takeaway

For AI Scientists and Research Scientists working on drug discovery, BioMiner offers a robust framework to significantly accelerate the extraction of protein-ligand bioactivity data. You should consider integrating such multi-modal extraction systems to build richer pre-training datasets and enhance the efficiency and accuracy of your bioactivity annotation workflows, potentially leading to faster identification of novel drug candidates.

Key insights

BioMiner automates protein-ligand bioactivity data extraction by separating semantic interpretation from chemical structure construction.

Principles

Method

BioMiner uses direct reasoning for bioactivity semantics and multi-modal LLMs on chemically grounded visual representations for structure relationships, delegating exact construction to chemistry tools.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.