[AINews] The End of Finetuning
Summary
OpenAI has deprecated its finetuning APIs, a move that signals a shift in the AI engineering landscape, despite previous promotion of finetuning as a key toolkit component. This change occurs as Anthropic's valuation potentially surpasses OpenAI's and amidst an extreme GPU crunch. While finetuning may be declining for the general industry, top-tier companies like Cursor and Cognition are increasing their use of open model RLFT. The article also covers advancements in AI research, including new reasoning benchmarks like Soohak's 439 math problems and Medmarks v1.0, agentic systems like Google DeepMind's AI Co-Mathematician, and specialized retrieval models. Further sections detail progress in training optimization, scaling laws, inference systems (e.g., Blackwell racks for MoE serving), and new model releases such as Perceptron Mk1 for video reasoning and Jina's jina-embeddings-v5-omni. Operational security concerns are highlighted by the Mini Shai-Hulud supply-chain attack targeting AI developer tooling.
Key takeaway
For AI Architects evaluating model deployment strategies, OpenAI's finetuning deprecation suggests a pivot towards prompt engineering or larger, more capable base models for general use cases. However, if your team is building frontier applications or leveraging open-source models, investing in RLFT and custom ASIC solutions may still yield significant performance and cost advantages, especially for long-context or specialized tasks. You should assess whether your specific application truly benefits from finetuning or if alternative methods like advanced prompting or agentic orchestration are more efficient.
Key insights
Finetuning is declining for general AI engineering but remains critical for top-tier applications and open models.
Principles
- Benchmarks require continuous evolution to challenge frontier models.
- Small, specialized models excel in retrieval tasks when paired with generators.
Method
Agentic systems can decompose complex problems into specialized tasks, iteratively refining queries and leveraging external tools for enhanced performance in science and math.
In practice
- Consider aggressive GPU power caps for efficiency in local inference.
- Use small, distilled models as routers for larger LLMs to optimize costs.
Topics
- Finetuning Deprecation
- AI Agent Systems
- LLM Inference Optimization
- Research Benchmarking
- Multimodal AI Models
Code references
Best for: CTO, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.