Data Machina #262

2019-03-12 · Source: Data Machina · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, quick

Summary

The latest AI Radar update highlights several significant advancements in large language models and AI systems. Mistral AI, in collaboration with NVIDIA, released Mistral NeMo 12B, a 12-billion parameter, multi-lingual model with a 128k context length under the Apache 2.0 license, though its benchmark accuracy is under community scrutiny. Stanford introduced TextGrad, a framework enabling automatic "differentiation" via textual feedback from LLMs to enhance compound AI systems' zero-shot capabilities. Tencent proposed patch-level training, a technique that compresses multiple tokens into a single patch to reduce LLM computational costs by 50%. Additionally, Stanford's STORM project unveiled an open-source generative writing system for creating grounded, long-form articles comparable to Wikipedia pages, which simulates expert conversations and curates information for outlining.

Key takeaway

For AI Architects evaluating new LLM technologies, you should investigate Mistral NeMo 12B for its multi-lingual capabilities and 128k context, while also noting community questions regarding its benchmarks. Consider Stanford's TextGrad for improving compound AI systems' zero-shot performance through textual feedback, and Tencent's patch-level training for potentially halving computational costs in LLM training. These advancements offer pathways to more efficient and capable AI deployments.

Key insights

New models and frameworks are advancing LLM efficiency, reasoning, and long-form content generation.

Principles

Context length is critical for multi-lingual models.
Textual feedback can drive AI system improvement.
Token compression reduces LLM training costs.

Method

TextGrad uses LLM-provided textual feedback for backpropagation to improve compound AI system components. Tencent's patch-level training compresses multiple tokens into a single patch, then trains the model to predict the next patch.

In practice

Explore Mistral NeMo 12B for multi-lingual applications.
Investigate TextGrad for enhancing LLM zero-shot performance.
Consider patch-level training for cost-efficient LLM pre-training.

Topics

Mistral NeMo
LLM Training Optimization
Compound AI Systems
Generative Writing
MLOps

Code references

Best for: AI Scientist, Research Scientist, AI Architect, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Machina.