Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench
Summary
Recent research highlights advanced capabilities and persistent challenges in AI. Google, University of Chicago, and Santa Fe Institute researchers found that large language models (LLMs) like DeepSeek-R1 and QwQ-32B simulate "societies of thought" with distinct personalities and expertise to solve complex problems, a phenomenon not seen in base pre-trained models. Concurrently, a new benchmark called ChipBench, developed by UC San Diego and Columbia University, reveals that frontier LLMs struggle with real-world chip design tasks, including Verilog coding, debugging, and reference model generation, achieving low pass@1 scores. Separately, Google DeepMind's Gemini-based LLM, Aletheia, solved two previously open Erdős mathematical problems, demonstrating AI's potential in scientific discovery but also underscoring the significant human effort required to filter and validate AI-generated solutions. Huawei and Nanjing University also developed AscendCraft, an LLM-guided system for automating kernel design for Huawei's AscendC chips, achieving 98.1% compilation success and 90.4% functional correctness, with 46.2% matching or exceeding PyTorch eager execution performance.
Key takeaway
For research scientists evaluating AI capabilities, recognize that while LLMs can achieve breakthroughs in abstract reasoning and even solve open mathematical problems, their performance in practical, domain-specific engineering tasks like chip design remains limited. You should focus on developing robust, real-world benchmarks and specialized scaffolding techniques to bridge the gap between general LLM intelligence and industrial application, while also preparing for significant human oversight in validating AI-generated scientific discoveries.
Key insights
Advanced LLMs simulate internal "societies of thought" for complex reasoning and can solve open math problems, yet struggle with real-world chip design.
Principles
- Enhanced reasoning emerges from implicit multi-agent interactions.
- AI for science requires substantial human validation.
- LLMs need scaffolds for obscure hardware design.
Method
AscendCraft uses a two-stage pipeline: an LLM generates a high-level DSL program for kernel computation, then transcompiles it into AscendC code via structured LLM-based lowering passes.
In practice
- Use multi-perspective prompting for complex LLM tasks.
- Develop specialized benchmarks for real-world AI applications.
- Implement DSL-guided LLM pipelines for niche hardware.
Topics
- LLM Reasoning
- AI Chip Design
- Mathematical Discovery
- Hardware Optimization
- LLM Benchmarking
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Import AI.