Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench

2025-10-13 · Source: Import AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Mathematics & Computational Sciences · Depth: Intermediate, long

Summary

Recent research highlights advanced capabilities and persistent challenges in AI. Google, University of Chicago, and Santa Fe Institute researchers found that large language models (LLMs) like DeepSeek-R1 and QwQ-32B simulate "societies of thought" with distinct personalities and expertise to solve complex problems, a phenomenon not seen in base pre-trained models. Concurrently, a new benchmark called ChipBench, developed by UC San Diego and Columbia University, reveals that frontier LLMs struggle with real-world chip design tasks, including Verilog coding, debugging, and reference model generation, achieving low pass@1 scores. Separately, Google DeepMind's Gemini-based LLM, Aletheia, solved two previously open Erdős mathematical problems, demonstrating AI's potential in scientific discovery but also underscoring the significant human effort required to filter and validate AI-generated solutions. Huawei and Nanjing University also developed AscendCraft, an LLM-guided system for automating kernel design for Huawei's AscendC chips, achieving 98.1% compilation success and 90.4% functional correctness, with 46.2% matching or exceeding PyTorch eager execution performance.

Key takeaway

For research scientists evaluating AI capabilities, recognize that while LLMs can achieve breakthroughs in abstract reasoning and even solve open mathematical problems, their performance in practical, domain-specific engineering tasks like chip design remains limited. You should focus on developing robust, real-world benchmarks and specialized scaffolding techniques to bridge the gap between general LLM intelligence and industrial application, while also preparing for significant human oversight in validating AI-generated scientific discoveries.

Key insights

Advanced LLMs simulate internal "societies of thought" for complex reasoning and can solve open math problems, yet struggle with real-world chip design.

Principles

Enhanced reasoning emerges from implicit multi-agent interactions.
AI for science requires substantial human validation.
LLMs need scaffolds for obscure hardware design.

Method

AscendCraft uses a two-stage pipeline: an LLM generates a high-level DSL program for kernel computation, then transcompiles it into AscendC code via structured LLM-based lowering passes.

In practice

Use multi-perspective prompting for complex LLM tasks.
Develop specialized benchmarks for real-world AI applications.
Implement DSL-guided LLM pipelines for niche hardware.

Topics

LLM Reasoning
AI Chip Design
Mathematical Discovery
Hardware Optimization
LLM Benchmarking

Code references

zhongkaiyu/ChipBench

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Import AI.