How we built an internal data analytics agent

2026-06-19 · Source: The GitHub Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, short

Summary

GitHub has developed Qubot, an internal Copilot-powered analytics agent, to provide self-serve data access for its employees, known as Hubbers. Launched on June 19, 2026, Qubot allows users to ask natural language questions about GitHub's data warehouse and receive answers within seconds, accessible via Slack, VS Code, and the Copilot CLI. Its architecture comprises a user interface, a federated context layer, and a query engine connecting to Kusto and Trino. The context layer, enriched by product and analytics teams, is crucial for Qubot's accuracy and speed, making it three times faster. An offline evaluation framework, using curated test cases and automated runs, ensures accuracy and catches regressions. Qubot has seen wide adoption, with hundreds of users running thousands of queries, significantly reducing reliance on dedicated analytics support and centralizing distributed data knowledge.

Key takeaway

For AI Engineers or MLOps teams considering internal data analytics solutions, implementing a Copilot-powered agent like Qubot can significantly democratize data access. Your team can reduce the burden on data analysts by enabling self-serve exploration, especially for exploratory questions. Focus on building a robust, federated context layer and an evaluation framework to ensure accuracy and performance, centralizing distributed knowledge and driving faster decision-making across your organization.

Key insights

A Copilot-powered analytics agent can provide self-serve data access, reducing reliance on dedicated support.

Principles

Federated context layers enhance agent accuracy and speed.
Standardized templates streamline context contribution.
Offline evaluation frameworks are critical for agent quality.

Method

Build an analytics agent with a UI, a federated context layer (bronze, silver, gold data), and a query engine (Kusto/Trino) that automatically selects the appropriate backend.

In practice

Integrate agents into Slack, VS Code, and CLI for accessibility.
Use pull requests for context layer changes.
Benchmark agent performance with curated test cases.

Topics

AI Agents
Data Analytics
GitHub Copilot
Self-serve Data
Evaluation Frameworks
Trino
Kusto

Best for: AI Engineer, MLOps Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The GitHub Blog.