Query Tags: The Context Your Warehouse Queries Have Been Missing
Summary
Databricks is introducing Query Tags in Public Preview, a new feature that allows users to attach custom key-value pairs (e.g. "project" : "finance_planning") of business context to every SQL execution. This capability addresses the lack of detailed attribution in standard query logs, which often only show who ran a query, on which warehouse, and from which tool. Query Tags are recorded in the Query History System Table, enabling grouping, filtering, and analysis of workloads. The feature has already seen strong adoption, with hundreds of customers tagging millions of queries weekly. It supports three primary scenarios: automatically propagating identifiers from partner tools like dbt, Power BI, and Tableau; attaching metadata such as "customerid" or "applicationname" to queries from custom applications via APIs and connectors; and allowing analysts to label ad-hoc work with dimensions like "dev vs. prod environment" or "cost center" directly in the Databricks UI.
Key takeaway
For MLOps Engineers or Data Engineers managing shared Databricks SQL warehouses, implementing Query Tags is crucial for accurate cost allocation and performance monitoring. You can now precisely attribute spend to specific teams or projects and quickly pinpoint the source of performance regressions, like a slow dbt model or Power BI report. Start by configuring automatic tagging for partner tools or adding tags via connectors and SQL statements to gain immediate visibility into your workloads.
Key insights
Query Tags provide granular business context to SQL executions, enhancing workload traceability and cost attribution.
Principles
- Contextualize queries with key-value pairs.
- Automate tagging via partner tools/connectors.
- Centralize tag data in system tables.
Method
Attach custom key-value pairs to SQL executions at the connection or statement level, then analyze them in the Query History System Table.
In practice
- Trace dbt model performance regressions.
- Allocate shared warehouse costs by team.
- Identify dev vs. prod workloads.
Topics
- Databricks SQL
- Query Tags
- Cost Allocation
- Performance Monitoring
- dbt Integration
- Data Warehousing
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Data Engineer, Analytics Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.