HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection

2026-06-10 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

HybridCodeAuthorship is a novel benchmark dataset introduced to address the growing challenge of detecting AI-generated code within industry codebases that increasingly blend human and AI contributions. Unlike existing benchmarks, which often feature academic problems or assume entire code snippets are either human- or AI-authored, HybridCodeAuthorship provides Python code files with authentically interleaved human- and AI-authored lines. Constructed using a pipeline that leverages CodeSearchNet, this dataset simulates real-world AI code assistant usage. Initial benchmarking with state-of-the-art algorithms, including AIGCode Detector, revealed it is a challenging benchmark, with the top algorithm achieving F1 scores of 0.48 for chunk-level and 0.56 for line-level code detection tasks.

Key takeaway

For Machine Learning Engineers developing code authorship detection systems, you should integrate the HybridCodeAuthorship benchmark into your evaluation pipeline. This dataset offers a more realistic assessment of algorithm performance on interleaved human- and AI-generated code, reflecting actual industry usage of AI assistants. Recognizing the current F1 scores of 0.48 to 0.56, focus your research on improving fine-grained detection capabilities to meet practical risk management and productivity analysis needs.

Key insights

Industry codebases require fine-grained, line-level detection of AI-generated code for risk and productivity analysis.

Principles

Existing code authorship benchmarks are insufficient for hybrid AI/human code.
Authentic AI assistant usage results in interleaved human and AI code.
Line-level AI code detection remains a challenging task.

Method

A dataset construction pipeline leverages CodeSearchNet to create Python files with interleaved human- and AI-authored lines, simulating real-world AI assistant usage.

In practice

Evaluate AI code detection algorithms on realistic hybrid code scenarios.
Develop new algorithms for line-level AI code authorship detection.

Topics

HybridCodeAuthorship
Code Authorship Detection
AI Code Assistants
Benchmark Datasets
Python
CodeSearchNet

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.