AI Radar Research

Daily research digest for developers — Monday, May 25 2026

arXiv

LLM Code Smells: A Taxonomy and Detection Approach

This paper presents a taxonomy of code smells specific to Large Language Models (LLMs) and proposes a detection approach to identify these smells in software systems.

Why it matters: Understanding and detecting code smells in LLMs can help developers improve the integration and maintainability of AI in software projects.
arXiv

Security of LLM-generated Code: A Comparative Analysis

This study examines the security implications of code generated by Large Language Models (LLMs), comparing it against traditional code to identify potential vulnerabilities.

Why it matters: Ensuring the security of LLM-generated code is crucial for safe deployment in real-world applications.
arXiv

Evaluating Large Language Models in a Complex Hidden Role Game

This research evaluates the reasoning, persuasion, and deceptive capabilities of LLMs within a complex hidden role game, providing insights into their potential for AI safety.

Why it matters: Understanding LLMs' capabilities in complex scenarios is essential for developing safe and reliable AI systems.
arXiv

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

This paper introduces a new metric for measuring energy consumption in agentic AI systems, focusing on energy per successful goal rather than per model invocation.

Why it matters: Optimizing energy efficiency in AI systems can lead to more sustainable and cost-effective deployments.
arXiv

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

EVE-Agent introduces a framework for self-evolving agents that generate and verify their own learning data without human intervention, enhancing autonomy and scalability.

Why it matters: Self-evolving agents can reduce the need for human oversight, making AI systems more autonomous and scalable.
arXiv

On the Reliability of Code Comprehension Proxies

This study investigates the reliability of various proxies used to assess code comprehension, providing insights into their effectiveness and limitations.

Why it matters: Reliable code comprehension proxies are essential for evaluating and improving AI-assisted code review tools.
arXiv

The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study

This longitudinal study examines the effects of AI coding assistants on software engineering practices, focusing on task focus, developer experience, and productivity.

Why it matters: Understanding the impact of AI coding assistants can guide their integration into software development workflows.
arXiv

AI Assurance: A Comprehensive Testing Strategy for Enterprise AI Systems

This paper proposes a comprehensive testing strategy for enterprise AI systems, addressing the unique risks associated with large language models and autonomous agents.

Why it matters: Effective testing strategies are crucial for ensuring the reliability and safety of enterprise AI systems.
arXiv

Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study

This empirical study explores the use of philosophical dispositions as constraints on AI-assisted code review tools, aiming to produce more context-sensitive and varied analyses.

Why it matters: Incorporating philosophical dispositions can enhance the contextual sensitivity of AI code review tools.
arXiv

A measurement substrate for agentic Kubernetes operations: Methodology and a case study in retrieval-compounding falsification

This paper presents a methodology for measuring agentic operations in Kubernetes, focusing on retrieval-compounding falsification to improve empirical claims' reliability.

Why it matters: Reliable measurement methodologies are essential for validating the performance of autonomous systems in cloud environments.
✉ Subscribe to daily research digest