AI News

⚡ 13 minutes ago
1
1
Qift: Shift-Friendly No-Zero W2 Post-Training Quantization for Rotated W2A4/KV4 LLM Inference (arxiv.org)
2
1
ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models (arxiv.org)
3
1
Contrastive Neural Algorithmic Reasoning for Graph Coloring (arxiv.org)
4
1
When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning (arxiv.org)
5
1
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks (arxiv.org)
6
1
Consistency Training Can Entrench Misalignment (arxiv.org)
7
1
Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models (arxiv.org)
8
1
BotDirector: Robot Storytelling Across the Symmetrical Reality with Multi-modal Interactions (arxiv.org)
9
1
Inducing Reasoning Primitives from Agent Traces (arxiv.org)
10
1
AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification (arxiv.org)
11
1
Binary Road Surface Classification Using Machine Learning on Production Vehicle Signals During Cruising (arxiv.org)
12
1
Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models (arxiv.org)
13
1
What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents (arxiv.org)
14
1
Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing (arxiv.org)
15
1
Learning Coherent Representations: A Topological Approach to Interpretability (arxiv.org)
16
1
WISE-HAR: A Generalizable Ensemble Deep Learning Framework for WiFi-Based Human Activity Recognition (arxiv.org)
17
1
RRISE: Robust Radius Inference via a Surrogate Estimator (arxiv.org)
18
1
RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting (arxiv.org)
19
1
ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents (arxiv.org)
20
1
RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases (arxiv.org)
21
1
Solipsistic Superintelligence is Unlikely to be Cooperative (arxiv.org)
22
1
CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection (arxiv.org)
23
1
FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments (arxiv.org)
24
1
DELTAMEM: Incremental Experience Memory for LLM Agents via Residual Trees (arxiv.org)
25
1
Fully Automated Identification of Lexical Alignment and Preference-Stage Shifts in Large Language Models (arxiv.org)
26
1
AI-Generated Traces for Novice Programmers: Learning Effects and Learner Differences in a Multi-Institutional Study (arxiv.org)
27
1
A Nonmonotone Gradient-Based Algorithm for Symmetric Nonnegative Matrix Factorization and Graph Clustering (arxiv.org)
28
1
Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning (arxiv.org)
29
1
Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering (arxiv.org)
30
1
Uncertainty-Aware Clarification in LLM Agents with Information Gain (arxiv.org)
31
1
MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents (arxiv.org)
32
1
Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation (arxiv.org)
33
1
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks (arxiv.org)
34
1
What Makes Interaction Trajectories Effective for Training Terminal Agents? (arxiv.org)
35
1
Towards Non-Monotonic Entailment in Propositional Defeasible Standpoint Logic (arxiv.org)
36
1
Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition (arxiv.org)
37
1
Conditional Latent Diffusion Model with Fourier-based Motion Modelling for Virtual Population Synthesis (arxiv.org)
38
1
Fairness Definitions and Metrics in Deep Reinforcement Learning for Drug Discovery in Healthcare: A Rapid Evidence Review (arxiv.org)
39
1
Constitutional On-Policy Safe Distillation (arxiv.org)
40
1
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward (arxiv.org)
41
1
$\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval (arxiv.org)
42
1
The DeepSpeak-Agentic Dataset (arxiv.org)
43
1
Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation (arxiv.org)
44
1
Evaluating LLMs' Effectiveness on Real-World Consumer Device Repair Questions (arxiv.org)
45
1
SkillPyramid: A Hierarchical Skill Consolidation Framework for Self-Evolving Agents (arxiv.org)
46
1
Dynamic Objective Selection with Safeguards and LLM Oversight for Financial Decision-Making (arxiv.org)
47
1
HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models (arxiv.org)
48
1
WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts (arxiv.org)
49
1
Enhancing Protein-Protein Interaction Prediction with Hierarchical Motif-based Multimodal Protein Embedding (arxiv.org)
50
1
Towards Compact Autonomous Driving Perception with Balanced Learning and Multi-sensor Fusion (arxiv.org)