AI News

⚡ 13 minutes ago
1
1
From AGI to ASI (arxiv.org)
2
1
Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System (arxiv.org)
3
1
DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks (arxiv.org)
4
1
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness (arxiv.org)
5
1
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior (arxiv.org)
6
1
Benchmarking AI Agents for Addressing Scientific Challenges Across Scales (arxiv.org)
7
1
Reducing the Complexity of Deep Learning Models for EEG Analysis on Wearable Devices (arxiv.org)
8
1
Prefill Awareness in Large Language Models (arxiv.org)
9
1
Constructing Evaluation Datasets for Procedural Reasoning: Balancing Naturalness, Grounding, and Multi-Hop Coverage (arxiv.org)
10
1
Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents (arxiv.org)
11
1
GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models (arxiv.org)
12
1
Topical Phase Transitions in Artificial Intelligence Research: Large-Scale Evidence and an Early-Warning Signature for Emerging Topics (arxiv.org)
13
1
Fantastic Scientific Agents and How to Build Them: AgentBuild for Rietveld Refinement (arxiv.org)
14
1
(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable (arxiv.org)
15
1
Can I Buy Your KV Cache? (arxiv.org)
16
1
IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing (arxiv.org)
17
1
A Quantitative Experimental Repeated Measures Study of Training Dynamics in a Small Llama Style Language Model Under a Compute-Aware Token Budget (arxiv.org)
18
1
MiniMax Sparse Attention (arxiv.org)
19
1
Optimizing Appliance Scheduling for Solar Energy Management Using Metaheuristic Algorithms (arxiv.org)
20
1
Evaluation Sovereignty in Metadata-Driven Classification: A Multi-Track Framework for Weakly Supervised Information Systems (arxiv.org)
21
1
Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models (arxiv.org)
22
1
CloudCons: A Comprehensive End-to-End Benchmark for Cloud Resource Consolidation (arxiv.org)
23
1
Uncertainty-Aware Hybrid Retrieval for Long-Document RAG (arxiv.org)
24
1
Is It You or Your Environment? A Bayesian Inference Framework for Genomically-Anchored Personalized Physiological Interpretation (arxiv.org)
25
1
A Three-Layer Framework for AI in Scientific Discovery (arxiv.org)
26
1
Multiagent Protocols with Aggregated Confidence Signals (arxiv.org)
27
1
Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch (arxiv.org)
28
1
Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning (arxiv.org)
29
1
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility (arxiv.org)
30
1
Beyond Runtime Enforcement: Shield Synthesis as Defensibility Analysis for Adversarial Networks (arxiv.org)
31
1
Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization (arxiv.org)
32
1
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery (arxiv.org)
33
1
Agents-K1: Towards Agent-native Knowledge Orchestration (arxiv.org)
34
1
Automated reproducibility assessments in the social and behavioral sciences using large language models (arxiv.org)
35
1
AI SciBrief as a Gateway to Research: A Framework for Onboarding Students into New Research Areas (arxiv.org)
36
1
GeoDial: A Multimodal Conversational Tutoring Dataset for Geometry Problem-Solving with Visual Tutor Turns (arxiv.org)
37
1
The AI Legal Specialist: A Juridically Autonomous Professional Profile for AI Governance (arxiv.org)
38
1
Eigenism: Ethics for a Human-AI Future (arxiv.org)
39
1
Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering (arxiv.org)
40
1
The Challenges of Balancing AI Compliance and Technological Innovations in Critical Sectors: A Systematic Literature Review (arxiv.org)
41
1
AI-Automation Tooling in Computer Engineering Education: Mixed-Methods TAM/UTAUT Evidence for a General Acceptance Attitude (arxiv.org)
42
1
Boosting Direct Preference Optimization with Penalization (arxiv.org)
43
1
Mapping AI Programs in the U.S: A Status Report from Early 2026 and an Analysis of AI Majors and Minors (arxiv.org)
44
1
Muse Spark Safety & Preparedness Report (arxiv.org)
45
1
Will AI Agents Free Us From Meaningless Work? A Human-Centered Analysis (arxiv.org)
46
1
Algorithmic Constitutionalism (arxiv.org)
47
1
Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots (arxiv.org)
48
1
MP3: Multi-Period Pattern Pre-training forSpatio-Temporal Forecasting (arxiv.org)
49
1
NaturalFlow: Reducing Disruptive Pauses for Natural Speech Flow in Simultaneous Speech-to-Speech Translation (arxiv.org)
50
1
Select and Improve: Understanding the Mechanics of Post-Training for Reasoning (arxiv.org)