AI News

⚡ 6 minutes ago
1
1
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics (arxiv.org)
2
1
Trace2Policy: From Expert Behavior Traces to Self-Evolving Decision Agents (arxiv.org)
3
1
Soul Computing: A Theoretical Framework and Technical Architecture for Intelligent Agents with Independent Consciousness (arxiv.org)
4
1
A Unified Multi-Modal Framework for Intelligent Financial Systems: Integrating Reinforcement Learning, High-Frequency Trading, and Game-Theoretic Approaches with Cross-Modal Sentiment Analysis (arxiv.org)
5
1
STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios (arxiv.org)
6
1
Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning (arxiv.org)
7
1
Self-Distillation Policy Optimization via Visual Feedback: Bridging Code and Visual Artifacts (arxiv.org)
8
1
Mobility Anomaly Generation using LLM-Driven Behavior with Kinematic Constraints (arxiv.org)
9
1
What Spatial Memory Must Store: Occlusion as the Test for Language-Agent Memory (arxiv.org)
10
1
From Context-Aware to Conflict-Aware: Generalizing Contrastive Decoding for Knowledge Conflict in LLMs (arxiv.org)
11
1
Sim2Schedule: A Simulator-Guided LLM Framework for Autonomous Open-Pit Mine Scheduling (arxiv.org)
12
1
Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction (arxiv.org)
13
1
RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning (arxiv.org)
14
1
Regimes: An Auditable, Held-Out-Gated Improvement Loop Demonstrated on LongMemEval with ActiveGraph (arxiv.org)
15
1
Minimalist Genetic Programming (arxiv.org)
16
1
Operator Fusion for LLM Inference on the Tensix Architecture (arxiv.org)
17
1
Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents (arxiv.org)
18
1
Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune (arxiv.org)
19
1
Does Normalization Choice Matter for Causal Large Time-Series Models? (arxiv.org)
20
1
Aesthetic Perspectives in Information Systems Research: A Hermeneutic Analysis (arxiv.org)
21
1
Human-AI Teaming Through the Lens of Calibration (arxiv.org)
22
1
RAG over Thinking Traces Can Improve Reasoning Tasks (arxiv.org)
23
1
The hyper-scaled NLP bound for maximum-entropy remote sampling (arxiv.org)
24
1
AI Application Gives Users Real-Time Feedback on the Level of Peace in the Social Media Videos They Watch (arxiv.org)
25
1
SAFE: An LLM-as-Verifier Framework for Evidence-Grounded Multi-Hop Reasoning (arxiv.org)
26
1
Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings (arxiv.org)
27
1
Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training (arxiv.org)
28
1
Using the YOLOv12 Model for Verifying the Correct Color Sequence of Wires in Network Cables (Patch Cords) on the Production Line (arxiv.org)
29
1
UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data (arxiv.org)
30
1
Decentralized Multi-Agent Systems with Shared Context (arxiv.org)
31
1
Constructing coherent spatial memory in LLM agents through graph rectification (arxiv.org)
32
1
Position: The ML Community Must Build an AI-Augmented Peer-Review Ecosystem (arxiv.org)
33
1
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications (arxiv.org)
34
1
A Survey on Semantic Modeling for Building Energy Management (arxiv.org)
35
1
Belief Acquisition as Stochastic Filtering (arxiv.org)
36
1
EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents (arxiv.org)
37
1
Piper: A Programmable Distributed Training System (arxiv.org)
38
1
Flaws in the LLM Automation Narrative (arxiv.org)
39
1
Provenance-Grounded Gating and Adaptive Recovery in Synthetic Post-Training Data Curation (arxiv.org)
40
1
Towards Autonomous Accelerator Design: FPGA Accelerator Generation with SECDA (arxiv.org)
41
1
FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model (arxiv.org)
42
1
PhantomBench: Benchmarking the Non-existential Threat of Language Models (arxiv.org)
43
1
RoboNaldo: Accurate, Stable and Powerful Humanoid Soccer Shooting via Motion-Guided Curriculum Reinforcement Learning (arxiv.org)
44
1
Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models (arxiv.org)
45
1
T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains (arxiv.org)
46
1
Assessment of Personality Dimensions Across Situations in Dyadic Role-Play Scenarios (arxiv.org)
47
1
LLM-Aided Joint Secrecy Precoding and Trajectory for RSMA-Based Heterogeneous UAV Networks (arxiv.org)
48
1
A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design (arxiv.org)
49
1
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark (arxiv.org)
50
1
Attacks on Machine-Text Detectors Retain Stylistic Fingerprints (arxiv.org)