AI News

⚡ 14 minutes ago
1
1
The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection (arxiv.org)
2
1
A formal definition and meta-model for a machine theory of mind (arxiv.org)
3
1
Self-Soupervision: Cooking Model Soups without Labels (arxiv.org)
4
1
Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression (arxiv.org)
5
1
A Training-Free Mixture-of-Agents Framework for Multi-Document Summarization using LLMs and Knowledge Graphs (arxiv.org)
6
1
PieArena: Ranking and Profiling Language Agents in Realistic Negotiation Scenarios (arxiv.org)
7
1
Toward a Modular Architecture for Embedded AI Agent Systems at the Edge (arxiv.org)
8
1
Low-Frequency Shortcuts in Texture-Driven Visual Learning (arxiv.org)
9
1
Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models (arxiv.org)
10
1
Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models (arxiv.org)
11
1
Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models (arxiv.org)
12
1
Grasp-Then-Plan with Failure Attribution: A Closed Two-Stage Framework for Precise and Generalizable Robotic Manipulation (arxiv.org)
13
1
Correcting Neural Operator Spectral Bias via Diffusion Posterior Sampling with Sparse Observations (arxiv.org)
14
1
Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection (arxiv.org)
15
1
Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection (arxiv.org)
16
1
SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling (arxiv.org)
17
1
Spatial Transcriptomics-Guided Alignment Enhances Molecular Profiling in Pathology Foundation Model (arxiv.org)
18
1
Causal Preference Elicitation (arxiv.org)
19
1
Echo-POSED: Geometric Self-Distillation for Echocardiography Guidance (arxiv.org)
20
1
A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents (arxiv.org)
21
1
Aligning Data-Driven Predictors with Allocation: A Decision-Focused Approach to Survival Analysis (arxiv.org)
22
1
Visual Graph Scaffolds for Structural Reasoning in Large Language Models (arxiv.org)
23
1
Building Better Activation Oracles (arxiv.org)
24
1
ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services (arxiv.org)
25
1
Cross-Modal Contrastive Learning of ECG and Angiography Representations for Severe Stenosis Classification (arxiv.org)
26
1
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning (arxiv.org)
27
1
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning (arxiv.org)
28
1
Geometry-Aware Tabular Diffusion (arxiv.org)
29
1
Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation (arxiv.org)
30
1
Improvise, Adapt, Overcome: An On-The-Fly Multifidelity Algorithm for Efficient Machine Learning (arxiv.org)
31
1
Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs (arxiv.org)
32
1
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins (arxiv.org)
33
1
PerchRL: Vision-Based Agile Perching on Inclined Platforms under Rapid and Irregular Motion (arxiv.org)
34
1
AdaWeather: Adaptively Mixing Probabilistic Weather Forecasts with Logarithmic Regret (arxiv.org)
35
1
Sample-Size Scaling of the African Languages NLI Evaluation (arxiv.org)
36
1
Assessing Region-Level EEG Contributions to Cognitive Workload Prediction (arxiv.org)
37
1
Graph Mamba Survival Analysis Based on Topology-Aware ordering (arxiv.org)
38
1
AURA: Action-Gated Memory for Robot Policies at Constant VRAM (arxiv.org)
39
1
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces (arxiv.org)
40
1
Making Brain-Computer Interfaces More Secure (arxiv.org)
41
1
How Visible Are Silent Manipulation Failures? An Observability Study of False-Success Detection in Simulated Robot Episodes (arxiv.org)
42
1
PyraMathBench: Evaluating and Improving Mathematical Capability in Large Language Models (arxiv.org)
43
1
NeuroArmor: Safe-Variant-Guided Representation Consistency for Selective Re-Anchoring in Jailbreak Defense (arxiv.org)
44
1
The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size (arxiv.org)
45
1
Reasoning Structure of Large Language Models (arxiv.org)
46
1
Regime-Arrival Uncertainty in Generalization Bounds under Distribution Shift (arxiv.org)
47
1
Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals (arxiv.org)
48
1
Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention (arxiv.org)
49
1
Generalizing Graph Foundation Models via Hyperbolic Retrieval-Augmented Generation (arxiv.org)
50
1
Critical evaluation of PINN for FWD inverse analysis and differentiable FEM as an alternative (arxiv.org)