Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation9просмотров3 месяца назад
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control7просмотров4 месяца назад
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design8просмотров4 месяца назад
GOEDEL-PROVER-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction2просмотра4 месяца назад
DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis1просмотр4 месяца назад