TopPaper
Today's Paper

Training AI Co-Scientists Using Rubric Rewards

Goel, Shashwat, et al.

AI "co-scientists" (LLM-based assistants) can help researchers by generating detailed research plans from given goals and constraints. However, current models often produce plans that violate implicit requirements due to the open-ended nature of scientific planning and the lack of fast, cheap feedback (unlike code execution). This paper proposes a scalable, unsupervised training method using reinforcement learning (RL) with automatically extracted "rubric rewards" from existing scientific papers, enabling models to self-improve plan quality without human labeling or experiment execution.

View Full Abstract View Full Paper DOI: 10.48550/arXiv.2512.23707 Share