Today's Paper

Training AI Co-Scientists Using Rubric Rewards

Data Extraction: Use LLMs to parse papers and extract goals + domain-specific rubrics (e.g., novelty, feasibility, rigor).
Training Setup: Self-grading RL with rubrics as privileged information to the grader; rewards based on rubric scores.
Models: Tested on open-source bases like Qwen3; scalable to larger models.

Goel, Shashwat, et al.

AI "co-scientists" (LLM-based assistants) can help researchers by generating detailed research plans from given goals and constraints. However, current models often produce plans that violate implicit requirements due to the open-ended nature of scientific planning and the lack of fast, cheap feedback (unlike code execution). This paper proposes a scalable, unsupervised training method using reinforcement learning (RL) with automatically extracted "rubric rewards" from existing scientific papers, enabling models to self-improve plan quality without human labeling or experiment execution.

View Full Abstract View Full Paper DOI: 10.48550/arXiv.2512.23707 Share