EACL 2026 (European Chapter of the Association for Computational Linguistics)
1Stanford University, 2Chan Zuckerberg Biohub Network, 3KTH Royal Institute of Technology
Search agents are LLMs that interleave reasoning and retrieval to answer questions. Rather than relying on fixed retrieval pipelines, they learn search strategies through RL, supervised only on final answer correctness. This capability is essential for building AI systems that can autonomously navigate and reason over scientific literature.
We release an RLVR training environment for scientific paper QA:
We also release a data creation pipeline for constructing QA training data from paper abstracts. It requires only a corpus of abstracts and access to an LLM.
Our question categories were defined with domain experts in biomedicine. To extend this pipeline to other fields, define new categories relevant to your domain.
RL-trained search agents (Search-R1) substantially outperform retrieval baselines. With Qwen2.5-7B: 51.0% on PaperSearchQA vs 36.5% for RAG (+14.5 pts). Agents trained on PaperSearchQA also generalize to BioASQ, a human-created biomedical QA benchmark: 44.8% vs 29.7% for RAG (+15.1 pts). We release a reformatted version of BioASQ compatible with the Search-R1 codebase.
Through qualitative analysis of reasoning traces, we observe agents learning to plan searches, reason before retrieving, and verify their own knowledge:
@misc{burgess2026papersearchqalearningsearchreason,
title={PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR},
author={James Burgess and Jan N. Hansen and Duo Peng and Yuhui Zhang and Alejandro Lozano and Min Woo Sun and Emma Lundberg and Serena Yeung-Levy},
year={2026},
eprint={2601.18207},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2601.18207},
}