University of Waterloo

Welcome to the 📖 Reading to Learn (R2L) Lab 🤖. We are in the Cheriton Department of Computer Science at the University of Waterloo. The R2L Lab builds intelligent generalist agents that read, reason, and act across digital and physical environments. Our work is organized around a small number of long-running research threads, each grounded in concrete artifacts: papers, benchmarks, models, and frameworks.

Building and evaluating generalist agents. Capable agents and credible evaluations advance together: a benchmark that fails to capture real workflows produces methods that fail to transfer. We build both — agents for complex workflows and the platforms that hold them honest. Recent work includes Spider 2 for enterprise text-to-SQL, OSWorld and Computer Agent Arena for computer-use evaluation, OpenCUA for open computer-use foundations, and SynQuE for ranking training data without annotations. Our platforms serve as primary evaluations for OpenAI, Anthropic, Google, and Salesforce.

Generalization, adaptation, and reasoning. Frontier agents fail in characteristic ways when they meet environments they were not trained on. We study how agents reason, generalize, and adapt — at training time, at deployment time, and over long horizons — so that capability degrades gracefully. Recent work includes Test-Time Adaptation via Environment Interaction, From Atomic to Composite on how RL and SFT enable complementary reasoning, and ASH on self-improvement from unlabeled long-horizon experience.

Language as supervision. Hand-engineered rewards and dense demonstrations do not scale. We study how natural language — instructions, critiques, explanations, and an agent's own narrations — can replace or augment traditional supervision. Earlier work on Language Feedback Models and Text2Reward framed the broader question; more recent work studies what feedback large models can reliably provide in grounded environments.

Information access for autonomous reasoners. Retrieval, memory, and evidence systems were designed for human users; agents are a different kind of consumer. We study how information access should be reconceived when the user is an autonomous reasoner — what counts as the right unit of evidence, what signals drive retrieval, and what evaluation looks like when downstream task success is the metric that matters. Recent work includes AgentIR on reasoning-aware retrieval for deep research agents, alongside ongoing work on natural-language querying over heterogeneous data lakes.


We are always looking for strong students. For openings, see here. Companies interested in research collaborations can email Victor directly.