The R2L Lab at the University of Waterloo's Cheriton Department of Computer Science builds intelligent generalist agents that read, reason, and act across digital and physical environments. Our work is organized around four long-running research threads: building and evaluating generalist agents, generalization, adaptation, and reasoning, language as supervision, and information access for autonomous reasoners. We are recruiting students and postdocs across all four.
Please read this before applying.
We typically do not hire MS students, barring exceptional circumstances.
We do not take remote undergraduate collaborators.
Our lab is great for students curious about graduate school and open-ended research. It is not the right fit for most students looking to transition into engineering jobs in industry.
An LLM summary of our recent papers is insufficient preparation.
As a PhD student at R2L, you will own a research project within one of our threads from framing through publication, publish at top AI/ML/NLP venues (NeurIPS, ICML, ICLR, ACL, EMNLP), and contribute to the intellectual life of the lab. We admit 1-2 PhD students every year.
Required:
Strong background in computer science, mathematics, or a related field.
Demonstrated experience in machine learning and deep learning.
Proficiency in Python and ML frameworks (PyTorch, JAX).
Prior research experience, including publications at relevant venues.
Experience training and systematically running inference with large language models.
Foundations in theoretical machine learning, NLP on real-world text, and reinforcement learning in realistic environments.
Agents
We are hiring a postdoc to lead research within our agents threads (computer-use, generalization and adaptation, language as supervision, or information access). You will define and lead ambitious projects, co-mentor PhD students, and help shape the lab's research direction. Apply on AcademicJobsOnline.
AI for Chemistry
We are also hiring a postdoc focused on agents for natural-science discovery, in collaboration with industrial and academic partners. This position is suited to candidates with a foothold in either ML/NLP or computational chemistry who want to build at the interface. Apply on AcademicJobsOnline.
Required for either postdoc:
PhD in Computer Science or a related field.
Strong publication record at top AI/ML/NLP venues (or, for the chemistry posting, equivalent record in computational chemistry).
Demonstrated ability to conduct research independently.
Each semester we have 1–2 slots for undergraduate researchers. We prioritize students through the full-time URF program, followed by the part-time URA program. Students who are not at the University of Waterloo can apply through URF and through Vector Internship. At R2L, undergrads are expected to lead their own investigation — identify a compelling research problem, formulate a hypothesis, design experiments, and drive the project toward a meaningful outcome such as a publication at a top-tier conference. This is not a task-execution role.
Required:
Outstanding academic record in Computer Science.
Strong programming skills in Python and experience with a deep learning framework (PyTorch).
Genuine curiosity about AI and a proactive, self-motivated mindset.
Prior research experience is a plus, but a strong demonstration of initiative matters more.
A few representative R2L-led projects, to give a sense of the work:
GTTA: Test-Time Adaptation for LLM Agents via Environment Interaction (ICLR 2026). Agents fail at deployment because of two distinct mismatches: a syntactic gap (unfamiliar observation formats) and a semantic gap (unknown state-transition dynamics). GTTA addresses both with lightweight online adaptation and a persona-driven exploration phase that probes the environment before task execution. On WebArena multi-site, this raises agent success from 2% to 23%.
AgentIR: Reasoning-Aware Retrieval for Deep Research Agents (Preprint). Deep research agents emit explicit natural-language reasoning before each search call — a signal that existing retrievers ignore. AgentIR jointly embeds the reasoning trace with the query and trains a 4B retriever that, on BrowseComp-Plus, reaches 68% accuracy with an open-weight agent versus 50% from conventional embeddings twice its size.
SynQuE: Synthetic Dataset Quality Estimation Without Annotations (TMLR 2026). Synthetic data is abundant but quality varies. SynQuE ranks synthetic datasets by their expected real-world task performance using only limited unannotated real data. On text-to-SQL parsing, training on the top-3 synthetic datasets selected by SynQuE proxies raises accuracy from 30.4% to 38.4% on average.
ASH: Agents that Self-Hone via Embodied Learning (Preprint). Long-horizon embodied learning typically requires hand-engineered rewards or action-labeled demonstrations. ASH learns from unlabeled, noisy internet video via a self-improvement loop: when stuck, it trains its own Inverse Dynamics Model and uses it to extract supervision from relevant video. On Pokemon Emerald and Zelda: Minish Cap, ASH sustains progression across 8-hour evaluations where strong baselines (e.g. VPT) plateau.
In addition to our core threads, we are particularly looking for graduate students and postdocs in the following areas. These are some of our most differentiated research directions.
Efficient relational ML over databases. Bringing learned components into the database stack and bringing database thinking into ML retrieval and indexing. Active projects include LakeQuest for natural-language querying over heterogeneous data lakes, work on hybrid vector search, and adaptive indexing for vector databases.
AI for natural-science discovery (chemistry, materials). Building agents that read scientific literature, plan experiments, and reason over multi-modal scientific data. Active projects include unified multimodal models for chemistry. Suited to candidates with backgrounds in ML/NLP, scientific computing, or computational chemistry.
Systems and algorithms for scaling ML deployments. Infrastructure-level research to make large-scale agent training and inference tractable, including data and compute scheduling for long-horizon agentic learning. Suited to candidates with strong systems backgrounds who want to apply that lens to modern ML workloads.
If you do not have traditional ML/NLP training but want to work on ML/NLP, we may still be a good fit.
Questions after reading this page? Reach out to Victor Zhong.