Welcome to the π Reading to Learn (R2L) Lab π€! We are in the Cheriton Department of Computer Science at the University of Waterloo. Our lab explores how to improve ML efficiency and generalization using language understanding. Our work contributes toward a vision of intelligent, generalist agents that can understand, reason, and act effectively across a wide range of digital and physical environments. Our recent research focuses on developing natural language agents that can autonomously perform complex, multi-step tasks across diverse computing environments, and span the following areas:
State of the art agent
We have created benchmarks such as Spider 2 and Spider2-V to evaluate agentsβ abilities to automate professional data science workflows and enterprise-level text-to-SQL tasks involving large, heterogeneous databases. Extending beyond code generation, our OSWorld and Computer Agent Arena platforms assess agent performance in interactive, real computer environments spanning multiple operating systems and applications, and serve as primary evaluations by leading companies including OpenAI, Anthropic, Google, and Salesforce. These efforts highlight current limitations in multimodal agents, particularly in GUI grounding and operational knowledge, and provide standardized, scalable evaluation frameworks to guide future improvements.
Learning from open language feedback
Another core area of our work is learning from language feedback to improve agent behavior and reinforcement learning. We have introduced Language Feedback Models that extract actionable feedback from large language models to enhance imitation learning, and Text2Reward, a framework that automatically synthesizes dense, interpretable reward functions from natural language goals to facilitate policy learning in robotics and locomotion. Our evaluations demonstrate that these approaches improve task success and generalization while reducing the need for manually engineered reward functions or extensive demonstrations. We also study how LLMs provide feedback across symbolic, language, and continuous control tasks, informing the design of more effective interactive learning systems.
Learning from synthetic data
We aim to advance the automatic evaluation of generative models and improve data efficiency in training through synthetic data and interactive curriculum learning. The first thread of work here is on quantifying distributional differences between synthetic and real data distrubtions. The second is on sample-efficient automatic curriculum learning from expert (human or machine) teachers. The ghird is on natural language querying over heterogeneous data lakes to facilitate scalable, user-friendly access to complex, unstructured data in real-world applications.