TLDR: We are developing the first open-source LLM RL environment framework for real work.
We believe the current frontier of developing LLM abilities is scaling reinforcement learning environments. Research has shown that LLMs RLed on agentic tasks can generalize.
However, there does not exist an open-source framework that simplifies trajectory generation for long-horizon tasks. Existing RL frameworks such as Atropos and ART are focused on providing thin wrappers around your environment for the RL training, but you must design every element of the environment yourself.
We’re developing a framework to facilitate this. Our framework has tools such as an agent harness running in a Docker environment, so you can quickly generate trajectories for your RL training runs. This makes designing environments easier, and we hope to build a modular ecosystem of RL environment tooling.
Our focus is on generating the trajectories. We will add modules for RL training later, but most researchers want fine-grained control over this, so we have not implemented it yet.
Real Work can be used with any OpenAI-compatible endpoint and supports parallel rollouts. This means you can scale trajectory generation on a GPU-less machine.
We are especially optimistic about this approach given the advent of off-policy LLM RL algorithms such as QRPO and SPO.
And if you would like custom-built RL environments, please get in touch with us.
Ritser Team