November 22, 2025
1 bookmark curated on this day.
Vending-Bench 2 | Andon Labsandonlabs.com
We're releasing Vending-Bench 2, a benchmark for measuring AI model performance on running a business over long time horizons. Models are tasked with running a simulated vending machine business over a year and scored on their bank account balance at the end.
coolAI
Dev Tools