Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I coded it bun and openrouter(dot)ai. I have an array of benchmarks, each benchmark has a grader (for example, checking if it equals a certain string or grade the answer automatically using another LLM). Then I save all results to a file and render the percentage correct to a graph


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: