"Google said in a statement: “Quality raters are employed by our suppliers and a...

creddit · 2025-09-13T12:54:30 1757768070

Because they are doing it to compute quality metrics not to implement RLHF. It’s not training data.

visarga · 2025-09-13T16:51:31 1757782291

Every decision they take based on evals influences the model.

creddit · 2025-09-13T19:44:00 1757792640

/"directly"/

yobbo · 2025-09-13T14:30:21 1757773821

> For this to be true they would have to throw away labeled training data.

That's how validation works.

jfengel · 2025-09-13T16:03:44 1757779424

Is there a reason not to use validation data in your next round of training data? Or is it more efficient to reuse validation and instead get more training data?

parineum · 2025-09-13T16:53:47 1757782427

You'd have to recreate your validation if you trained your model on it every iteration and then they wouldn't be consistent enough to show any trends

jfengel · 2025-09-14T15:13:08 1757862788

I'd have thought that if you kept the same validation you'd risk over fitting.

Clearly that does make it hard to measure. I'd think you'd want "equivalent" validation (like changing the SATs every year), though I imagine that's not really a meaningful concept.

teiferer · 2025-09-13T13:01:36 1757768496

Key word: "directly"

It does so indirectly, so it's a true albeit misleading statement.

skybrian · 2025-09-13T17:40:30 1757785230

It's not part of the inner feedback loop. It's part of the outer feedback loop that they use to decide if the inner loop is working.

Gracana · 2025-09-13T12:48:36 1757767716

They probably don’t do it at a scale large enough to do RLHF with it, but it’s still useful feedback the people working on the projects / products.

zozbot234 · 2025-09-13T12:55:44 1757768144

More recent models actually use "reinforcement learning from AI feedback", where the task of assigning a reward is essentially fed back into the model itself. Human feedback is then only used to ground the training, on selected examples (potentially even entirely artificial ones) where the AI is most highly uncertain about what feedback should be given.