Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

they are not meaningless, but when you work a lot with LLMs and know them VERY well, then a few varied, complex prompts tell you all you need to know about things like EQ, sycophancy, and creative writing.

I like to compare them using chathub using the same prompts

Gemini still calls me "the architect" in half of the prompts. It's very cringe.



    Gemini still calls me "the architect" in half of the prompts. It's very cringe.
Can't say I've ever seen this in my own chats. Maybe it's something about your writing style?


it absolutely does. and human employees don't call me "the architect." that's the point.


I wonder if under the covers it uses your word choices to infer your Myers-Briggs personality type and you are INTJ so it calls you "The Architect"?? Crazy thought but conceivable...


Possible :) GPT 5.1-thinking, Sonnet 4.5-thinking, and Grok 4.1 don't make up names like that.


It’s very different to get a “vibe check” for a model than to get an actual robust idea of how it works and what it can or can’t do.

This exact thing is why people strongly claimed that GPT-5 Thinking was strictly worse than o3 on release, only for people to change their minds later when they’ve had more time to use it and learn its strengths and weaknesses. It takes time for people to really get to grips with a new model, not just a few prompt comparisons where luck and prompt selection will play a big role.


I get that one can perhaps have an intuition about these things, but doesn't this seem like a somewhat flawed attitude to have all things considered? That is, saying something to the effect of "well I know its not too sycophantic, no measurement needed, I have some special prompts of my own and it passed with flying colors!" just sounds a little suspect on first pass, even if its not like totally unbelievable I guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: