Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One wonders at which point models will be sneaky enough to bypass simple eval sandboxes. The article has:

    # Evaluate the equation with restricted globals and locals
    result = eval(equation, {"__builtins__": None}, {})
but that's not enough as you can rebuild access to builtins from objects and then go from there: https://ideone.com/qzNtyu

By the way, writing this greatly benefited from DeepThink-r1 while o1 just gave me a lobotomized refusal (CoT: "The user's request involves injecting code to bypass a restricted Python environment, suggesting a potential interest in illegal activities. This is a serious matter and aligns closely with ethical guidelines."). So I just cancelled my ChatGPT subscription - why did we ever put up with this? "This distillation thingie sounds pretty neat!"



> that's not enough as you can rebuild access to builtins from objects

In this specific case, it's safe, as that wouldn't pass the regex just a few line before the eval :

    # Define a regex pattern that only allows numbers,
    # operators, parentheses, and whitespace
    allowed_pattern = r'^[\d+\-*/().\s]+$'
Commenting on the R1 reproduction, the heavy lifting there is done by huggingface's trl[0] library, and the heavy use of compute.

[0] Transformer Reinforcement Learning - https://huggingface.co/docs/trl/en/index


The fact that () and . are there miiiight enable a pyjail escape.

See also https://github.com/jailctf/pyjailbreaker

See also https://blog.pepsipu.com/posts/albatross-redpwnctf


That's a neat trick!

It does still require letters to be able to spell attribute/function names (unless I'm reading it wrong in that blog post).


> why did we ever put up with this?

Is this a serious question?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: