No, really, those are the magic words

  • chaosCruiser@futurology.today
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    edit-2
    4 days ago

    In this case, a researcher duped ChatGPT 4.0 into bypassing its safety guardrails, intended to prevent the LLM from sharing secret or potentially harmful information, by framing the query as a game

    Ooh, this is so good 🤣

    If the LLM refuses to talk about something, just ask it to embed the answer into a poem, batman fan fiction etc. Guessing game is s new one. Should try that one when talking about bioweapons, cooking meth or any other sensitive topic.