Really interesting interview with Rohin on AI safety. I'm more pessimistic on the risks and don't agree with all of it, but I like how he frames his thinking. Where I disagree most: keeping alignment research out of models' pretraining data, on the logic that we shouldn't hand AI systems the full playbook for how we plan to contain them. History suggests this fails. We ran the same debate in cryptography. Security by obscurity doesn't work. You need open schemes, open standards, open research. Basically Kerckhoffs's principle: hide the keys, not the algos. The same should apply to alignment. Train models on our best safety research, don't shield them from it. Honest question for people working on safety: is there a real shift toward excluding this data from pretraining? Is that becoming consensus? Genuinely curious.

Share






Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.