Why Care About AI Safety?

May 19, 2025 AI Safety

Sooner or later, we will create artificial general intelligence (AGI), because we know the brain holds intelligence, and we know the brain isn’t magic.

This statement, or at least, the belief in it, has driven lots of progress in the field of AI. I’ve written before about how McCarthy and others wrote up a proposal to ‘solve’ AI in 2 months. Although that was 70 years ago, it looks like we now might actually be quite close. In my opinion, that’s concerning, and I’ll explain why.

We’re creating something intelligent, or for the skeptical, something that can achieve goals increasingly well, and we have no clue how to handle this properly.

Current AI is fundamentally built on scaling deep learning, which leads to systems that show behavior that no one, not a single person on this planet, including the researchers who built it, truly understands. Some examples include alignment faking (i.e., AIs faking that they are aligned with human values or instructions) and emergent misalignment (i.e., AIs being finetuned to output insecure code, and then taking that one step further and stating that all humans should be enslaved by AI), to name just two.

I have many uncertainties about how AI will progress. It’s uncertain whether AI will keep progressing in the same rate, whether that will be enough to become ‘truly intelligent’, whether it will really be so hard to align them as they get more intelligent, and much, much more. However, all of these uncertainties should not stop us from having safety as the number 1 priority when it comes to designing AI.

The following good analogy is often used when discussing the risks of AI, and how to deal with the uncertainties laid out above.

Absolute safety is often unrealistic, and so, the International Air Transport Association is proud to have only one accident for every 1.26 million flights. This makes flights the safest mode of transport, and I’m decently sure people would not quickly accept numbers riskier than that. To put it into percentages, this is roughly a 0.0001% probability of crashing once you step into an airplane. Those are quite good odds.

With AI progressing like it is currently, the odds are looking less good, and we need to fix that. It’s already spreading fake news and could be used to design biochemical weapons or deploy autonomous weapons. While the current safety issues concerning AI are already enough for us to put a halt to this rapid pace of development, we haven’t even taken into consideration misaligned AGI. The combination of these, together with the aforementioned nonexistent knowledge about how to properly align AI, our odds are looking worse, far worse, than 0.0001%.

Perhaps also important to mention, is that we only have 1 shot at designing AGI. It is truly unlike any technology humanity has produced so far. Once an entity is more intelligent than you, and it is not aligned to your interests, there is nothing you can do about it. I’m decently sure all factory-farmed animals would, if they could, agree with me.

If we don’t actively try to change the way AI is currently progressing, it won’t happen. Let’s try.