Why AI Works

5 min read

AI keeps surprising us. It writes, it paints, it drives cars. It does all this using the same basic approach: repeating simple learning loops billions—sometimes trillions—of times.

It sounds too simple to be this powerful. Smarter people than me keep predicting that we’ll hit a wall—that we’re missing something.

But we don’t.

GPT-4o is smarter than GPT-4, which was smarter than GPT-3, which was smarter than GPT-2.

The thing we’re missing isn’t in the AI. It’s in how we think about intelligence.

I believe we’ve mirrored something fundamental in nature. The problem isn’t that AI is “just predicting the next word.” The problem is that our understanding of computation is wrong.

This isn’t magic. It’s iteration. And the way AI learns mirrors something much, much older.


Why does everything work?

Why does one type of learning system work across so many different fields? Writing text, spotting objects in pictures, generating videos, moving robots—it seems like these should need totally different approaches.

And actually, they used to. We had AlphaGo for board games, OpenAI Five for Dota. Each problem required a bespoke solution created by a team of experts.

Take transformers—not the robots, the AI model (Vaswani et al., 2017). They were designed for language. But researchers found that when scaled and fine-tuned, they worked for images, videos, even robotics.

The same core idea—iterative optimization via gradient descent—powers success across all domains.

I don't think this concept has been fully appreciated yet. Why does one concept work across so many varied fields? Why do we think this concept is different to our own intelligence?

AI doesn't need a new playbook for every problem. It just needs bigger, better loops.


Why does scaling work?

For many years after GPT-3, very smart academics were predicting that as AI models grew, they'd get messier, less efficient. More size, more noise, right?

Wrong.

Some skeptics, like Yann LeCun, argued that adding more compute would just create more paths to nowhere—infinite branches, dead ends. In theory, that should mean bigger AI gets worse, not better.

But in the real world it doesn’t work that way.

Instead of sprawling into chaos, AI locks onto the “golden path”—the best way to solve a problem. The bigger the model, the better it filters out noise.

Why? We don’t fully understand yet. But what we do know is that AI follows a structured learning curve. Instead of randomness, we see predictable scaling laws—bigger models get smarter, not dumber.

And that raises an even bigger question:

If AI was just “predicting the next word,” why does scaling make it better at reasoning, problem-solving, and planning? Why does telling the AI to think step-by-step result in better answers?

Maybe it’s not just next-word prediction. Maybe we’ve stumbled onto something deeper.


What is the power of compute?

Ilya Sutskever’s mantra: The models just want to learn.

Small tweaks, applied at scale, lead to massive breakthroughs.

This isn’t brute force. It’s evolution in action.

AI doesn’t memorise. It optimises. It keeps improving, even when trained on synthetic data. If AI were just a clever word predictor, scaling shouldn’t work like this. We should have hit a wall.

Instead, the opposite happens: bigger === better.

Why?

Because this might be how intelligence in nature actually works, and we've distilled it.


How close are we to AGI?

Hold on to your lunchboxes, kids.

Leo and Situational Awareness are right. We’re just a few Orders of Magnitude (OOMs) from AGI.

Some of these OOMs will come from better models—DeepSeek has already shown we can make our current models more efficient. Some will come from raw compute power i.e. $500M AI clusters being built. And some of them will just be sensible user experience improvements—like the AI choosing the right model for you based on your question, instead of letting you pick.

But this isn't my question. I'm not asking when we'll hit AGI. I'm asking why this even works at all.

If intelligence was just loops and weights, why doesn’t it collapse into noise?

The answer, I feel in my bones, is evolution.


Is this simply evolution?

AI’s learning process is evolution.

Think about how humans learn to walk. Babies don’t start with a perfect step. They wobble, fall, adjust. Trial and error. Iteration. Feedback.

One of LeCun's big ideas about why we would hit a wall is that we are going to run out of data, that we are already giving the AIs much more data than a baby gets, and yet the AI still makes simple mistakes children wouldn't do.

I think this is a wrong way of thinking about it.

Because babies aren’t starting from scratch.

They’re fine-tuning on top of a pre-trained base model: evolution.

Think about it:

  • The universe is the training loop
  • Humans are the weights
  • The reward function is survival—adapting, outcompeting, reproducing
  • The loss function is the loss of individuals/species
  • The compute is millions of years of life

Culture, laws, knowledge? That’s fine-tuning on top of this species-level base model.

And AI does the same thing—just faster.

It tests, fails, adjusts. It converges toward better solutions over time.

The fact that AI keeps improving suggests that we might have hit upon the secret to intelligence itself.

And the only reason AGI isn’t here yet? We still don’t have enough compute.


On the future

If intelligence is just loops and iteration, how far can it go?

If gradient-based optimization unlocks emergent behaviour, what happens when we push further?

AI is already making scientific discoveries, generating art, solving problems that took humans decades. And it’s getting exponentially faster.

The skeptics said scaling wouldn’t work. They said more compute would mean more garbage, not more intelligence.

They were wrong.

The best ideas in nature are simple. Evolution, learning, intelligence—they all follow the same core process:

Try, fail, adjust, repeat.


Final thoughts

The big secret is that intelligence is simple.

It learns like we do.

Simple loops, repeated endlessly. Iteration. Scaling. Refinement.

That’s how it writes, paints, drives, and solves.

And it’s only getting smarter.

What next?

I've spent the last few years curating over 1000+ prompts for GPT-4 in everything from engineering to art to sales. Get a copy for free.

Made with ❤️

Alex Hughes © 2025