If AI Writes the Code, Does Anyone Need to Understand It?

A pragmatic framework for deciding when human comprehension still matters

A question keeps surfacing in conversations with developers, CTOs, and tech leads: if AI is writing most of the code now, why should humans bother understanding it?

It's not a ridiculous question. The numbers are striking — 41% of all code is now AI-generated or AI-assisted. Microsoft and Google report that roughly a quarter of their code comes from AI. Anthropic's CEO predicted that 90% of code would be AI-written by the end of 2025. Tools like Cursor, Copilot, and Claude Code have moved from novelty to daily workflow for most developers.

The pitch is seductive: describe what you want in plain English, let AI write the implementation, ship it. Why spend years learning syntax when you can just ask for what you need?

At Skelpo, we use AI extensively. It's transformed how quickly we can move from idea to working software. But we also build finance applications. And there's code in those systems that no AI touches without a human understanding exactly what it's doing — and why.

That tension — between velocity and vigilance — is what this article explores. Not a blanket answer, but a framework for thinking about when human code comprehension still matters, and when it might not.

The 70-Year Journey Toward Human-Readable Code

Programming languages exist because humans couldn't work with machine code.

In the 1940s, early computers like ENIAC were programmed using binary — raw sequences of ones and zeros that the processor could execute directly. It worked, technically. But writing binary by hand was tedious, error-prone, and nearly impossible to debug. A single misplaced digit could break everything, and finding the mistake meant scanning walls of numbers.

Assembly language improved things slightly. Instead of raw binary, programmers could use mnemonics like ADD or MOV — still tightly coupled to the hardware, but at least vaguely readable. It was a step toward human comprehension, but only a small one.

The real shift came in the late 1950s. FORTRAN, COBOL, and LISP introduced something radical: code that looked more like language than mathematics. You could write something resembling logic instead of hardware instructions. A programmer reading the code could understand its intent, not just its mechanics.

Every major language innovation since then has followed the same trajectory. C abstracted away memory addresses. Python prioritized readability so heavily it made whitespace syntactically meaningful. The entire history of programming languages is a 70-year project to make code easier for humans to understand.

But here's the crucial point: readability was never just about writing code. It was about maintaining it.

Software outlives the people who write it. The developer who built a system in 2019 may have left the company by 2024. The person debugging a production incident at 2am probably didn't write the code that's breaking. Readable code was an investment in the future — a recognition that someone else would eventually need to understand this, modify it, fix it.

Now we're questioning whether that entire premise still holds.

What the Data Actually Shows

Before we can answer whether humans need to understand code, we should look at what's actually happening — not the headlines, but the numbers.

The claim that AI writes "80-90% of code" gets repeated often, usually by people selling AI tools. The reality is more modest. As of 2025, 41% of all code is AI-generated or AI-assisted. That's significant, but it's not the revolution the marketing suggests.

More telling is what happens after AI generates code. Among developers who use AI tools daily, only about 24% of their merged code is fully AI-authored — meaning it went in without major human rewrites. The rest gets modified, corrected, or thrown out entirely. GitHub Copilot offers suggestions 46% of the time, but developers only accept around 30% of those suggestions. The majority get rejected.

Then there's the quality question. A recent analysis of hundreds of GitHub pull requests found that AI-generated code contains 1.7 times more issues than human-written code. The breakdown is sobering:

75% more logic and correctness errors
64% more code quality and maintainability problems
57% more security vulnerabilities
42% more performance issues

Another study found that 62% of AI-generated code contains security flaws or design problems — even when using the latest models. Common issues include missing input validation, improper password handling, and insecure authentication patterns. AI doesn't understand your threat model. It pattern-matches from training data, and that training data includes plenty of insecure code.

Perhaps most striking: only 3.8% of developers report both low hallucination rates and high confidence shipping AI code without human review. That's not a trust problem with a few skeptics. That's an industry-wide recognition that AI output requires oversight.

The gap between perception and reality matters here. If you believe AI writes 90% of production-ready code, you'll build processes assuming minimal human review. If you understand that AI produces first drafts that frequently need correction — especially for security and logic — you'll build very different processes.

The Vibe Coding Trap

In early 2025, Andrej Karpathy — OpenAI co-founder and former Tesla AI director — coined the term "vibe coding." The idea: describe what you want in natural language, let AI write the code, and iterate based on whether it works. Don't read the implementation. Don't try to understand it. Just accept the output and move on.

It sounds efficient. For certain use cases, it is. But a pattern has emerged among developers who lean too heavily on this approach.

One developer described it this way: "For a while, I barely bothered to check what Claude was doing because the code it generated tended to work on the first try. But now I realize I've stopped understanding my own codebase."

This is the vibe coding trap. It works fine — until it doesn't. And when it fails, it fails in ways that are uniquely difficult to recover from.

The failure modes are predictable. When AI-generated code breaks in production, developers who never understood it can't debug it. They're stuck prompting the AI for fixes, hoping it can solve a problem without any memory of why it made the original choices. When security researchers find vulnerabilities — and they do, regularly — no one on the team can assess the severity or implement a proper fix because no one knows how the system actually works.

The skill atrophy is gradual and invisible. Developers report that after months of heavy AI reliance, they struggle with tasks that used to come naturally. The fundamentals get rusty. Pattern recognition fades. When they finally encounter a problem AI can't solve, they discover their own capabilities have degraded.

For junior developers, the trap is especially dangerous. They're building careers on a foundation of code they've never truly understood. They can generate output, but they can't evaluate it. They can't distinguish secure code from code that looks secure. They're accumulating years of experience without developing the underlying judgment that makes experience valuable.

The most uncomfortable cases are public. A startup founder who built his entire product with AI — "zero handwritten code" — posted proudly about it on social media. Within days, attackers had bypassed his payment system, exploited his API, and filled his database with garbage. He'd shipped code he couldn't secure because he'd never understood it.

Vibe coding isn't inherently wrong. For throwaway scripts, personal tools, and quick prototypes, it's genuinely useful. The trap is mistaking it for a general-purpose approach — believing that because it works for low-stakes projects, it'll work for everything.

A Pragmatic Framework — When Understanding Matters

The honest answer to "should humans understand the code?" is: it depends. But "it depends" isn't useful without a framework for deciding.

At Skelpo, we've developed a simple mental model. The higher the stakes, the deeper the human understanding required. That sounds obvious, but the practice requires being specific about what "stakes" actually means.

Lower scrutiny: AI-assisted, human-reviewed

Some code genuinely doesn't need deep human comprehension. Not everything is critical infrastructure.

Internal tools and admin scripts fall into this category. If a script that reformats log files has a bug, someone will notice and fix it. The blast radius is small. Prototypes and proofs of concept are similar — they exist to test ideas, not to run in production. The goal is speed, not durability.

Boilerplate and scaffolding rarely need line-by-line understanding. Setting up a standard project structure, generating config files, writing repetitive CRUD operations — AI handles these well, and the patterns are familiar enough that a quick review catches obvious problems.

Well-isolated components with clear boundaries are also good candidates. If a function takes defined inputs, produces defined outputs, and has solid test coverage, understanding its internals matters less. The tests verify behaviour. The isolation limits damage if something's wrong.

The common thread: these are situations where failure is visible, recoverable, and limited in scope.

Higher scrutiny: human-understood, AI-assisted at most

Other code demands genuine comprehension. AI can help draft it, but a human needs to understand what shipped and why.

Authentication and authorisation sit at the top of this list. Getting login flows, permission checks, or session handling wrong doesn't just cause bugs — it creates security vulnerabilities that attackers actively hunt for. AI-generated auth code is 88% more likely to contain improper password handling than human-written code. That's not a typo.

Payment processing and financial calculations require the same rigour. A subtle rounding error or currency conversion bug might not crash anything. It'll just quietly lose money — yours or your customers'. These are systems where "it seems to work" is nowhere near good enough. You need to know why it works, and what assumptions it's making.

Data handling in regulated contexts — healthcare, finance, legal — carries compliance implications beyond just functionality. Auditors don't accept "the AI wrote it and the tests pass" as documentation. Someone needs to understand and be able to explain the data flows.

Security-critical pathways need human eyes regardless of what AI drafted. Input validation, SQL queries, file handling, anything that touches user-supplied data — these are the places where AI most reliably reproduces insecure patterns from its training data.

Core business logic that will live for years deserves understanding because someone will need to modify it later. The person maintaining this code in 2028 might not have access to the AI conversation that generated it. They'll have only the code itself. If no one ever understood it, it becomes legacy code on day one.

The question to ask

When deciding how much human understanding a piece of code requires, we ask ourselves:

"If this breaks at 2am and the AI that wrote it has no memory of the context, can someone on our team diagnose and fix it?"

If the answer is no — if we'd be stuck prompting an AI to fix its own mysterious output — then a human needs to understand that code before it ships.

What Skills Matter Now

If AI handles syntax, what's left for humans?

This question keeps junior developers up at night. It's worth answering clearly: the skills that matter are shifting, not disappearing. And in some ways, they're becoming more demanding, not less.

Architectural judgment is the big one. Understanding how systems fit together — where to draw boundaries, how components communicate, what happens when one part fails — requires the kind of holistic thinking AI currently lacks. AI can generate a function. It struggles to design a system. The developers who can see the whole board, not just the next move, become more valuable as AI handles the individual pieces.

Security intuition matters more when you're reviewing code you didn't write. You need to recognise red flags: user input flowing into SQL queries, authentication checks that can be bypassed, data being logged that shouldn't be. AI reproduces insecure patterns constantly because its training data is full of them. The human reviewer needs to know what "wrong" looks like even when the code runs without errors.

Debugging capability becomes critical in an AI-assisted world. When something breaks, you're often tracing through code you didn't author, looking for logic you didn't design. This requires reading code carefully, understanding execution flow, forming hypotheses about what might be wrong. Developers who've never practiced this — who've only ever prompted their way to solutions — find themselves helpless when the prompts stop working.

Specification precision is the new bottleneck. The quality of AI output depends heavily on how clearly you describe what you need. Vague prompts produce vague code. Developers who can translate business requirements into precise technical specifications — including edge cases, error handling, and security constraints — get dramatically better results from AI tools than those who can't.

Evaluation skill ties it all together. Can you look at a piece of code and judge whether it's good? Not just whether it runs, but whether it's maintainable, secure, performant, and correct at the edges? This skill requires having written enough code to recognise patterns, encountered enough bugs to know where they hide, and reviewed enough pull requests to develop taste.

Here's the uncomfortable paradox: AI coding tools are most powerful in the hands of people who need them least. Experienced developers use AI to accelerate work they already know how to do. They catch mistakes because they recognise them. They guide the AI toward better solutions because they know what better looks like.

Novice developers get something different. They get plausible-looking output they can't evaluate, confident-sounding suggestions they can't verify, and a growing codebase they don't understand. The tool amplifies capability for those who have it and masks the absence of capability for those who don't.

The path forward isn't avoiding AI tools. It's building the judgment to use them well.

Our Approach at Skelpo

We're not AI skeptics. Anyone who's worked with us recently knows that AI tools are part of our daily workflow. They've genuinely changed how fast we can move from concept to working software, and we'd be foolish to ignore that.

But we're not naive either.

We build finance applications. Systems where a subtle bug in a calculation could mean real money — our clients' money. Where a security flaw could expose sensitive financial data. Where regulatory compliance isn't optional. In these contexts, "it seems to work" isn't an acceptable standard.

So we've landed on a simple principle: the higher the stakes, the deeper the human understanding required.

For internal tooling, prototypes, and isolated utilities, we let AI do the heavy lifting. A developer reviews the output, checks that it behaves correctly, and moves on. We're not precious about understanding every line of a script that reformats data for a one-time migration.

For anything touching money, authentication, or sensitive data, the calculus changes completely. AI might draft the code. AI might suggest approaches. But before it ships, a human being understands what that code does — not just that it passes tests, but why it works, what assumptions it makes, and how it could fail.

This isn't about distrusting AI. It's about understanding its current limitations and building processes around them. AI doesn't know our threat model. It doesn't understand our compliance requirements. It can't weigh the business consequences of a subtle bug in payment logic. Those judgments require humans who understand the code and the context it operates in.

We've also been deliberate about skill development. It would be easy to let AI handle everything and watch our team's fundamentals erode. Instead, we treat AI as a tool that handles the tedious parts while humans focus on the hard parts — architecture decisions, security review, debugging complex issues, understanding the systems we're responsible for.

The goal isn't to use AI less. It's to use it well, which means knowing when to trust it and when to verify.

The Judgment Era

The question we started with — if AI writes the code, does anyone need to understand it? — turns out to be the wrong question.

The right question is: which humans need to understand which code, and how do we make that judgment well?

The era of every developer understanding every line is probably ending. Codebases are too large, development is too fast, and AI is too useful at handling the routine work. That's not a loss worth mourning.

But the era of nobody understanding the code is not something to aspire to. It's a recipe for systems that can't be debugged, security flaws that can't be assessed, and technical debt that accumulates silently until it becomes unmanageable.

Human-readable code isn't becoming obsolete. What's changing is who needs to read it, when, and why. The 70-year project of making code comprehensible to humans still matters — just not uniformly, and not for the same reasons it used to.

The skill that matters now isn't syntax memorisation. It's judgment. Knowing when to trust AI output and when to verify it. Knowing which code demands deep understanding and which doesn't. Knowing how to evaluate what you didn't write.

That judgment is what separates teams that use AI effectively from teams that are one production incident away from discovering they've built systems nobody understands.

The code will keep getting easier to generate. The judgment to use it well won't.