Something has shifted in software development.
There is production code running today, in ordinary companies, that nobody can fully account for from end to end. The engineer who merged it cannot explain all of it. The team responsible for the service cannot either. In some cases leadership has even less visibility. The system stays up, tests pass, features go live, and the numbers look fine. That can hide the real problem for a while.
Ask what a given section is doing, why a certain behaviour emerges, or what breaks if one part is removed, and the answers get thin very quickly.
That is the territory people are starting to describe as dark code. The term fits. The software works, but human understanding is incomplete because much of it was produced with AI assistance and moved into production before anyone built a solid mental model of it.
That is not just an engineering annoyance. It changes what ownership means.
What dark code is
Dark code is not the same as bad code.
It is not simply old code, ugly code, or code with too much debt hanging off it. Those categories are familiar. Teams know what they are dealing with, even when the cleanup is painful. Dark code is different because the gap is not mainly about quality. It is about comprehension.
The defining feature is simple: the code was never properly understood by a human at any stage that mattered.
That usually happens in a pattern that now feels normal. Someone writes a prompt. A model generates the implementation. The tests pass, or at least enough of them pass. The feature is shipped. The missing piece is the older discipline where somebody had to understand the logic well enough to explain it under pressure, change it safely, and predict how it would behave when conditions changed.
The software may be reliable enough in day-to-day use. That does not mean it is well understood.
Why it is spreading
The growth of dark code is not mysterious. It comes from how software is now being produced.
One reason is built into AI-assisted development itself. When a person writes a system piece by piece, they are forced through the logic. They meet the awkward edges, the strange dependencies, the little failure cases that reveal how the thing really works. That process is slow, but it builds understanding. Generated code cuts through a lot of that. You get the result faster. You often do not get the same depth of familiarity. Tools like Cursor, Windsurf and Claude Code all accelerate the generation step — the step that used to do much of the teaching.
The other reason is pressure. Teams are expected to ship faster, iterate faster, and operate with less slack. AI makes that possible, which is why adoption has moved so quickly. But speed changes behaviour. When a feature can be produced in minutes, the time spent really understanding it starts to look optional. In many firms, it becomes optional in practice even if nobody says so directly.
So the codebase grows. The output is real. The comprehension layer gets thinner.
Why the obvious fixes do not solve it
A lot of proposed answers miss the point because they treat dark code as a tooling gap.
Observability helps. Teams need logs, traces, metrics, alerts. None of that is in dispute. But being able to see what a system is doing in production is not the same as understanding why it was built that way, what assumptions it relies on, or what hidden dependencies sit underneath apparently normal behaviour. You can instrument a black box very well and still have a black box.
More automation does not solve it either. Better pipelines, better agents, more structured orchestration, stricter evaluation layers — all of that can reduce certain classes of error. It can also make the system harder to reason about because there are now more layers between the human and the output. When something goes wrong, you are not only looking at the code. You are looking at prompts, tool calls, generated patches, evaluation logic, and whatever workflow wrapped around the whole process. The growing use of MCP servers and agent plumbing adds capability, but it also adds indirection.
Testing helps, but only up to a point. Passing tests tells you the code met the conditions you checked. It does not tell you whether anyone truly understands what has been shipped. Some teams are disciplined enough to push far with strong evals and tight process. Most are not. In many places the code comes from several directions at once and nobody has an accurate picture of the whole.
That is where the risk sits.
The real problem inside the organisation
Dark code is not only a software issue. It is a problem of institutional competence.
A company can keep operating while its internal understanding degrades. That is part of what makes this dangerous. Delivery continues. The roadmap moves. Customers may not notice anything unusual. But inside the organisation, fewer people can explain why the system behaves as it does, what its failure boundaries are, or how much of its present state is intentional versus accidental.
That matters when something breaks. It also matters when nothing is visibly broken yet.
| Function | Before (human-written code) | With unchecked dark code |
|---|---|---|
| Risk review | Teams can describe how the system behaves under stress | Reviewers sign off on systems they cannot explain |
| Compliance | Controls map to understood logic | Paperwork exists; the underlying evidence is thin |
| Incident response | On-call can reconstruct intent quickly | Engineers reverse-engineer their own code during an outage |
| Vendor / supply trust | Integrations are deliberate and legible | Dependencies accumulate without a clear owner |
Risk review gets weaker when teams cannot explain their own systems clearly. Compliance turns into paperwork instead of evidence. Incident response slows down because people are reconstructing logic they never really had. Vendor trust becomes harder to assess. The organisation starts relying on software it cannot properly interrogate.
That is not a stable position.
What does help
The answer is not to stop using AI for software development. That is unrealistic, and it misses the point. The gains in speed and output are real. Most teams are not going backwards.
The question is how to keep understanding from collapsing while development speeds up.
Force clarity before code generation. Before a team asks a model to produce anything important, it should be able to state what the system is meant to do, what constraints matter, what success looks like, and what cannot be allowed to happen. That does not require bloated documentation. It does require actual precision. If intent is vague, the code may still work, but the team has very little basis for judging whether it is correct beyond surface behaviour.
Make the system easier to read after it exists. Code should carry context with it. Modules should be identifiable in purpose. Dependencies should be visible and defensible. Interfaces should say more than what shape of data passes through them. Expected behaviour, failure conditions, and operational assumptions should not live only in chat threads or in somebody's head.
Require a comprehension check before production. Not code review as a box-ticking ritual. A real attempt to answer the questions a strong engineer would ask when trying to work out whether this system is safe to depend on. Why is this dependency here? What assumptions are buried in this implementation? What breaks if this part fails? What has been optimised, and what has been quietly traded away? If nobody can answer those questions cleanly enough, the code is not ready just because it passes tests.
Why this matters now
This problem is not speculative. It is already here, and it will get worse as code generation becomes easier.
As models improve, the temptation to accept output at face value will increase. Teams will be asked to handle more with fewer people. Generated code will continue to pile up in production systems. Some of it will be good. Some of it will be fragile. Much of it will sit in the uncomfortable middle where it works well enough but is poorly understood.
Companies that ignore this can still move quickly. They will just be moving with less visibility than they think.
Companies that take the problem seriously have a better chance of keeping control of their systems as those systems become harder to see clearly. The same argument applies at the business layer: firms swapping SaaS for AI-driven stacks are quietly inheriting the same legibility risk — only now it sits in the workflows running the company.
The actual advantage
For a while, speed alone looked like the prize.
It is not enough now. Plenty of teams can generate software quickly. The harder thing is to keep that software legible to the people responsible for it. That is where the advantage will be. Not in producing more code than everyone else, but in being able to explain what has been built, change it without fear, and stand behind it when the easy answers run out.
Dark code is not inevitable. It is what happens when output outruns understanding for long enough that people stop treating the gap as a problem.
The risk is not simply that machines are writing more of the code. The risk is that organisations start accepting systems they no longer really know.