Vibe Code Cleanup and the Discipline of Letting Go

3 Hits

by Pushker K June 27, 2026 23 min read

TLDR

Vibe coding carries a product to about 80 percent fast, and the last 20 percent is a separate kind of work that rarely shows up in a demo. The maintenance phase, not the build, holds most of the lifetime cost of software. The model that produced the mess cannot reliably clean it up, because it adds code more readily than it removes it and it cannot hold a large codebase in working memory at once. The real cause sits upstream, in a specification that was never fully stated and an architecture the model was never given.

The cleanup follows one method we apply to this work, the Containment Method, in 3 movements. Characterize the current behavior with static analysis and characterization tests before you change a line. Contain the work to one vertical slice and keep every file small enough for the model to hold whole. Convert the slice with a second model reviewing the first, then govern the result with a constitution file so the disorder cannot return.

Two things to keep in view. Run the green light test on any prototype, since a login, a payment, or real customer data each turn a harmless demo into something that can lose money or leak identity. And rebuild in place of repair when the code holds no boundary you can point the model at, because at that point the prototype is the specification and a clean rebuild moves faster than the patch.

Cole sits 30 minutes of zazen most mornings before he opens his laptop. There is a small Zen center 4 blocks from his apartment in the South Bay, and he has kept the same seat for 10 years. The practice has worn down a habit most engineers carry their whole careers, the habit of gripping. He has trained himself to look at a thing as it is before he decides what it should become. That one quality, more than any framework he knows, is what makes him good at the work that landed in his inbox this week.

A founder he contracts for had spent the better part of a year and around $4,000 in model credits building the company’s product. It demos beautifully. 3 clients are ready to sign. The founder handed the whole thing to Cole to make production ready, then turned to the next idea on the roadmap.

Cole opened the backend. One file. 30,000 lines. He asked the agent to break it into modules, and the work stalled inside a minute. Each pass surfaced 10 problems, then 12 more, then 8 more. The codebase had grown past what any model can hold in its context window at once. The size was the wall. The tool was not, and switching tools would move nothing, because the same file would overflow the same working memory in any of them.

Two part panel: a clean glowing application surface beside one solid block marked 30,000 lines with no internal seams

Fig 1. The Inheritance – What shipped looks clean. What sits underneath is one 30,000 line block with no seams.

He felt the old reflex rise, the urge to grab the file and force it into shape that night. He watched the reflex, and he set it down.

This situation is no longer rare. It is becoming a defined role, and Cole has just been handed it.

The eighty twenty that no one mentions

Cole is standing at a wall every builder reaches. It helps to say plainly what got him here, because the wall is not a sign that anyone failed.

Vibe coding carries a builder to roughly 80 percent of a product with real speed, and that speed is genuine. A working prototype in 2 hours where the older road took 2 weeks is a true economic gain, and pretending otherwise insults the reader’s experience. The quiet truth is that the last 20 percent is a separate kind of work, and vibe code cleanup is the name for most of it. It moves slower. It rarely shows up in a demo. And the most expensive part of software has always been the years of maintenance that follow the launch, which held true long before models could write a line of code. The O’Reilly 60/60 rule, one of the few proposed laws of software maintenance, puts 60 percent of a system’s lifecycle cost in the maintenance phase and 40 percent in the original build, and IEEE software maintenance research going back to Lientz and Swanson places the maintenance share between 60 and 80 percent of total lifecycle cost. The bill for software comes due in the years after the demo.

A horizontal bar where the first 80 percent fills fast and the final 20 percent reads dense and steep, marked production, the real work

Fig 2. The Eighty Twenty Wall – The first 80 percent ships in hours. The last 20 percent is where the real work lives.

There is a 4 person startup Cole knows by reputation. Call them the Atlas team. They ship at remarkable speed and they are now 200,000 lines deep. One member understands software architecture. The other 3 keep shipping. The one who knows what technical debt is watches it compound every week and cannot keep ahead of it alone. Cole’s single 30,000 line file is the seed of that same tree. He can see his own next year inside the Atlas codebase if he chooses speed over structure now.

Starting got easy. Staying alive is the whole job now.

What Cole finds when he looks

Cole does the thing his practice trained him to do. He looks at the system as it is, without flinching and without blaming the founder, and he writes down what he finds.

The first discovery sets the tone. A payment function copied across 3 files, because the founder had prompted the same feature on 3 separate days and the model carried no memory of the first 2 builds. A founder in another city had lived the consequence of the same root cause. Her billing fired 3 times on a single click, her webhooks failed on the first real subscription, and she spent that evening emailing every early customer to apologize and hand back free credit. The model forgets between sessions, and the forgetting lands hardest on money and identity.

A timeline of three prompting sessions on Day 1, Day 4, and Day 9, each producing its own copy of one payment function in a different file

Fig 3. Why The Same Code Appears Three Times – The model forgets between sessions, so one feature gets built three times in three files.

The rest of what Cole finds follows a familiar shape, the same set of failures that send most vibe coded products into trouble as they try to cross from launch to scale. The agent built the path where everything goes right and left the failures silent, so a broken call shows the user a blank screen and writes nothing to the logs. It built a login page and treated that as access control, which means any signed in user can reach another user’s records. It kept adding code and almost never removed any, because adding carries less risk for a model than deleting. And the disorder feeds on itself, because a sprawling codebase drains the model’s working memory on understanding the sprawl before it can fix anything inside it.

None of this is hypothetical, and the public record now carries the proof. In May 2025 the security researcher Matt Palmer disclosed CVE-2025-48757, an authorization flaw across applications generated on Lovable, one of the most widely used vibe coding platforms. More than 170 production apps had shipped with their database tables readable by anyone holding the public client key, because row level security sat off by default and the generated code never turned it on. One analysis found that around 10 percent of the apps it examined exposed data this way. The failure Cole is looking at, a login page mistaken for access control, is the same failure that put real user records on the open internet at scale.

The cost of untangling this is not small. One rescue of this kind ran 6 weeks of senior engineering time and 3 months of company runway to repair the billing faults and the access control alone.

Why you cannot prompt your way through vibe code cleanup

Cole feels the obvious pull, the same one every builder feels, to point the agent at the whole mess and tell it to clean itself. Most people reach for vibe code cleanup this way, as one prompt that fixes everything, and his practice makes him pause, and the pause is the lesson.

The Atlas team had already run that experiment at full scale. They tried every autonomous agent harness they could find. Each one surfaced 10 problems, then 12 more, then 8 more, and none of them held its ground when the architect pushed back on a bad call. After 16 hours, the most productive thing he had done all day was test the software by hand, one step at a time.

The reason sits in how the tools work. The model that produced the mess cannot see past it. It reaches for more code because adding is safer for it than removing. And it writes code that passes every test while still falling apart under maintenance, because passing a test and surviving a year of changes are 2 different properties. The research backs the caution. In the first large study of its kind, published at the 2022 IEEE Symposium on Security and Privacy, Pearce and colleagues generated 1,689 programs with an AI coding assistant and found roughly 40 percent of them vulnerable. 3 years later the finding held. Veracode’s 2025 review of AI generated code across more than 100 models reported that 45 percent of the samples introduced a known class of security weakness. A separate user study found something more unsettling for anyone planning to trust the output, that developers working with an AI assistant wrote less secure code while believing it was more secure. The confidence rises faster than the quality does.

Cole reads all of this as attachment, the engineering form of the thing his teacher warns about. The Atlas architect gripped the codebase and drowned in it. The way through asks for the opposite hold, close in understanding and loose in the hands.

A short bar marked vibe coded, 1 day beside a bar ten times longer marked cleanup, about 10 days

Fig 4. The Ten To One Reality – Take the time spent vibe coding and multiply by ten to estimate the cleanup.

There is a number worth setting expectations against. Take the time the founder spent vibe coding the product, and multiply it by 10 to estimate the cleanup. The builders who do this for a living keep landing near that figure.

The cause sits one step earlier

Every vibe code cleanup job traces back to the same root. The mess Cole is reading began as a specification problem that hardened into an architecture problem.

The founder asked for a login page and received a login page, with no access control under it. He asked for features and received features, with no foundation beneath them. Architecture is the place where human intent and risk tolerance live, and a model carries none of that in its training. A person has to supply it, in the asking and in the review.

Two columns, login page and payment handler on the left, access control and verified billing on the right, with the space between marked where architecture begins

Fig 5. Asked Against Meant – What you asked for and what you meant are separated by the gap where architecture begins.

This is where the wishful thinking has to end. A folder of well written markdown files does not turn an unowned codebase into a clean one. Governance files help the agent stay on course, and they earn their place. They do not stand in for a human who understands what the code does. The belief that enough instruction files remove the need to understand is the most common form of learned helplessness in the vibe coding world. Cole understands the founder’s product because he sat with it and characterized it. That understanding is the asset. The files only record it.

The reframe is simple to state and hard to live. Architecture literacy matters more than the ability to write code. A person does not have to type the system to govern it. A person has to recognize when the model has built it wrong. This is also the engineering shape of Cole’s practice. He holds the whole loosely and reads it clearly, and the clear reading is what lets him act.

The Containment Method for vibe code cleanup

There is proof that disciplined cleanup pays, and it is worth meeting before the method.

A solo builder named Theo let an agent rewrite his application in full. The first attempt broke real behavior. He stopped, built a set of tests around what the application already did, and let the rewrite proceed inside that frame. The finished version ran cleaner than his original hand written one, measured on crashes, on logged errors, and on user reports. The model did not get smarter between the 2 attempts. Theo built the structure it needed to work inside.

A vertical flow of three movements, Characterize, Contain, Convert, each a labeled node with an arrow carrying down to the next

Fig 6. The Containment Method – Characterize, Contain, Convert. Each movement makes the next one safe.

That is the whole teaching in one story. A human builds the frame, and the model works within it. Cole already knows the shape of this, because it is how his practice works, a held form inside which things can move freely. The method we apply to this work carries the same shape, and we call it the Containment Method. It rests on one premise drawn from decades of legacy code research. You reduce disorder in a system only after you have made its current behavior observable and bounded the radius of any change. 3 movements follow from that premise.

Movement one, characterize

Cole makes the current behavior observable before he changes anything. You cannot reduce disorder you have not measured.

He runs static analysis across the whole codebase first, in one sweep, to surface dead code, near duplicate logic, unused imports, and dependency faults together. He writes a short document that orders the work by the files eating the most context, largest first, because those files block the model before anything else does. Then he writes characterization tests against the live behavior. The term comes from Michael Feathers and his book on working with legacy code, and the idea is exact. A characterization test documents what the system does today, including the broken parts, and freezes that behavior into a contract that any rewrite has to honor. He holds back the urge to fix while he maps, because mapping and fixing are 2 movements and joining them is how cleanups fail.

Do this. Characterize.

Run a static analysis sweep across the whole repository before you touch anything. On a JavaScript or TypeScript project, knip surfaces dead code and unused exports, and npm audit surfaces dependency faults. Save the output as your starting map.
Order the work by file size. List every file over 500 lines, largest first. These are the files the model cannot hold whole, so they sit at the top of the queue.
Write characterization tests against current behavior. A prompt that works: “Write integration tests that document exactly what this billing flow does right now, including any broken behavior. Do not fix anything. The tests should pass against the current code.”
Turn on a scan at every commit so the mess cannot rebuild in silence. 20 minutes of setup here saves the next full sweep.

Movement two, contain

Cole bounds the radius of every change. The science here is the context window as working memory. A model holds a small file whole and reasons well across it, and it loses the thread on a large one, so the size of the thing you ask it to change decides how well it changes it.

He chooses one vertical slice that carries real user value, the billing flow, and he treats everything outside it as untouchable until that slice stands clean and tested. He keeps every file under 500 lines. The characterization tests from the first movement catch any regression the moment it shows, which lets him work the slice with a steady hand.

Do this. Contain.

Pick one slice that a real user touches and that you can name in a sentence. Billing. Login. The flow most likely to lose money or leak data goes first.
Declare everything else off limits in writing. Tell the agent which directories it may read and the single directory it may change this session.
Split any file over 500 lines before you edit it. A prompt that works: “Split this file into modules under 500 lines each. Change no behavior. The characterization tests must still pass.” Run the tests after.
Make one change at a time and run the test suite before every commit. A green suite is your permission to proceed.

Movement three, convert

Cole reduces the disorder inside the contained slice. The science here is that a model cannot see past what it produced, so a second model trained by a different company flags what the first one accepted as ordinary.

He runs the build model to do the work, then a separate model with a large context window to review it. He exports the slice to a single document, asks for an unsparing review with the praise turned off, and brings the findings back as a task list ranked by severity, with the invented issues struck out by hand. He refactors the billing flow, and where a slice has no recoverable shape he rebuilds it from the characterization tests. Then he governs. A constitution file in the root holds the conventions and the architecture decisions so the next session does not reinvent them, and a short list of protected zones marks the code the agent may not touch.

Do this. Convert.

Export the contained slice to one document and hand it to a second model with a large context window. A prompt that works: “Review this code for security faults, duplicated logic, and missing error handling. I want a holistic critique, not praise. Rank every issue by severity.”
Filter the review by hand. Models invent issues, so read the list and strike the ones that do not hold before you act on any of them.
Feed the surviving list back to your build model as a ranked task file and work it top down, running the tests after each fix.
Where a slice has no shape worth saving, rebuild it from the characterization tests, since the tests already hold the behavior you need to preserve.
Write a constitution file. Put your stack, your conventions, your architecture decisions, and a short list of files the agent may not touch in one file at the root that the agent reads every session. This is what stops the next session from undoing this one.

Characterize, Contain, Convert. By the time the billing slice stands clean and green, Cole can say the 3 movements without looking, and so can the reader.

The green light test, and five things to anchor on

Before the method ever comes into play, one question tells a builder whether their prototype can hurt anyone. It is the most useful 10 seconds in this whole craft, and it costs nothing to apply.

Ask 3 things. Does it have a login. Does it take payments. Does it hold real customer data. If the answer to all 3 is no, ship it freely and learn in public, because the worst case is an embarrassing bug. If the answer to any one is yes, a human reviews the slice that touches it before it goes live, because that slice can now lose money, leak an identity, or expose a person. The Lovable disclosure happened entirely inside the third question.

Three questions stacked as a gate, a login, payments, real customer data, with one path to ship and learn in public and one to a human reviews that slice first

Fig 6B. The Green Light Test – A login, a payment, or real customer data each turn a harmless demo into real risk.

5 anchors hold up everything above. A builder who keeps these in view stays out of most of the trouble in this post.

Decide what the thing is before you build it, because the model builds what you ask for, and a vague ask returns confident disorder.
Write tests that lock current behavior before you change anything, because a test you can run is the only honest signal that you broke something.
Keep files small and work one slice at a time, because a model reasons well over a file it can hold whole and loses the thread on one it cannot.
Clean at the end of every session, because the disorder you leave today is the context the model wades through tomorrow, and the cost accelerates.
Understand what was built even when you did not write it, because durability lives in your grasp of the system and not in the code alone.

Cole holds all 5. He holds them as a way of working he has practiced for years, on the cushion and at the keyboard both, the kind of discipline that lives deeper than a rule taped to a wall.

The method above is the work. If you would rather hand it off, we run vibe code cleanup for a living, and we are happy to start with a short diagnostic.

When the honest move is to rebuild

The fastest way to fix a vibe coded app is sometimes to stop trying to fix it. Cleanup is not always the right answer, and pretending it is would cost the reader real money.

A builder spent 3 days vibe coding a product, then 2 days failing to patch what he had, then deleted everything except the interface and rebuilt the rest piece by piece. The rebuild took less total time than the failed patch and came out at higher quality. The signal that sent him there is the one Cole watches for, and it is the hardest non attachment of all. When you cannot point the model at a boundary because the code holds none, the slice has no shape worth saving.

A decision gate asking can you point to a boundary in the code, a yes path to Refactor and a no path to Rebuild from the prototype as specification

Fig 7. Refactor Or Rebuild – If you cannot point the model at a boundary, the slice has no shape worth saving.

The prototype still has worth in that moment. It is the specification. It already showed the flows, the edge cases, and the true requirements, and the characterization tests captured that behavior in a form a rebuild can honor. The uncertainty cost of early work is already paid, and a clean rebuild from that base moves faster than the first build did. Cole lets the old code go without ceremony, the way he lets most things go.

Who does vibe code cleanup well

Cole got free of it. He rebuilt the interface against a real component system, brought the billing flow under tests, gave the agent a defined structure to work inside, and shipped a product that holds under real users. The work was real, and it asked for the discipline this post described. None of it came from a clever prompt. His practice gave him the one thing the job required, the capacity to hold the system closely in his understanding and loosely in his grip.

A mirror of the opening frame where the single block is now opened into clean labeled modules with visible seams

Fig 8. The Inheritance, Resolved – The same block, now opened into clean labeled modules with visible seams.

The market around this work sits crooked right now. The person who generates the demo collects the credit, and the person who makes it real collects the friction, and at times the blame. One engineer at a large company asked for documentation and proper requirements before cleaning up a vibe coded project, and got removed from the project for asking.

The trade is forming right at this seam anyway. Cleanup as a service already advertises in the same forums where founders ask for help. The question in every thread, who can fix this, is the question that built the older business of rescuing failed minimum viable products for the decade before vibe coding had a name. The work is old and the volume is new. It does not come cheap either, because the engineers who can do it know the true cost, and most of them decline a codebase they did not build. The ones who do it well bring architectural judgment and move in sequence, and they use the model as a tool inside a structure they defined.

The shape of the next few years reads clearly enough. Product teams will vibe code the demo, engineering will make it real, and that handoff will settle into a defined stage of how software gets built. It is the same gap that sinks most AI agent projects, the distance between a prototype that demos well and a system that holds under real load.

Who answers for the system

A slide attributed to IBM internal training, circulated widely since 2017, reads:

A computer can never be held accountable, therefore a computer must never make a management decision.

A typographic rendering of the line a computer can never be held accountable, therefore a computer must never make a management decision

Fig 9. A Computer Cannot Be Held Accountable – The model built what it was asked to build. It cannot answer for what it built.

The line predates vibe coding by decades and names the exact gap that makes this a real job. The model built what it was asked to build. It cannot answer for what it built. The accountability stays with the person who directs it, the person who maintains it, and most often the person who inherits it.

That person is the mechanic. It has always been the job. It arrives faster now, larger, and more often, and it is the same job underneath. Cole knew that before the founder ever called. He sat with it each morning, held it lightly, and let the work be the work.

Start with the map

The hardest part of vibe code cleanup is the first movement, because you cannot fix what you have not made visible. So start there, and start free.

Send us your repository and we will run the Characterize step for you at no cost. You get back a static analysis sweep, a ranked map of the files eating the most context, and a plain read on the 3 places that decide whether your product is safe to run, the login, the payments, and the customer data. The map is yours to keep and act on, whether you hand the rest to us or take it from there yourself.

If you want the Contain and Convert movements run by people who do this every week, we do that too. We bring the slice under tests, rebuild what has no shape worth saving, and leave you with a governed codebase and the constitution file that keeps it clean.

Request your free codebase map

Written By

Pushker K Chief Executive Officer @ Clixlogix

About the Author:

Pushker is the founder of Clixlogix. Give him a messy operation and he finds the leverage point, then builds the fix himself. He works at the edge of what AI can actually do inside a business, and writes about what he finds there.

Just Drop Us A Line!

We are here to answer your questions 24/7

Vibe Code Cleanup and the Discipline of Letting Go

The eighty twenty that no one mentions

What Cole finds when he looks

Why you cannot prompt your way through vibe code cleanup

The cause sits one step earlier