Theodore Lowe, Ap #867-859 Sit Rd, Azusa New York
Theodore Lowe, Ap #867-859 Sit Rd, Azusa New York
TL;DR. The question to settle before “RAG or fine-tuning” is architectural, not technical. Where does the AI sit inside the application? Three answers cover most cases. A bolt-on keeps the AI outside the application boundary as a service the legacy app calls. An embed wires the AI into the application’s main workflows. A rebuild rearchitects around the model. Each carries a distinct cost curve, rollback story, and data demand. Path first, technique second, platform downstream.
Most teams come to the AI and legacy conversation with the wrong question on the table. They ask whether to use RAG or fine tuning, whether to host the model themselves or call a frontier API, whether to start with one vendor or shop the market. These are all real questions, and in almost every case they are the second question. The first one is architectural. Where does the AI sit inside the application? The answer determines the cost curve, the rollback story, the data work that has to happen first, and what good even looks like when the feature ships. The technique question follows the architecture question, and a surprising number of AI initiatives stall because the order gets reversed.
This piece is for engineering leaders weighing how to put AI into a system that already runs the business. It walks through the three integration paths we see most often, the tradeoffs that come with each, the questions that help a team pick correctly, and the techniques that sit underneath all three.
The reasons companies are taking this on now are not mysterious, but they are worth saying out loud. Inference costs have dropped by close to an order of magnitude in under two years. Vector databases have moved from research curiosity to production grade infrastructure. The major model providers and the cloud platforms have introduced enterprise tiers with data isolation, audit trails, and the kind of governance that makes legal and security teams willing to sign off.
The legacy applications running most companies were built before any of this existed. They were not designed with model inference on the critical path, or with the data shape that retrieval needs, or with the feedback loops that agentic behavior depends on. The gap between what those applications do today and what their users now expect is growing fast enough to be a competitive cost. (Thoughtworks have written about the related question of using AI to understand legacy codebases, which sits underneath all three of the integration paths we’re about to walk through.)
We see this across our clients in shapes that are starting to feel familiar. A logistics dispatch tool adds document extraction over freight paperwork. A CRM gains meeting summarization and follow up drafting. A field service platform picks up predictive scheduling. None of these features were the original product, but the absence of them is becoming a reason customers leave.
Before deciding on RAG or fine tuning, before sizing a vector database, before picking a frontier model, we need to answer a simpler question. Where does the AI sit inside the application? There are essentially three answers (bolt-on, embed, and rebuild), and they differ less in their AI techniques than in their architectural footprint, their cost profile, and what they ask of the legacy system underneath. Figure 1 sets the three paths side by side.
Figure 1 – Three integration paths for AI in legacy applications
A bolt on treats the AI as an external service that the legacy application calls. The application keeps its existing architecture, its data model, and its deployment cadence. The AI features live in a sidecar service or a separate microservice, reached through an API, and they return text, structured output, or a decision that the legacy app then displays or acts on.
This is the path most teams should start with, especially the ones who have not yet built much production experience with LLMs. Time to first value is usually four to eight weeks. The blast radius is small, because the AI component can be turned off without breaking the underlying workflows. Rollback is straightforward. Cost stays predictable, since the AI inference is paid for separately and can be capped.
The constraint is that the AI is only as good as the data and UI the legacy app can hand it. If the legacy schema does not capture the right context, or the existing screens have no surface to display an AI response, the feature ends up attached to the side of the product in a way that users learn to ignore. The bolt on path fits copilots, summarization, triage, draft generation, and other adjacent features where the AI augments an existing task without changing the workflow.
A common failure mode is that the team ships a chat panel which lives in a tab no one opens, because the workflow it serves was never the workflow users came to the app to do.
An embedded integration makes the AI a core participant in the application’s main workflows. Retrieval, inference, or agent logic gets wired into the application’s runtime, and the data flows are reshaped so the AI components have what they need to behave well. The legacy app’s existing screens, APIs, and event paths get rebuilt around the new behavior.
This is heavier work, typically three to six months to a meaningful release, with a real engineering investment in evaluation, observability, and the data plumbing that feeds the model. The payoff is that the AI becomes part of how the product behaves rather than a feature on the side. Decisioning, search, recommendations, and workflows that lean on documents at scale tend to need embedded integration to deliver on the user’s expectations, because the AI’s output is the product, not a sidebar.
One concrete shape we’ve written about elsewhere is a Nordic legal workflow CRM where multimodal document understanding became the spine of the application. Case routing, contract risk flagging, and the attorney chatbot all depend on the model’s reading of the documents rather than on a separate extraction step that hands a character stream to a downstream reasoner. The AI is the workflow, not a feature attached to it.
The risk is real. Embedded work touches the parts of the system that the business depends on, which means the cost of being wrong shows up directly in user trust, in compliance posture, and in operations. Teams that succeed here invest early in evaluation harnesses, fallback paths, and the kind of observability that catches model drift before a customer does.
The typical failure here is that the team ships the embedded behavior without an evaluation harness, then spends the next quarter debugging quality regressions in production.
The third path rearchitects the application around model inference, vector stores, events as the spine of the data layer, and agentic orchestration. The legacy app is not extended but replaced, or carved up and replaced piece by piece, with a new architecture whose first principles assume that probabilistic components sit on the critical path.
This is the modernization equivalent of moving from a monolith to a set of microservices, and it carries the same kind of cost profile. Nine to eighteen months is a reasonable expectation for a first production release. The investment is substantial, the architectural risk is real, and the work overlaps with the broader modernization questions a team would face even without AI in the picture.
A rebuild is the right call when the legacy architecture cannot support the latency, the data shape, or the feedback loops that the target AI experience requires. If the application’s reason for existing is being reshaped by AI, attempting a bolt on or an embed is an exercise in fighting the architecture. We see this most often in search experiences, in workflows where the model has to reason across many documents through a graph of retrieval and inference rather than a single call, and in any product where agentic behavior is the user value.
The recurring trap here is that the team commits to a rebuild before the product question is settled, then discovers six months in that a bolt on would have answered the same user need.
The differences are easier to see in a single table than across paragraphs. The numbers are typical ranges from delivery work and the broader industry conversation, not promises.
| Dimension | Bolt on | Embed | Rebuild |
|---|---|---|---|
| Time to first value | 4 to 8 weeks | 3 to 6 months | 9 to 18 months |
| Investment range | Low | Medium | High |
| Architectural risk | Low | Medium | High |
| Data leverage | Limited | Strong | Full |
| Reversibility | Easy | Moderate | Low |
| Best fit | Adjacent features | Core workflows | New product surface |
Once the path is settled, the technique conversation gets simpler, because the path narrows what makes sense. RAG, fine tuning, classical machine learning, agentic orchestration, prompt only LLM calls. All of these can appear in any of the three paths. The path decides the architectural footprint of the AI work. The technique decides how the model behaves. Figure 2 lays the two dimensions on the same grid.
Figure 2 – Architecture path and AI technique are orthogonal choices
A bolt on copilot that pulls answers from a SharePoint index is RAG sitting inside a bolt on architecture. An embedded fraud decisioning module built on a fine tuned classifier is a different technique sitting inside a different architecture. A rebuilt support platform that combines RAG, fine tuning, and agentic routing is a third technique stack sitting inside a third architectural choice.
This separation also makes it easier to evolve. Most teams do not need to commit to a technique stack on day one if they have committed to the path. A bolt on can start as a prompt only LLM call and graduate to RAG once the retrieval shape is understood. An embed can start with a classical model and add fine tuning when the data signal is rich enough. The path is the long lived decision, and the technique is the part that changes over time.
Treating them as a single decision is where teams lose six months and a budget cycle.
Five questions usually settle the path, asked in order.
The first is whether the AI behavior is central to the user’s reason for opening the application. If the answer is no, a bolt on almost always wins. If the answer is yes, the conversation moves on.
The second is whether the legacy data model already contains what the model needs. Most of the time the answer is partly yes and partly no, and the shape of that partial answer is what decides between an embed and a rebuild. If the data is largely there but needs to be re plumbed, an embed is usually the right shape. If the data layer would need to be rebuilt regardless, the rebuild becomes the honest path.
The third is what the cost of being wrong looks like in production. A summarization that occasionally hallucinates is usually recoverable, while a pricing decision that occasionally hallucinates can do real damage before anyone notices. The higher the cost of error, the more the path tilts toward embed or rebuild, because both create room for the evaluation harnesses, fallback paths, and observability that a bolt on can only loosely accommodate.
The fourth is how often the AI logic will need to change. If the team expects the model behavior to evolve every few weeks, an embed gives the surface area to do that without touching the application’s core. A bolt on works for the same reason at a smaller scale. A rebuild is the wrong choice when the AI behavior is still being discovered.
The fifth is whether the legacy stack is on a modernization trajectory anyway. If a replatforming program is already funded and underway, folding the AI work into it can change the math on the rebuild path. If the legacy stack is meant to stand for another five years, a rebuild is hard to justify on AI alone.
The five answers cluster, and the cluster points at one of the three paths.
The work that sits in front of the code is what determines whether any of this ships. Our approach has settled into a few stages that map to the failure modes we have seen most often.
Feasibility and ROI sizing comes first, before any architectural commitment. The path decision should be made on the basis of expected value and expected cost, not on the basis of which AI demos looked good in a vendor meeting.
A data readiness audit follows, because the gap between what the legacy data layer contains and what the AI components need is almost always larger than the team expected at the start. The audit usually surfaces work that would have surfaced anyway, except later and at higher cost.
The build itself is staged, with measured fallback at each step. The first production release is rarely the most ambitious version of the feature. It is the version that proves the path is correct.
An evaluation harness gets built alongside the feature, not retrofitted afterwards. This is the single most reliable difference between an AI initiative that holds up in production and one that becomes a quarterly incident review.
Production observability covers cost, latency, and model drift from day one. Each of these will trend in unexpected directions, and a team that cannot see the trends will respond to them late.
Selection follows the path and the data constraints, not the other way around. A few categories are worth keeping in mind.
Frontier model providers like OpenAI, Anthropic, and Google offer the broadest general capability and the fastest access to new model families. They are usually the right starting point when the AI behavior is the main user value and the team needs to move quickly.
Cloud hosted model services like Amazon Bedrock, Google Vertex, and Azure AI Foundry trade some capability flexibility for tighter integration with the rest of the cloud estate and stronger governance defaults. They tend to win when the legacy app already lives inside that cloud and when compliance is heavy.
Open weight models like Llama, Mistral, and Qwen become relevant when cost control matters, when data residency rules out hosted inference, or when the team wants the option to fine tune deeply without going through a provider’s process.
Vector and retrieval infrastructure is its own category. Pgvector, Pinecone, Weaviate, OpenSearch, and a handful of others each have their place, and the right choice depends on existing data stores, latency targets, and the size of the corpus the AI needs to reason over.
The honest version of this section is that the platform choice is downstream of the path. A bolt on can be done well on almost any stack. An embed narrows the choices. A rebuild starts to depend on commitments the team will live with for years.
One principle is worth carrying through every path, regardless of platform. Build the model itself as a replaceable component. Store the model identifier as a configuration value, keep prompts and output schemas decoupled from the specific model behind them, and a deprecation email becomes a configuration change rather than an engineering project. We have seen the same migration take a day for teams that built this way, and weeks for teams that did not.
The bolt on, embed, rebuild framing is the same disciplined architectural conversation that has guided modernization decisions for two decades. AI does not change the framing. It changes the techniques that sit inside it, and it changes the urgency of having the framing conversation at all. The teams that get this work right are the ones that pick the path first, pick the technique second, and treat the platform choice as a downstream consequence rather than an upstream commitment.
Want a second pair of eyes on which path fits your situation?
Our team offers an architecture review at no cost. We will sit with your engineering lead, look at the legacy system, and tell you whether bolt on, embed, or rebuild is the honest path.
The three answers describe where the AI sits in relation to the application, not what AI technique is used. A bolt on keeps the AI outside the application boundary as a separate service the legacy app calls. An embed wires the AI into the application’s main workflows and reshapes data flow around it. A rebuild rearchitects the product around model inference, vector stores, and agentic orchestration as first principles.
Time to first value varies sharply by path. A bolt on usually ships in four to eight weeks, because it touches the application surface but not its internals. An embed is heavier work and tends to take three to six months to a meaningful release. A rebuild lines up with broader modernization timelines and runs nine to eighteen months for a first production release.
Often yes, and this is one of the reasons bolt on is the default starting path for teams new to production LLM work. A bolt on serves as a low risk way to learn the data shape, the evaluation needs, and the user response to the feature. When the team has enough signal to justify a deeper investment, the bolt on becomes a working prototype that informs the embed design. The migration is rarely zero cost, but the bolt on usually pays for itself before that conversation begins.
The decision comes down to whether the legacy data layer and architecture can support the AI experience the team is targeting. If the data is largely there and needs to be re plumbed, an embed is usually the honest choice. If the data layer would need to be rebuilt regardless, or if probabilistic components have to sit on the critical path that the legacy architecture cannot accommodate, the rebuild becomes the path. The cost of being wrong on this question is months, not days.
RAG is a technique. It can appear inside any of the three integration paths. A bolt on copilot that retrieves from a SharePoint index is RAG inside a bolt on architecture. An embedded support tool that uses retrieval and reranking on the application’s main workflows is RAG inside an embed. The technique sits underneath the path, and conflating the two is one of the more common ways AI initiatives lose time.
Shipping the feature without an evaluation harness. Teams move from prototype to production on the strength of demo level outputs, then spend the next quarter debugging quality regressions that no one can characterize. The fix is to build the eval harness alongside the feature, not after the first incident. Bolt on, embed, and rebuild all fail in similar ways when evaluation lags the build.
Build the model itself as a replaceable component from day one. Store the model identifier as a configuration value, keep prompts and output schemas decoupled from the specific model behind them, and a provider deprecation email becomes a configuration change rather than an engineering project. Teams that build this way move between providers in days. Teams that hard wire the model assumptions into the code spend weeks or months when they need to switch.
Frontier models from OpenAI, Anthropic, and Google offer the broadest general capability and the fastest access to new model families, which makes them the usual starting point when the team needs to move quickly. Open weight models like Llama, Mistral, and Qwen become relevant when cost control matters at scale, when data residency rules out hosted inference, or when the team wants the option to fine tune deeply without going through a provider’s process. The choice is best made downstream of the path decision and the data constraints, not before either is settled.
Akhilesh leads architecture on projects where customer communication, CRM logic, and AI-driven insights converge. He specializes in agentic AI workflows and middleware orchestration, bringing “less guesswork, more signal” mindset to each project, ensuring every integration is fast, scalable, and deeply aligned with how modern teams operate.
We are here to answer your questions 24/7