Technical services
We polished Cerrajería Carlos Rodríguez's quotes with AI — without taking their data off the server
We polished the quote texts at Cerrajería Carlos Rodríguez with an AI assistant in their management software — without taking data off the server.
When quotes give away they were written in a hurry
In a technical-services company, the quote and the invoice are the written face of the brand. The end customer might not notice the lock that was installed, but they do read the text they receive. And the moment that text gets written by different people at different times, it stops sounding like the same company.
At Cerrajería Carlos Rodríguez, quotes sometimes went out in a hurry, sometimes with detail, almost never with consistency. It wasn’t a lack of judgement on the team’s part: it was the invisible cost of having to write a professional paragraph at the end of a day spent handling lockouts, vandalism repairs and emergencies. The customer received uneven descriptions, came back with questions, and the company’s image kept landing below the actual quality of the work being done on the street.
What we decided before building
The natural answer was to embed an AI assistant inside their management software, connected to one of the big cloud providers. It works, plenty of teams do it, and it’s probably what anyone else would have proposed to them.
But before we started I walked them through what that decision actually means when the data isn’t yours but your customers’. Every time someone presses the button to improve a text, the quote content — the customer’s name, their address, what we did at their front door at 3 am, what we charged them — gets sent to a provider we don’t control, subject to their terms, their subcontractors and their training-data policies.
Alejandro hadn’t thought about it. Few customers do — it’s not their job. The moment he heard it, the answer was immediate: “no, not that.” And that’s where the decision to host it on their own server was born.
What the assistant actually does
Next to the description field of every quote and invoice, inside the management software they already use every day, a button appeared: improve text. That button is the whole visible change.
- The user types the description however they can, in whatever time they have, without fighting the prose.
- They press the button.
- The assistant returns the same idea, rewritten in a professional tone consistent with the rest of the company’s quotes.
- The user approves or edits the result before saving. The last word is always human.
Zero training. Zero new tabs to open. Zero changes to the workflow. The AI appears exactly where the friction was.
How it’s built
The model. Qwen 2.5 3B. A small, efficient model, more than enough to rewrite a paragraph in Spanish with a professional tone. A 70B model would be overkill for this task, consume more RAM and take longer per response without delivering proportional value. Picking the right model for the problem is part of the craft.
The engine. Ollama. It’s one of the most solid local-inference runtimes of 2025-26: it exposes an HTTP API at localhost:11434, handles model download and loading, and runs with the simplicity of any Linux service. The technical barrier to running a small local model has collapsed in the last two years.
The integration. The management software makes an HTTP request to the Ollama engine running on the same server, with the user’s text and a tone instruction. Ollama returns the rewritten text. The application shows it to the user for approval or editing. Everything happens inside the customer’s network.
The consequence. No end-customer data ever leaves Cerrajería’s server. No cloud AI provider has access to their quotes. Zero per-token cost, no variable billing. And if some provider changes its policy tomorrow, the assistant keeps working exactly the same: it doesn’t depend on anyone.
Three things we learned
1. You don’t always need the biggest model. There’s an inertia, fed by the headlines, to reach for the largest available model for any AI task. But a 3B local model handles this job perfectly. Picking a model proportional to the case saves resources, improves latency and opens the door to on-prem deployment on hardware companies already own.
2. The user has the last word. The assistant suggests; it doesn’t auto-publish. That design choice does two things: it cuts the risk of errors reaching the customer to nearly zero, and it raises the team’s confidence in the tool. An AI that decides on its own is an AI you don’t use; an AI that assists and lets you review is an AI that sticks around.
3. On-prem isn’t what it used to be. Five years ago, serving a local model required expensive hardware and niche DevOps know-how. In 2025-26, with Ollama and small efficient models, serving AI on your own server is operationally similar to calling a cloud API. The technical barrier that justified “just send everything to OpenAI” has collapsed. For companies handling sensitive data, that changes the conversation.
What’s next
The assistant has been in production since 2026 and keeps iterating. Immediate roadmap:
- Extending the assistant to other texts in the management software: customer emails, descriptions of closed jobs, internal communications.
- Moving to Qwen 2.5 7B if in some use case the 3B falls short, keeping the deployment local.
- Templates per job type (locks, security upgrades, vandalism, 24h emergencies) so the assistant can fine-tune the tone to context.
“All I wanted was for the quotes not to look the way they sometimes did. The data side hadn't crossed my mind. But they were right to bring it up — that's what you expect from a partner. Today we have both: professional texts, and our data where it belongs.”