Small Language Models (SLMs) as the gold standard for trust in AI

Half of accountants have seen businesses lose money due to mistakes made by large language model (LLM) chatbots.
Small Language Models (SLMs) are trained on narrow, high-quality datasets. By stripping away wider general knowledge that often leads to hallucinations in LLMs, these specialised SLMs can focus purely on accuracy.
SLMs are faster, cheaper to run, and can guarantee a much greater degree of accuracy in their output. They’re trying to be correct, not clever.
The next era won’t be won by whoever has the biggest model, but by those who build the right architecture around these models: the specialist foundations, human audits, and feedback loops – all on top of proprietary data and infrastructure that the frontier labs can’t see. This is the result of years of work.

The trust gap in AI is growing. This is particularly true in the high-stakes world of finance, where the margin for error is zero, and one single hallucination can lead to financial loss and compliance nightmares. A wrong supplier name or an invoice with an extra 0 on the end isn’t a quirky model behaviour – it’s a misstatement in someone’s accounts.

Our recent data shows that half of accountants have seen businesses lose money due to mistakes made by large language model (LLM) chatbots. This isn’t a story about AI failing, but about businesses using the wrong shape of AI for the job at hand.

Bolting a general purpose LLM onto a financial workflow and hoping for accuracy is far too dangerous a gamble. The bottom line is that hope is not an architecture.

The failure of frontier AI alone in finance

Frontier large language models excel in language and reasoning, but they often fall short in ensuring complete accuracy. For example, if I asked a bot to read 500 invoices and extract the data with 100% accuracy, it would eventually fail.

This is because LLMs are probabilistic – they’re trained to give answers, no matter what. While a human may say they’re unsure of the answer to a complex question, LLMs are designed to act confidently, whether that action is accurate or not. Nothing in their training rewards abstention. This results in plausible guesses, which only widen the trust gap.

Another issue with using frontier models on their own is that they change overnight. A provider ships an update, the weights move, and a workflow that was 92% accurate on Monday is something different on Tuesday. If your product is a thin cover over someone else’s API, you don’t own your own accuracy – you’re renting it, and the rent keeps changing.

This is why pure-LLM accounting tools, for example, tend to ship without an accuracy claim. There isn’t one they can credibly defend. Auditing and sampling outputs at scale is hard work, and without it, you’re shipping a confidence performance instead of a verified system.

A smaller solution

To bridge this trust gap, we need to move away from a one-size-fits-all approach to AI. Enter small language models, or SLMs – localised, domain-specific language models which are designed for precision.

While LLMs are trained on the entire public internet, made up of billions of parameters, SLMs are trained on narrow, high-quality datasets. By stripping away wider general knowledge that often leads to hallucinations in LLMs, these specialised SLMs can focus purely on accuracy.

Because they are highly targeted and working with much smaller, local datasets, SLMs are faster, cheaper to run, and can guarantee a much greater degree of accuracy in their output. They’re trying to be correct, not clever.

The best of both worlds – orchestrating the reasoning layer

Businesses often make the mistake of thinking that AI adoption is a binary choice between one model or the other, but real operational resilience actually comes from leveraging the best elements of different architectures.

While LLMs provide real value in natural language processing and high-level reasoning, SLMs provide the mastery of specific, narrow topics. When these models are combined, they can truly elevate a business, with LLMs acting as the interface and reasoning layer, while SLMs act as the deterministic verifier.

Getting these models to communicate requires a secure bridge, utilising the Model Context Protocol (MCP). However, the reasoning layer’s role is to provide guidance, not just answers. It mines the muscle memory of how a professional handles a specific client and codifies that into natural language rules. The system then uses these rules to steer the specialists.

In the context of processing financial documents for example, the LLM handles the complex reasoning and conversation while the SLM handles calculation and recognition of different tax rules – all while the MCP securely routes the query from the LLM to the correct specialist model. This ensures that the final output is grounded in fact, not probability.

As with any technology, SLMs have their limits. To maintain such a high level of accuracy, small language models must stay narrow – meaning that a suite of SLMs is essential to cover all bases in complex areas like finance. These suites of SLMs form what is called an ‘AI swarm’ – individual specialists, often sub-1B parameters in size, in different areas like tax and receipt extraction.

Creating a feedback loop

Human verification should be a core design feature, not a fallback. In a high-stakes financial environment, trust is built on a system which acknowledges its own uncertainty by giving every prediction a confidence score. If a score falls below a set threshold, it goes back to human reviewers.

The architecture of trust

Over the next year, the market will be flooded with impressive AI tools and demos, but these tools often crumble when they hit the reality of a complex workflow.

The next era of finance won’t be won by whoever has the biggest model, but by those who build the right architecture around these models: the specialist foundations, human audits, and feedback loops – all on top of proprietary data and infrastructure that the frontier labs can’t see. This is the result of years of work.

The goal is about enabling professionals to spend less time on manual inputs and more time on high-impact advisory work. Trust is vital in every industry, now more than ever. A specialised, connected AI architecture is the way to ensure it.

Stephen Edginton is chief product technology officer at Dext.

Goodbye Software as a Service, Hello AI as a Service – AI as a Service (AIaaS) may replace Software as a Service (SaaS) in the near future. Here’s what that means for your organisation

From generative to agentic AI – now the real transformation begins – Node4’s Mark Skelton takes us through the move from generative to agentic AI and how to approach it in your organisation

Will more AI mean more cyberattacks? – An increased use of AI within organisations could spell a rise in cyberattacks, explains Nick Martindale. Here’s what you can do

Small Language Models (SLMs) as the gold standard for trust in AI

Stephen Edginton of Dext explains why Small Language Models (SLMs) should be the next stage in your evolving AI strategy

The failure of frontier AI alone in finance

A smaller solution

The best of both worlds – orchestrating the reasoning layer

Creating a feedback loop

The architecture of trust

Read more

Small Language Models (SLMs) as the gold standard for trust in AI

Stephen Edginton of Dext explains why Small Language Models (SLMs) should be the next stage in your evolving AI strategy

The failure of frontier AI alone in finance

A smaller solution

The best of both worlds – orchestrating the reasoning layer

Creating a feedback loop

The architecture of trust

Read more

Related Topics

Related Stories

Goodbye Software as a Service, Hello AI as a Service

Ultra-compressed and more efficient AI models for industry

Digital friction is where most AI initiatives fail

Are you really ready for AI? Exposing shadow tools in your organisation

Related Stories

Small Language Models (SLMs) as the gold standard for trust in AI

Goodbye Software as a Service, Hello AI as a Service

Ultra-compressed and more efficient AI models for industry

Digital friction is where most AI initiatives fail