How developers can build cost-effective AI models

Making affordable models to match Big Tech’s capabilities

We’re at the end of an era, one in which giant tech companies race one another to build the largest language model they can. For a long time, we’ve chased ever-larger models with an increasing number of parameters, culminating in OpenAI’s GPT-4, rumoured to contain over 1 trillion parameters.

Large language models (LLMs) like GPT-4 and LLaMA 2, produced by Big Tech giants like Microsoft-backed OpenAI and Meta, have significant and wide-ranging applications for enterprises and individuals alike. However, the cost incurred by training and running these models has become impossible to ignore, writes Victor Botev, CTO and co-founder of Iris.ai.

We need smarter, optimised model architectures to make AI something that all businesses can access for their own needs. Fortunately, there are a variety of ways for smaller organisations to build cost-effective AI that competes with the capabilities of the largest models.

Tackling the cost issue

For most enterprises, but especially for small to medium-sized operations, the amount of spare capital available for investment in AI can be small. If we want AI to be accessible for the vast majority of organisations, we must make its training and running costs feasible.

Right now, fine-tuning and instruction set tuning a model like GPT-4 for a business to apply to a specific use case might cost tens of thousands of dollars. Quite aside from the training costs, the more complicated your model, the more expensive it is to run over time. At the extreme end, there are estimates that it can cost around $700,000 per day to keep ChatGPT’s servers running.

These issues make customising the largest models for domain-specific use cases totally out of reach for many organisations. So, let’s examine how developers can overcome this issue and create truly cost-effective models.

Training and tuning your AI models

One of the most common approaches to creating a language model is to take an existing, open-source model and train it to fit an organisation’s requirements. When we think about how we can customise these models to reduce running and training costs, it’s essential to know the minimum number of parameters and tuning required for each use case.

Companies today must be able to do three types of model tuning to make them fit for purpose. From the most intricate to the highest level, these are fine-tuning, instruction set tuning, and prompt tuning.

Fine-tuning

Fine-tuning means making minor adjustments to the way models understand and encode the language related to a particular domain. If certain tokens are underrepresented, fine-tuning helps us change this and improve the model’s contextual understanding. Say that a model has been created with the goal of recognising and categorising scientific papers, for instance, this may make it a good candidate to fine-tune for patent research.

With a carefully selected data set, often involving a business’s own proprietary data, the new model can be far more accurate than the generic model from which it evolved. By focusing on quality over quantity, we can improve accuracy whilst keeping total training time down.

Instruction set tuning

Instruction set tuning and prompt tuning are less expensive and data-intensive than fine-tuning but require us to take more care in how we formulate the instructions and prompts. Another challenge for both of these approaches is working out how we automate data gathering to make this a scalable process.

Instruction set tuning is a slightly newer technique, introduced in 2021 with the ‘Finetuned Language Models Are Zero-Shot Learners’ paper from Google researchers. It involves giving the model an understanding of certain instructions – instead of the user having to tell it how to follow an instruction step-by-step.

This approach does have some other limitations – namely, how we deal with performance losses due to counterproductive or overlapping instructions. To overcome this, you need highly tailored, well-curated datasets with instructions that don’t overlap - normally, you would need to create and curate these datasets manually. However, by using a ‘swarm’ of smarter, specialised language models, you can automatically generate high-quality datasets and save human labour in the process.

Prompt tuning

When it comes to prompt tuning, we can turn our attention to how we extract the exact knowledge we want from a model based on how it’s been encoded. A useful parallel may be how you phrase a search engine query to get the right results. As we zoom out to this high-level optimisation, bear in mind that prompt tuning’s effectiveness is dependent on how much fine-tuning and instruction tuning remains to be done.

If the knowledge you want to extract from the model has been encoded properly, then no fine-tuning will be required. But language has different meanings in different contexts, so fine-tuning will often be required to optimise a model for specialised domains. Similarly, if a model is able to carry out multi-step instructions and present knowledge to the user in an easily understandable way, then no instruction tuning will be needed.

How many parameters do you need?

Defining a model as having ‘X billion parameters’ means simply that this number of parameters are activated every time it responds. Whilst, in an ideal world, this correlates to a better, more fine-tuned response, we must consider how many parameters are needed for each specific use case.

To make models cost-efficient, we cannot fall into the trap of treating size and capability as a linear progression. We must search for smarter, optimised architectures, not just apply brute force to each use case.

Developers should examine the task, or set of tasks, for which they’re creating an AI model. If it’s a language model, they should consider if it needs to be particularly good at a certain element of natural language processing: such as sentence boundary disambiguation, factual validation, or part-of-speech tagging. `This information will reveal which areas of the architecture demand additional focus, and which areas can be simplified.

To make a viable asset, you need to be able to do fine-tuning, instruction set tuning, and prompt tuning relatively inexpensively. This creates a ‘temperate zone’ for the required number of parameters. Any lower than a couple of billion parameters and performance will suffer. Anything above that and the price point gets out of reach for smaller businesses. As with any resource-intensive project, balance is key.

Make the data work for you, not against you

The idea of ‘data-centric AI’, championed by industry figureheads such as Andrew Ng, posits that we must focus on data quality over quantity. As our algorithms have progressed, and we have access to incredible open-source LLMs to train our own models, it’s time to focus on how we engineer the data to build a cost-efficient model without sacrificing performance – with corporations like Microsoft already working in this direction with Phi-1.

Focus on the collection of high-quality, curated datasets for fine-tuning, ensuring high accuracy and low chances of hallucination whilst cutting down on total training time. In the future, synthetic datasets may become a viable option, allowing us to source the required amount of data for even the most niche domains.

To make AI financially viable for smaller organisations, we must create smarter language models that only consume the necessary amount of power. Once we do that, we’re well on the way to democratising AI and making it accessible for everyone, no matter how specialised or complex their domain.