To receive industry-leading AI updates and exclusive content, sign up for our daily and weekly newsletters. Learn more
Over 40% of marketing, sales and customer service organizations Uses generative AI This comes second only to IT and cybersecurity. Of all generations of AI technologies, conversational AI will see rapid adoption in these sectors due to its ability to bridge the current communication gap between businesses and their customers.
But many of the marketing business leaders I speak with are at a crossroads on how to start implementing the technology: They’re unsure of which of the available large-scale language models (LLMs) to choose, whether to choose open or closed source, and they’re worried about spending too much money on a new and unknown technology.
Companies can buy off-the-shelf conversational AI tools, or if it’s core to their business, they can build their own in-house.
To help ease some of the anxiety for those who choose to build one, I would like to share some of the internal research my team and I did on our own to find the best LLM to build conversational AI in. We spent time looking at different LLM providers and finding out how much each one would cost depending on their inherent costs and expected usage from our target users.
We decided to compare GPT-4o (OpenAI) with Llama 3 (Meta), as these are two of the leading LLMs most companies consider and we consider them to be the highest quality models. We also wanted to compare closed source (GPT) with open source (Llama) LLMs.
How do you calculate LLM cost in Conversational AI?
The two main financial points to consider when choosing an LLM are the set-up costs and the final processing costs.
The setup costs include all the costs involved in getting LLM up and running towards your end goal, including development and operational costs. The processing costs are the actual cost of each conversation once the tool is up and running.
When it comes to setup, the cost vs value ratio depends on what you want your LLM to do and how often you use it. If you need to deploy the product as quickly as possible, In that case, you might be willing to pay a premium for a model like GPT-4o that requires little to no setup. Setting up Llama 3 might take a few weeks, but that time could allow you to fine-tune your GPT product for the market.
However, if you manage a large number of clients or want more control over your LLM, it may be worth incurring larger setup costs up front to reap greater benefits down the line.
When it comes to conversation processing cost, we look at token usage, which is the most directly comparable. LLMs like GPT-4o and Llama 3 use a basic metric called a “token”, which is a unit of text that these models can process as input and output. There is no universal standard for how a token is defined across different LLMs; some calculate tokens per word, per subword, per character, or other variations.
All these factors make it difficult to make an exact comparison of LLMs, but we have approximated this by simplifying as much as possible the inherent costs of each model.
While GPT-4o is cheaper in terms of initial outlay, over time we have found that Llama 3 is exponentially more cost-effective. Let’s look at why, starting with setup considerations.
What is the basic cost for each LLM?
Before we dive into the cost per conversation for each LLM, we need to understand how much it will cost to get there.
GPT-4o is a closed-source model hosted by OpenAI, so setup is minimal – all you need to do is configure your tool to ping GPT’s infrastructure and data libraries through simple API calls.
Llama 3, on the other hand, is an open-source model and must be hosted on your own private server or cloud infrastructure provider. Companies can download the model components for free, but it’s up to you to find a host.
The thing to consider here is hosting costs: unless you buy your own servers (which is relatively unlikely to begin with), you’ll need to pay to use a cloud provider’s infrastructure, and each provider may have different ways of customizing their pricing structure.
Most hosting providers “rent” instances and charge for computing power by the hour or second. For example, AWS’ ml.g5.12xlarge instances charge per server hour. Other providers might bundle your usage into different packages and charge a flat annual or monthly fee based on various factors, such as your storage needs.
However, the provider, Amazon Bedrock, calculates costs based on the number of tokens processed, so even with low usage it can be a cost-effective solution for your business. Bedrock is a serverless platform managed by AWS, which also simplifies the deployment of LLM by handling the underlying infrastructure.
Beyond the direct costs, to make conversational AI work with Llama 3, you will need to allocate more time and money to operations, such as the initial selection and setup of a server or serverless option, performing maintenance, etc. You will also need to spend more money on developing things like error logging tools and system alerts for issues that may arise with the LLM server.
Key factors to consider when calculating a basic cost-to-value ratio include the time it will take to deploy, your level of usage of the product (if you’re processing millions of conversations per month, setup costs will quickly outweigh any bottom-line savings), and the level of control you need over the product and your data (an open source model is best here).
What is the cost per conversation for a major LLM?
Now you can find out the base cost of each unit of conversation.
In our modeling, we used the following heuristic: 1,000 words = 7,515 characters = 1,870 tokens.
We assumed that the average consumer conversation would result in a total of 16 messages between the AI and humans, which equates to 29,920 tokens of input and 470 tokens of output, for a total of 30,390 tokens (though the inputs will be much higher due to prompt rules and logic).
In GPT-4o, price At $0.005 per 1,000 input tokens and $0.015 per 1,000 output tokens, a “benchmark” conversation would cost about $0.16.
GPT-4o Input/Output | Number of tokens | Price per 1,000 tokens | Fee |
Input Token | 29,920 | $0.00500 | $0.14960 |
Output Token | 470 | $0.01500 | $0.00705 |
Total cost per conversation | $0.15665 |
For Llama 3-70B on AWS Bedrock, price At $0.00265 per 1,000 input tokens and $0.00350 per 1,000 output tokens, the cost of a “benchmark” conversation would be approximately $0.08.
Llama 3-70B Input/Output | Number of tokens | Price per 1,000 tokens | Fee |
Input Token | 29,920 | $0.00265 | $0.07929 |
Output Token | 470 | $0.00350 | $0.00165 |
Total cost per conversation | $0.08093 |
In summary, once the two models are fully set up, a conversation run on Llama 3 costs almost 50% less than an equivalent conversation run on GPT-4o, although server costs must be added to the Llama 3 calculation.
Please note that this is only a snapshot of the full cost of each LLM; many other variables will come into play when building a product for your unique needs, such as whether you use a multi-prompt or single-prompt approach.
For companies that plan to leverage conversational AI as a core service but not as a fundamental element of their brand, the investment in building AI in-house may not be worth the time and effort when compared to the quality they can get from an off-the-shelf product.
Whatever path you choose, integrating conversational AI can be extremely beneficial, but always make sure it fits your situation and your customers’ needs.
Sam Oliver is a Scottish technology entrepreneur and serial startup founder.
Data Decision Maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists working with data, can share data-related insights and innovations.
If you want to hear about cutting edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
You might also consider contributing your own article.
Learn more about DataDecisionMakers