Authors:
(1) Geraldo Xexéo, Programa de Engenharia de Sistemas e Computação – COPPE, Universidade Federal do Rio de Janeiro, Brasil;
(2) Filipe Braida, Departamento de Ciência da Computação, Universidade Federal Rural do Rio de Janeiro;
(3) Marcus Parreiras, Programa de Engenharia de Sistemas e Computação – COPPE, Universidade Federal do Rio de Janeiro, Brasil and Coordenadoria de Engenharia de Produção - COENP, CEFET/RJ, Unidade Nova Iguaçu;
(4) Paulo Xavier, Programa de Engenharia de Sistemas e Computação – COPPE, Universidade Federal do Rio de Janeiro, Brasil.
Table of Links
-
2.2 An anedotal model from industry
-
A Model for Commercial Operations Based on a Single Transaction
-
Modelling of a Binary Classification Problem
Abstract
Selecting language models in business contexts requires a careful analysis of the final financial benefits of the investment. However, the emphasis of academia and industry analysis of LLM is solely on performance. This work introduces a framework to evaluate LLMs, focusing on the earnings and return on investment aspects that should be taken into account in business decision making. We use a decision-theoretic approach to compare the financial impact of different LLMs, considering variables such as the cost per token, the probability of success in the specific task, and the gain and losses associated with LLMs use. The study reveals how the superior accuracy of more expensive models can, under certain conditions, justify a greater investment through more significant earnings but not necessarily a larger RoI. This article provides a framework for companies looking to optimize their technology choices, ensuring that investment in cutting-edge technology aligns with strategic financial objectives. In addition, we discuss how changes in operational variables influence the economics of using LLMs, offering practical insights for enterprise settings, finding that the predicted gain and loss and the different probabilities of success and failure are the variables that most impact the sensitivity of the models.
1 Introduction
This article proposes different approaches to large language model (LLM) evaluation by analyzing the financial impact that the adoption of these technologies can have on a business operation. Although the vast majority of work only discusses the performance of LLMs in a set of tasks [Chang et al., 2023], here it is assumed that the process of selecting a Language Model (LLM) for specific tasks within a business context must go beyond performance assessment, looking also at operational aspects and taking into account the expected earnings and return on investment (RoI).
Our motivation is the fact that the selection of an appropriate LLM has become a strategic decision for companies looking to improve and optimize operations and maximize RoI [Gupta, 2024]. With the continuous evolution of artificial intelligence technologies, LLMs offer innovative solutions for a variety of business applications, from virtual assistants to complex data analysis systems. However, choosing between the different models available can be challenging considering variable costs and impacts on the performance of business tasks. Furthermore, the constant evolution of the state-of-the-art, for different performance and cost levels, shows the need to always question the choice of LLM to be used at a certain moment, which requires constant evaluation of new models [Shekhar et al., 2024].
The paper presents different models to analyze the impacts of adopting different LLM based on their impact on earnings and return on investment. Each model supposes a scenario of business operations that includes a business task based on the execution of an LLM task. For example, LLMs can be used in a recommender system associated to an online sale [Zhao et al., 2024], where it will propose a product to be acquired by the client. The business task is to actually sell the recommended system, while the LLM taks is find a correct recommendation. Usually, both tasks can be trained, fine-tuned, or tested using an existing data set. However, actual business results can only be measured while in operation.
Moreover, in each business scenario some information is easy to retrieve, such as cost per token, other can be estimates, averages, or difficult to find. Moreover, different scenarios can also have different results. Each scenario can lead to the creation of a new model to explain the expected RoI and a sensitivity analysis that evaluates how changes in input parameters affect the RoI.
In this paper, we model scenarios in which the gains and benefits are the direct result of one business operation. This is, for example, the case for a recommender system acting during an online sale, which could bring short-term gains that are very quantifiable. In other scenarios, listed as future work, an operation can have only an indirect impact. This would be the case for the detection of fake news in a social network that could enhance the perception of trust in its users, resulting in long-term benefits.
The following section makes some considerations and assumptions for the models. The third section shows the model for the scenario based on the RoI of a single operation that can succeed or not. The fourth section shows the model for the scenario based on a binary classification task. The fifth section describes the model to be used when it is not possible to work with a single operation but actually with the general result of using an LLM in the business. The sixth section discusses some next steps, while the last section shows the conclusions.
This paper is