In a recent interview for Apiumhub, Ben Evans, renowned software architect, expert in JVM technologies, and principal engineer at Red Hat, shared his views on the current challenges and potential problems that artificial intelligence (AI) could face in the near future.

As a figure heavily involved in the world of software architecture, Ben’s opinions are weighty, and his analysis sheds light on some critical issues that AI is facing or is about to face.

In this article, we delve into Ben Evans’ interview, where he discusses several challenges that, according to him, LLMs (such as, for example, ChatGPT) are going to face regarding costs and sustainability, accuracy, downtime, data ownership, narrowly focused models, or feedback loops.

The challenges for AI according to Ben Evans

Costs and sustainability

“CHATGPT alone could cost about $700,000 per day to operate.”

According to Ben, the substantial costs associated with operating large language models (LLMs) like GPT-3 can stretch a company’s budget to the limit, which could make them financially unsustainable in the long run.

Think of OpenAI, a company that has revolutionized the technological world thanks to its well-known image and text Artificial intelligence, respectively: DALL-E and ChatGPT. According to the American media outlet “The Information“, before the publication of ChatGPT’s paid AI, the company’s revenues reached 28 million dollars, while its losses amounted to 540 million.

Although, in the case of a company as well known as this one, the launch of the paid version of ChatGPT could improve these numbers, it is foreseeable that the costs will also be increasingly higher, as the platform’s infrastructure must support more users. According to an estimate by SemiAnalysis, the cost of maintaining ChatGPT3 (the free version of Artificial Intelligence) would be around $700,000 per day.

In addition to the economic cost, there are also concerns about environmental sustainability. Not only is energy required to power the servers, but the data sets used to train the model are often extremely large, requiring a lot of energy to process and store. Continuing with the example above, OpenAI alone is estimated to consume 2.7 megawatts of electricity per year, equivalent to the power consumption of 11,000 homes.

Precision

Although they have surprised everyone, LLMs still make accuracy errors that limit their use in commercial settings.

Ben also discusses his concerns about the accuracy of LLMs. Although the published models have impressed everyone, there is still room for improvement in generating accurate and reliable results.

These models process large amounts of text and learn language patterns, which implies answers based on probability rather than on the veracity of the information, as in these recent examples from ChatGPT.

*If you overtake the eighth rider, you become the eighth, taking his place.

*Although the request has its complexity, there are many examples of monovocalic texts.

Other platforms based on these models have sometimes offered controversial answers, as in this screenshot in which Bing’s AI proposes a user say “Heil Hitler!” (Heil Hitler!) and which became quite well known at the time.

*A screenshot of Bing showing anti-Semitic recommendations, with the problem reported by the user.

This tool has also reported other inaccurate cases, such as the one in the following screenshot.

*Bing Chatbot goes into a loop when offering its response.

This type of problem led Bing to limit the number of questions that could be asked per session to five. This limit has been progressively increased as problems with the platform have been solved.

These issues can be curious at a given moment but also problematic, depending on the context. Think, for example, of the customer service uses that are being made of LLMs. A model that is incapable of generating correct information or that offers biased information could lead to damage to the company’s reputation, causing frustration among customers and migration to a competitor. U.S. companies alone lose $136.8 billion a year due to avoidable customer churn, according to CallMiner.

Narrowly Focused Models

“Narrowly Focused Models” refers to artificial intelligence or machine learning models that are designed to perform specific or limited tasks rather than a variety of general tasks.

Ben Evans suggests that an alternative to many of the accuracy failures of LLMs could be a shift to specialized language models. and that these models could provide highly accurate results within specific domains, such as programming, or machine translation of text, for example.

Downtime

The downtime experienced by some large language models is also one of the issues Ben points out. Downtime directly affects the usability of the model. This has an impact on productivity, profits, and company productivity. These costs are difficult to calculate because they are also context-dependent. A downtime problem during a period of peak usage can mean more losses than at other times.

According to a study by the Ponemon Institute, the average cost of these downtime problems has been increasing in percentage terms since studies began more than a decade ago. The average cost of a data center outage increased from $690,204 in 2013 to $740,357 in 2016 (the date of the study).

Data Ownership

Who owns the data used for internet-based LLM training?

Data protection is another major concern about these models. Determining who owns the data used for training can become a legal and ethical challenge.

When an LLM generates content based on a wide range of internet sources, it is difficult to identify and attribute data sources, which can involve copyright and privacy conflicts. When the machine aggregates and combines the data, it can be extremely difficult to determine which pieces come from which sources and, therefore, difficult to determine who owns the rights to the data and whether it has been used with the permission of the original owner. As a result, there is a risk that the data may be used without the owner’s permission, which can lead to legal problems that have not yet been resolved.

Without going any further, the graphic and artistic content website DeviantArt has filed a class action lawsuit against Stable Diffusion and Midjourney, two of the most well-known image-generating AIs, considering that these AIs are based on the previous work of thousands of artists who are, in many cases, hosted on the plaintiff platform. The plaintiffs understand that, although the AI-generated images are new, the platforms that generate them have been trained (and therefore profited) from images that were subject to copyright without having permission or giving any compensation or credit.

And not only this. GitHub Copilot has also received a class action lawsuit against it for similar reasons. The “Codex” model on which this platform is based has been trained with code from “millions of public repositories”. The plaintiffs consider that this violates the “open source” licenses to which this code was subject.

The plaintiff community also considers that these self-interested uses by private companies jeopardize the very existence of the open source communities, which now see how their code available “in open source” can be used to generate software for 100% commercial use, which is not in line with the philosophy of these communities.

The judicial resolution of this conflict remains to be seen. Ethical considerations are also open to discussion.

Artificial Intelligence Feedback Loops

Finally, Ben Evans discusses the problem of managing feedback loops between the training data and the text generated in LLMs. These loops can result in undesirable behavior, deteriorating the model.

Until their inception, LLMs have been trained with “human-generated” input, but since their release for general public use, more and more LLM-generated text has been dumped on the web, increasing the risk that the platform will continue to be trained, in turn, with text generated by itself. An extreme that would perpetuate biases when generating content, creating a feedback loop of misinformation that deteriorates the quality of the model’s response.

The LLM is not able to discern whether the content it produces is truthful or not, so in the face of this problem, the model would continue to create “junk” content that, for the moment, it is not trained to recognize.

Ben’s considerations shed light on the multifaceted challenges facing AI, emphasizing the need for constant technical improvement as well as encouraging ethical considerations. While AI presents tremendous potential, it also requires meticulous scrutiny and responsible development to overcome these obstacles.

Ben Evans discusses these AI challenges as part of a broader interview. You can watch the full interview here.

Ben will also be a featured speaker at the Global Software Architecture Summit (GSAS 2023), a 3-day event that brings together software architects and enthusiasts in the industry to learn and connect. You can purchase your ticket here.

The Frontiers of AI: Interview with Ben Evans