Integrating LLMs in existing web applications is becoming the norm. Also, there are more and more AI native companies. These create autonomous agents putting the LLM in the center and giving it tools allowing it to perform actions on different systems.
In this post I will present a new project called Offload, which allows you to move all that processing to the user devices, increasing their data privacy and reducing the inference costs.
The 2 problems
The are two big concerns when integrating AI in an application: Cost and user data privacy.
1. Cost. The typical way to connect an LLM is to use a third-party API, like OpenAI, Anthropic, or others, there are many alternatives in the market. These APIs are very practical, with just an HTTP request you can easily integrate an LLM into your application. However, these APIs are expensive at scale. They are putting big efforts into reducing the cost, but if you make many API calls per user per day the bill becomes huge.
2. User data privacy. Using third-party APIs for inference is not the best alternative if you work with sensitive user data. These APIs often use the data you send to continue training the model which can expose your confidential data. Also, the data could become visible at some level when it reaches the third-party API provider (for example in a logging system). This is not just a problem for companies, but also for consumers that may not want to send their data to those API providers.
Addressing them
Offload addresses both problems at once. The application "invokes" the LLM via an SDK that behind the scenes runs the model directly on each user device instead of calling a third-party API. This saves money on the inference bill because you do not need to pay for API usage and maintain the user data within each user device, not needing to send it to any API.
If this is of your interest and want to remain in the loop, check out the Offload website here