Oh My God, the Pile of AI Models Is Killing Us!
The number of AI models practically required can quickly explode and burst the available IT resources. Is there a way out of this trap?
Building software systems that use AI-based analytics methods is no longer a problem. There are many ready-made building blocks. However, depending on the type of application, many individual AI models per user can be needed. For example, this happens, if a user runs different projects and a special AI model needs to be trained and used for each one. This is not uncommon with project-specific classifiers. Over time, this can result in many different models and the user wants to have all available at the same time.
Possibly the software system is licensed in parallel to different customers, e.g. in the form of a SaaS solution. In this case, the number of AI models needed then quickly multiplies to several hundred, if not a thousand.
Of course, requests for analyses to the system should be answered in milliseconds. Unfortunately, machine learning and deep learning models require many resources and a non-negligible ramp-up phase until they are usable. In the end, services would have to be started for all possible models and continuously provided. This is the only way to meet the demand for a response in a few milliseconds for every request.
From a business point of view, this is a no-go: the enormous consumption of resources results in extreme costs per transaction, and it is also an ecological disaster. Viewed individually, the individual services/AI models are used not so often. However, since they all have to be kept running all the time, the result is high energy consumption with little productive benefit. This is really not green IT.
… and the solution?
One way to avoid this trap is to quickly swap models in and out depending on requirements and needs. This sounds easier than it is. After loading an AI model, in many cases a ‘working internal structure’ is created during the ramp-up phase. This internal structure is not normally relocatable in working memory. That means, without an extension to the system that loads and processes the models, the swapping of the running AI models is almost impossible to build.
First, an internal structure of the model is needed that does not depend on a special position in the working memory. For example, pointers to memory areas are forbidden and must not be contained within it.
During the ramp-up phase, the trained model is loaded and converted into the new, internal structure. This puts the AI system in a runnable state. After this initialization step is completed, a dump can be generated from the internal models and saved. Another extension is required for this.
Further customization of the AI system is then used to replace the required models depending on the work requirement. The runnable dumps no longer need a ramp-up phase. So they are immediately usable after the swap. Multiple caching layers can be used for further speed ups of the swapping.
As a result of these changes to the AI system, fast swapping of models within milliseconds is truly possible. Few instances of the AI system can use many different models. This accommodates the use in a microservices architecture. In addition, there is a saving of resources and of course the reduction of energy consumption - Green IT.