Artificial intelligence (AI) and machine learning (ML) can be invaluable tools to spur innovation, but they have different management requirements than typical enterprise IT applications that run at moderate CPU and memory utilisation rates. Because AI and ML tend to run intense calculations at very high utilisation rates, power and cooling costs can consume a higher proportion of the budget than an IT group might expect.
It's not a new problem, but the impact is intensifying.
As more CPU-heavy applications such as data warehousing and business intelligence became prevalent, IT was often oblivious to the electric bill it was racking up – particularly since the bill usually goes to the ops department, not IT.
"The data-science team leaders often have carte blanche to just process anything, anytime," says Mark Swartz, CEO and founder of AI developer Neural. "The days of these luxurious approaches to solving heavy compute requirements will start to trend down in the next five years."
One reason for greater scrutiny of power and cooling costs is that AI often relies on high-performance computing (HPC), while data warehousing and business intelligence applications can be run on standard systems. HPC and AI run much hotter, and no one should be blindsided by the increased bill, says Addison Snell, CEO of Intersect360, a research firm specialising in HPC issues.
"There are costs associated with any type of IT effort that can run hot. If you are not prepared for AI, you might be surprised at the cost of power and cooling if you thought it would be the same as [regular] enterprise IT servers," Snell says.
So what can be done to avoid sticker shock? Here are six steps to take.
1) Shop around for less expensive power options
If you have the option of placing your data center outside of the corporate office, look for good sources of renewable energy, starting with hydroelectric. Hydroelectric power is one of the least expensive sources of electrical power. "There is a reason Microsoft and Google have their data centers located near large sources of water," says Steve Conway, senior advisor for HPC market dynamics at Hyperion Research.
Wind power is also less expensive than fossil fuels, which is why many data centers are located in the Midwest. And electricity is cheaper in rural areas than large cities. The majority of data centers are in major cities for reasons of necessity – northern Virginia is the largest data center market due to its proximity to the federal government – but it's not unheard of to place data centers in Iowa (Microsoft, Google, Facebook), Oklahoma (Google), and New Mexico (Facebook).
In addition, try to run compute-intensive applications at night, when the power rates tend to drop during off-peak hours, Conway says.
2) Use AI to optimize power use
It may seem counterintuitive, but one of the best ways to manage your data center computers is AI itself. It can optimise power and cooling, improve workload distribution, and perform predictive maintenance to warn of impending hardware failure. This is a different type of AI, one of monitoring rather than machine learning, and it isn't as taxing on the system. The servers could also use sensors to watch for peaks in power supply units and CPUs and inform clients if systems are running higher than the norm, says Swartz.
"Just by using AI properly, it can help to reduce energy. There are many, many applications out there which can run more efficiently if people start applying AI," says Jo De Boeck, CSO at imec, a research and development facility focused on digital technologies.
3) Use lower power chips where you can
Machine learning is a two-step process: training and inference. The training portion involves training the system to recognise something, such as images or usage patterns. That's the most processor intensive part. Inference is a simple yes/no question: Is this a match to the model? Significantly less processing power is needed to find a match than it is to train the system to recognise one.
A GPU is the best option for training, but a GPU consumes up to 300 watts of power. You can use a GPU for inference, but why do that when a much lower-power part will do the trick? Intel had a special inference chip, the Nervana, which it has since discontinued in favor of the Habana chip. Nervana in early tests used between 10 and 50 watts of power to perform inference.
The solution is to develop more application-specific hardware, De Boeck says. "So instead of using just CPUs, or GPUs, which are still general purpose, you see more and more specialisation coming in hardware. Special functional unit building blocks are added to the hardware to make the machine-learning algorithms learn more efficiently."
4) Reduce training time
Another way to skirt the power-sucking effects of training is to do less. As you get experienced with training, revisit your training algorithms and see what you can shave off without losing accuracy.
"State-of-the-art inferencing requires lots of training to do simple tasks. People are working on improving inferencing, so as the machine gets smarter, less training is needed to carry it out. Adding more intelligence to inferencing means less training," Conway says.
Training is usually done with single- (32-bit) or double-precision (64-bit) math. The higher the precision, the slower the processing, but the power consumption is unchanged. What many AI developers, including Nvidia and Google, have been saying for some time now is you don't need such precision in most instances, except perhaps image and video processing, where fine graphics accuracy is important.
"There is still a lot of work happening to try to reduce, for example, the number of operations that are required, trying to make these networks as compact as possible, or exploiting specific properties of the algorithms. Companies are trying to exploit the specific features of neural networks by reducing or figuring out that many of the parameters are actually zero and then not executing the computation. So that's a process called pruning," De Boeck says.
Reduced-precision computation has slowly garnered interest over the past few years. The bfloat16 format is a 16-bit floating point format developed by the IEEE and used in Intel's AI processor, Xeon processors and FPGAs, and Google's TPUs and TensorFlow framework. And it has become popular because in most cases it is accurate enough.
5) Always be optimising your training
Also, it's important to redo the inference training regularly to improve and optimise the algorithms, De Boeck says. "In theory, you can run training only a few times in practice, but you cannot say 'I think it's done permanently,'" he says. "These companies are constantly trying to improve the performance of these AI algorithms, so they continuously train or retrain them as well."
Swartz says in his ML/AI experience, his teams have a process in which they all agree on thresholds in training sets and the "bake time" for building/re-building new models. By adding new training information, less time is spent retraining the models.
"All models must incorporate transfer learning, which is a form of locating the delta between two models and only adding the 'new' data into the next training set to be processed. This was manually done by our teams for years, while now we have algorithms that can locate this itself," Swartz says.
6) Look to the cloud
All of the major cloud providers have an AI offering, with Google being at the forefront with its TensorFlow AI processor. That may prove more economical, Snell says, especially if you have to start from scratch.
"People often look to cloud to offset on-prem costs. Whether that is profitable depends on utilization and the provider. Power is consumed somewhere. You pay a cloud provider's power bill as part of the cost. It's not automatically cheaper. And you might want to outsource if you are lacking in skillsets, like data science," he says.