Menu
IBM speeds deep learning by using multiple servers

IBM speeds deep learning by using multiple servers

IBM's Distributed Deep Learning spreads model training across any number of hardware nodes—as long as they’re IBM nodes

For everyone frustrated by how long it takes to train deep learning models, IBM has some good news: It has unveiled a way to automatically split deep-learning training jobs across multiple physical servers -- not just individual GPUs, but whole systems with their own separate sets of GPUs.

Now the bad news: It's available only in IBM's PowerAI 4.0 software package, which runs exclusively on IBM's own OpenPower hardware systems.

Distributed Deep Learning (DDL) doesn't require developers to learn an entirely new deep learning framework.

It repackages several common frameworks for machine learning: TensorFlow, Torch, Caffe, Chainer, and Theano. Deep learning projecs that use those frameworks can then run in parallel across multiple hardware nodes.

IBM claims the speed-up gained by scaling across nodes is nearly linear. One benchmark, using the ResNet-101 and ImageNet-22K data sets, needed 16 days to complete on one IBM S822LC server. Spread across 64 such systems, the same benchmark concluded in seven hours, or 58 times faster.

IBM offers two ways to use DDL. One, you can shell out the cash for the servers it's designed for, which sport two Nvidia Tesla P100 units each, at about $50,000 a head.

Two, you can run the PowerAI software in a cloud instance provided by IBM partner Nimbix, for around $0.43 an hour.

One thing you can't do, though, is run PowerAI on commodity Intel x86 systems. IBM has no plans to offer PowerAI on that platform, citing tight integration between PowerAI's proprietary components and the OpenPower systems designed to support them.

Most of the magic, IBM says, comes from a machine-to-machine software interconnection system that rides on top of whatever hardware fabric is available.

Typically, that's an InfiniBand link, although IBM claims it can also work on conventional gigabit Ethernet (still, IBM admits it won't run anywhere nearly as fast).

It's been possible to do deep-learning training on multiple systems in a cluster for some time now, although each framework tends to have its own set of solutions.

With Caffe, for example, there's the Parallel ML System or CaffeOnSpark. TensorFlow can also be distributed across multiple servers, but again any integration with other frameworks is something you'll have to add by hand.

IBM's claimed advantage is that it works with multiple frameworks and without as much heavy lifting needed to set things up. But those come at the cost of running on IBM's own iron.

This article originally appeared on InfoWorld.


Follow Us

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags CloudIBMData Centrestorageserver

Featured

Slideshows

EDGE 2018: Kiwis kick back with Super Rugby before NZ session

EDGE 2018: Kiwis kick back with Super Rugby before NZ session

New Zealanders kick-started EDGE 2018 with a bout of Super Rugby before a dedicated New Zealand session, in front of more than 50 partners, vendors and distributors on Hamilton Island.​

EDGE 2018: Kiwis kick back with Super Rugby before NZ session
EDGE 2018: Kiwis assess key customer priorities through NZ research

EDGE 2018: Kiwis assess key customer priorities through NZ research

EDGE 2018 kicked off with a dedicated New Zealand track, highlighting the key customer priorities across the local market, in association with Dell EMC. Delivered through EDGE Research - leveraging Kiwi data through Tech Research Asia - more than 50 partners, vendors and distributors combined during an interactive session to assess the changing spending patterns of the end-user and the subsequent impact to the channel.

EDGE 2018: Kiwis assess key customer priorities through NZ research
Show Comments