Menu
IBM speeds deep learning by using multiple servers

IBM speeds deep learning by using multiple servers

IBM's Distributed Deep Learning spreads model training across any number of hardware nodes—as long as they’re IBM nodes

For everyone frustrated by how long it takes to train deep learning models, IBM has some good news: It has unveiled a way to automatically split deep-learning training jobs across multiple physical servers -- not just individual GPUs, but whole systems with their own separate sets of GPUs.

Now the bad news: It's available only in IBM's PowerAI 4.0 software package, which runs exclusively on IBM's own OpenPower hardware systems.

Distributed Deep Learning (DDL) doesn't require developers to learn an entirely new deep learning framework.

It repackages several common frameworks for machine learning: TensorFlow, Torch, Caffe, Chainer, and Theano. Deep learning projecs that use those frameworks can then run in parallel across multiple hardware nodes.

IBM claims the speed-up gained by scaling across nodes is nearly linear. One benchmark, using the ResNet-101 and ImageNet-22K data sets, needed 16 days to complete on one IBM S822LC server. Spread across 64 such systems, the same benchmark concluded in seven hours, or 58 times faster.

IBM offers two ways to use DDL. One, you can shell out the cash for the servers it's designed for, which sport two Nvidia Tesla P100 units each, at about $50,000 a head.

Two, you can run the PowerAI software in a cloud instance provided by IBM partner Nimbix, for around $0.43 an hour.

One thing you can't do, though, is run PowerAI on commodity Intel x86 systems. IBM has no plans to offer PowerAI on that platform, citing tight integration between PowerAI's proprietary components and the OpenPower systems designed to support them.

Most of the magic, IBM says, comes from a machine-to-machine software interconnection system that rides on top of whatever hardware fabric is available.

Typically, that's an InfiniBand link, although IBM claims it can also work on conventional gigabit Ethernet (still, IBM admits it won't run anywhere nearly as fast).

It's been possible to do deep-learning training on multiple systems in a cluster for some time now, although each framework tends to have its own set of solutions.

With Caffe, for example, there's the Parallel ML System or CaffeOnSpark. TensorFlow can also be distributed across multiple servers, but again any integration with other frameworks is something you'll have to add by hand.

IBM's claimed advantage is that it works with multiple frameworks and without as much heavy lifting needed to set things up. But those come at the cost of running on IBM's own iron.

This article originally appeared on InfoWorld.


Follow Us

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags CloudIBMstorageserver

Brand Post

Featured

Slideshows

Reseller News Platinum Club celebrates leading partners in 2019

Reseller News Platinum Club celebrates leading partners in 2019

The leading players of the New Zealand channel came together to celebrate a year of achievement at the annual Reseller News Platinum Club lunch in Auckland. Following the Reseller News Innovation Awards, Platinum Club provides a platform to showcase the top performing partners and start-ups of the past 12 months.

Reseller News Platinum Club celebrates leading partners in 2019
Reseller News hosts alumnae breakfast for Women in ICT Awards

Reseller News hosts alumnae breakfast for Women in ICT Awards

Reseller News hosted its second annual alumnae breakfast for the Women in ICT Awards in New Zealand, designed to showcase the leading female leaders in the industry. Held at The Cordis in Auckland, attendees came together to hear inspiring keynotes and panel discussions, alongside high-level networking among peers. Photos by Gino Demeer.

Reseller News hosts alumnae breakfast for Women in ICT Awards
Reseller News Innovation Awards 2019: meet the winners

Reseller News Innovation Awards 2019: meet the winners

Reseller News honoured the standout players of the New Zealand channel in front of more than 480 technology leaders in Auckland on 23 October, recognising the achievements of top partners, emerging entrants and innovative start-ups.

Reseller News Innovation Awards 2019: meet the winners
Show Comments