A sub-category of machine learning, deep learning uses multi-layered neural networks to automate historically difficult machine tasks—such as image recognition, natural language processing (NLP), and machine translation—at scale.
TensorFlow, which emerged out of Google in 2015, has been the most popular open source deep learning framework for both research and business. But PyTorch, which emerged out of Facebook in 2016, has quickly caught up, thanks to community-driven improvements in ease of use and deployment for a widening range of use cases.
PyTorch is seeing particularly strong adoption in the automotive industry—where it can be applied to pilot autonomous driving systems from the likes of Tesla and Lyft Level 5. The framework also is being used for content classification and recommendation in media companies and to help support robots in industrial applications.
Joe Spisak, product lead for artificial intelligence at Facebook AI, told InfoWorld that although he has been pleased by the increase in enterprise adoption of PyTorch, there’s still much work to be done to gain wider industry adoption.
“The next wave of adoption will come with enabling lifecycle management, MLOps, and Kubeflow pipelines and the community around that,” he said. “For those early in the journey, the tools are pretty good, using managed services and some open source with something like SageMaker at AWS or Azure ML to get started.”
Disney: Identifying animated faces in movies
Since 2012, engineers and data scientists at the media giant Disney have been building what the company calls the Content Genome, a knowledge graph that pulls together content metadata to power machine learning-based search and personalisation applications across Disney’s massive content library.
“This metadata improves tools that are used by Disney storytellers to produce content; inspire iterative creativity in storytelling; power user experiences through recommendation engines, digital navigation and content discovery; and enable business intelligence,” wrote Disney developers Miquel Àngel Farré, Anthony Accardo, Marc Junyent, Monica Alfaro, and Cesc Guitart in a blog post in July.
Before that could happen, Disney had to invest in a vast content annotation project, turning to its data scientists to train an automated tagging pipeline using deep learning models for image recognition to identify huge quantities of images of people, characters, and locations.
Disney engineers started out by experimenting with various frameworks, including TensorFlow, but decided to consolidate around PyTorch in 2019.
Engineers shifted from a conventional histogram of oriented gradients (HOG) feature descriptor and the popular support vector machines (SVM) model to a version of the object-detection architecture dubbed regions with convolutional neural networks (R-CNN). The latter was more conducive to handling the combinations of live action, animations, and visual effects common in Disney content.
“It is difficult to define what is a face in a cartoon, so we shifted to deep learning methods using an object detector and used transfer learning,” Disney Research engineer Monica Alfaro explained to InfoWorld. After just a few thousand faces were processed, the new model was already broadly identifying faces in all three use cases. It went into production in January 2020.
“We are using just one model now for the three types of faces and that is great to run for a Marvel movie like Avengers, where it needs to recognise both Iron Man and Tony Stark, or any character wearing a mask,” she said.
As the engineers are dealing with such high volumes of video data to train and run the model in parallel, they also wanted to run on expensive, high-performance GPUs when moving into production.
The shift from CPUs allowed engineers to re-train and update models faster. It also sped up the distribution of results to various groups across Disney, cutting processing time down from roughly an hour for a feature-length movie, to getting results in between five to 10 minutes today.
“The TensorFlow object detector brought memory issues in production and was difficult to update, whereas PyTorch had the same object detector and Faster-RCNN, so we started using PyTorch for everything,” Alfaro said.
That switch from one framework to another was surprisingly simple for the engineering team too. “The change [to PyTorch] was easy because it is all built-in, you only plug some functions in and can start quick, so it’s not a steep learning curve,” Alfaro said.
When they did meet any issues or bottlenecks, the vibrant PyTorch community was on hand to help.
Blue River Technology: Weed-killing robots
Blue River Technology has designed a robot that uses a heady combination of digital way-finding, integrated cameras, and computer vision to spray weeds with herbicide while leaving crops alone in near real time, helping farmers more efficiently conserve expensive and potentially environmentally damaging herbicides.
The Sunnyvale, California-based company caught the eye of heavy equipment maker John Deere in 2017, when it was acquired for $305 million, with the aim to integrate the technology into its agricultural equipment.
Blue River researchers experimented with various deep learning frameworks while trying to train computer vision models to recognise the difference between weeds and crops, a massive challenge when you are dealing with cotton plants, which bear an unfortunate resemblance to weeds.
Highly-trained agronomists were drafted to conduct manual image labelling tasks and train a convolutional neural network (CNN) using PyTorch “to analyse each frame and produce a pixel-accurate map of where the crops and weeds are,” Chris Padwick, director of computer vision and machine learning at Blue River Technology, wrote in a blog post in August.
“Like other companies, we tried Caffe, TensorFlow, and then PyTorch,” Padwick told InfoWorld. “It works pretty much out of the box for us. We have had no bug reports or a blocking bug at all. On distributed compute it really shines and is easier to use than TensorFlow, which for data parallelisms was pretty complicated.”
Padwick says the popularity and simplicity of the PyTorch framework gives him an advantage when it comes to ramping up new hires quickly. That being said, Padwick dreams of a world where “people develop in whatever they are comfortable with.
Some like Apache MXNet or Darknet or Caffe for research, but in production it has to be in a single language, and PyTorch has everything we need to be successful.”
Datarock: Cloud-based image analysis for the mining industry
Founded by a group of geoscientists, Australian startup Datarock is applying computer vision technology to the mining industry. More specifically, its deep learning models are helping geologists analyse drill core sample imagery faster than before.
Typically, a geologist would pore over these samples centimetre by centimetre to assess mineralogy and structure, while engineers would look for physical features such as faults, fractures, and rock quality. This process is both slow and prone to human error.
“A computer can see rocks like an engineer would,” Brenton Crawford, COO of Datarock told InfoWorld. “If you can see it in the image, we can train a model to analyse it as well as a human.”
Similar to Blue River, Datarock uses a variant of the RCNN model in production, with researchers turning to data augmentation techniques to gather enough training data in the early stages.
“Following the initial discovery period, the team set about combining techniques to create an image processing workflow for drill core imagery. This involved developing a series of deep learning models that could process raw images into a structured format and segment the important geological information,” the researchers wrote in a blog post.
Using Datarock’s technology, clients can get results in half an hour, as opposed to the five or six hours it takes to log findings manually. This frees up geologists from the more laborious parts of their job, Crawford said. However, “when we automate things that are more difficult, we do get some pushback, and have to explain they are part of this system to train the models and get that feedback loop turning.”
Like many companies training deep learning computer vision models, Datarock started with TensorFlow, but soon shifted to PyTorch.
“At the start we used TensorFlow and it would crash on us for mysterious reasons,” Duy Tin Truong, machine learning lead at Datarock told InfoWorld. “PyTorch and Detecton2 was released at that time and fitted well with our needs, so after some tests we saw it was easier to debug and work with and occupied less memory, so we converted,” he said.
Datarock also reported a 4x improvement in inference performance from TensorFlow to PyTorch and Detectron2 when running the models on GPUs — and 3x on CPUs.
Truong cited PyTorch’s growing community, well-designed interface, ease of use, and better debugging as reasons for the switch and noted that although “they are quite different from an interface point of view, if you know TensorFlow, it is quite easy to switch, especially if you know Python.”