Cloud platforms, managed service providers, and organisations undertaking digital transformations are beginning to reap the benefits of an emerging IT trend: the use of AI-powered IT operations technology to monitor and manage the IT portfolio automatically.
This emerging practice, known as AIOps, is helping enterprises head off potential outages and performance issues before they negatively impact operations, customers, and the bottom line. But the more advanced deployments are beginning to use AI systems not just to identify issues, or to predict issues before they happen, but to react to events with intelligent, automated mitigation.
But what exactly is AIOps and how are organisations putting it to use today? Here we take a deeper look at the technologies, strategies, and challenges of AI-assisted IT operations.
What is AIOps?
AIOps is an emerging IT practice that applies artificial intelligence to IT operations to help organisations intelligently manage infrastructure, networks, and applications for performance, resilience, capacity, uptime, and, in some cases, security. By shifting traditional, threshold-based alerts and manual processes to systems that take advantage of AI and machine learning, AIOps enables organisations to better monitor IT assets and anticipate negative incidents and impacts before they take hold.
Carhartt CIO John Hill leverages AIOps at the work-apparel retailer in three main areas: service management, performance management, and IT automation. Thanks to intelligent monitoring, Carthartt can now spot problems before they impact users or customers.
"It's the whole process of monitoring your environment and understanding what's going on — and taking actions based on those indicators," he says. "Previously, you would rely on an outage or some indication that something isn't working" to know when a fix was needed — events likely to have already degraded customer experience before you knew of them.
Many AIOps platforms have been built on monitoring systems with a long history. Others began in AI labs and grew outwards. Good AIOps tools generate forward-looking guesses about machine load and then watch to see whether anything deviates from these estimates. Anomalies might be turned into alerts that generate emails, Slack posts, or, if the deviation is large enough, pager messages. Sophisticated AIOps tools also offer “root cause analysis,” which creates flowcharts to track how problems can ripple through the various machines in a modern enterprise application. Anyone considering adopting an AIOps platform will want to evaluate how well each AIOps offering integrates with your particular databases and services. The following AIOps tools are among the best available today:
- GitHub Copilot
- IBM Watson Cloud Pak for AIOps
- New Relic One
For a deeper look at these tools, see “Top 10 AIOps platforms.”
AIOps use cases
AIOps may already be at work in your IT portfolio without you even knowing it. Advanced CRM or ERP systems often have intelligent management built in. Most major cloud platforms make use of machine learning–powered monitoring and management tools as well.
But relying on built-in functionality within point solutions has its downsides. Sixty-five percent of IT organisations in an AIOps Exchange survey said they still rely on monitoring approaches — whether intelligent or not — that are either siloed, rules-based or don’t cover the needs of their entire IT environment. Moreover, according to a recent BigPanda survey, 42 percent of IT organisations use more than 10 different monitoring tools for their IT environments.
That was how Carhartt started with AIOps. "Previously, for the different environments, we’d have to monitor them independently," Hill says. To manage this complexity, Hill opted to combine monitoring onto two platforms, settling first on AppDynamics for application performance monitoring, and later adding Turbonomic to keep tabs on Carhartt’s infrastructure.
Performance issues on the company's website during Black Friday and Cyber Monday shopping rushes forced the need for a change. By the time the company saw the problems, customers had already felt the service degradation, Hill says.
Since Carhartt deployed AppDynamics in the fall of 2017, spikes during Black Friday and Cyber Monday have been met with zero downtime.
"We had record growth," he says. "We grew double the rate of the industry as a whole, without any of the outages or performance degradation that we had experienced previously."
Carhartt added Turbonomic in early 2019 for resource management of both on-prem and cloud environments. With the new system, utilisation has increased from 70 to 92 percent, he says. "It probably saved us 25 percent of infrastructure costs."
Increased utilisation needs are processed automatically, without human intervention, while decreases in capacity still require human approval.
"It sees that we've got a capacity challenge and it puts a change request through to ServiceNow," Hill says. "When we have too much capacity, it creates a ticket in ServiceNow, and someone looks at it first. It's a quick review — just a click. For now, I don't need to automate it."
The next step for the company is automating business tasks, such as processing customer orders using text recognition and natural language processing.
Read more on the next page...