“Observability” might prove to be the buzzword of the year, and logging and monitoring specialist Splunk is joining other software tools developers to address the market with its latest product, Observability Cloud.
The managed cloud service aims to bring together infrastructure, application performance, digital experience, and synthetic monitoring capabilities with log investigation and “smart” incident response for a holistic observability platform.
“Observability” is a broad term that has its roots in traditional engineering principles of control theory, where the measure of the internal state of a system can be observed using only its external outputs.
For software specifically, it has been coopted to represent the bringing together of various raw outputs like metrics, events, logs, and traces, to allow software developers to gain a real-time picture of how their systems are performing and where issues might be occurring, at scale and without any data sampling.
The rise of applications being constructed using microservices spread across containers orchestrated by Kubernetes has made it harder than ever for software developers, devops teams, and software reliability engineering (SRE) teams to keep track of how their complex, distributed systems are performing—and even harder to pinpoint issues and resolve them before they affect the user.
Splunk, alongside rival vendors such as Sumo Logic, Dynatrace, Data Dog, New Relic and newcomer Honeycomb, are increasingly looking to offer a comprehensive observability suite out of the box, rather than relying on developers to bring together the various tools and instrumentation techniques required to gain any semblance of observability today.
With Observability Cloud, Splunk is providing a single interface for various core logging, monitoring, and incident-response dashboards and tools, including machine learning-powered alerting. The key elements of the product include:
- Splunk Infrastructure Monitoring
- Splunk APM for application performance monitoring
- Splunk Log Observer
- Splunk Real User Monitoring (RUM)
- Splunk Synthetic Monitoring for APIs, service endpoints, business transactions, and user flows
- Splunk On-Call for incident response
Observability Cloud is built on OpenTelemetry, an open source project that came about as a merger between OpenCensus and OpenTracing and aims to create an open industry standard for telemetry data from cloud-native software and applications. The sandbox project currently sits with the Cloud Native Computing Foundation (CNCF) and has gathered broad support.
“We realised that proprietary instrumentation is bad for users because they don’t have control of that data and it creates data silos that aren’t connected at the source. Now we have standardised and made that open source,” said Spiros Xanthos, VP of product management, observability, and IT operations at Splunk.
Now, by connecting those data sources, Splunk can better layer on “smart” machine-learning-enabled analytics, which is promised to spot performance issues automatically and “maybe even go further to find the root cause,” Xanthos said.
One early customer is the computer and personal electronics manufacturer Lenovo, which says it has been able to reduce its mean time to resolution for incidents from 30 minutes to under five minutes and maintain 100 per cent uptime during peak traffic loads on the very busy Black Friday online shopping period by using the Observability Cloud in beta.
Splunk Observability Cloud is now generally available; it is priced by host, based on infrastructure size.