Analytics are core to all modern software-as-a-service (SaaS) applications. There is no way to successfully operate a SaaS application without monitoring how it is performing, what it’s doing internally, and how successful it is at accomplishing its goals.
However, there are many types of analytics that modern applications need to monitor and examine. The purpose, value, accuracy, and reliability of those analytics vary greatly depending on how they are measured, how they are used, and who makes use of them.
There are essentially three classes of analytics with radically different use cases.
Class A analytics
Class A analytics are metrics that are application mission-critical. Without these analytics, your application could fail in real time. These metrics are used to evaluate the operation of the application and adjust how it is performing and dynamically make adjustments to keep the application functioning.
The analytics are part of a feedback loop that constantly monitors and improves the operational environment of the application.
A prime example of Class A analytics are metrics used for autoscaling. These metrics are used to dynamically change the size of your infrastructure to meet the current or expected demands as the load on the application fluctuates.
If a specific metric reaches specific criteria, AWS Auto Scaling will add or remove Amazon EC2 instances from an application, automatically adjusting the resources that are used to operate the application. It will add instances when additional resources are needed, and remove those instances when the metrics indicate the resources are no longer needed.
AWS Auto Scaling allows you to create a service, composed of any number of EC2 instances, and automatically add or subtract servers based on traffic and load requirements. When traffic is lower, fewer instances will be used. When traffic is higher, more instances will be used.
As an example, AWS Auto Scaling might use a CloudWatch metric that measures the average CPU load of all the instances being used for a service. Once the CPU load goes above a certain threshold, AWS Auto Scaling will add an additional server to the service pool.
Note that, if for some reason those Amazon CloudWatch metrics are not available or they are inaccurate, then the algorithm cannot function, and either too many instances will be added to the service, which will waste money, or too few instances will be added to the service, which could result in the application browning out or failing outright.
Clearly, these metrics are truly essential. The very operation of the application is jeopardised if they are not available and correct. As such, they are Class A metrics.
AWS Elastic Load Balancing is another great example. AWS automatically adjusts the size and number of instances necessary to operate the traffic load balancing service for a particular use case, depending on the current amount of traffic going to each load balancer.
As traffic increases, the load balancer is moved automatically to larger instances or more instances. As traffic decreases, the load balancer is moved automatically to smaller instances or fewer instances. All of this is automatic, based on internal algorithms making use of specific CloudWatch metrics. If those metrics are not available or they are incorrect, the load balancer won’t size appropriately, and the ability of the load balancer to handle the traffic load could suffer.
Class B analytics
Class B analytics are metrics that are not business-critical, but are used as early indicators of impending problems, or are used to solve problems when they arise. Class B analytics can be important for preventing or recovering from system outages.
Class B metrics typically give insights into the internal operation of the application or service, or they give insights into the infrastructure that is operating the application or service. These insights can be used proactively or reactively to improve the operation of the application or service.
Proactively, Class B metrics can be monitored for trends that indicate an application or service might be misbehaving. Based on those trends, the metrics can be used to trigger alerts to indicate that the operations team must examine the system to see what might be wrong.
Reactively, during a system failure or performance reduction, Class B metrics can be examined historically to determine what might have caused the failure or the performance issue, in order to determine a solution to the problem. These metrics are often used during site failure events, and afterward during postmortem examinations.
During a failure event, Class B metrics are used to quickly determine what went wrong, and how to fix the problem. Afterward, they are used to improve the Mean Time To Detection (MTTD)—the amount of time it takes on average to find a problem during an outage—and the Mean Time To Repair (MTTR)—the amount of time to determine how to fix a problem during an outage. Both of these are critical goals for high-performance SaaS applications.
Yet, these metrics are not the same level of criticality as Class A metrics. If a Class A metric fails, your application could fail. But if a Class B metric fails, your application won’t fail. However, if your application has an issue, it might take longer to find and fix the problem if your Class B metrics aren’t functioning correctly.
Read more on the next page...