Intrusions where hackers compromise the infrastructure of software developers and Trojanise their legitimate updates are hard to detect by users of the impacted software products, as highlighted by multiple incidents over the past several years.
Researchers agree there is no silver bullet solution, but network defenders can use a combination of techniques to detect subtle changes in how critical software and the systems it's deployed on behave.
Researchers from security analytics firm Splunk have recently analysed several such techniques that rely on building unique fingerprints to identify which software applications establish HTTPS connections.
The premise is that malware programs, regardless of how they're delivered, often come with their own TLS libraries or TLS configuration and their HTTPS handshakes would be identifiable in traffic logs when compared to TLS client hashes of pre-approved applications.
Leveraging the JA3 standard
The idea of using SSL/TLS client fingerprints to identify malicious traffic on networks is not new. Researchers from Salesforce developed a standard called JA3 that builds an MD5 hash out of several attributes of a client's TLS configuration that are sent in the so-called “client hello” -- the starting message of a TLS handshake.
Attributes such as SSL version, accepted ciphers, list of extensions, elliptic curves and elliptic curve formats are concatenated and an MD5 hash is calculated from the result. While it's possible for different client applications to have similar TLS configurations, the combinations of multiple attributes are believed to be unique enough to be used to identify programs.
A similar approach can be used to build fingerprints for TLS servers based on attributes in the “server hello” messages. This is called JA3S but can be less reliable because servers support multiple TLS versions and configurations and will adapt their response based on what the client supports and requests. So, the same server will respond with a different set of TLS configuration attributes to different clients.
Since its creation, JA3S support had been added to many network monitoring and security tools, both open source and commercial. The Splunk team has investigated the effectiveness of several methodologies for using JA3 in the context of the Splunk data analysis platform with several types of queries.
"In our testing using real-world enterprise data, along with data generated from our testing environments, the results showed it is highly probable anomalous activity can be detected via abnormal JA3S hashes," the researchers said in a newly released paper. "However, your mileage may vary depending on many factors. In all likelihood, an allow list will be required to limit the number of perceived false positives."
The best results will be for detecting malicious traffic from critical servers that run a limited number of applications that are allowed to initiate outbound connections.
For example, this was the case in the SolarWinds supply-chain attack, where hackers delivered a Trojanised binary as part of updates for the company's Orion infrastructure monitoring and management platform. Since its goal is to monitor infrastructure, SolarWinds Orion needs to often have unrestricted network access and usually runs on a dedicated machine.
First seen and rarest JA3 fingerprints
Two types of queries that the Splunk team investigated are meant to detect "first seen" and "rarest" occurrences of JA3 fingerprints inside network traffic from a particular host system.
The results of both these queries could indicate potentially unauthorised applications, such as backdoors and Trojans, making HTTPS connections. "Detecting abnormal activity via a first seen query proved helpful when the analyst was familiar with network activity and leveraged an allow list [of JA3S hashes and/or server names to filter out known entities]," the researchers said.
The time window for the queried traffic data is also important because if it's too wide or narrow the malicious activity could be missed. The researchers found that a time window of seven days gave the best results with this type of query; the malicious requests appeared in the top 20 query results.
Using an appropriate time frame and allow list are even more important for rarest type queries that are meant to highlight the least frequently occurring JA3s hash by server_name in a data set. The results for this type of query were inconsistent across various time windows, with many false positives, leading the researchers to conclude that the results of such queries should only be used in combination with the results of other queries to identify suspicious connections.
The researchers then used a native Splunk command called anomaly detection that can be used to filter the results of queries and compute a probability score for each event. This proved effective but required tweaking the probability threshold value depending on the amount of data collected and desired sensitivity.
"Leveraging the anomaly detection command proved to be highly effective at identifying malicious abnormal activity over a 24- to 48-hour period," the researchers said.
"Periods longer than this reduced the effectiveness of the query. In experiments of smaller networks with a single /24 netblock, the known malicious activity was consistently identified without an allow list in the top 30 events. However, in networks with multiple or more extensive net-blocks, this was not the case."
Improving triage of malicious activity
To improve the speed of the analysis, the researchers also devised a technique for storing and ordering the query results in a lookup table and then running other queries against that table. While this did not improve the anomaly detection accuracy, it made it much more scalable and around 100 times faster to use for day-to-day operations.
Finally, combining the JA3 query results with data from Sysmon, the Windows system monitoring service, allowed much more powerful triaging of potential malicious activity. That's because Sysmon adds information about process execution.
"This will allow for correlating Windows processes with JA3S hashes along with the server_name," the researchers explained.
"For instance, we will be able to identify a powershell.exe process connecting to an external host. In order to collect the relevant data, Sysmon must be configured to collect network connection initiated (EventCode 3) events. Olaf Hartong has written and open-sourced a utility to modularly configure Sysmon, which may be the easiest way to collect the required data quickly."
Limitations of TLS fingerprinting
While these methodologies were researched by the Splunk team in the context of the company's data analysis platform, they are not limited to it and can be adapted to be used with other data collection and traffic analysis tools.
One big limitation, however, is that these techniques are prone to generating a lot of false positives when used to detect anomalies across endpoints where general browsing is allowed or that generate a large volume of HTTPS traffic from different applications to different servers.
In other words, these anomaly detection techniques work best when applied to high-risk, high-value applications servers, or systems that run in the most critical network segment.
"In this case, what we're looking at is using something like SolarWinds, which will be a high risk to your assets and your network," Ryan Kovar, distinguished security strategist at Splunk, tells CSO. "Can you actually look at changes coming in? Now, does that mean that you might not be detecting everything across your Adobe Reader and all these different [widely deployed applications in the environment]?
"One hundred per cent and we understand that, but you can actually start making incremental progress on some of these high-risk application servers. Maybe you are looking at something like Microsoft, but you're not looking at every single desktop host.
"You're looking at your Windows Server 2016. Maybe you're looking at those assets that are specifically in your tier zero networking area. That is where we've looked at being able to reduce the scale and have done testing with customers and various partners who've been able to look at this and see value."
Of course, software supply-chain attacks don't impact just server-type applications that are deployed on a limited number of systems. In 2017, hackers managed to compromise the infrastructure of CCleaner, a system optimisation and clean-up tool and delivered a Trojanised update to 2.2 million computers belonging to both consumers and companies.
The second-stage payload was only delivered to a limited number of high-value targets that included tech companies. That same year, Microsoft reported an attack where hackers compromised the update mechanism for a third-party editing tool used by enterprises. These are the type of products that can be widely deployed on workstations and employee laptops inside an organisation.
However, there is no single product or technique that will be able to completely detect and block software supply-chain attacks because these attacks exploit the legitimate privileges that installed applications have on computers and their update distribution channels. Going back to the days where software updates were not deployed for months or years leaving systems vulnerable to publicly known exploits is not a solution either.
Another problem is that many companies don't have up-to-date software and IT asset inventories to know which applications are running on which computers on their network at all times.
This is especially true now, with many employees working from home and from their own devices that are allowed to connect to corporate networks. Even with up-to-date configuration management databases (CMDBs), building an allow list of software update servers for all possible applications that exist on a network is impossible because software vendors don't openly publish this information. Even if they do, their domain names and IP addresses change frequently.
Companies have to start addressing this problem somewhere and focusing on high-risk applications and servers that should otherwise have fairly limited outbound connections to the internet sounds like a good place to start.
"Your first step is creating that asset list and then you can apply research, like what we've done here, around detecting supply chain attacks using something like JA3S," Kovar says.
"You can start broadly, and we point out in the paper that you can just run this and you'll get results immediately. But you're not going to have the granularity and accuracy unless you have a bit of a whitelist, or a lot less, rather, of exactly what those assets are that you're trying to defend.
"So, in this case, we're saying: 'Hey, look specifically at these things that you want to see the output or the changes of rather than everything on your network.' Do you have an asset inventory of all your SolarWind hosts? That's what we're going to be looking at, rather than the 20,000 different systems on your network."