When Windows bug fixes go bad, IT can now roll back individual changes
- 05 March, 2021 06:00
Microsoft has announced a new enterprise-only flexibility in Windows servicing that lets IT professionals roll back individual non-security elements of an update when a change breaks something.
The feature, dubbed "Known Issue Rollback," aka KIR, is an unusually frank admission that the company's nearly six-year-long experiment of forcing customers to either accept everything in an update or pass on the update entirely, is flawed.
"Even as quality has improved over the last five years, we do acknowledge that sometimes things can and do go wrong," Namrata Bachwani, principal program manager lead, said in a March 2 session video from Microsoft's all-virtual Ignite conference. "In the past, you had two choices: all or nothing," Bachwani continued. "You either take it all, so you install the update and you get all the great fixes that you want and the problem, which is causing an issue for your customers. Or you take nothing.
"So you either don't install the update because you've heard that it causes a problem, or you uninstall it, which means the problem goes away but you also don't get all the other great fixes in that package, which has changes that you want and need," she said.
If Bachwani's summary sounds familiar, it should: Essentially, it was the argument made by critics of Windows 10's practice of bundling fixes, both security and non-security, into one package that was not only cumulative — it included all prior fixes as well as the newest — but was indivisible.
Windows 10's approach was in stark contrast to previous editions of the OS, which had offered each fix as a separate, discrete update that could be deployed ... or not.
Customers, including enterprise IT personnel, could — as Bachwani pointed out — either forgo an update because of a known (or suspected) problem or accept the update, even though it contained one or more flaws. The dilemma caused many to decry Microsoft's take-it-or-leave-it attitude, which broke with decades of past practice. In the end, customers did what they almost always did in the face of a Microsoft move; they accepted it, since they had little recourse.
But apparently someone kept complaining, someone Microsoft listened to.
"We have been listening to you and working on how to handle such a scenario in a targeted, nondestructive way," Bachwani said.
In with the new, but keep the old around — just in case
KIR was functional as of Windows 10 2004 (also known as 20H1 after another Microsoft name change for Windows 10's feature upgrades), with about 80 per cent of the changes in that version capable of rollback. But some past versions — Microsoft explicitly mentioned 1809 and 1909 — partially support the feature.
Because Windows 10 Enterprise customers receive 30 months of support for the year's second-half upgrade, it's most likely that they'll first encounter KIR with Windows 10 20H2 and, if not then, with this year's 21H2, due out in the fall. KIR also boosts the case for enterprises moving to 20H2 with all due speed.
As Microsoft's software engineers tackle a non-security bug, they write the fix but, unlike in the past, retain the old code impacted by the changes. According to Eric Vernon, principal program manager lead, those changes are "contained" using KIR capability. When the update is released and users deploy it, each KIR-enabled fix runs normally.
But if the OS encounters a specific group policy, the code in the change "container" is ignored and the original code — the part retained by the engineer when she wrote the fix — runs instead. Each individual fix is assigned a different group policy. "If a fix turns out to have a serious problem, Azure-hosted services and Windows work in tandem to update this policy-setting on the device and disable the problematic fix," wrote Vernon.
Enterprise IT is in charge
There are two ways KIR can be triggered to roll back a bad update.
For consumers and small businesses, Microsoft itself manages KIR. "We make a configuration change in the cloud," said Vernon, referring to the action the Redmond, Wash. company would take once it's decided to roll back a bug fix issued by a recent update. "Devices connected to Windows Update or Windows Update for Business are notified of this change and it takes effect with the next reboot."
In this scenario, users would be unaware that Microsoft had kicked in KIR. Microsoft would know, however, because users' PCs would tell the firm, via Windows' telemetry, which code — the new, but buggy fix, or the old, hopefully stable code — to use. "This data helps us learn how well the rollback is succeeding in the ecosystem," said Vernon.
For managed machines, KIR will be under control of the IT staff. Microsoft will publish information about the known issue in the update's documenting bulletin, the KB, under the "mitigations" section, along with a link to Microsoft's Download Center, where the appropriate Group Policy will be posted. IT personnel would then deploy the policy to the organization's PCs using the usual tools.
Microsoft made a point to stress that IT will be in charge of KIR on their managed systems. "In the KB article, we describe the issue and related information that would help IT pros make informed choices," said Vatsan Madhavan, principal software engineer, in an Ignite session focused entirely on KIR.
Normally, the KIR Group Policies don't need to be retracted or removed by the IT staff, Madhavan said, because they're only valid for that KIR — and once the known issue has been addressed, they become moot. "Once the underlying problem has been fixed, the Group Policy has outlived its usefulness. It becomes a benign setting and can be undeployed safely," Vernon wrote in the March 2 blog post.
Microsoft has further work on KIR already outlined, including integrating it with Intune, the cloud-based mobile device management platform, so that organisations that no longer use Group Policy will be able to leverage the functionality.