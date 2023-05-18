Menu
GitHub owns up to service issues, multiple outages

GitHub owns up to service issues, multiple outages

Over the past four months, GitHub has experienced 16 disruptions in its services, blog posts from the company showed.

Anirban Ghoshal Anirban Ghoshal (InfoWorld)
Comments
Credit: Dreamstime

Microsoft-owned GitHub, which provides a code hosting platform for version control and collaboration, faced three disruptions in its services last week, following 13 such incidents in the past three months.

“Last week, GitHub experienced several availability incidents, both long-running and shorter duration. We have since mitigated these incidents and all systems are now operating normally,” Mike Hanley, chief security officer at GitHub, said in a blog post.

“The root causes for these incidents were unrelated but in aggregate, they negatively impacted the services that organisations and developers trust GitHub to deliver. This is not acceptable nor the standard we hold ourselves to,” Hanley added.

The three incidents, which occurred on May 9, May 10, and May 11, affected a majority of the critical services that GitHub provides, the company said.

Incidents take out critical GitHub services

The incident that occurred on May 9, disrupted GitHub’s databases due to a configuration change, according to the company.

“On May 9, we had an incident that caused 8 of the 10 services on the status portal to be impacted by a major (status red) outage. The majority of downtime lasted just over an hour,” Hanley said in the blog post.

At the time of the outage, many services could not read newly written Git data, causing widespread failures, Hanley explained, adding that post the outage, there was an extended timeline for post-incident recovery of some pull request and push data.

The outage, according to Hanley, was triggered by a configuration change to the internal service serving Git data.

“The change was intended to prevent connection saturation and had been previously introduced successfully elsewhere in the Git backend. Shortly after the rollout began, the cluster experienced a failover. We reverted the config change and attempted a rollback within a few minutes, but the rollback failed due to an internal infrastructure error,” Hanley said.

The incident on May 10, which occurred due to the degradation of GitHub’s App authentication token issuance capability, also saw six out of ten critical GitHub services affected.

“On May 10, the database cluster serving GitHub App auth tokens saw a 7x increase in write latency for GitHub App permissions (status yellow). The failure rate of these auth token requests was 8-15% for the majority of this incident, but did peak at 76% percent for a short time,” Hanley said in the blog post.

The issue with token issuance was a result of “inefficient implementation” of an API for managing GitHub App permissions, the chief security officer explained, adding that the company was updating the API to check for the shift in installation state.

GitHub’s database was hit again on May 11 due to a loss of read replicas, the company said.

“In the Git database incidents, Git reads and writes are at the core of many GitHub scenarios, so increased latency and failures resulted in GitHub Actions workflows unable to pull data or pull requests not updating,” Hanley said in the blog post.

GitHub working on avoiding similar incidents in the future

In order to avoid similar incidents in the future, Hanley said that the company was working on several aspects, such as carefully reviewing its internal processes and making adjustments to ensure that changes are always deployed more safely moving forward.

“In addition to the standard post-incident analysis and review, we are analysing the breadth of impact these incidents had across services to identify where we can reduce the impact of future similar failures,” Hanley said, adding that GitHub was working to improve the observability of high-cost, low-volume query patterns and general ability to diagnose and mitigate this class of issue quickly.

Other measures include addressing the database failover issues to ensure that failover always recovers fully without intervention and understanding the multiple Git database crash incidents.

Although the company claims to be working on addressing outages, GitHub has continued to face disruptions in the last four months with four incidents in April, six incidents in March, and three in February. 


Follow Us

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags MicrosoftGitHub

Events

Brand Post

Featured

Slideshows

Channel leaders unite at Reseller News Influencer Network

Channel leaders unite at Reseller News Influencer Network

​Leading figures came together under the Reseller News roof at Influencer Network held on 27 April at the Park Hyatt in Auckland. Featuring David Kennedy as a keynote speaker, the self-described 'river guide' shared his insights into an effective company culture and leadership skills. An interactive panel discussion also brought a lens to the impact of company culture post pandemic and the importance of striving to keep that intact. More than 60 executives within the New Zealand channel attended Influencer Network as Reseller News launched a market-leading agenda for 2023. Photos by: Cactus Photography.

Channel leaders unite at Reseller News Influencer Network
Channel gathers for Nextgen New Zealand's Summer Party

Channel gathers for Nextgen New Zealand's Summer Party

​Held in Auckland on International Women's Day, Nextgen New Zealand's Summer Party was an opportunity to celebrate its 10th anniversary in 2023. Nextgen's channel community seized the opportunity to mix business with pleasure and enjoy an in-person gathering.

Channel gathers for Nextgen New Zealand's Summer Party
Lenovo and WIICTA partner to 'break the bias'

Lenovo and WIICTA partner to 'break the bias'

Lenovo WILL+ (Women in Lenovo Leadership), in partnership with Reseller News' Women in ICT Awards (WIICTA), hosted a 'breaking the bias' luncheon in Auckland. Special guest Victoria Harris, co-founder of The Curve, gave an interactive session about closing the gender finances gap by taking control of your finances - while we work on mitigating bias in emerging technologies, we must also focus on eliminating bias where ideas emerge.

Lenovo and WIICTA partner to 'break the bias'
Show Comments
 