Historians and future generations of developers will be able to unearth early lines of open source Linux, Ruby, or Python code buried 250 feet under the earth’s permafrost layer and, now, in three historic libraries in Oxford, Egypt, and California, thanks to GitHub’s expanding Archive Program.
By printing historically relevant open source repositories onto reels of piqlFilm (digital photosensitive archival film), GitHub—which was acquired by Microsoft in 2018—hopes to preserve the open source software movement for future generations.
This program includes the storage of a code archive in the Arctic World Archive in Svalbard, Norway—just one mile away from the famous Global Seed Vault—by storing 186 reels of piqlFilm and 21TB of repository data in a decommissioned coal mine 250 meters deep in the permafrost this summer.
Run in partnership with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Arctic World Archive, and Microsoft Research, the program looks to preserve both “warm” and “cold” versions of the code to ensure multiple copies and formats of the software are preserved, also known as the “LOCKSS” approach by archivists, or Lots Of Copies Keeps Stuff Safe.
Now, the project is expanding by donating reels of hardened microfilm to the 400-year-old Bodleian Library at Oxford University in England; the Bibliotheca Alexandrina in Egypt, and the Stanford Libraries in California; as well as storing a copy in the library at GitHub’s headquarters in San Francisco.
Preserving the GitHub stars
GitHub is preserving its most popular repositories by the number of “stars” given by the community, including projects like Linux and Android and programming languages like Ruby and Go. The company is also preserving 5,000 repositories picked at random.
“The idea behind that is when you go back in history we want to preserve the work of individual developers, students, and small, lesser known developers and their open source projects,” Thomas Dohmke, vice president of strategic programs at GitHub told InfoWorld.
By its very nature, open source software is not a static thing to be preserved, it is collaborative and always in flux. The intention is not to store copies that can be booted and run in the future, although that may be possible. Instead, the idea is to preserve a moment in time, where open source became the premier mode of software development, and chart the cultural significance of that movement.
“A platform like GitHub can paint a picture of a broad spread of the software developer community across the globe at a moment in time,” Richard Ovenden, the Bodley’s librarian and president of the Digital Preservation Coalition, told InfoWorld.
“We think it is worth preserving software and how people worked together across the world to contribute and review source code. There is something culturally there which is worth preserving,” GitHub’s Dohmke added.
The archive is being built for two types of people, according to Dohmke, “historians and future software developers curious about how software was developed during this era.”
Each donation is specially encased using a combination of 3D printing and AI-generated art by the engineer and artist Alex Maki-Jokela. You can read more about his work on Medium.
All archived code will also include technical guides to QR decoding, file formats, character encodings, and other critical metadata so that future developers can decode it. “Storage is not the same thing as preservation, you have to do other things,” Ovenden said.