Tweag

Software Identifiers through the eyes of Nix

12 March 2024 — by Matthias Meschede

This is an answer to a recent request for comments issued by CISA, the United States “Cybersecurity and Infrastructure Security Agency”, about software identifiers. Unfortunately I wasn’t aware of this request for comments early enough and thus too late to comment officially. But CISA encouraged me to publish the answer as a separate blog post. The Guix team similarly published their own answer


Dear CISA team,

I appreciate your effort to gather comments about your recently released “Software Identification Ecosystem Option Analysis” white paper. As you say in the Executive Summary, “Organizations of all sizes must track what software they own and operate to perform user support, inventory administration, and vulnerability management”. I would go further and claim that for any software system that will be modified and reassembled — which is basically always — precise knowledge and control over the components and how they should be put together is crucial. Precise naming, identifiers, are the basis of that. In that light, I would like to bring to your attention another noteworthy technology that hasn’t been mentioned in this study, Nix and its sister project Guix, that achieves exactly this.

Nix is a powerful package manager, offering a very distinctive approach to deploying software. It achieves very high levels of reproducibility and provenance tracking of software artifacts by design, utilizing a functional and declarative language to describe software builds and their dependencies.

In fact, the levels of reproducibility it achieves are so high that Nix can robustly rely on an “input-addressed” storage, an identification model that names software artifacts by hashing everything required to build them, as opposed to hashing their content once assembled. This unique input-addressed approach is very powerful because it allows computing the identifier of a software asset without assembling it.

“Software artifacts” in Nix can be anything from sources and data assets to executable binary packages. And, importantly, “everything required to build” is not limited to source code and data assets, as in most intrinsic identification models, but comprises all commands, configuration, and recursively identified dependencies that are required to assemble the asset in a very strict sandbox. This controlled environment ensures that the description and the identifier of a software asset (called “package closure” in Nix) are complete, capturing all ingredients that went into the final output.

The reproducibility of core packages of the Nix distribution NixOS is very high, automatically tested, and can in principle be used for the independent, and decentralized verification of the content of software artifacts, as demonstrated by this implementation developed within the European Commission’s Next Generation Internet program.

In addition, these advantageous properties allowed the Nix community to construct Nixpkgs, the largest, and most up to date, open software library available. As required by the Nix model, this enormous repository of software assets comprises not only a description of the components that have been used to assemble them but also everything else necessary to produce the final packages, including, besides build instructions and configuration, a global dependency graph of all these assets. And, besides the guarantees that Nix furnishes by design for completeness and reproducibility, the packages go through automatic tests executed by an associated CI, and the hands of tens of thousands of regular users.

The screenshot below shows the CI build example of the open source computer game EmptyEpsilon. Omitting some details in the screenshot below, rja769qkxhiha7mbhq5bjmkjd0d5l1v0-empty-epsilon-2023.06.17 in the Derivation store path field, is the unique identifier of the particular version of this software package realized in the context of all the dependencies and configuration that are defined in a specific version (commit) of the Nixpkgs library that this CI build is attached to. “release.nix” is the entrypoint into the Nixpkgs library of software assets where EmptyEpsilon is defined using the Nix language; the mentioned .drv file contains an intermediate, raw build recipe that was generated from the Nix expressions. Software assets can come with attached metadata such as license information or short descriptions, and some data such as the closure size comprising the software and all its dependencies (build- and run-time) can easily be computed. More details on this can be found here or here.

CI build details of EmptyEpsilon

The exceptional completeness, robustness and precision of this approach uniquely positions Nix (and Guix) as highly valuable tool for automatically creating and managing accurate, reliable and reproducible software bills of materials (SBOMs) that can be employed to address the challenges outlined in the CISA notice. In fact, several projects exist (e.g. 1,2,3) that aim to automatically generate SBOMs from Nix expressions, or connect Nix packages to NIST’s national vulnerability database. The Nix model for identifiers works very well together with SoftWare Hash IDentifiers (SWHID) for the full state of version control system repositories that are developed by Software Heritage.

Finally, and certainly most importantly, I would like to emphasize that tens of thousands of users are demonstrating every day that applying this model comes without overhead. In fact, the precision and robustness of these software identifiers comes with a multitude of additional benefits. All this is not magic but enabled by a tool that follows a rigorous deployment model, the output of decades of academic experimentation and research. This is why the user bases of Nix and Guix, still emerging technologies, are rapidly growing.

About the authors
Matthias MeschedeMatthias initiates and coordinates projects at Tweag. He says he's a generalist "half scientist, half musician, and a third half of other things" but you'll have to ask him what that means exactly! He lives in Paris, and is regularly in Tweag's Paris office where you'll most likely find him in a discussion, writing long-form texts in vim or reading and occasionally writing code.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

Company

AboutOpen SourceCareersContact Us

Connect with us

© 2024 Modus Create, LLC

Privacy PolicySitemap