It is a truth universally acknowledged that the Python packaging ecosystem is in need of a good dependency checker.
In the least, it’s our hope to convince you that Tweag’s new dependency checker, FawltyDeps, can help you maintain an environment that is minimal and reproducible for your Python project, by ensuring that required dependencies are explicitly declared and detecting unused dependencies.
If you work with Python, and care about keeping your projects lean and repeatable, then this is for you.
Why do we need a dependency checker?
Say you’re working on a new project that uses Python, and you want to
leverage Python’s ecosystem of 3rd-party libraries.
So you decide to import something
.
For that something
to work, you must first install it into your development environment
(typically using pip install something
).
Nothing wrong with that… Or is there?
The dependency is now installed in your Python virtual environment or on your system. But what about the next user of your project, for example:
- your colleague;
- your CI environment;
- yourself on a different laptop in 6 months?
This is where declaring your dependencies becomes important.
Contrary to an oft-quoted principle from the Zen of Python1, there is more than one obvious way to declare your dependencies in Python.
For now, though, let’s say that you declare the project’s dependencies in
requirements.txt
or pyproject.toml
.
You can go wrong in either (or both!) of the following ways:
-
You declare too little.
You might forget one of the imports you used in your code.
Imagine someone running a long computation in a notebook only for it to fail when it reaches an import that you forgot to declare!
-
You declare too much.
While working on your project, you jumped between a couple of frameworks before deciding on the one you’re going to use. Along the way, you have declared some dependencies in
requirements.txt
that you no longer use.The project configuration is now “bloated” and will install things that are not actually needed.
What if there was a tool to check the match between what you declare and what you use?
Enter FawltyDeps
FawltyDeps is a tool that gives insight into a Python project’s imports, declared dependencies, and how they match up. Specifically, the main purpose of FawltyDeps is to report imports that you have forgotten to declare (undeclared dependencies), as well as packages that you’ve declared to use but that are not imported in your code (unused dependencies).
The goal of FawltyDeps is to help you ensure the reproducibility of the project and help save resources by not installing unnecessary packages.
What does FawltyDeps do?
FawltyDeps proceeds in three steps:
- It reads your Python code and Jupyter notebooks and discovers all imports from packages outside the standard library and the project itself (aka. 3rd-party imports).
- It extracts dependencies that are declared by your project.
Those declarations may come from one of the following files:
requirements.txt
,setup.py
,setup.cfg
,pyproject.toml
. - It compares the imports to the declared dependencies found in your project.
FawltyDeps then reports:
- Undeclared dependencies: imports of packages that are not found in the project configuration.
- Unused dependencies: packages that you declare, but that are not imported by your code. These may point to dependencies that should be removed2.
You may think, “Hmmm, my linter can do that!” But as far as we know there is currently no tool that does exactly this:
a linter will only tell you if the package is missing from your local environment, not if the package is missing from your project configuration.
Similarly, a linter can identify when an import
in your code is no longer used, but they will typically not tell you when the corresponding package can be removed from your project configuration.
Some editors and IDEs may offer checkers that go a bit further in discovering undeclared or unused dependencies3, but these will depend on the specific editor/IDE you have chosen to work with, and they will likely not integrate nicely with your CI.
The goal of FawltyDeps is to offer its functionality in a package that works easily both in your local development environment, as well as in your CI.
How to use FawltyDeps? An example
FawltyDeps is available from PyPI, and works with any Python project based on Python v3.7+.
Here is a small animation that shows FawltyDeps in use on a project called detect-waste:
Let’s take a closer look at how you would use FawltyDeps to analyze dependencies in a Python project. The following example collects some common issues into a small project that we can easily analyze in a few paragraphs.
Assuming that you’re already inside the development environment for the Python project4, you can install FawltyDeps into this environment with your preferred tool:
pip install fawltydeps
Once installed, you can run fawltydeps
to get your first report:
fawltydeps
This should give a list of undeclared and/or unused dependencies. In our small example project we get this:
These imports appear to be undeclared dependencies:
- 'requests'
- 'tomli'
These dependencies appear to be unused (i.e. not imported):
- 'black'
- 'tensorflow'
For a more verbose report re-run with the `--detailed` option.
Fixing undeclared dependencies
Let’s start by taking a closer look at the undeclared dependencies,
specifically the import
s that FawltyDeps is referring to:
fawltydeps --check-undeclared --detailed
These imports appear to be undeclared dependencies:
- 'requests' imported at:
my_script.py:3
- 'tomli' imported at:
my_script.py:8
Looking at my_script.py
, we can see the relevant imports:
import sys
from requests import Request, Session
if sys.version_info >= (3, 11):
import tomllib
else:
import tomli as tomllib
...
- For
requests
, this is clearly a 3rd-party dependency that was simply never declared. Maybe it is installed system-wide, or maybe it waspip install
ed at some point, but in either case, someone apparently forgot to add it torequirements.txt
. Good catch! - For
tomli
, this is a conditional import that depends on the current Python version5. It is preferable to declare it conditionally if the configuration format allows this.
So in this example we can solve both undeclared dependencies by adding the
following lines to requirements.txt
6:
requests
tomli; python_version < "3.11"
Fixing unused dependencies
Now let’s look at the unused dependencies that FawltyDeps complains about.
We can re-run with --detailed
for FawltyDeps to report where unused dependencies come from:
$ fawltydeps --check-unused --detailed
These dependencies appear to be unused (i.e. not imported):
- 'black' declared in:
dev-requirements.txt
- 'tensorflow' declared in:
requirements.txt
- For
tensorflow
, this was probably intended to be imported at some point, but there is currently noimport tensorflow
statement or similar anywhere in the code. This is a costly dependency to ask users to install, especially when it’s completely unnecessary. It should simply be removed fromrequirements.txt
. - For
black
, this is clearly a tool used in this project’s development environment, and it is not the intention of the project to everimport
this. Since it’s declared in a separatedev-requirements.txt
file, it is likely more appropriate for FawltyDeps to focus only on the dependencies declared in the mainrequirements.txt
file. This can be done by using the--deps requirements.txt
argument. (We could also ask FawltyDeps to specifically ignore this dependency with--ignore-unused black
.)
Recap
This example illustrates what FawltyDeps can do for your project: while the project probably worked just fine on the developer’s machine, FawltyDeps identified a couple of issues that would become apparent if someone else tried to install this project. On top of that, it identified an unnecessary dependency that would waste time and space for users.
There are of course more options to customize FawltyDeps for your use case,
documented in our README
,
or by running
fawltydeps --help
What FawltyDeps cannot do
It is still early days for the FawltyDeps project, and there are already several things that are either in development or on our roadmap, but not yet released:
- We are still figuring out many of the corner cases when mapping between dependency names and import names. For now, we rely on running FawltyDeps inside the same environment as your project. In the future we should be able to loosen this requirement7.
- At this point in time we do not differentiate between the main dependencies of a project and other, more optional, dependencies (often called “extras”).
The above are things that we think we can solve to a large degree, but we have also identified some challenges that will be harder to solve automatically.
For example, Python allows imports to happen dynamically or conditionally. This is sometimes impossible to resolve with static analysis: How can we know for sure whether something is going to be imported, or even what will be imported?
In cases like this, we try to give useful and actionable information, and we provide the knobs for you to help FawltyDeps where needed.
Conclusion
We asked ourselves whether there was a tool in the Python ecosystem that could find undeclared and unused dependencies, and that did exactly this, and no more (following the UNIX philosophy). There wasn’t any tool like that, and that is how FawltyDeps came to be.
FawltyDeps will help you find undeclared and unused dependencies in your projects. Fixing these issues will make your projects leaner, more lightweight, and more reproducible6. In short, it will help you combat the “works on my machine” syndrome!
FawltyDeps is currently available on PyPI. We hope you will give it a try, and we’ll be happy to receive your feedback! Please reach out to us with any problems you have by opening an issue on the FawltyDeps repository.
- “There should be one - and preferably only one - obvious way to do it.”↩
- It’s worth noting that not all dependencies are necessarily meant to be imported. A common category are tools that you run as part of your development workflow, but that you never intend to
import
per se. Common examples include tools like: black, pylint, mypy, etc. A way to deal with this is to keep your main dependencies in one file (e.g.requirements.txt
) and your development dependencies in another (e.g.dev-requirements.txt
), and then use the--deps
option to point FawltyDeps at the first file only.↩ - For example, PyCharm offers some impressive tooling for working with
requirements.txt
files. Most other editors (e.g. VS Code) will at most help you create a virtualenv from the project configuration, but all subsequent interaction is based on what packages are available in your venv, not what dependencies you declare in your project configuration.↩ - In our small example project, we can quickly create an ad hoc development environment with these commands:
python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt -r dev-requirements.txt
.↩ - Python v3.11 added
tomllib
to the standard library; for earlier Python versionstomli
is the recommended alternative.↩ - At Tweag, we’re all about making software and development environments more reproducible. It is worth noting that to improve the reproducibility of Python projects you should seriously consider pinning your dependencies (in addition to declaring them). That topic deserves a blog post of its own, however, and is currently also outside the scope of FawltyDeps. But stay tuned, it’s in the making!↩
- More details about the improvements we are considering in this area can be found here.↩
About the authors
Johan is a Developer Productivity Engineer at Tweag. Originally from Western Norway, he is currently based in Delft, NL, and enjoys this opportunity to discover the Netherlands and the rest of continental Europe. Johan has almost twenty years of industry experience, mostly working with Linux and open source software within the embedded realm. He has a passion for designing and implementing elegant and useful solutions to challenging problems, and is always looking for underlying root causes to the problems that face software developers today. Outside of work, he enjoys playing jazz piano and cycling.
Nour is a data scientist/engineer that recently made the leap of faith from Academia to Industry. She has worked on Machine Learning, Data Science and Data Engineering problems in various domains. She has a PhD in Computer Science and currently lives in Paris, where she stubbornly tries to get her native Lebanese plants to live on her tiny Parisian balcony.
Maria, a mathematician turned Senior Data Engineer, excels at blending her analytical prowess and software development skills within the tech industry. Her role at Tweag is twofold: she is not only a key contributor to the innovative AI projects but also heavily involved in data engineering aspects, such as building robust data pipelines and ensuring data integrity. This skill set was honed through her transition from academic research in numerical modelling of turbulence to the realm of software development and data science.
Vince works on software development projects, some internal and some for clients. Initially self-taught in basic Java programming, Vince was led by course of study and general curiosity to programming in other languages. Eventually he was hooked on Scala and interested by Haskell, which of course naturally led him to Tweag. At Tweag Vince enjoys interacting with colleagues who come from broad range of places of origin and current residence, and the array of perspectives that follow naturally from that.
If you enjoyed this article, you might be interested in joining the Tweag team.