notion image

Why using Poetry is better than PIP

Third-party software is essential to develop any modern software product. How we manage external packages can have a big impact on the robustness of our systems. In this article I will outline how packages are managed and contrast the Python and JavaScript package ecosystems.
The good thing about open source is that developers don't need to start from scratch with every new project. If I want to write a JS web app, I add the React library to my source code and import the functions that I need.
If I want to run my code on another computer, I don't directly transfer the source code of the React library stored in my project along with my code - I just send you the bits of code I wrote, plus a note which tells your project, "you'll need to download React if you want this code to work".
The thing is, React wasn't built "from scratch" either, it is also built on other third-party libraries. React keeps its own note of its dependencies and if my software uses React, it also needs to download all of its dependencies. These are called transitive dependencies.
Importantly, when the React library is updated over time by Facebook and old versions become redundant. This means that I also need to keep track of the version(s) which are compatible with my code. My project might run fine using React v18, but break if you try to run it alongside v16.
It's essential that I can track these dependencies accurately, so that whenever someone new wants to run my code, they can download the exact same libraries I have so that my software doesn't break for them.

A better example of dependency management: JavaScript

The node ecosystem has a good way of consistently ensuring that the correct versions of third-party modules are always used. (Granted there are some downsides, like a node_modules folder more massive than a black hole).
All node packages are published on the node package registry npmjs.com. It is the job of the registry to securely store all the code for each version of each library and allow people to download specific versions of packages.
Any node project that uses third-party packages from npmjs contains a package.json file which stores a list of the package versions it requires.
Any computer looking to download node packages needs a package manager. There are two main ones for node: npm (node package manager - developed by npmjs) and yarn (developed by Facebook). These are both command line interfaces (CLIs) which both go to npmjs.com and download packages into a local folder called node_modules. The node_modules are always downloaded to the folder of the project you're working in, so if you have two React projects, you will download React twice in each project folder.

An extra complexity

What if two of the packages you depend on both depend on a common package? For instance one depends on package some_library v1.2 and the other depends on some_library v1.0-1.9 (any version between 1.0 and 1.9 are acceptable to use).
In this case, it makes sense to download the dependency only once and use a version that both libraries are happy to use.
It is the job of the package manager to look at the entire dependency tree and answer the question, "given these shared requirements, with these acceptable versions, which package versions does the project require". The answer to this depends on the combination of all the project dependencies.
Package managers use lockfiles (package-lock.json and yarn.lock) to save this dependency resolution. If a lockfile is present and sufficiently describes valid dependencies in the package.json, then the versions specified in the lockfile are installed. If there is no lockfile, or the versions in the lockfile no longer satisfy the package.json, then the dependencies are resolved again (to the latest valid versions) and saved to a new lockfile.
Lockfiles mean that dependency resolutions are deterministic - every user will always get the same dependencies every time. This solves the "it works on my machine" conversation with bugs, as you can ensure you are running the same software on every machine.

Why PIP struggles

The registry for Python modules is called PyPI (Python Package Index at pypi.org - analogous to npmjs.com with node) and its default package manager is PIP (recursive acronym: Pip Installs Packages - analogous to npm with node). PIP was released in 2008 (initially as pyinstall), and can struggle to consistently manage packages for two main reasons.

1. Python dependencies are installed globally

Whereas node modules are installed in a node_modules folder for each project, Python installs package dependencies to a specific place on your computer (mine are saved at /usr/local/lib/python3.9/site-packages). This means that if you are developing more than one Python project on your computer at a time, which means you can't have two different versions of the same package on your computer at the same time (PIP overrides versions).
This can cause some fiddling.

2. No lockfiles

PIP also doesn't have any way of deterministically resolving transient dependencies such as using a lockfile. This means that when someone installs your project for the first time, they will probably get some variation of the packages that you have - this means their code is different to yours and could work differently.

How does Poetry solve these issues

Poetry was developed to address some of these issues and improve package management and environment isolation in the Python world. Poetry uses a pyproject.toml file to keep track of dependencies, just like package.json for node.
You can create a new project with Poetry with poetry new or add Poetry to an existing project using poetry init.

1. Isolate environments for each project

The global installation of packages is a restraint imposed by the language. In the Python world, this is solved by creating many virtual environments which each have their own version of Python installed. You can create a new virtualenv called venv in your project like so:
python3 -m venv .
This is essentially a folder in your project which acts as the "root folder" of a file system. Now Python packages sit in ./venv/lib/python3.9/site-packages. When you want to activate this environment you run
source venv/bin/activate
Now your shell will look at your venv when you run python (this points at Python v3 rather than v2). You can close this venv by running deactivate.
Poetry makes the creation and management of virtualenvs very easy.
poetry shell # create and activate a new venv poetry env list # list all project venvs poetry env remove venv # delete the env called vnenv exit # deactivate the current venv

2. Add lockfiles

After running poetry shell, you can instal dependencies to your environment:
poetry install # install dependencies in pyproject.toml poetry search dj # search PyPI for packages containing 'dj' poetry add Django # add Django as a dependency to pyproject.toml
Just as with node package managers, Poetry uses lockfiles to resolve transient dependencies in a poetry.lock file. This allows us to recreate project dependencies exactly.
Make sure to commit your lockfile to the source control of your project - when someone comes to run your project on a new machine, they get the same dependencies as you.

Conclusion

Poetry has some other nice features, but hopefully this article has outlined the main two advantages it has over using PIP in your project.