In a previous post, I discussed how to do repeatable python builds. I also mentioned it was too much of a hassle to check dependency hashes at install time. According to the pip documentation on the subject, hash checking does the following:
This protects against a compromise of PyPI or the HTTPS certificate chain. It also guards against a package changing without its version number changing (on indexes that allow this).
For the most part, I’m not too concerned about these things, but
while researching this I was very confused about how difficult to use
the pip hash
command
is. pip hash
requires you to have already downloaded a dependency
using pip download
and it only gives you a single hash for the exact
dependency file you downloaded. That means you can’t easily use it to
generate a single requirements.txt
or constraints.txt
file that
will work on different architectures or operating systems since each
could have a different hash, especially for packages written in c or
fortran. The way to deal with this is to provide multiple allowed
--hash
arguments to pip install
, but there doesn’t seem to be an
easy way to automate this using pip
directly.
I was curious about this, so I did a little digging and discovered that there is a convenient way to find all the hashes for a given package version using the pip json api.
For any given package version, you can hit
https://pypi.org/pypi/$package/$version/json
and get a response that
includes the sha256
hash of each wheel or tarball available via
pypi. The hashes are inside the various releases
keys in the json
response under digests
then sha256
. Using this information it’s
straightforward to generate a --hash=sha256:$mysha
for each of those
to generate a fully hashed requirements.txt
that works across
operating systems and architectures.
Given this, I might re-think how much of a hassle doing hash-checking is.