NeoUniversity

11th June, 2021

The Complete Guide To Creating Python Packages

If you’re like me, you probably have a couple of Python scripts lying around that you reuse for every project. They’re filled with useful functions and classes that help you be more productive. At the start of a new project, you copy the scripts, and you’re on your way.

Now, it’s time to convert your scripts into real Python Packages. Converting them will make them easier to use and share with others. Stop copying and pasting!

I also have a couple of Python scripts I reuse when working on Kaggle competitions. For example, two scripts that implement CutMix and MixUp data augmentation within TensorFlow.

In this article, I will convert these scripts into a single Python package. We’ll take it in small steps so that you can follow along. By the end of the article, we’ll have uploaded a package to PyPI, ready to be used by others!

If you’re only interested in the source code, you can find it in the GitHub repository.

Convert Code to Local Packages

The first thing you have to do is create a specific folder structure for your package. Let’s call this package tensorflow_helpers, and create the following directory structure:

C:. ¦ .gitignore ¦ +---tensorflow_helpers tensorflow_helpers.py __init__.py

In the root of the repository, create a tensorflow_helpers directory. In this folder, place a special file called __init__.py. This file marks this directory as a Python package — the file itself is empty. The other file tensorflow_helpers.py contains the package code.

Add two sub-packages called augmentation and training-preparation. This makes it easier to organize the code. You can add sub-packages by creating additional folders in the main package folder that with an __init__.py file. The folder structure now looks like this:

C:. ¦ .gitignore ¦ +---tensorflow_helpers ¦ tensorflow_helpers.py ¦ __init__.py ¦ +---augmentation ¦ augmentation.py ¦ __init__.py ¦ +---training_preparation training_preparation.py __init__.py

1. Adding Documentation:

With the directory structure in place, we have to talk about documentation. Yup, it is not the most fun part, but it is crucial as it helps users to use your code.

We have to document each function, class, and class method. Before we start, we have to choose a documentation style. There are four styles that you can use: Google, NumPy, reST, and Javadoc — they don’t differ a lot.

Let’s use the reStructuredText (reST). This style is recommended by PEP 287.

An easy way to start documenting is by using pyment. This is a Python program that can create, update or convert docstrings in an existing Python file. It supports the previously mentioned styles.

You install pyment using pip install pyment. Then you can use it to generate a documentation template, like this:

pyment -w .\training-preparation.py -o reST

The -w tells pyment to add documentation to the existing file. The -o sets the output documentation style, in our case reST. Of course, it won’t write the documentation but gives you an excellent start. As you can see below (without documentation):

def distribute_images(train_validation_split_ratio, csv_location, input_path): create_folder_structure() train_df = pd.read_csv(csv_location) for label in train_df['label'].unique(): print(f'processing: {label}')

With the added documentation template by pyment:

def distribute_images(train_validation_split_ratio, csv_location, input_path): """ :param train_validation_split_ratio: :param csv_location: :param input_path: """ create_folder_structure() train_df = pd.read_csv(csv_location)

2. Structuring Imports:

With the documentation in place, it’s time to think about how you want the users to import your package. By default, the following does not work:

from tensorflow_helpers.augmentation import

CutMixImageDataGeneratorhelp(CutMixImageDataGenerator)

Executing this results in the following error:

Traceback (most recent call last): File "main.py", line 3, in help(tensorflow_helpers.CutMixImageDataGenerator) AttributeError: module 'tensorflow_helpers' has no attribute 'CutMixImageDataGenerator'

If we currently want to use the CutMixImageDataGenerator we have to import it like this:

import tensorflow_helpers.augmentation.cutmix_imagedatagenerator

help(tensorflow_helpers.augmentation.cutmix_imagedatagenerator.CutMixImageDataGenerator)

This is a lot of typing. To make it easier for your users to use the package we fix this by using internal imports.

First, we have to add two relative imports to the __init__.py file in the augmentation folder.

from .cutmix_imagedatagenerator import CutMixImageDataGenerator from .mixup_imagedatagenerator import MixupImageDataGenerator

Secondly, we add a single relative import to the __init__.py in the tensorflow-helpers folder.

from . import augmentation

Now we can use the CutMixImageDataGenerator like this. This looks a lot nicer and uses fewer characters.

from tensorflow_helpers.augmentation import CutMixImageDataGenerator

help(CutMixImageDataGenerator)

Convert Local Packages to Installable Packages

Until now, the source of our package has been inside a subfolder of our solution folder. Because the package is a subfolder, we can directly import it. However, if we move the package to another location, we can’t.

We have to make the package installable and then install the package. After installation, we can use it from everywhere, just like any other package.

You make a package installable by adding the file setup.py.

1. Adding the setup script:

The setup script contains additional metadata for your package. This file is vital if you want to publish your package.

Before we can add setup.py, we have to restructure the code. The setup.py script should not be part of the source code of the package. Therefore, we create a new top-level folder, like this:

We added a folder tensorflow_helpers inside the folder tensorflow_helpers. Let’s add the setup.py file to the outer tensorflow_helpers.

The setup script contains things such as the author, name, description, and version of the package.

from setuptools import setup, find_packages setup( author='Patrick Kalkman', description='helper functions for image augmentation and training data distribution', name='tensorflow_helpers', version='0.2.0', packages=find_packages(include=['tensorflow_helpers','tensorflow_helpers.*']), )

Once you created the setup.py script you can install it using pip. We navigate to the same folder as the setup.py script and execute the following command:

pip install -e .

The . tells pip to install the package in the current directory. The -e states that you want to install the package in editable mode. Without this, you would have to reinstall the package every time you made a change to the package during development.

If everything goes well you should see the following:

2. Handling Dependencies:

Usually, your package uses other packages, such as NumPy or Pandas. These packages are the dependencies of your package.

To ensure users of your package automatically install these dependencies, we add the install_requires parameter to the setup script.

We also have to be sure that they use the correct Python version, which you can specify with the python_requires parameter.

So, our setup.py script looks like this:

It would be best if you tried to allow as many versions of dependent packages as possible. If you restrict the version numbers too much, your users might not install your package.

3. Developer Environment:

It is also good practice to include an environment for package developers. Your package co-authors’ development environments all need to have the exact same version of all the dependant packages.

You can find out all the exact versions you’re using by executing the pip freeze command, which generates a list with all the packages and versions. You can write it to a requirement.txt file like this:

pip freeze > requirements.txt

This makes sure that anyone can start developing by simply executing the pip install command.

4. Adding A License:

If you share your package and code online, you have to include a license file. If you don’t, you’re not giving other people permission to share, modify or use the code. Most Python packages are open source and may be freely modified and shared by other users.

A website that can help you choose an appropriate license is choosealicense.com. You copy the contents of the license from choosealicense.com and place it in a file called LICENSE.

5. Adding A README:

Another important file for your project is your README. This file acts as the front page of your package. If you host your package online on Github or PyPI then your README will be displayed there.

What you include is up to you. A good README will include:

The package title.
A description of the package.
How to install the package.
Examples to get started.
How to contribute to the package code.
A note on the type of license used.

Take a look at the README for the tensorflow_helpers package.

6. Adding MANIFEST.in:

The last file that you need to create before you can release your package is the MANIFEST.in file. This file lists all extra files that you want to distribute with your package.

By default, the distribution doesn’t include the LICENSE and README files. so we need to list them in the MANIFEST.in, like this:

README.md LICENSE

Publishing Your Package

At this moment your package is ready to be published. When you install packages using pip you download them from the Python package index, known as PyPI.

PyPI is an online code repository that anyone can upload packages to. You only need to register for a free account.

Before you can upload your package, you have to create a distribution of your package. There are two kinds of distributions: source distributions and wheel distributions.

A source distribution contains all the Python files you have written as part of the package. A wheel distribution is a processed version of the package, smaller in size and faster to install. The wheel distribution is the preferred distribution — Pip will use this if available.

However, when you upload distributions to PyPI, it is good practice to upload both the wheel and the source distribution.

Creating The Distribution:

You create the distribution with this command:

python setup.py sdist bdist_wheel

The arguments sdist and bdist_wheel indicate that we want to build both the source and wheel distribution. The result of the command is a dist folder with both the wheel and source distribution. It will also create build and egg-info directories, but you can ignore these.

All that is left is to upload the files in the dist folder to PyPI. You can use twine for uploading the packages like this.

twine upload dist/*

Twine will ask you for your PyPI username and password. There is also a test version of PyPI that you can use to make sure that everything looks and works as you wanted it. You upload our package to TestPyPI with the following command:

twine upload -r testpypi dist/*

Note that you need a separate account for the TestPyPI site. When you upload the package it’s immediately visible on the site.

Increase the Quality of Your Package

Now that we uploaded our package we can take a look at increasing the quality of our package. We can increase the quality by adding automatic testing and validating the consistency of our code.

1. Automatic Testing:

Many open-source packages include a set of tests that you can run automatically. They even show on the project page how much of the code is covered using these tests. See for example the Pandas Github project page:

The codecov badge shows 88% — this means that the Pandas package has automated tests that cover 88% of the source code.

Ideally, you add a test for every function in your package. Your tests should be organized similarly to your package source code. For every script file in your package, you also create a test file. This means that the structure of our code looks like this — on the left is the source code of the package itself and on the right, you see the test code:

This article does not go into detail about creating the tests themselves – that is a whole other topic.

2. Code Consistency:

As you know, source code is read more times than it is written. So to help the reader, it’s important to have a consistent code style. This style should cover naming variables and functions, source code layout, and general rules of thumb.

Instead of creating your own style, let’s use a predefined one called PEP8. We enforce PEP8 by using flake8. Flake8 is a static code checker. This means it analyzes your source code without running it. You can install flake8 using the following command:

python -m pip install flake8

You can run flake8 from the terminal but if you use Visual Studio Code you can use a plugin that directly analyzes the source code while typing.

Using Cookiecutter

In the previous paragraphs, we built our package from scratch. It is also possible to use a generator such as Cookiecutter. Cookiecutter is a command-line tool that creates packages from templates.

You can use it to create an empty Python package. These templates create all of the files which your package needs. So you can focus more on the code and don’t need to worry if you have forgotten something.

Before you can use it you have to install it:

python3 -m pip install cookiecutter

After installation, start CookieCutter with the following command:

cookiecutter https://github.com/audreyr/cookiecutter-pypackage

If you look at the files that Cookiecutter generated, you’ll see two extra files, CONTRIBUTING.md and HISTORY.md.

CONTRIBUTING.md describes how other developers can help with developing the package. This file is the first place a developer will look if they are interested in helping with your package.

The CONTRIBUTING.md that Cookiecutter generates is a great place to start. It starts with the following sentence:

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

1. HISTORY.md:

HISTORY.md contains the release notes. It’s a markdown file that describes all the changes from one version to another. It tells your users the important things that have changed between the previous and new releases, so they can figure out which versions of your package they should use.

There is no official guide on how to structure this file but most packages use a structure like this:

Release History =============== 2.25.1 (2020-12-16) ------------------- **Bugfixes** - Requests now treats `application/json` as `utf8` by default. Resolving inconsistencies between `r.text` and `r.json` output. (#5673) **Dependencies** - Requests now supports chardet v4.x. 2.25.0 (2020-11-11) ------------------ **Improvements** - Added support for NETRC environment variable. (#5643)

2. Version Numbering:

You saw that the HISTORY.md contains version numbers. If you release a new version of your package, you have to increase the version number. The version number consists of three parts. The major number, minor number, and patch number.

As you develop the package you will increment these numbers. A well-known strategy for updating the version number is called Semantic versioning. Semantic versioning dictates how version numbers are assigned and incremented. It states:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,

MINOR version when you add functionality in a backward-compatible manner, and

PATCH version when you make backward-compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

3. Bump-Version Tool:

You can update the version numbers using the bump-version tool. We run the tool from the top level of your package on the command-line.

You run bumpversion with the current version, the argument, major, minor, or patch, and the name of the file. It then increases the major, minor, or patch number of the version number in that file.

bumpversion --current-version 0.2.0 patch setup.py

4. Using A Makefile:

Cookiecutter also generates a Makefile. The Makefile makes it easier to perform the various terminal commands that we used before. You can always add more functions to the Makefile. The default Makefile that Cookiecutter contains the following commands:

`clean`	`remove all build, test, coverage and Python artifacts`
`clean-build`	`remove build artifacts`
`clean-pyc`	`remove Python file artifacts`
`clean-test`	`remove test and coverage artifacts`
`lint`	`check style with flake8`
`test`	`run tests quickly with the default Python`
`test-all`	`run tests on every Python version with tox`
`coverage`	`check code coverage quickly with the default Python`
`docs`	`generate Sphinx HTML documentation, including API`
`servedocs`	`compile the docs watching for changes`
`release`	`package and upload a release`
`dist`	`builds source and wheel package`
`install`	`install the package to the Python's site-packages`

For example, instead of executing python setup.py sdist bdist_wheel to generate a distribution, you can use make dist, which is easier to remember.

Conclusion

If you followed along, you saw that we started with converting Python scripts to a local package. We added documentation and organized the imports. This made it easier for users to use the package.

We converted the local package to an installable package by creating the setup script. We managed our dependencies and added a LICENSE and a README file. After making a source and wheel distribution, we published them to PyPI.

After publishing, we looked at how adding tests and improving code consistency could increase the quality of our package.

Finally, we looked at using Cookiecutter to generate a Python package with a template. Cookiecutter generated extra files such as CONTRIBUTING.md and HISTORY.md. Cookiecutter also generated a Makefile to make interacting with the package easier.