Day 25
For Next Time
- Work on next round of website edits incorporating feedback
- In-class work on creating video/animated content for website and presentation
On Day 22 we talked about “good code” from a quantitative perspective, including correctness and performance. Today, we’ll discuss the more qualitative but equally important aspects of “good code”: organization, documentation, and style.
Organizing large projects
As projects grow in scope and complexity, organization becomes more important. Fortunately, Python makes it fairly easy to add structure as you go.
Interactive
In Python it’s quick and easy to get started and test your ideas interactively in the shell. Unfortunately these explorations are ephemeral, and it’s difficult to make changes and run the program again, so we quickly move beyond this stage.
Script
Your first Python script may be simply code from an interactive session pasted into a .py
file so that you can edit and re-run it (as well as share it with team mates).
Program
We consider a Python program to be a more evolved version of a script, with defined functions, classes, documentation, etc.
Modules
As your program gets bigger, grows beyond what can comfortably fit in one file.
To create a module in Python, you simply write a .py
file as always.
Code in the module can then be imported from other modules using the filename (minus .py
).
Packages
For really big projects, you may even grow beyond a single directory. Python allows you to organize your modules into directories and refer to them as packages.
All you need to do is place a (possibly empty) file called __init__.py
in the folder along with your Python module files.
Read about Python modules for all the details.
We’ve created an example repository showing the evolution of one project through these steps. Browse through the git history to see the changes over time.
Exercise:
Reorganize your project code into modules that group related code.
As a caveat, this type of major refactoring can create lots of merge conflicts if you are not careful. It is best to make sure everything is checked in first, do the exercise together as a team, and have test cases before and after the restructuring to make sure nothing breaks.
Style and Lint
As you move to more polished code that other people will read and contribute to, style and consistency count.
For a poetic take on “Pythonic” code, try typing import this
in Python.
The most definitive Python style guide is PEP8 (Python Enhancement Proposals are the mechanisms by which the language grows). It has a list of guidelines along with concrete examples - definitely worth a read.
Checking style rules is a good job for a computer, and one tool you can use to do this for you is pylint. “Linting” is the process of checking code without actually running it (think about picking pieces of lint off your sweater). You can even set this up to run continuously in your text editor as you write code.
Linting can even help you catch bugs. Check out this output for “Bad Kangaroo” from Think Python chapter 17.
...
C: 1, 0: Missing module docstring (missing-docstring)
W: 4, 4: Dangerous default value [] as argument (dangerous-default-value)
C: 12, 8: Variable name "t" doesn't conform to snake_case naming style (invalid-name)
...
Using mutable types as a default argument? Dangerous indeed - watch out!
Exercise:
Run pylint
on your project code and analyze the results.
You don’t need to fix everything, but it’s best to know what rules you’re breaking and why.
Documentation
In case we haven’t said it enough, proper documentation is vital to your final project. As a general guideline, you should have:
- Docstring/header comment at the top of each module (file) explaining what it does
- Docstring for each class, method, and function
- Comments for sections of code that are complex enough to need explanation
One cool feature of docstrings is that Python can use them to automatically generate documentation.
The tool that does this is called
pydoc,
and it is what is used when you call help()
in Python.
Exercise:
Autogenerate documentation for your project using pydoc:
pydoc path/to/my_project.py
This will open a help file based on your docstrings (use q to quit). Make sure this help file would be adequate for someone using your code - try showing it to someone not on your team. Add to your documentation based on this feedback.
pydoc can also generate HTML documentation, which you can add to your project website if you like (not required). If you want to generate truly beautiful online documentation, check out Sphinx (the tool used to generate the Python documentation).
Capturing dependencies
Your project may require 3rd party Python packages that are not part of the standard library. In your README file (e.g. “Getting Started” or Install section), you must at minimum list these extra packages necessary to run your code with links and versions.
You should know what modules are needed since you wrote an import statement for each, but in case you forget or you’re looking at someone else’s code: command line tools to the rescue! In your project directory, type:
grep -hr "^import" . | sort | uniq
This means 1) search recursively in the current directory for lines that start with “import” (without printing which file you found them in), 2) sort those results alphabetically, and 3) don’t show me duplicates.
(Technically "^\(from .\+ \)\?import"
would be a better search expression, since it also catches from foo import bar
style imports.)
In order to make things easier for your users, in addition to just listing the required packages you can give them a listing that they can automatically install using their package manager. To generate this list, you can run:
pip freeze > requirements.txt
or
conda list --export > requirements.txt
depending on which package manager you are using. This will create a file called requirements.txt
that you can include in your project repository.
Users can then install the needed dependencies by simply running:
pip install -r requirements.txt
or
conda install --file requirements.txt
as appropriate. Important caveat: this process will dump every package you’ve installed, not just the ones needed for this project. It is best practice to edit down the list to just those necessary for your project.
Exercise:
Make sure the README file for your project has up-to-date installation instructions including required packages. Generate a requirements.txt file for your project.