The following are guidelines and suggestion that might help produce code that is easier to integrate. These are based on and may be biased by Jon Crall’s opinions.
Carefully think about inputs and outputs. Try to write functions in such a way that the inputs are easy to construct. If there is a bug, it helps to be able to quickly define a simple entrypoint into the code that exercies the behavior in question. Doctests are great for this!
Don’t try to do too much in a single step. The outputs should be a reasonably simple transformation of the inputs. If you are doing a lot of different things to the inputs and have many intermediate forms, perhaps it’s better to split up the functionality into multiple stages. This is not always true. Use best judgement.
Think about serializability. If you have a lot of nested custom Python classes, it may be difficult to reconstruct states in debugging, or refactor later if a better way of doing something is discovered.
Try make code easy for static analysis. Try to avoid decorators and other code that is run at import time. When possible it’s best to simply define classes and functions at import time, and then leave any complex logic to the functions themselves. As always, there are exceptions to this rule.
Try to avoid polymorphism in cases where mixin classes would work well. It’s usually better to start with something non-polymorphic and then refactor it to be polymorphic later if it turns out that that strategy is useful. Often mixin classes are enough. In general, try to keep code structures flat when possible.
Use pathlib (or ubelt.Path) instead of os.path methods.
Don’t import pyplot in the global scope. It has runtime side effects we would like to avoid.
Use
python -m
to invoke scripts via their module name, instead of specifying the path to them directly.
Valuable Code Properties¶
Code Patterns¶
Idempotence - https://en.wikipedia.org/wiki/Idempotence
Examples and Entry Points - Try to make it easy to test small units of code. See the doctests section for one good way to do this.
When writing configurations use default values of “auto”, if you can infer a reasonable default behavior at runtime. Using None also works, depending on the context.
Testing¶
Doctests - https://github.com/Erotemic/xdoctest
“Demo Data” - see kwcoco demodata
Continuous Integration - CI Server
Filesystems¶
- User-agnostic or relative paths - Dont hard code your paths. Use things
like “~” or “$HOME” environment variables. Use the
python -m
mechanism to invoke code by module name rather than using the absolute path (some exceptions apply).
- Content addressable data - IPFS, DVC, Hasing - Make dedup easy, this may
be the future of data storage.
fpath
for file-paths,dpath
for directory-paths,path
for anunspecified path type. If possible, avoid
dir
because it conflicts withbuiltins.dir
.
Misc¶
Statically parsable declarative configurations are useful, this is why I like scriptconfig over argparse. I would like to add jsonargparse integration as the mechanism of handling nested configurations, so it should be compatible with jsonargparse powered things like LightningCLI in the future.
When you are writing a python package, give your module a distinctive name. Don’t name it “lib” or “model” or “net”, do something that wont conflict with other python packages. This is a tip based off of observing this anti-pattern in research repos.