Typing in numpy
I recently updated my archeryutils package to use numpy typing ‘properly’. Whilst I was already using type checking, it was in a non-ideal fashion for numpy due to a lack of experience/understanding when originally writing the code. I also find the numpy typing documentation hard to read when it comes to practical implementation, so summarise what I found here.
There is already a lot out there around numpy typing, so I’ll keep this brief and focus on communicating what I felt was missing from those. I would advise reading the numpy typing documentation alongside this and asume you are familiar with type hinting in Python .
Basics
If you want to use numpy type hints you will want to import the module:
|
|
If using mypy for checking typing you’ll want to add the
numpy plugin
by adding the following to your pyproject.toml
config:
|
|
Key components
As I see it utilisation of typing with numpy in your code comes down to the following two components:
NDArray
This is a numpy array, and can be used wherever a variable will be a numpy array type.
It is possible to specify a dtype
if you want some additional control/restriction over
your types.
This is done with
numpy’s dtype class
For example, a function that we know will return an array of floats:
|
|
We can also use union typing with dtypes when we need some flexibility:
|
|
ArrayLike
This is a useful ‘catch-all’ that allows various input types (scalar, list, array) to be passed to functions. Under the hood it is a union of the many types that numpy ‘agressively’ casts to arrays. Using this as the input type to functions provides a lot of flexibility as to what can be accepted - e.g. users can pass scalars, lists, and arrays.
Note that there may be a small price to pay in that you might need to explicitly cast
ArrayLike
to an NDArray
for certain numpy internal functions that expect NDArray
inputs.
This is much easier than trying to manually achieve this behaviour using unions and
overloads (as I had been doing previously), however!
e.g. here there is no meaning for float ** list
so we must enforce creation of an
np.array
before using **
:
|
|
Admittedly using ArrayLike
reduces control over the dtype
of inputs.
If this is important perhaps use NDArray[np.dtype]
internally instead, and build in
dtype checks with Raises
in user-facing code e.g. for passing an array of strings to
something that needs numbers.
Bear in mind that typing in Python is optional and static, so should not be treated
(IMO) quite the same way as for a checked compiled language. Indeed, the flexibility
of Python types can be regarded as a ‘feature’.
Lessons
We can achieve easy overloading as a result of ArrayLike
(see above).
However, there is unfortunately not ‘one obvious way to use this’ so exercise sensible
judgment to balance simplicity/flexibility/safety/control as appropriate for your code.
Relinquish some control and assume that numpy typing knows what it is doing with its
NDArray
under the hood.
This falls slightly under a more general mantras I have come to adopt of
“typing in Python is good, but you can’t treat it the same as statically typed
languages”, and “don’t try to be overly-specific as this will cause tears”.
If a function that returns an NDArray
returns what is realistcally a scalar,
it is what numpy calls a
0D array
.
Your end users are unlikely to ever notice this thanks to
duck typing
,
but internally if you know a type is guaranteed to be a scalar (0D array) and need
to specify this for subsequent type checking you can use typing
’s cast
:
|
|
As an example of how I improved my code using these concepts in practice see the pull request that prompted me to write this short post .
Acknowledgments
I would thank Liam Pattinson for pointing me towards doing this in the first place when things were starting to get messy with mypy.