Python Type Definition Proposal
A proposal for Python type definition inspired by Haskells algebraic data types. By using the representative power of algebra to define types in Python one attains a low clutter view of the composed types for variables/arguments and return types.
It’s not claiming to be able to represent any composed type (yet), but that is not the main goal of it.
Usage
Inline to aid the coming-back-next-month scenario
books = map(get_address_book, people) # :: [{name :: str: number :: str}]
alternative_books = map(get_address_book, people) # :: [{str: str}]
A deeper understanding of the datatype for the arguments and returned value from the documentation.
def foo(books):
"""Documentation about the function/method as usual.
Arguments
---------
books :: [{name :: str: number :: str}]
Documentation about the variable as usual.
Returns
-------
n_contacts :: [int]
Documentation about the variable as usual.
"""
return map(len, books)
This compared to, if any type definition at all, for example the
numpy
standard way of doing it.
def foo(books):
"""Documentation about the function/method as usual.
Arguments
---------
books : list
Documentation about the variable as usual.
Returns
-------
n_contacts : list
Documentation about the variable as usual.
"""
return map(len, books)
Instead of thinking for your own, or writing it down in plain text in the
documentation n_books is a list of integers
you have a compact and stringent
way to write composed types that minimizes the mental clutter.
Operators
Is of type
N :: T
: The name N is of type T
Or
T1 | T2
: It is either of type T1 or T2, which tells that the code must
be guarded for both types.
Associative operation which makes T1 | T2 | T3 | ...
unambiguous.
An useful (and unfortunately common) use is T|None
, i.e. nullable.
Function
T -> R
: A function which takes one argument of type T
and
returns a value of type R
.
For other number of arguments 0 to n: -> R
(or *() -> R
), T -> R
(or *(T1, ) -> R
), *(T1, T2) -> R
,
*(T1, T2, ..., Tn) -> R
.
The syntax for this is still under consideration, see notes on “Work needed to be done”.
Tuple
(T1, T2)
: A tuple with the inner types T1
and T2
. Extends to
()
, (T1, )
, (T1, T2)
, (T1, T2, T3)
, (T1, ..., Tn)
.
Sequence
[E]
: A sequence with elements of type E
.
See note about sequences under “Work needed to be done”.
Set
{E}
: A set with elements of type E
.
Dictionary
{K: V}
: A dictionary with keys of type K
and values of type V
.
Grouping
(T)
: Grouping works as in mathematics, the expression inside the grouping
is evaluated first. Used for non-associative operations like ->
.
Example: decorator :: (T -> int) -> (T -> float)
Atomic
bool
: bool
, types.BooleanType
.
int
: int
, types.IntType
.
float
: float
, types.FloatType
.
str
: basestring
.
None
: types.NoneType
.
Polymorphic
enumerate :: [T] -> [(count :: int, T)]
Where T
and the other T
is of any
and the same type. Could also be used for non-functions zip(a, a) :: [(T, T)]
Pragmatic usage
Stop with names instead of types all the way down
service_hooks :: {service: [hook]}
, remember readability is key.
Extensions to numpy datatypes
Add the dimensions of the ndarray, in Haskell the ndarray
would be parameterized type.
Either you have the parameters to be the size in integers
(represented by a letter which in turn represents the dimension,
as often done in mathematics)
or the name of the dimension directly.
timeseries :: ndarray(M, T)
timeseries :: ndarray(feature, time)
runs :: ndarray(run, feature, time)
It’s much easier to verify by eye that you are by for example
np.sum(runs, axis=0)
actually summing all the runs, or by indexing
first_feature = runs[:, 0, :]
actually getting the first feature for all runs and times.
One could also add the dtype
when it’s a non-float64.
timeseries_count :: ndarray(M, T, dtype=int)
def features(data):
"""description
Arguments
---------
data :: ndarray(time)
description
Returns
-------
features :: ndarray(feature, time)
description
"""
return ...
def nan_feature_count(data):
"""description
Arguments
---------
data :: ndarray(feature, time)
description
Returns
-------
features :: ndarray(time, dtype=int)
description
"""
return ...
Work needed to be done
Be able to define properties
bisect :: *(Sorted [T], T) -> (index :: int)
Handle the consumption difference between iterator and list (or if it doesn’t matter)
Be able to breakout common or long parts out of a definition
dict_utils.reduce :: *([D], reducer :: [V] -> V) -> D
where D = {K: V}
ordereddict
, frozenset
and other standard data types with properties.
Ordered {K: V}
, Frozen {E}
.
Default arguments randint :: (seed :: int<0>) -> int
, default
arguments with “Not set” i.e. None
.
*args
, **kwargs
For example isinstance
and type
where you are supposed to use a tuple
with variable length. (T, ...)
A mark to differentiate pure (no side-effects) with unpure (side-effects).
Deal with types.EllipsisType
, types.ClassType
, types.TypeType
if NaN is a possible value for numpy array
ndarray(feature|NaN, time)
.
Type definition examples from the standard library
enumerate :: [T] -> [(int, T)]
int :: int -> int
sorted :: [T] -> [T]
sort :: [T] -> None
zip :: *([T1], ..., [Tn]) -> [(T1, ..., Tn)]
partial :: (T1 ... -> R) -> T1 -> (... -> R)
all :: [T] -> bool
chr :: int -> str
cmp :: *(T, T) -> bool
def groupby(iterable, keyfunc=None):
"""groupby description is put here.
Arguments
---------
iterable :: [T]
iterable description is put here.
keyfunc :: T -> K
keyfunc description is put here.
Returns
-------
grouping :: [(K, [T])]
grouping description is put here.
"""
itertools.takewhile :: *((T -> bool), [T]) -> [T]
itertools.combinations :: *([T], repeat :: int) -> [(T, ...)]
reduce :: *(*(T, T) -> T, [T]) -> T