Iterator as argument in numpy weirdness

I stumbled on something rather unexpected in numpy the other day when I, lazy as I am, wrote an iterator using itertools and the code somewhere down the line tried to np.sum it.

Dependencies

I’m using numpy-1.9.1 and python2.7, I haven’t looked this issue with other versions.

Small Example

With identity = lambda x: x, N_10 = range(10) we get

	`sum(.)`	`np.sum(.)`
`map(identity, N_10)`	`45`	`45`
`imap(identity, N_10)`	`45`	`<itertools.imap at 0x2c28610>`

Down the rabbit hole

It seems that it’s not only the np.sum that posses this unintuitive behaviour with iterators, but also np.min, np.max among others.

I wrote some code to try all numpy functions with an iterator and see what happens.

from __future__ import print_function, division
import numpy as np
from itertools import *
from functools import *
import inspect

it = lambda: imap(lambda x: x, range(10))
def all_functions(module):
    return filter(
        inspect.isfunction,
        map(
            partial(getattr, module),
            sorted(dir(module))
        )
    )

fs = all_functions(np)

def tryprint(a):
    try:
        print(a)
    except:
        pass

def tryapply(f, g=it):
    try:
        return f, f(g())
    except:
        return f, "FAILED"

not_failed = lambda (f, f_return): f_return != "FAILED"

map(
    tryprint,  # NOTE something returned didn't have a proper __repr__ (otherwise just print)
    filter(
        not_failed,
        map(
            tryapply,
            fs
        )
    )
)

The relevant output: gist link

Grouping by “type” and filtering out the irrelevant functions

Shape

(<function alen at 0x7fda5dd459b0>, 1)
(<function ndim at 0x7fda5dd45b18>, 0)
(<function shape at 0x7fda5dd45320>, ())
(<function size at 0x7fda5dd45c08>, 1)

The problem probably originates from the shape of the iterator being interpreted as np.shape(imap(.)) = () i.e. the degenerate case of np.ndarray. This similarly to np.array(0). Would have been more expected if it was interpreted as a python list.

Pass-through

(<function all at 0x7fda5dd456e0>, <itertools.imap object at 0x7fda5b002210>)
(<function alltrue at 0x7fda5dd455f0>, <itertools.imap object at 0x7fda5b002250>)
(<function amax at 0x7fda5dd458c0>, <itertools.imap object at 0x7fda5b002310>)
(<function amin at 0x7fda5dd45938>, <itertools.imap object at 0x7fda5b002390>)
(<function any at 0x7fda5dd45668>, <itertools.imap object at 0x7fda5b0023d0>)
(<function amax at 0x7fda5dd458c0>, <itertools.imap object at 0x2f68c50>)
(<function maximum_sctype at 0x7fda60d4a230>, <itertools.imap object at 0x2f68950>)
(<function nansum at 0x7fda5d9bacf8>, <itertools.imap object at 0x2f68f10>)
(<function prod at 0x7fda5dd45a28>, <itertools.imap object at 0x2f7b110>)
(<function product at 0x7fda5dd45500>, <itertools.imap object at 0x2f7b410>)
(<function sometrue at 0x7fda5dd45578>, <itertools.imap object at 0x2d2f410>)
(<function sum at 0x7fda5dd45488>, <itertools.imap object at 0x3026d90>)

By the looks of it the degenerate case isn’t handled properly.

The all and any-family is working better than in numpy-1.8 and handles integers properly (but not iterators, even if we would interpret it as an object) stackoverflow question

reduce(op, .)-family (which is all pass-through’s except sctype) probably has a special case for the degenerate case (being identity). They should act the same in the degenerate case going from ((id(op) op a_0) op a_1)... to id(op) op a_0. Where id(op) is the identity element of the group <R, op>. If this would have been the case the application of an iterator would fail (which is better).

maximum_sctype does not return a type.

Number

(<function argmax at 0x7fda60d76e60>, 0)
(<function argmin at 0x7fda60d76ed8>, 0)
(<function argsort at 0x7fda60d76de8>, 0)
(<function nanargmax at 0x7fda5d9bac80>, 0)
(<function nanargmin at 0x7fda5d9bac08>, 0)
(<function rank at 0x7fda5dd45b90>, 0)

arg*-family probably has a special case for the degenerate case. arg* doesn’t make much sense in the degenerate case, compare np.argmax(4, axis=0) with np.argmax([4], axis=0). The degenerate’s index-space is zero-dimensional so it should return () additionally it doesn’t even have an axis (not even axis=0).

No comment on rank.

Boolean

(<function iscomplex at 0x7fda5d9919b0>, False)
(<function iscomplexobj at 0x7fda5d991aa0>, False)
(<function isreal at 0x7fda5d991a28>, True)
(<function isrealobj at 0x7fda5d991b18>, True)
(<function isscalar at 0x7fda5dd47ed8>, False)
(<function issctype at 0x7fda60d4a320>, False)
(<function iterable at 0x7fda5d9a52a8>, 1)

What constitutes something to be real or realobj? If imap is interpreted as an object it’s not real (otherwise it is (in this case)).

Off-topic: iterable should be isiterable and have a boolean codomain.

Cumulation

(<function cumprod at 0x7fda5dd45aa0>, array([<itertools.imap object at 0x7fda5b002c50>], dtype=object))
(<function cumproduct at 0x7fda5dd457d0>, array([<itertools.imap object at 0x7fda5afae050>], dtype=object))
(<function cumsum at 0x7fda5dd45758>, array([<itertools.imap object at 0x7fda5afae110>], dtype=object))
(<function gradient at 0x7fda5d9b2398>, [])

gradient is interpreting the imap as an object. The cumop-family has the same issues as reduce(op, .).

Complex

(<function imag at 0x7fda5d991938>, array(0, dtype=object))
(<function real at 0x7fda5d9918c0>, array(<itertools.imap object at 0x2d2f0d0>, dtype=object))
(<function real_if_close at 0x7fda5d991c80>, array(<itertools.imap object at 0x2d2f150>, dtype=object))

This behaviour is really unintuitive and probably originates from isreal’s unintuitive behaviour.

Array with `dtype=imap`

(<function asanyarray at 0x7fda60d6e5f0>, array(<itertools.imap object at 0x7fda5b002450>, dtype=object))
(<function asarray at 0x7fda60d6e578>, array(<itertools.imap object at 0x7fda5b002590>, dtype=object))
(<function asarray_chkfinite at 0x7fda5d9b21b8>, array(<itertools.imap object at 0x7fda5b002610>, dtype=object))
(<function ascontiguousarray at 0x7fda60d6e668>, array([<itertools.imap object at 0x7fda5b002690>], dtype=object))
(<function asfortranarray at 0x7fda60d6e6e0>, array([<itertools.imap object at 0x7fda5b0026d0>], dtype=object))
(<function asmatrix at 0x7fda5d9b38c0>, matrix([[<itertools.imap object at 0x7fda5b002790>]], dtype=object))
(<function atleast_1d at 0x7fda5dd645f0>, array([<itertools.imap object at 0x7fda5b002890>], dtype=object))
(<function atleast_2d at 0x7fda5dd64aa0>, array([[<itertools.imap object at 0x7fda5b002950>]], dtype=object))
(<function atleast_3d at 0x7fda5dd64b18>, array([[[<itertools.imap object at 0x7fda5b0029d0>]]], dtype=object))
(<function broadcast_arrays at 0x7fda5d9b7aa0>, [array(<itertools.imap object at 0x7fda5b002a90>, dtype=object)])
(<function copy at 0x7fda5d9b2320>, array(<itertools.imap object at 0x7fda5b002b50>, dtype=object))
(<function diagflat at 0x7fda5d997aa0>, array([[<itertools.imap object at 0x7fda5afae290>]], dtype=object))
(<function asmatrix at 0x7fda5d9b38c0>, matrix([[<itertools.imap object at 0x7fda5afae410>]], dtype=object))
(<function msort at 0x7fda5d9b3140>, array(<itertools.imap object at 0x2f68e90>, dtype=object))
(<function ravel at 0x7fda5dd45230>, array([<itertools.imap object at 0x2f7b610>], dtype=object))
(<function require at 0x7fda60d6e758>, array(<itertools.imap object at 0x2d2f190>, dtype=object))
(<function sort at 0x7fda60d76d70>, array(<itertools.imap object at 0x33a7890>, dtype=object))
(<function squeeze at 0x7fda5dd450c8>, array(<itertools.imap object at 0x30cbd90>, dtype=object))
(<function transpose at 0x7fda60d76c08>, array(<itertools.imap object at 0x3026d10>, dtype=object))
(<function unique at 0x7fda5d753c08>, array([<itertools.imap object at 0x3026bd0>], dtype=object))

All of these functions is handling imap as an object and therefor operating as the degenerate case. For some of which the degenerate case is unintuitive in itself.

Array with `dtype=number`

(<function argwhere at 0x7fda60d6e848>, array([[0]]))
(<function column_stack at 0x7fda5d9bd5f0>, array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]))
(<function dstack at 0x7fda5d9bd668>, array([[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]]))
(<function hstack at 0x7fda5dd64c08>, array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
(<function indices at 0x7fda5dd47de8>, array([], shape=(10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9), dtype=int64))
(<function nonzero at 0x7fda5dd452a8>, (array([0]),))
(<function ones_like at 0x7fda60d6e398>, array(1, dtype=object))
(<function vstack at 0x7fda5dd64b90>, array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]]))
(<function vstack at 0x7fda5dd64b90>, array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]]))
(<function zeros_like at 0x7fda60d6e2a8>, array(0, dtype=object))

argwhere and nonzero is handling it as the degenerate case.

*_like-family is handling it as the degenerate case.

The stack-family works as one should expect (if imap is interpreted as an list).

indices is iterating through the iterator interpreting it as a list, however is this case compared to (all?) other the argument isn’t data but instead a list of index dimensions.

Summary

np.array(it()) returns array(<itertools.imap object at 0x2f45490>, dtype=object) and are thereby handling the iterator as a degenerate case instead of consuming it.

Contract for the return type in documentation is not guaranteed in many cases for iterators as input.

Note

Realized when I was done that some functions weren’t registering as functions by inspection.isfunction, one example being np.minimum. I should (if I have the time to) redo this analysis.