I have a 3D numerical array X, of shape=(N,N,6); and a 2D logical array mask, of shape=(N,N). I would like to find all unique 1D subarrays (length=6, along axis=2) of X, searching over the first two dimensions, that are also masked by mask (i.e. have a True value at the same position in the first two dimensions).
This seems (relatively) simple, but I have the additional complication that repeated elements of X that should be non-unique may differ due to floating-point/numerical solution reasons, so I must round the values in X before comparing. However, I require the full-precision unique values for my final results, so I cannot simply use the raw output of a function like np.unique applied to the rounded values without additional processing.
To this end, I decided to find all the indices of the unique arrays within X, i.e., I want the 2D pairs of indices, corresponding to the first two dimensions of X, where the full-precision unique+masked 1D arrays are located.
These specifications together seem to not play nice with the options I can currently find in numpy. It took me a number of attempts to find something that worked.
Attempt #1
My original idea was something like this:
X_unique, unique_idx = np.unique(
X[mask].round(decimals=8),
axis=0,
return_index=True
)
This gives me correct results, but X_unique is, of course, rounded. Furthermore, using a 2D logical array to index the first two dimensions of X collapses them, resulting in an X_unique output of shape=(<# of uniques>,6) and unique_idx of shape=(<# of uniques>,), so I have lost the structural information of the first two dimensions. There is no pattern to the unique array occurrences within X, so this will not do.
Attempt #2
I then tried to find a way to mask X while retaining the original shapes, but realized something like the following could not work anyway:
X_unique, unique_idx = np.unique(
hypothetical_X_masked_but_still_3D.round(decimals=8),
axis=??>,
return_index=True
)
because because the axis kwarg of np.unique cannot accept a tuple of indices (and I wish to search over axes 0 and 1, retaining 2).
Attempt #3
Next, I tried to flatten the first two dimensions, but instead of omitting the elements not in mask, I replaced them with a sentinel np.nan so that I could later restore the flattened array back to its original shape:
X_masked = np.where(mask[...,None], X, np.nan).reshape(N*N,6)
X_unique, unique_idx = np.unique(
X_masked.round(decimals=8),
axis=0,
equal_nan=True,
return_index=True
)
This apparently fails because the equal_nan kwarg of np.unique does not seem to work for arrays of np.nan.
MRE:
nans = np.full((6), np.nan)
print(nans == nans) # > [False False False False False False]
print(np.all(nans == nans)) # > False
This seems like a bug/oversight to me. I understand that np.nan is not comparable with itself, so np.nan == np.nan > False makes sense to me. But surely the whole point of the equal_nan kwarg is to specially deal with NaNs, and the kwarg works correctly with non-array NaNs (ostenisbly; I did not test this).
Anyway, bug or not, I had to find some other replacement sentinel value. I couldn't use a non-numeric placeholder like a string or ... because my actual application requires rounding the values of X before testing for uniques (due to numerical precision issues). I also couldn't use the imaginary part of complex numbers as a flag since my actual array X is also complex.
I eventually found a solution of sorts, but it's more complicated than I would like. Maybe someone else has a better way of doing this.