I had a multi-indexed pandas series with more than two levels in the row multi-index.
arrays = [
["bar", "baz", "foo", "qux"],
["one", "two",],
["a", "b","c" ]
]
index = pd.MultiIndex.from_product(arrays, names=["first", "second","third"])
s = pd.Series(np.random.randn(24), index=index)
I was able to slice this dataframe like so:
df.loc[(slice(None), slice(None), "a")
with output
first second
bar one 1.574135
two 1.557756
baz one -0.023177
two -0.537326
foo one -0.183099
two -1.776498
qux one 0.425121
two 0.451778
dtype: float64
To make certain kinds of plots easier, I reset the index to incorporate the third column as a row
s = s.reset_index(level = [2])
third 0
first second
bar one a 1.574135
one b -2.450800
one c 0.966331
two a 1.557756
two b -0.800615
two c 0.405982
baz one a -0.023177
one b 1.743043
one c 0.847148
two a -0.537326
two b -1.157622
two c -0.229345
foo one a -0.183099
one b 1.047875
one c -1.980324
two a -1.776498
two b -1.169729
two c -1.509195
qux one a 0.425121
one b 0.903984
one c -0.455681
two a 0.451778
two b 0.215866
two c -0.090196
Now, slicing using the same format results in a KeyError
s.loc[(slice(None),"one")]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File c:filepath\pandas\core\indexes\base.py:3812, in Index.get_loc(self, key)
3811 try:
-> 3812 return self._engine.get_loc(casted_key)
3813 except KeyError as err:
File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'one'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[65], line 1
----> 1 s.loc[(slice(None),"one")]
File c:filepath\pandas\core\indexing.py:1184, in _LocationIndexer.__getitem__(self, key)
1182 if self._is_scalar_access(key):
...
3822 # InvalidIndexError. Otherwise we fall through and re-raise
3823 # the TypeError.
3824 self._check_indexing_error(key)
KeyError: 'one'
After scratching my head for a while, I thought that maybe because there were only two arguments passed during slicing, pandas was assuming that the second one was a column slice, even though it was inside the parenthesis. Sure enough, adding a slice(None) outside to let pandas know that the second argument wasn't for a column led to
s.loc[(slice(None),"one"),slice(None)]
third 0
first second
bar one a 1.574135
one b -2.450800
one c 0.966331
baz one a -0.023177
one b 1.743043
one c 0.847148
foo one a -0.183099
one b 1.047875
one c -1.980324
qux one a 0.425121
one b 0.903984
one c -0.455681
Why is pandas behaving this way? Is this just a quirk? What should I do to avoid this problem in the future?