Faster way to apply a function across rows?
If we wanted to apply a function across rows, were there currently is no built in method, like rank_horizontal, what is the fastest way?
data = {0: [0, 1, 0, 1, 1, 0, 1, 0, 1, 1],
1: [0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
2: [0, 0, 1, 0, 1, 0, 1, 0, 0, 1],
3: [0, 1, 0, 0, 1, 0, 0, 0, 1, 0],
4: [0, 1, 1, 0, 1, 0, 0, 0, 1, 1]}
Input df (pandas):
0 1 2 3 4
0 0 0 0 0 0
1 1 0 0 1 1
2 0 0 1 0 1
3 1 0 0 0 0
4 1 0 1 1 1
5 0 0 0 0 0
6 1 1 1 0 0
7 0 0 0 0 0
8 1 1 0 1 1
9 1 1 1 0 1
In pandas we can do this:
df.rank(axis=1)
In polars, this is one way to do it. Is there a faster way?
(df2.select(
pl.concat_arr(
pl.all()).arr.eval(pl.element().rank())
)
.with_row_index()
.explode('0')
.with_columns(
pl.col('index').cum_count().sub(1).over('index').alias('cc')
).pivot(index = 'index',on = 'cc'))
Output:
0 1 2 3 4
0 3.0 3.0 3.0 3.0 3.0
1 4.0 1.5 1.5 4.0 4.0
2 2.0 2.0 4.5 2.0 4.5
3 5.0 2.5 2.5 2.5 2.5
4 3.5 1.0 3.5 3.5 3.5
5 3.0 3.0 3.0 3.0 3.0
6 4.0 4.0 4.0 1.5 1.5
7 3.0 3.0 3.0 3.0 3.0
8 3.5 3.5 1.0 3.5 3.5
9 3.5 3.5 3.5 1.0 3.5
Also, why does cum_count() start at 1 and not 0?