How to perform a max_by window aggregation in Polars?
14:05 26 Apr 2022

I am trying to use polars to do a window aggregate over one value, but map it back to another.

For example, if i wanted to get the name of the max value in a group, instead of (or in combination to) just the max value.

assuming an input of something like this.

df = pl.from_repr("""
┌───────┬──────┬───────┐
│ label ┆ name ┆ value │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ str  ┆ f64   │
╞═══════╪══════╪═══════╡
│ a.    ┆ foo  ┆ 1.0   │
│ a.    ┆ bar  ┆ 2.0   │
│ b.    ┆ baz  ┆ 1.5   │
│ b.    ┆ boo  ┆ -1.0  │
└───────┴──────┴───────┘
""")
# 'max_by' is not a real method, just using it to express what i'm trying to achieve. 
df.select(pl.col('label'), pl.col('name').max_by('value').over('label'))

i want an output like this

shape: (2, 2)
┌───────┬──────┐
│ label ┆ name │
│ ---   ┆ ---  │
│ str   ┆ str  │
╞═══════╪══════╡
│ a.    ┆ bar  │
│ b.    ┆ baz  │
└───────┴──────┘

ideally with the value too. But i know i can easily add that in via pl.col('value').max().over('label').

shape: (2, 3)
┌───────┬──────┬───────┐
│ label ┆ name ┆ value │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ str  ┆ f64   │
╞═══════╪══════╪═══════╡
│ a.    ┆ bar  ┆ 2.0   │
│ b.    ┆ baz  ┆ 1.5   │
└───────┴──────┴───────┘
python dataframe window-functions python-polars