Pandas DataFrames have a method assign which will assign values to a column, and which differs from methods like loc or iloc in that it returns a DataFrame with the newly assigned column(s) without modifying any shallow copies or references to the same data.
The assign method uses named arguments to denote column names, and it works fine for column names that are strings, but pandas supports usage of arbitrary python objects as column names.
Suppose I have a DataFrame with integers as column "names":
import pandas as pd
df = pd.DataFrame({
0: [1, 2, 3],
1: [4, 5, 6]})
How can I assign, say, to the column 0?
This doesn't work:
df.assign(0=df[[0]] + 1)
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
Nor does this:
df.assign(**{0: df[[0]] + 1})
TypeError: keywords must be strings
Now, I could use direct assign or loc, but it would modify the underlying data - for example:
df_shallow_copy = df
df[[0]] = df[[0]] + 1
Now df_shallow_copy would have values [2, 3, 4] for column 0 instead of [1, 2, 3].
I could also do a full deep copy of all the columns, but that involves duplicating the data in memory and performing redundant operations:
df_shallow_copy = df
df = df.copy()
df[[0]] = df[[0]] + 1
How can I assign to the column without generating a redundant deep copy and without potentially modifying other objects?