Let's say I have the following two dataframes, which contain some common information:
df_1 = pd.DataFrame({"first_name": ["Alice", "Bob", "Charlie", "Alice"], "age": [15, 16, 15, 17], "last_name": ["Smith", "Smith", "Jones", "Doe"]})
df_2 = pd.DataFrame({"first_name": ["Alice", "Alice", "Bob"], "last_name": ["Smith", "Doe", "Smith"], "dog": ["Bingo", "Fido", "Rover"]})
Now, I'd like to add a column "has_dog" to df_1, with boolean values.
In principle, I want to go through the rows of df_1 and compare them to df_2. If the person from df_1 has matching values of "first_name" and "last_name" in df_2, then "has_dog" should be True for the entry with that first and last name in df_1, and False otherwise.
Note that there are two people in the dataset who have the same first name, and a different pair of people who have the same last name, so identifying matching entries between the dataframes has to use multiple columns.
Is there an efficient way to do this that doesn't require iterating over dataframe rows? (The actual dataframes that I'm trying to do this kind of operation on are orders of magnitude bigger, so performance optimization is important.)