I tried many methods to insert data from dataframe to database.
No method helped me insert quickly it took 2 days and 3 days after that getting timeout error.
Here is few methods i tried
First method i converted dataframe to dictionary and tried uploading database Here is the sample example of data looks like: Note: i'm sharing 20 rows of data but actually i have huge amount of data
conn = psycopg2.connect(
host= 'redshift-####-dev.)00000.us-east-1.redshift.amazonaws.com' ,
database= '*****',
user='****',
password='*****',
port= '5439'
)
print("Succesful Connection to RedShift Dev")
cur = conn.cursor ()
df_dic =[{'case_id': 69370, 'column_name': 'subject', 'split_text': 'working', 'split_text_cnt': 1, 'load_ts': '2023-
12-15'}, {'case_id': 69370, 'column_name': 'subject', 'split_text': 'scenes', 'split_text_cnt': 1, 'load_ts': '2023-12-15'}, {'
case_id': 69370, 'column_name': 'subject', 'split_text': 'intended', 'split_text_cnt': 1, 'load_ts': '2023-12-15'}, {'case_id':
69371, 'column_name': 'subject', 'split_text': 'redirected', 'split_text_cnt': 1, 'load_ts': '2023-12-15'}, {'case_id': 69371,
'column_name': 'subject', 'split_text': 'ge', 'split_text_cnt': 2, 'load_ts': '2023-12-15'}, {'case_id': 69371, 'column_name':
'subject', 'split_text': 'sensor', 'split_text_cnt': 1, 'load_ts': '2023-12-15'}, {'case_id': 69371, 'column_name': 'subject',
'split_text': 'flush', 'split_text_cnt': 1, 'load_ts': '2023-12-15'}, {'case_id': 69371, 'column_name': 'subject', 'split_text
': 'motion', 'split_text_cnt': 1, 'load_ts': '2023-12-15'}, {'case_id': 69371, 'column_name': 'subject', 'split_text': 'led', '
split_text_cnt': 1, 'load_ts': '2023-12-15'}, {'case_id': 69371, 'column_name': 'subject', 'split_text': 'fixture', 'split_text
_cnt': 1, 'load_ts': '2023-12-15'}, {'case_id': 69371, 'column_name': 'subject', 'split_text': 'contact', 'split_text_cnt': 1,
'load_ts': '2023-12-15'}]
sql = "INSERT INTO odey.sfc_ca_sit_di (case_id,column_name,split_text,split_text_cnt,load_ts) VALUES (%(case_id)s,%(column_name)s,%(case_subject)s,%(Case_Subject_Split_Count)s,%(load_date)s)"
cur.executemany(sql, df_dic)
conn.commit()
@ this methods i tried
Second method i tried dataframe to database which is converting into tuple and executed query final_out is dataframe
sql = "INSERT INTO sey.sfdse_sp_di (case_id,column_name,split_text,split_text_cnt,load_ts) VALUES (%s,%s,%s,%s,%s)"
for row in final_out.values.tolist():
cur.execute(sql, tuple(row))
#print(tuple(row))
conn.commit()
Both methods are taking days of time
Can anyone please suggest better methods to insert data very quickly
it's huge data roughly 600 000 records data