I'm trying to save a pandas.DataFrame object to disk, as a JSON. The slight complication is that I need to store some additional data alongside it. So my intended JSON would be
{
"additionalData": ...,
"table": {
"schema": ...,
"data": ...
}
}
I can obtain the desired layout for table by doing df.to_json(orient='table'), however this a str and not a dict. So if I do something like
dict_to_write = {
'additionalData': ...,
'table': df.to_json(orient='table'),
}
when dict_to_write gets written to disk (which already involves a json.dumps step), the JSON looks like
{
"additionalData": ...,
"table": "{\"schema\": ...,\"data\": ...}"
}
where the table has been double-stringified.
I can fix this by doing
dict_to_write = {
'additionalData': ...,
'table': json.loads(df.to_json(orient='table')),
}
instead. This turns df first into a string, and then into a dict with the desired structure, stopping it from being double-stringified.
It seems like df.to_dict should make this possible in one call to me. Is there any particular reason df.to_dict(orient='table') is not supported?
(I'm not sure if this is best placed here, as it's not really a question and more of a discussion, please suggest an alternative place if not.)