How can I apply a code to a zip file while also changing the filetype in the output zip folder?

13:51 16 Feb 2024

I have a code that I am trying to use on a zip file in python and I want the output to be another zip file, but with the files changed to .txt or .csv.
The code I have runs without error, but fails to change the file type and I am unable to determine if the contents are being changed.
I am able to change single .psa files taken from the zip one at a time, and the outcome is a .csv file with the desired modifications.

Breakdown:
I have a zip folder that consists of .psa files.
I have a code that can read an individual .psa file and output a .csv with the modifications.
I want to apply this code to the zip folder with all the .psa files instead of manually carrying out this process for individual files.
Currently the code that applies to the individual file requires me to name the output file within the code. It would be helpful if the resulting file carried the same name as the input, just with the filetype designation changed to .csv.

Bonus: It would be even nicer if the output of the zip folder was just one .csv file which included the contents of all output files within the output zip folder. I think this would solve the naming issue. (So basically just one file output instead of many.)

Below is the code which works on individual files: This code takes a .psa file and converts it to a .csv while also changing the contents.

import pandas as pd
import re

`fname = '1 Area 2 - store 15 group.psa'
df = pd.read_csv(fname, usecols=[0,1,2], header=None, names=['type','upc', 'num'])
store = re.search(r'store\s+(\d+)', fname).group(1)
df = df[df['type'] == 'prod'].drop(columns=['type','upc']).assign(store=store)
df.to_csv("output.csv", index=False) `

Below is the code I'm applying to the zip file. The result is that a new zipfile is being created, but the files within it are the same .psa filetype.

import pandas as pd
import re
import zipfile
import os
 
input_zip_path = r'test.zip'
output_zip_path = 'results.zip'
 
def process_file(file_path):
    df = pd.read_csv(file_path, usecols=[0,1,2], header=None, names=['type','upc','num'])
    store = re.search(r'store\s+(\d+)',file_path).group(1)
    df = df[df['type']=='prod'].drop(columns=['type','upc']).assign(store=store)
    df.to_csv("name.csv", index=False)
    return df
 
with zipfile.ZipFile(input_zip_path, 'r') as zin, zipfile.ZipFile(output_zip_path, 'w') as zout:
    for file_info in zin.infolist():
        with zin.open(file_info) as f, open(file_info.filename, 'w') as fout:
            fout.write(str(f.read()))
        modified_df = process_file(file_info.filename)
        modified_df.to_csv(file_info.filename, index=False)
        zout.write(file_info.filename)
        os.remove(file_info.filename)

python pandas jupyter-notebook zip

Your Answer

Privacy & Cookie Consent