Download all xlsx files and metainformation from a website

10:17 01 Sep 2023

I am using kaggle browser. Looking to see if all the below can be done on this kaggle notebook.

Website screenshot:

The downloading files here in the website are updated every hour and daily. I don't think any information on this website going to change except the xlsx file content as you see in the website.

I want to download two things from this url: meta information and xlsx files you see in the screenshot.

First, I want to download this meta information and make it a dataframe as given below. Now I am manually selecting them, copying them here. But I want to do it from the url

url_meta_df = 

ID   Type   Name        URL
CAL  Region California  https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_CAL.xlsx
CAR  Region Carolinas   https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_CAR.xlsx
CENT Region Central     https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_CENT.xlsx
FLA  Region Florida     https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_FLA.xlsx

Second: download each xlsx file, save them.

My code: I have tried following based on an answer here in SO

from bs4 import BeautifulSoup

import requests
r  = requests.get(url)
data = r.text
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))

Present output:

None
https://twitter.com/EIAgov
None
https://www.facebook.com/eiagov
None
#page-sub-nav
/
#
/petroleum/
/petroleum/weekly/
/petroleum/supply/weekly/
/naturalgas/
http://ir.eia.gov/ngs/ngs.html
/naturalgas/weekly/
/electricity/
/electricity/monthly/
....

python python-3.x web-scraping beautifulsoup

Your Answer

Privacy & Cookie Consent