I am using kaggle browser. Looking to see if all the below can be done on this kaggle notebook.
Website url: click here
Website screenshot:
The downloading files here in the website are updated every hour and daily. I don't think any information on this website going to change except the xlsx file content as you see in the website.
I want to download two things from this url: meta information and xlsx files you see in the screenshot.
First, I want to download this meta information and make it a dataframe as given below. Now I am manually selecting them, copying them here. But I want to do it from the url
url_meta_df =
ID Type Name URL
CAL Region California https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_CAL.xlsx
CAR Region Carolinas https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_CAR.xlsx
CENT Region Central https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_CENT.xlsx
FLA Region Florida https://www.eia.gov/electricity/gridmonitor/knownissues/xls/Region_FLA.xlsx
Second: download each xlsx file, save them.
My code: I have tried following based on an answer here in SO
from bs4 import BeautifulSoup
import requests
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
Present output:
None
https://twitter.com/EIAgov
None
https://www.facebook.com/eiagov
None
#page-sub-nav
/
#
/petroleum/
/petroleum/weekly/
/petroleum/supply/weekly/
/naturalgas/
http://ir.eia.gov/ngs/ngs.html
/naturalgas/weekly/
/electricity/
/electricity/monthly/
....
