Avoid wget appending index.html to links
21:35 10 Jul 2015

I am trying to make a static HTML copy of a Wordpress site that I can upload somewhere else, like Github pages.

I use this command:

Option 1:

wget -k -r -l 1000 -p -N -F -nH -P ./website http://example.com/website

It downloads the entire site etc. but my main issue here is that it adds "index.html" to every single link. I understand the need for this to view the site locally, but it is not required on a static website host.

So is there a way to tell wget not to modify all the links and add index.html to them?

For example it creates:

Hello world!

On the default Worpress Hello World post.

Option 2:

Use mirroring command with -k convert links:

wget -E -m -p -F -nH -P ./website http://example.com/website

Then it will not apply index.html and retain the domain name.

But then it also crawls up to http://example.com and indexes everything there. I do not want that. I want the /website to be the root (Because Wordpress multi site). How do I fix this?

I also want it to rewrite the hostname instead of stripping it or keeping it. So it should go from http://example.com/website/ (Wordpress multi site) to http://example.org/ Is this possible or do I need to run sed/awk on all files after download?

wget