February 16th 2016

Download all files of (a) certain type(s) from a website


If you get a "Forbidden" error for certain folders, it might help to start at the root of the site.

wget --recursive --no-directories --level=2 --accept pls,PLS,m3u,M3U \
--reject "*32.pls","*24.pls" \
http://somafm.com/ ; rm robots.txt

This will download all playlists for all soma.fm radio stations into the current folder, omiting low-quality 32 and 24kbit/s stations. Should work with other sites containing playlists.

Breaking it down:

  • --recursive Recursive retrieving. The default maximum depth is 5.
  • --no-directories Do not create a hierarchy of directories when retrieving recursively. All files will get saved to the current directory, without clobbering (if a name shows up more than once, the file$
  • --level=2 Specify recursion maximum depth level. Use it if you know at which level the files are, otherwise you might get huge amounts of pointlessly transferred data (in this example ~500kB with level 2 and 47MB without).
  • --accept --reject Comma-separated lists of file name suffixes or patterns (simple wildcards, not regex) to accept or reject.

If the server kicks you out or blocks you for an unreasonable amount of time, try appending these options:

  • --wait 2 Wait the specified number of seconds between retrievals.
  • --random-wait Causes the time between requests to vary between 0.5 and 1.5 * wait seconds.

And this will download a large collection of transparent tiles from Transparenttextures

wget -nd -r -l 2 -A png http://www.transparenttextures.com/