This will download all playlists for all soma.fm radio stations into the current folder, omiting low-quality 32 and 24kbit/s stations. Should work with other sites containing playlists.
If you get a “Forbidden” error for certain folders, it might help to start at the root of the site.
wget --recursive --no-directories --level=2 --accept pls,PLS,m3u,M3U \
--reject "*32.pls","*24.pls" \
https://somafm.com/ ; rm robots.txt
Breaking it down:
If the server kicks you out or blocks you for an unreasonable amount of time, try appending these options:
This will download a large collection of transparent tiles from Transparenttextures
wget -nd -r -l 2 -A png http://www.transparenttextures.com/
wget --recursive --no-parent --reject="index.html*" "$URL"
--level=n if you need to change this).To also dispense with the useless empty directory tree (something like www.example.com/interesting/project/source/dir_containing_desired_data) you can use --no-host-directories and --cut-dirs.
In this example adding --no-host-directories --cut-dirs=3 will leave you with a folder named dir_containing_desired_data that conatins the desired data (incl. all subfolders that might contain).
However, since usually such folders have non-unique names like src or util, it might be better to remove --no-host-directories and increase the number of cut-dirs by 1.
Here’s a little script that automates the last example:
#!/bin/sh
type awk >/dev/null || {
echo "awk is required."
exit 1
}
test "$#" -eq 1 || {
echo "Please provide exactly one URL to download the last component recursively."
exit 1
}
URL="$1"
# For counting directory components
# remove trailing slash, if any
str="${URL%/}"
# and the protocol
str="${str#http?://}"
wget --recursive --no-parent --reject="index.html*" --cut-dirs="$(echo "$str" | awk -F/ '{print NF-1}')" "$URL"
robots="${str%%/*}/robots.txt"
if test -r "$robots"; then rm "$robots"; fi