Hemanth's Scribes

cli

Wget - tricks and tips

Author Photo

Hemanth HM

Thumbnail

Wget and Curl make such a wonderful pair in Linux. Here are some useful tips:

Basic Downloads:

# Download a single file/page
wget http://required_site/file

# Download the entire site using the -r option
wget -r http://required_site/

# Download certain file types using the -A option
wget -r -A pdf,mp3 http://required_site/

# Follow external links using the -H option
wget -r -H -A pdf,mp3 http://required_site/

# Limit the sites to follow using the -D option
wget -r -H -A pdf,mp3 -D files.site.com http://required_site/

# Number of levels to go when using -r
wget -r -l 2 http://required_site/

# Download all images from the site
wget -erobots=off -r -l1 --no-parent -A .gif,.jpg http://required_site/

Advanced Tricks:

# Download content protected by referer and cookies
# Step 1: get base url and save its cookies in file
wget --cookies=on --keep-session-cookies --save-cookies=cookie.txt http://first_page

# Step 2: get protected content using stored cookies
wget --referer=http://first_page --cookies=on --load-cookies=cookie.txt \
     --keep-session-cookies --save-cookies=cookie.txt http://second_page

# Mirror website to a static copy for local browsing
wget --mirror -w 2 -p --html-extension --convert-links -P http://required_site

# Wget to work in the background
wget -t 45 -o log http://required_site &

# Wget for FTP (login and password handled automatically)
wget ftp://required_site

# Read the list of URLs from a file
wget -i file
#linux#bash
Author Photo

About Hemanth HM

Hemanth HM is a Sr. Machine Learning Manager at PayPal, Google Developer Expert, TC39 delegate, FOSS advocate, and community leader with a passion for programming, AI, and open-source contributions.