Home » Home » How To Download Entire Website using Wget ?

Linux is a badass open-source operating system, with overwhelming feature to make the user happy. If you would like to tinkering your experience more on Linux with online tutorials, here’s how you can knock down any website to your local drive using a linux command “wget“.

The man page of “wget” says it can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site.  This is sometimes referred to as “recursive downloading.”  While doing that, Wget respects the Robot Exclusion Standard (/robots.txt).  Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.

Okay, lets get down to the syntax…

wget   –recursive      –no-clobber      –page-requisites      –html-extension      –convert-links     –restrict-file-names=windows      –domains website.org      –no-parent   <yourwebsite.com>

Example :

wget   –recursive      –no-clobber      –page-requisites      –html-extension      –convert-links     –restrict-file-names=windows      –domains website.org      –no-parent  imsudo.com/category/linux/

When you enter the above command in Terminal, it will download the entire website to the working directory, the explanation of parameters are follows

–recursive:  To download the entire Web site.

–domains website.org: This parameter will not allow to crawl the outside links.

–no-parent: you can remove this parameter if you like to download entire website, or you can specify the page and the directory to dowload. Here this will download the post which are there in /category/linux/ .

–page-requisites: This gets all the elements that compose the page (images, CSS and so on).

–html-extension: To save files as  the .html extension.

–convert-links: To work locally and off-line.

–restrict-file-names=windows: To modify filenames so that they will work in Windows as well.

–no-clobber: To avoid overwrite any existing files (used in case the download is interrupted and
resumed).

For further information you can check the man page of wget for better understanding.

Hope you like this article, feel free to leave your comments and do share.

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA Image

*