Utilities for backing up client-side website data

I was looking into various methods for backing up websites on my localhost and have come up with 3 options. I’m sure there are more, but these require tools you already have.

Internet Explorer 6 (I don’t have IE7 installed yet, so I did it with IE6)

Internet explorer 6 offers the ability to “work offline” which downloads all data to your localhost and allows you to act as tho you are viewing the live site. In order to enable this, you bookmark the page and select “make available offline.” If you want more than just that one page, goto customize and you are prompted for getting pages it links too also. What is nice is that you can specify how deep you want IE6 to go. If you select 2, for example, all links on the page will be followed and downloaded. All links on these pages will be treated similarly and the process stops. You are then prompted for how you want to synchronize these local copies. You have 2 options: manually synchronizing and scheduling. Scheduling will do it at a specified time every n number of days. Since IE6 is “working offline” paths don’t need to be modified.

FireFox 2

Firefox 2 offers a similar solution, however; it does not appear to have the synchronization by default. To do this you “save page as” and then have the option of saving as a text file, as the single page, or “complete” which will build up all the files needed for that page. Unlike IE6 you cannot get pages linked to. There is a Firefox add-on that will provided added functionality, but this article’s scope is default functionality. Also, Firefox does not change links to localhost paths.

WGET

The previous options are great for GUI systems, however; if you are a sysadmin or a web developer and need to make a backup of a current live site before replacing it with a new version neither of these options are very good for you. So I provide a command line, no GUI option. running wget with the -r (recursive) option will provide the same functionality as IE6. Simply create a directory, change directory into it, and run:

wget -r site.com

and you have all the client side data. Much eaiser and without GUI.

These options bring up 2 more topics I’d like to cover. First is from the perspective of the site owner. Suppose you don’t want people going around downloading your content. For a dedicated person, this is not preventable, but you can make it more difficult and annoying. IE6 and wget both follow the robots.txt rules, this is not an issue for Firefox 2 since it doesn’t have this functionality by default anyways. In short, other than making it less convenient and data you send to a client (HTML, CSS, Javascript) will be available for backup which is obvious since it is client side data and the web would be useless if it was inaccessible.

The other topic is client side security. Browsers disallow cross site AJAX requests. This is a security feature to stop a malicious individual from putting AJAX calls to other sites on their page and stealing your personal information. Browsers do however allow this behavior from the localhost. So by downloading and viewing this malicious code it will execute. Also, by putting JavaScript code on your local file system you allow malicious individuals to access these files.

Interestingly it seems that Internet Explorer 6 actually beats the default install of Firefox 2 in this test. Seems Microsoft did a good job on this feature. wget doesn’t really enter into that comparison since both are browsers and wget is utility, however; it also tops Firefox 2 by having recursiveness. As backup utilities wget and Internet Explorer 6 are tied since they both preserve pages, links included. Personally, I prefer wget since I’m not a fan of GUI or tools that won’t run under linux.

About samurai

I like computers... A lot. So I tend to spend a lot of time doing varied things with them. Often you'll find me playing with Python or PHP, fighting with operating systems, ranting about some off-the-wall concept, or preparing for zombies.
This entry was posted in SamuraiNet Archive and tagged , , , , , . Bookmark the permalink.

2 Responses to Utilities for backing up client-side website data

  1. You should also mention that Firefox actually restructures the links to point to a FILENAME_files/ directory that stores all the linked content (only the internal linked content) like CSS, JavaScript, Image, and Flash files.

    I don’t know what IE 6 does but while wget has recursiveness, it doesn’t restructure the links so you can’t get the page to act normally without editing link targets like Firefox does.

  2. admin says:

    IE6 doesn’t have this problem since it is simply “working offline.” Therefore it still uses all the same links and paths, but just pulls from it’s cache rather than from the web.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>