StarDestroyer.Net BBS

Posted: **2007-04-04 05:17pm**

I recently was pointed in the direction of The official Crimson Skies website, which I'm interested in reading because the game is that awesome. However, as said in the front page, it will be closed soon. This sucks because I want to be able to read it at my leisure.

So, anyone know how I can transfer the whole thing to my hard-drive?

Posted: **2007-04-04 05:25pm**

Flashget might do it.

Posted: **2007-04-04 05:29pm**

wget to the rescue!

Posted: **2007-04-04 06:54pm**

phongn wrote:wget to the rescue!

I concur - finding an open-source version of wget (hopefully with a graphical interface - nice, but certainly not required) should work for you just fine.

wget -r -l3 www.example.com/index.html

That snippet will recursively vaccuum an entire website, three link-levels deep. The max is 5. (corrections welcome)

Posted: **2007-04-04 07:11pm**

Yeah, at the end of the day, wget is the best option. I tried using various windows-based utilities in the hope that I would find something that works, however, in the end, I got the site you were talking about with wget, which was the only one that got it with a minimum of fuss (the only issue was it first downloading the site into the void because I had it in Program Files and didn't specify an output directory which caused windows to ignore wget's output).

Get wget for Windows on this link, and then unzip it to a directory.

Then run cmd.exe, navigate to the directory (folder) where you unzipped it (type "cd C:\nameofdirectory\nameofsubdirectory", without quotes and with proper names obviously) then execute wget with something like this (you can get help by typing "wget -h"):

wget -r -np -p --directory-prefix=C:\dirwhereyouwanttodownload www.crimsonskiesuniverse.com/story/

(this will get you the story pages - unfortunately, the links seem to be screwed up because they were made as static links in a weird way and apparently the -k option is not smart enough to parse them correctly to compensate)

Posted: **2007-04-04 08:54pm**

I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.

Posted: **2007-04-04 09:07pm**

Ultrasucker is also very good, but can be annoying to configure "just right"

Webreaper is also pretty good

Posted: **2007-04-05 04:16am**

I've just used the Linux version of wget to mirror the CS website and it worked like a charm.

Posted: **2007-04-05 04:21am**

aerius wrote:I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.

You mean to say I can download any pornsite, without paying?!

Posted: **2007-04-05 04:41am**

Shroom Man 777 wrote:
aerius wrote:I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.
You mean to say I can download any pornsite, without paying?!

How about'sa test? NSF FUCKING W for the slower people in the audience!

And I believe the act being discussed in this thread is called a 'siterip'. Ignore the pornsite links at the top

Posted: **2007-04-05 04:44am**

Shroom Man 777 wrote:
aerius wrote:I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.
You mean to say I can download any pornsite, without paying?!

If you have the password, but any competent admin will make sure leaked passwords don't work for long.

And damnit man, there are better ways of getting free porn than ripping websites.

Posted: **2007-04-05 04:48am**

Bittorrent?

Posted: **2007-04-09 04:00pm**

Well I got around to trying a couple of these programs out. HTTrack first, since it seemed the most simple couldn't figure it out. Then I used Wget with Netko's instruction and it seems that it almost worked. I seem to have downloaded a lot of files, but not a coherent webpage.

This whole getting an internet webpage seems more complicated than I thought it would be.

Posted: **2007-04-09 04:06pm**

Adrian Laguna wrote:Well I got around to trying a couple of these programs out. HTTrack first, since it seemed the most simple couldn't figure it out. Then I used Wget with Netko's instruction and it seems that it almost worked. I seem to have downloaded a lot of files, but not a coherent webpage.

No default.htm? Wget worked fine for me.

Posted: **2007-04-09 04:22pm**

There is, actually, a "default.htm" thought it's complete name is "default.htm@MSID[bunch of numbers]". Windows can't open it.

Posted: **2007-04-09 04:24pm**

Adrian Laguna wrote:There is, actually, a "default.htm" thought it's complete name is "default.htm@MSID[bunch of numbers]". Windows can't open it.

Remove everything after the .htm, or force Widows to open it with Firefox. If you remove the letters after the extension, the link back to the main page will break, but that's a minor issue.

Posted: **2007-04-09 04:35pm**

Okay, I tried again with a slightly different approach. Everything works but the pop-up pages. Basically, all the subsections in the "universe" page has a subject (corporations, aircraft, pilots, etc) that when you click on it shows a pop-up page that tells you about said subject. But all of the pop-ups are blank.

I put the following in the command line:
wget -r -np -p -k --directory-prefix=C:\ http://crimsonskiesuniverse.com/

Posted: **2007-04-09 04:39pm**

Crapsicles, wget didn't get those. You can get to them by adding the corp name to the /universe/corporations URL (or pilots, or tech or whatever). Start downloading, man!

So, can wget be made to follow Javascript links?

Posted: **2007-04-10 06:57pm**

I have the Universe and Story pages from the Crimson Skies site saved already if that's all you want.

Posted: **2007-04-10 08:11pm**

Do they work completely with all the links and stuff? Because that would be really awesome.

Posted: **2007-04-10 09:55pm**

You could write a script to grep all links that are in javascript and run a 2nd pass using wget...

Posted: **2007-04-11 06:14am**

Adrian Laguna wrote:Do they work completely with all the links and stuff? Because that would be really awesome.

I didn't bother altering the HTML to make them link to each other, mainly because I only barely know what I'm doing and would fuck it up. But I have them organized into a folder system similar to the tiers of pages on the site.

Posted: **2007-04-11 09:01pm**

Code: Select all

#! /bin/bash
#ROOT=$1
ROOT='crimsonskiesuniverse.com'
#wget -m -k -p $1
LIST=`egrep -roH "\('[a-zA-Z_./]*\.[a-zA-Z]{3,4}'\)" $ROOT`
for F in $LIST; do 
        K=`echo $F|cut -f 2 -d':'|sed "s/('//"|sed "s/')//"`
        if [ `echo $K |sed 's/\(^.\{1\}\).*/\1/'` = '/' ]; then
                TOGET=$ROOT$K
        else
                J=`echo $F|cut -f 1 -d':'| sed "s#^(.*##" |sed "s/\/[^/]*\$//#"`; 
                if [ ! -z $J ]; then 
                        I=$J; 
                fi 
                TOGET=echo $I$K;
        fi
        wget -r -l 1 -N -p $TOGET
done

will suck down the javascript and css links as well as downloading most of the site- still need to fix absolute/relative URL problems in the scripts. It probably would work if you threw up the site mirror on the root directory of a server- absolute paths don't work well straight as files though, and I'm too lazy to get apache back up to test. Should also be somewhat straightforward to write a script to fix all the links, but again, I'm lazy. I think most of the content is there, unless the CSS/javascript is inconsistent and also uses double quotes as well as single quotes (should be trivial to rectify this). Only 1 level of recursion on the 2nd pass for the javascript links since it looked like it went into an infinate recursion state.

StarDestroyer.Net BBS

Anyone know how to dowload an entire website?

Anyone know how to dowload an entire website?