Anyone know how to dowload an entire website?
Moderator: Thanas
-
- Sith Marauder
- Posts: 4736
- Joined: 2005-05-18 01:31am
Anyone know how to dowload an entire website?
I recently was pointed in the direction of The official Crimson Skies website, which I'm interested in reading because the game is that awesome. However, as said in the front page, it will be closed soon. This sucks because I want to be able to read it at my leisure.
So, anyone know how I can transfer the whole thing to my hard-drive?
So, anyone know how I can transfer the whole thing to my hard-drive?
- General Zod
- Never Shuts Up
- Posts: 29211
- Joined: 2003-11-18 03:08pm
- Location: The Clearance Rack
- Contact:
Flashget might do it.
"It's you Americans. There's something about nipples you hate. If this were Germany, we'd be romping around naked on the stage here."
I concur - finding an open-source version of wget (hopefully with a graphical interface - nice, but certainly not required) should work for you just fine.phongn wrote:wget to the rescue!
That snippet will recursively vaccuum an entire website, three link-levels deep. The max is 5. (corrections welcome)wget -r -l3 www.example.com/index.html
Yeah, at the end of the day, wget is the best option. I tried using various windows-based utilities in the hope that I would find something that works, however, in the end, I got the site you were talking about with wget, which was the only one that got it with a minimum of fuss (the only issue was it first downloading the site into the void because I had it in Program Files and didn't specify an output directory which caused windows to ignore wget's output).
Get wget for Windows on this link, and then unzip it to a directory.
Then run cmd.exe, navigate to the directory (folder) where you unzipped it (type "cd C:\nameofdirectory\nameofsubdirectory", without quotes and with proper names obviously) then execute wget with something like this (you can get help by typing "wget -h"):
wget -r -np -p --directory-prefix=C:\dirwhereyouwanttodownload www.crimsonskiesuniverse.com/story/
(this will get you the story pages - unfortunately, the links seem to be screwed up because they were made as static links in a weird way and apparently the -k option is not smart enough to parse them correctly to compensate)
Get wget for Windows on this link, and then unzip it to a directory.
Then run cmd.exe, navigate to the directory (folder) where you unzipped it (type "cd C:\nameofdirectory\nameofsubdirectory", without quotes and with proper names obviously) then execute wget with something like this (you can get help by typing "wget -h"):
wget -r -np -p --directory-prefix=C:\dirwhereyouwanttodownload www.crimsonskiesuniverse.com/story/
(this will get you the story pages - unfortunately, the links seem to be screwed up because they were made as static links in a weird way and apparently the -k option is not smart enough to parse them correctly to compensate)
I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.
aerius: I'll vote for you if you sleep with me.
Lusankya: Deal!
Say, do you want it to be a threesome with your wife? Or a foursome with your wife and sister-in-law? I'm up for either.
Lusankya: Deal!
Say, do you want it to be a threesome with your wife? Or a foursome with your wife and sister-in-law? I'm up for either.
- Shroom Man 777
- FUCKING DICK-STABBER!
- Posts: 21222
- Joined: 2003-05-11 08:39am
- Location: Bleeding breasts and stabbing dicks since 2003
- Contact:
You mean to say I can download any pornsite, without paying?!aerius wrote:I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.
"DO YOU WORSHIP HOMOSEXUALS?" - Curtis Saxton (source)
shroom is a lovely boy and i wont hear a bad word against him - LUSY-CHAN!
Shit! Man, I didn't think of that! It took Shroom to properly interpret the screams of dying people - PeZook
Shroom, I read out the stuff you write about us. You are an endless supply of morale down here. :p - an OWS street medic
Pink Sugar Heart Attack!
shroom is a lovely boy and i wont hear a bad word against him - LUSY-CHAN!
Shit! Man, I didn't think of that! It took Shroom to properly interpret the screams of dying people - PeZook
Shroom, I read out the stuff you write about us. You are an endless supply of morale down here. :p - an OWS street medic
Pink Sugar Heart Attack!
- Einhander Sn0m4n
- Insane Railgunner
- Posts: 18630
- Joined: 2002-10-01 05:51am
- Location: Louisiana... or Dagobah. You know, where Yoda lives.
How about'sa test? NSF FUCKING W for the slower people in the audience!Shroom Man 777 wrote:You mean to say I can download any pornsite, without paying?!aerius wrote:I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.
And I believe the act being discussed in this thread is called a 'siterip'. Ignore the pornsite links at the top
If you have the password, but any competent admin will make sure leaked passwords don't work for long.Shroom Man 777 wrote:You mean to say I can download any pornsite, without paying?!aerius wrote:I've used HTTrack to download quite a few porn sites as well as archiving several tech reference sites. Comes with a GUI and more options than I can count, including link-level depth, bandwidth usage, passwords for pay sites, and many others.
And damnit man, there are better ways of getting free porn than ripping websites.
-
- Sith Marauder
- Posts: 4736
- Joined: 2005-05-18 01:31am
Well I got around to trying a couple of these programs out. HTTrack first, since it seemed the most simple couldn't figure it out. Then I used Wget with Netko's instruction and it seems that it almost worked. I seem to have downloaded a lot of files, but not a coherent webpage.
This whole getting an internet webpage seems more complicated than I thought it would be.
This whole getting an internet webpage seems more complicated than I thought it would be.
Last edited by Adrian Laguna on 2007-04-09 04:20pm, edited 2 times in total.
No default.htm? Wget worked fine for me.Adrian Laguna wrote:Well I got around to trying a couple of these programs out. HTTrack first, since it seemed the most simple couldn't figure it out. Then I used Wget with Netko's instruction and it seems that it almost worked. I seem to have downloaded a lot of files, but not a coherent webpage.
-
- Sith Marauder
- Posts: 4736
- Joined: 2005-05-18 01:31am
There is, actually, a "default.htm" thought it's complete name is "default.htm@MSID[bunch of numbers]". Windows can't open it.
Last edited by Adrian Laguna on 2007-04-09 04:24pm, edited 1 time in total.
Remove everything after the .htm, or force Widows to open it with Firefox. If you remove the letters after the extension, the link back to the main page will break, but that's a minor issue.Adrian Laguna wrote:There is, actually, a "default.htm" thought it's complete name is "default.htm@MSID[bunch of numbers]". Windows can't open it.
-
- Sith Marauder
- Posts: 4736
- Joined: 2005-05-18 01:31am
Okay, I tried again with a slightly different approach. Everything works but the pop-up pages. Basically, all the subsections in the "universe" page has a subject (corporations, aircraft, pilots, etc) that when you click on it shows a pop-up page that tells you about said subject. But all of the pop-ups are blank.
I put the following in the command line:
wget -r -np -p -k --directory-prefix=C:\ http://crimsonskiesuniverse.com/
I put the following in the command line:
wget -r -np -p -k --directory-prefix=C:\ http://crimsonskiesuniverse.com/
- Rogue 9
- Scrapping TIEs since 1997
- Posts: 18681
- Joined: 2003-11-12 01:10pm
- Location: Classified
- Contact:
I have the Universe and Story pages from the Crimson Skies site saved already if that's all you want.
It's Rogue, not Rouge!
HAB | KotL | VRWC/ELC/CDA | TRotR | The Anti-Confederate | Sluggite | Gamer | Blogger | Staff Reporter | Student | Musician
HAB | KotL | VRWC/ELC/CDA | TRotR | The Anti-Confederate | Sluggite | Gamer | Blogger | Staff Reporter | Student | Musician
-
- Sith Marauder
- Posts: 4736
- Joined: 2005-05-18 01:31am
You could write a script to grep all links that are in javascript and run a 2nd pass using wget...
ah.....the path to happiness is revision of dreams and not fulfillment... -SWPIGWANG
Sufficient Googling is indistinguishable from knowledge -somebody
Anything worth the cost of a missile, which can be located on the battlefield, will be shot at with missiles. If the US military is involved, then things, which are not worth the cost if a missile will also be shot at with missiles. -Sea Skimmer
George Bush makes freedom sound like a giant robot that breaks down a lot. -Darth Raptor
- Rogue 9
- Scrapping TIEs since 1997
- Posts: 18681
- Joined: 2003-11-12 01:10pm
- Location: Classified
- Contact:
I didn't bother altering the HTML to make them link to each other, mainly because I only barely know what I'm doing and would fuck it up. But I have them organized into a folder system similar to the tiers of pages on the site.Adrian Laguna wrote:Do they work completely with all the links and stuff? Because that would be really awesome.
It's Rogue, not Rouge!
HAB | KotL | VRWC/ELC/CDA | TRotR | The Anti-Confederate | Sluggite | Gamer | Blogger | Staff Reporter | Student | Musician
HAB | KotL | VRWC/ELC/CDA | TRotR | The Anti-Confederate | Sluggite | Gamer | Blogger | Staff Reporter | Student | Musician
Code: Select all
#! /bin/bash
#ROOT=$1
ROOT='crimsonskiesuniverse.com'
#wget -m -k -p $1
LIST=`egrep -roH "\('[a-zA-Z_./]*\.[a-zA-Z]{3,4}'\)" $ROOT`
for F in $LIST; do
K=`echo $F|cut -f 2 -d':'|sed "s/('//"|sed "s/')//"`
if [ `echo $K |sed 's/\(^.\{1\}\).*/\1/'` = '/' ]; then
TOGET=$ROOT$K
else
J=`echo $F|cut -f 1 -d':'| sed "s#^(.*##" |sed "s/\/[^/]*\$//#"`;
if [ ! -z $J ]; then
I=$J;
fi
TOGET=echo $I$K;
fi
wget -r -l 1 -N -p $TOGET
done
ah.....the path to happiness is revision of dreams and not fulfillment... -SWPIGWANG
Sufficient Googling is indistinguishable from knowledge -somebody
Anything worth the cost of a missile, which can be located on the battlefield, will be shot at with missiles. If the US military is involved, then things, which are not worth the cost if a missile will also be shot at with missiles. -Sea Skimmer
George Bush makes freedom sound like a giant robot that breaks down a lot. -Darth Raptor