.. the "tree" of links in a URL family? Can run in DOS to Win7, output best in ASCII?
- posted
8 years ago
.. the "tree" of links in a URL family? Can run in DOS to Win7, output best in ASCII?
Want the program to "automatically" generate the tree from the web program set; NOT like TreeForm fron Sourceforge which looks something like CorelDraw.
What's a "URL family"?
-- umop apisdn
Maybe you can modify the old PCMag program 'Site snagger' to give you results.
Cheers
wget -r --spider
plus a bit of cleanup from a script or a good text editor.
Cheers
Phil Hobbs
-- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC Optics, Electro-optics, Photonics, Analog Electronics 160 North State Road #203 Briarcliff Manor NY 10510 hobbs at electrooptical dot net http://electrooptical.net
Do not know what proper term is, but take a simplified tree sample:
index.html | +-page1.html | | | +-explan1.html | +-page2.html | +-page3.html
Above "family" consists of the 5 html program files; the tree derived by starting at the URL like
That may be a very good idea,except..
1) I do NOT want a damn watch 2) I absolutely HATE WITH A PASSION those GD Fing overlays like that 3) My hard drive is not large enough to hold the site i have in mind 4) It would take a number of DAYS even on a T1 line to download the whole thing. I do not exaggerate. 5) Also, I am cheap - $7.95 is a bit much.The SourceForge Snagger clearly is NOT what i want.
Found a free source for Site Snagger 1.2 which goes "Have you ever wanted to download the content of a website and store them offline? SiteSnagger is a little freeware program that works great for doing this. SiteSnagger lets you download the contents of a Web site to your PC?s hard disk. It?s been around for a good few years, but is still incredibly useful. You can download as much or as little of the site as you wish, and you can browse the information anytime and anywhere."
Not too bad,if one really can control what gets pulled before the fact. Then i maybe can get away with a cheap 500G HD for JUST download; near 40 hours via my high speed download with no interruptions. Two daze..
That is just enough gibberish that i can say "gotza bee lyne-UX". Am guessing one must be logged to the site in question. Do not think that even Linux has an editor that can work with a multi-hundred gigabyte file.
Well, I generally use it from Windows under Cygwin, but native Windows executables are around.
No, it just uses HTTP/HTTPS/FTP.
Spider mode just touches all the links to see that they're good. It doesn't download anything. I use it for website maintenance.
If you aren't comfortable scripting, then this probably isn't your best solution.
Cheers
Phil Hobbs
-- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC Optics, Electro-optics, Photonics, Analog Electronics 160 North State Road #203 Briarcliff Manor NY 10510 hobbs at electrooptical dot net http://electrooptical.net
Hmmm..."scripting" sounds like writing a program in a language i do not know. To touch all of the links, one must read all of the code to find them. With (very little) download, that would speed things up by maybe a factor of two. So, what language and what Win executables should i look for?
It means writing a small interpreted program, usually one that uses other u tilities to get the job done as well as the script itself. It originally re ferred to a Unix shell script, but then a lot more interpreted languages ca me along--e.g. perl, python, php, tcl, and (my fave) rexx.
em.
Just the html, which usually isn't the bulk of the content.
How many times a second do your links change?
Well, start with wget, and see if its spider mode does what you need. I use Cygwin, but if you search, you'll find standalone Windows builds.
Even if it won't work in your application, it's great for batch downloads a nd laking mirrors. I snagged the NS website that way before it vanished.
Cheers
Phil Hobbs
These silly programs claim they will draw a tree (which i want), but so far, none of them do that. Site Snagger does give a list of HTMLs found which was a bit useful. Took a few hours for a few levels, about 40Mb for just the list given. If the whole of each HTML had been downloaded, I estimate the time would have been around 20 hours and a goodly number of gigglechomps. Oh well..
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.