This document aims to provide basic usage information for Hwrt, and is provided without warranty of any kind. Good luck!
Hwrt is just a shell script. Just put it wherever you put other scripts and make sure it's executable:
mkdir ~/bin # maybe you don't have a "~/bin" at all
echo 'PATH="$PATH:$HOME/bin"'>>.profile # if needed
mv hwrt.sh ~/bin/hwrt # if you don't like seeing ".sh"
chmod +x ~/bin/hwrt # change file mode to "executable"
type -p hwrt # make sure hwrt is in your path
Don't just copy-paste that without thinking — it was meant as an example for inexperienced users.
After putting Hwrt somewhere in your path and rendering it executable, cross your fingers and type[1] something like
hwrt source destination_directory
If your destination directory
(which, I might add, must exist before you run Hwrt)
is "public_html",
you can omit the second argument;
moreover,
if your source node's name is a regular file
whose name has a stem of "index",
you can omit the first argument, too.
In the sections that follow,
a file called “.hwrt_profile”
is mentioned several times;
it is a configuration file from which
Hwrt can read default values for many variables.
Ordinarily, Hwrt looks for this file in the directory from which it is invoked, but you can use something like
hwrt -p my_dir/my_hwrt_profile
to specify a different location for the configuration file.
By default, Hwrt prints a period for every node that it visits; if you want to suppress this output during a given invocation, you can type
hwrt -q
If you want to suppress default output for every invocation, put
HWRT_VERBOSITY=0
in your “.hwrt_profile”.
If you want to watch Hwrt crawl[2] (maybe because you're bored) type
hwrt -v
instead, which causes Hwrt to output a tree-like representation of your hypertext web “as it happens”. This may come in handy if your pageset is broken in some obscure way and you are having trouble making sense of the trace output in the logs.
There are two ways in which Hwrt can be made to stop:
In either case,
Hwrt's files may end up in an inconsistent state;
for example,
some cache entries may have a “current” timestamp
(and thus be deemed valid by Hwrt) but incomplete contents.
Hwrt attempts to recover automatically from such accidents and,
in general, manages quite well;
however,
should Hwrt not manage to recover from an interruption,
you can force a “recovery”
by using the -r switch:
hwrt -r
By default, Hwrt tries to avoid uploading a broken pageset; however, if you want Hwrt to ignore all errors and soldier on, you can use
hwrt -i
This may be useful if you are in a great hurry and just want to upload the part that's working before looking at the error messages.
To make this the default behavior, put
HWRT_IGNORE_ERRORS=1
in your “.hwrt_profile”.
You can achieve a similar effect by specifying
hwrt -l 0
which sets the log level to zero and, thereby, instructs Hwrt not to keep track of errors at all. This means that things will fail silently, and that there will be no trace information in the logs; therefore, be careful when you use this feature.
As you may have guessed from the foregoing section, the log level switch allows you to tell Hwrt how thorough a record of its action it should keep. Increasing the log level from the default 1 to 2 enables logging of warnings, while typing
hwrt -l 3
will cause operations like clobbering and copying to be logged to the messages file.
You don't have to remember this at all.
No, really.
Whenever a log file is non-empty,
Hwrt tells you where it is and suggests that you look at it
— unless
there are no errors and you've told Hwrt to be quiet,
in which case Hwrt tells you nothing at all.
OK, so occasionally you need to know where the log files are;
in this unlikely event,
you would look for files with stems like
“errors”,
“warnings”,
“messages”, and
in a directory called “.hwrt”
found wherever Hwrt was invoked from.
To enable automatic pageset uploads, put something like
HWRT_SITE_URL="http://example.com/~user/" # [3]
HWRT_REMOTE_TARGET_ROOT_PATH="user@example.com:www/~user/"
HWRT_AUTO_UPLOAD="1"
in your “.hwrt_profile”.
On my system,
I actually leave the HWRT_AUTO_UPLOAD out and
simply type
hwrt -u
whenever I do want the automatic upload to happen.
Note that, when using such a configuration,
you don't have to invoke Hwrt again
if you forget the -u:
if your “.hwrt_profile”
is set up as above,
Hwrt will suggest an rsync command
that you can simply copy-paste and execute.
If you use Google Sitemaps, you can ensure the presence and prevent the removal of your verification file by including something like
HWRT_VERIFICATION_FILE_NAME=googlea3d2c4f6a0a5e197.html
in your “.hwrt_profile”.