ht://Dig Copyright © 1995-2002 The ht://Dig Group
Please see the file COPYING for
license information.
Since ht://Dig is gaining popularity, it's good the project gets mirrored. Mirroring is of vital importance for improved availability, reponse time and of course, to save bandwidth for the main server. This document is about how to mirror all or a part of the ht://Dig web and FTP sites. Make sure you read it all.
There are four sites you can mirror:
Developing source code with possibly a large number of contributors spread around the world is a tedious task and requires good coordination. This coordination is provided by a piece of software known as Concurrent Versions System (CVS). This is why we use CVS for software development. But the web site is frequently updated by developers too; text is added, changed or deleted, new pages created etc. For the same reasons as developing software, we placed the web site in a CVS repository. More information on CVS can be found in the CVS online documentation. Note: There are a lot of options for use with CVS, which are not explained here. It's merely a short howto on how to set up a local mirror of the ht://Dig project. You should use version 1.10 or higher.
Wget is a software package that enables you create a mirror (an exact copy of) a FTP or web site. It is published under GNU license and should run on virtually every platform. If you don't like Wget, you can try Mirror, which is a very good alternative for mirrorring FTP sites. To learn more on wget, see the Wget web site. You should use version 1.7 or higher.
Alas, SourceForge does not provide an Rsync service and as far as we know, they are not intend to.
cvs(1)
where (1)
means that
information can be found in section one of the man pages.
htdigmirror.org
. The WWW document
root is /home/htdigmirror/www/
You need to havecvs(1)
installed on your system. It will not work without. Check out:
If your don't have it on your box, ask the administrator to install it for you.man cvs
You'll need to gain anonymous access at the CVS repository.
When asked for a password, leave it blank (i.e. press the enter key). You only need to do this once sincecvs -d:pserver:[email protected]:/cvsroot/htdig login
cvs(1)
will create a CVS password file.cvspass
in your home directory that will be used in future invocations.
Create a directory where you will place your local copy. You can pick any legal name you want, but for the sake of simplicity we'll name the directoryhtdig
. Note that thecvs(1)
command later on will create another directory under the just createdhtdig
directory.
cd /home/htdigmirror/www/
mkdir htdig
Change to that directory.
cd htdig
Check out the htdig module. In newbie-speak: create a local copy of the ht://Dig web site.
(Note the backslash at the end of the first line of the command. In Un*x this is the concatenation metacharacter, which means that the two code lines should read as one.) So what does the line of code mean? We are accessing the ht://Dig repository atcvs -z6 -d:pserver:[email protected]:/cvsroot/htdig \
co -d maindocs maindocscvs.htdig.sourceforge.net
in directory/cvsroot/htdig
via the password server (pserver) and checking out (co) module maindocs usinggzip(1)
compression (-z6) and place it in a local directory calledmaindocs
. We could have left out the-d maindocs
as this is the default. You will see some output on your terminal. Thecvs(1)
command has created a sub directory namedmaindocs
.
Now you need to adapt your web server configuration file. Mabye your need superuser privileges to do that. Don't forget to turn on server side includes (SSI). At the end there's an example configuration file for use with Apache virtual hosts. Start your favorite browser and surf to the web page. It should be something like http://www.htdigmirror.org/htdig/maindocs/. You did restart your web server, didn't you?
Ready. You've set up a mirror! Please inform the developers at <[email protected]> about your mirror.
You already have cvs(1)
otherwise you couldn't have created the initial
copy in the first place.
You already have anonymous CVS access.
Change to the directory that holds the copy of the ht://Dig web site
Note that you have tocd /home/htdigmirror/www/htdig/maindocs
cd(1)
to the directory created bycvs(1)
!
Start the update by executing
You will see rows with updated files. If there's nothing to update, you will see a new command prompt only; there is no output.cvs -z6 -q update -Pd
So you've updated your local copy of the ht://dig web site, but you don't want to do that every day. Solution: Set up acrontab(1)
entry to update your mirror every day. Example of an entry:
This will run the command every 2:40 AM. Depending on your version of40 2 * * * cd /home/htdigmirror/www/htdig/maindocs && /usr/bin/cvs \ -z6 -q update -Pd
cron(8)
, you will get a e-mail message containing the output of the command ran. If you do not want any output, you could use
It should work on40 2 * * * (cd /home/htdigmirror/www/htdig/maindocs && /usr/bin/cvs \ -z6 -q update -Pd) >/dev/null 2>&1
sh(1)
oriented shells.
Ready. Please inform the developers at <[email protected]> about your mirror.
In the early days (hmm, not so early for that matter), there was an ht://Dig FTP site that housed
release tarballs, binaries, snapshots and contributed work. The FTP service is abandoned but fortunately,
you can access them via the web; the /files directory. Since
it's web access, wget(1)
is used for retrieval. Note that you cannot copy the files directory
via cvs and that there is no real files directory on the ht://Dig Web Site.
Make sure you have installed wget(1)
and read the documentation.
We're going to place the copy of the files into the anonymous FTP area. That way, you can
have people access the files by FTP and by web.
Change to the public directory of the anonymous FTP area and create a sub directory for holding the files.
The files will be placed in the files directory (as you will see shortly).cd /home/ftp/pub
mkdir ftp.htdig.org
cd ftp.htdig.org
Copy the files.
The -nv will turn off verbose output, but it will not be very quiet. The -m option tellswget -nv -m -np -nH -p http://www.htdig.org/files
wget(1)
to turn on options suitable for mirroring (as in -r -N -l inf -nr). The -np (no-parent) option will prevent ascending to the parent directory. The option -nH disables generation of host-prefixed directories, so you will not get a directory called www.htdig.org. And last, -p causeswget(1)
to download all the files that are necessary to properly display a given HTML page.
This combination of options will result in a directory called files that holds a copy of http://www.htdig.org/files/
There is one drawback; since the files directory at the ht://Dig web site holds no index.html file, the web server over there will create one on-the-fly. Even more, this generated file creates links to itself that will display the files in all kinds of sort order like name, last modified, size and description. (These links look like ?N=D, ?M=A etc.) We will have to remove them as they contain links calculated for the ht://Dig web site and will probably not match your copy. Also, the sites' robots.txt file is copied. We don't need it eighter. So we do(Note the double quotes round *=*!.) This will traverse the files directory and delete any index.html or ?N=D-like files. As a bonus, it will print out the files deleted.rm -f robots.txt
cd files
find . -name index.html -print -exec rm -f {} \;
find . -name "*=*" -print -exec rm -f {} \;
Create an Alias in your web server configuration file so that this mirror can be accessed by web. We'll give an Alias line for Apache:
Files should now be accessible via http://www.htdigmirror.org/htdig/maindocs/files/. Don't forget to turn on directory indexing!Alias /htdig/maindocs/files "/home/ftp/pub/ftp.htdig.org/files"
Ready. Please inform the developers at <[email protected]> about your mirror.
This is very simple. Repeat step 2, 3 and 4 of "Setting up a Mirror of the ht://Dig Files Web Site". But
there is one drawback (again). As wget(1)
uses the links from the generated index.html files
as pointers to other files to be fetched, it will leave files that are not in that index.html untouched.
As a result, within a few weeks you will have a lot of snapshot files on your mirror and you'll need to
remove them by hand.
Making a mirror of the patch site involves the use of wget(1)
and is similar to creating a copy
of the files web site.
Create a suitable direoctory in the anonymous FTP area.
Here we will place the files.cd /home/ftp/pub
mkdir ftp.ccsf.org
cd ftp.ccsf.org
Copy the files.
This combination of options will result in a directory called htdig-patches that holds a copy of ftp://ftp.ccsf.org/htdig-patcheswget -nv -m -np -nH -p ftp://ftp.ccsf.org/htdig-patches
For some reason,wget(1)
leaves .listing files behind in your copy. Although they don't do any harm, it's nice to get rid of them.This will traverse the htdig-patches directory and delete any .listing files. It will also delete the one in the top directory.find . -name .listing -print -exec rm -f {} \;
Create an Alias in your web server configuration file so that this mirror can be accessed by web. We'll give an Alias line for Apache:
Files should now be accessible via http://www.htdigmirror.org/htdig/maindocs/htdig-patches/. Don't forget to turn on directory indexing!Alias /htdig/maindocs/htdig-patches "/home/ftp/pub/ftp.ccsf.org/htdig-patches"
Ready. Please inform the developers at <[email protected]> about your mirror.
This is very simple. Repeat step 1, 2 and 3 of "Setting up a Mirror of the ht://Dig Patch Site".
There are two trees you can checkout: the 3.1 branch and 3.2beta branch.
Because you know the procedure now, we'll just give the commands.
cd /home/htdigmirror/www/htdig
cvs -z6 -d:pserver:[email protected]:/cvsroot/htdig \
co -d htdig-3-1-x -r htdig-3-1-x htdig
cvs -z6 -d:pserver:[email protected]:/cvsroot/htdig \
co -d htdig-3-2-x -r htdig-3-2-x htdig
(the -r option will checkout a specific revision). You'll get two subdirectories named
htdig-3-1-x
and htdig-3-2-x
containing the CVS trees.
The trees can be accessed through web via
http://www.htdigmirror.org/htdig/htdig-3-1-x/
and
http://www.htdigmirror.org/htdig/htdig-3-2-x/
respectively.
You'll need to adapt your web server configuration file so that it will show directory indexes.
See examples at the end.
Note: If you leave out the -r option, you will check out the main branch of the htdig source tree, but this branch has been largely untouched since February 2000. You must use the -r option.
Note:There is currently no link from the ht://Dig Web Site to the CVS trees, so people cannot access it by your mirror. You have to tell them otherwise.
Well, this just like updating the ht://Dig web site. You'll need to goto the
right directory and issue the cvs(1)
update command.
One can use wget(1)
to copy the ht://Dig main web site, althought cvs(1)
is surely
efficiently and faster. If you decide to copy the ht://Dig main web site with wget(1)
, note
that you will make a copy of the /files directory automaticly. The /files directory is currently about
70 MB.
Apache 1.3.x example configuration file (part) for ht://Dig mirror sites. For use with virtual hosts:
# Host www.htdigmirror.org <VirtualHost 1.2.3.4> ServerAdmin [email protected] ServerName www.htdigmirror.org DocumentRoot /home/htdigmirror/www ErrorLog /home/htdigmirror/etc/error_log TransferLog /home/htdigmirror/etc/access_log # Aliasing files directory to web site and activate fancy directory indexing Alias /htdig/maindocs/files "/home/ftp/pub/ftp.htdig.org/files" <Directory /home/ftp/pub/ftp.htdig.org/files> Options Indexes </Directory> # Aliasing patch directory to web site and activate fancy directory indexing Alias /htdig/maindocs/htdig-patches "/home/ftp/pub/ftp.ccsf.org/htdig-patches" <Directory /home/ftp/pub/ftp.ccsf.org/htdig-patches> Options Indexes </Directory> # Activate Server Side Includes w/o Execute <Directory /home/htdigmirror/www/htdig/maindocs> Options IncludesNOEXEC </Directory> # Activate fancy directory indexing for browsing CVS tree <Directory /home/htdigmirror/www/htdig/htdig-3-1-x> Options Indexes </Directory> <Directory /home/htdigmirror/www/htdig/htdig-3-2-x> Options Indexes </Directory> </VirtualHost>