dictfmt
DICTFMT(1) DICTFMT(1)
NAME
dictfmt - formats a DICT protocol dictionary database
SYNOPSIS
dictfmt -c5|-e|-f|-h|-j|-p [options] basename
DESCRIPTION
dictfmt takes a file, FILE, on stdin, and creates a dictionary database
named basename.dict, that conforms to the DICT protocol. It also cre-
ates an index file named basename.index. By default, the index is
sorted according to the C locale, and only alphanumeric characters and
spaces are used in sorting, however this may be changed with the
--locale and --allchars options. ( basename is commonly chosen to cor-
respond to the basename of FILE , but this is not mandatory.)
Unless the database is extremely small, it is highly recommended that
basename.dict be compressed with /usr/bin/dictzip to create base-
name.dict.dz. (dictzip is included in the dictd source package.)
FILE may be in any of the several formats described by the format
options -c5, -e, -f, -h, -j, or -p. Exactly one of these options must
be given.
Headers are prepended to the .dict file giving the URL of the site from
which the original database was obtained, the name of the dictionary,
and the date of conversion (formatting). If the -u and/or -s options
are omitted, the value "unknown" will be included for these values,
which is undesirable for a publicly distributed database.
All text in the input file prior to the first headword is appended to
the headers in the .dict file. All text in the input file following a
headword, up to the next headword, is copied unchanged to the .dict
file.
FORMATTING OPTIONS
-c5 FILE is formatted with headwords preceded by 5 or more under-
score characters (_) and a blank line. All text until the next
headword is considered the definition. Any leading ‘@’ charac-
ters are stripped out, but the file is otherwise unchanged. This
option was written to format the CIA WORLD FACTBOOK 1995.
-e FILE is in html format, with the headword tagged as bold.
(<B>headword - </B>)
This option was written to format EASTON’S 1897 BIBLE DICTIO-
NARY. A typical entry from Easton is:
<A NAME="T0000005">
<B>Abagtha - </B>
one of the seven eunuchs in Ahasuerus’s court (Esther 1:10;
2:21).
This is converted to:
Abagtha
one of the seven eunuchs in Ahasuerus’s court (Esther 1:10;
2:21).
The heading "<A NAME="T0000005"> is omitted, and the headword
‘Abagtha’ is indexed.
NOTE: This option should be used with caution. It removes sev-
eral html tags (enough to format Easton properly), but not all.
The Makefile that was originally written to format dict-easton
uses sed scripts to modify certain cross reference tags. It may
be necessary to pipe the input file through a sed script, or
hack the source of dictfmt in order to properly format other
html databases.
-f FILE is formatted with the headwords starting in column 0, with
the definition indented at least one space (or tab character) on
subsequent lines. The third line starting in column 0 is taken
as the first headword, and the first two lines starting in col-
umn 0 are treated as part of the headers. This option was writ-
ten to format the F.O.L.D.O.C.
-h FILE is formatted with the headwords starting in column 0, fol-
lowed by a comma, with the definition continuing on the same
line. All text before the first single character line is con-
sidered part of the headers, and lines with only one character
are omitted from the .dict file. The first headword is on the
line following the first single character line.
This option was written to format HITCHCOCK’S BIBLE NAMES DIC-
TIONARY. The headword is indexed; the text of the file is not
changed.
-j FILE is formatted with headwords starting in col 0, enclosed in
colons, followed by the definition.
This option was written to format the JARGON FILE. The colons
surrounding the headword are removed, and the headword is
indexed. Lines beginning with ’*’, ’=’, or ’-’ are also
removed. All text before the first headword is included in the
headers.
NOTE: Some recent versions of the JARGON FILE had three blanks
inserted before the first colon at each headword. These must be
removed before processing with dictfmt. (sed scripts have been
used for this purpose. ed, awk, or perl scripts are also possi-
ble.)
-p FILE is formatted with ‘%h’ in column 0, followed by a blank,
followed by the headword, optionally followed by a line contain-
ing ‘%d’ in column 0. The definition starts on the following
line. The first line beginning ´%h´ and any lines beginnning
’%d’ are stripped from the .dict file, and ’%h ’ is stripped
from in front of the headword. All text before the first head-
word is included in the headers. This option was written to
format Jay Kominek’s elements database.
OPTIONS
-u url Specifies the URL of the site from which the raw database was
obtained.
-s name
Specifies the name and, optionally, the version and date, of the
database. (If this contains spaces, it must be quoted.)
-L display license and copyright information
-V display version information
-D output debuugging information
--help display a help message
--locale locale
specifies the locale used for sorting. if no locale is speci-
fied, the "C" locale is used.
--allchars
use all characters (not only alphanumeric and space) in sorting
the index
--headword-separator sep
sets the head word separator, which allows several words to have
the same definition. For example, if ´--headword-separator %%%’
is given, and the input file contains ´autumn%%%fall’, both
’autumn’ and ’fall’ will be indexed as headwords, with the same
definition.
--without-headword
head words will not be included in .dict file
CREDITS
dictfmt was written by Rik Faith (faith@cs.unc.edu) as part of the
dict-misc package. dictfmt is distributed under the terms of the GNU
General Public License. If you need to distribute under other terms,
write to the author.
AUTHOR
This manual page was written by Robert D. Hilliard
<hilliard@debian.org> .
SEE ALSO
dict(1), dictd(8), dictzip(1), http://www.dict.org, RFC 2229
25 December 2000 DICTFMT(1)
Man(1) output converted with
man2html