dictfmt



DICTFMT(1)                                                          DICTFMT(1)




NAME

       dictfmt - formats a DICT protocol dictionary database


SYNOPSIS

       dictfmt  -c5|-e|-f|-h|-j|-p [options]  basename


DESCRIPTION

       dictfmt takes a file, FILE, on stdin, and creates a dictionary database
       named basename.dict, that conforms to the DICT protocol.  It also  cre-
       ates  an  index  file  named  basename.index.  By default, the index is
       sorted according to the C locale, and only alphanumeric characters  and
       spaces  are  used  in  sorting,  however  this  may be changed with the
       --locale and --allchars options.  ( basename is commonly chosen to cor-
       respond to the basename of FILE , but this is not mandatory.)

       Unless  the  database is extremely small, it is highly recommended that
       basename.dict be  compressed  with  /usr/bin/dictzip  to  create  base-
       name.dict.dz.  (dictzip is included in the dictd source package.)

       FILE  may  be  in  any  of  the several formats described by the format
       options -c5, -e, -f, -h, -j, or -p.  Exactly one of these options  must
       be given.

       Headers are prepended to the .dict file giving the URL of the site from
       which the original database was obtained, the name of  the  dictionary,
       and  the  date of conversion (formatting).  If the -u and/or -s options
       are omitted, the value "unknown" will be  included  for  these  values,
       which is undesirable for a publicly distributed database.

       All  text  in the input file prior to the first headword is appended to
       the headers in the .dict file.  All text in the input file following  a
       headword,  up  to  the  next headword, is copied unchanged to the .dict
       file.



FORMATTING OPTIONS

       -c5    FILE is formatted with headwords preceded by 5  or  more  under-
              score  characters (_) and a blank line.  All text until the next
              headword is considered the definition.  Any leading ‘@’  charac-
              ters are stripped out, but the file is otherwise unchanged. This
              option was written to format the CIA WORLD FACTBOOK 1995.

       -e     FILE is in html  format,  with  the  headword  tagged  as  bold.
              (<B>headword - </B>)
              This  option  was  written to format EASTON’S 1897 BIBLE DICTIO-
              NARY.  A typical entry from Easton is:

              <A NAME="T0000005">
              <B>Abagtha - </B>
              one of the seven eunuchs  in  Ahasuerus’s  court  (Esther  1:10;
              2:21).

              This is converted to:
              Abagtha
                 one  of  the seven eunuchs in Ahasuerus’s court (Esther 1:10;
              2:21).

              The heading "<A NAME="T0000005"> is omitted,  and  the  headword
              ‘Abagtha’ is indexed.

              NOTE:  This option should be used with caution.  It removes sev-
              eral html tags (enough to format Easton properly), but not  all.
              The  Makefile  that was originally written to format dict-easton
              uses sed scripts to modify certain cross reference tags.  It may
              be  necessary  to  pipe  the input file through a sed script, or
              hack the source of dictfmt in order  to  properly  format  other
              html databases.

       -f     FILE  is formatted with the headwords starting in column 0, with
              the definition indented at least one space (or tab character) on
              subsequent  lines.  The third line starting in column 0 is taken
              as the first headword, and the first two lines starting in  col-
              umn 0 are treated as part of the headers.  This option was writ-
              ten to format the F.O.L.D.O.C.

       -h     FILE is formatted with the headwords starting in column 0,  fol-
              lowed  by  a  comma,  with the definition continuing on the same
              line.  All text before the first single character line  is  con-
              sidered  part  of the headers, and lines with only one character
              are omitted from the .dict file.  The first headword is  on  the
              line following the first single character line.
              This  option was written to format HITCHCOCK’S  BIBLE NAMES DIC-
              TIONARY.  The headword is indexed; the text of the file  is  not
              changed.

       -j     FILE  is formatted with headwords starting in col 0, enclosed in
              colons, followed by the definition.
              This option was written to format the JARGON FILE.   The  colons
              surrounding  the  headword  are  removed,  and  the  headword is
              indexed.  Lines  beginning  with  ’*’,  ’=’,  or  ’-’  are  also
              removed.   All text before the first headword is included in the
              headers.

              NOTE: Some recent versions of the JARGON FILE had  three  blanks
              inserted before the first colon at each headword.  These must be
              removed before processing with dictfmt.  (sed scripts have  been
              used  for this purpose. ed, awk, or perl scripts are also possi-
              ble.)

       -p     FILE is formatted with ‘%h’ in column 0, followed  by  a  blank,
              followed by the headword, optionally followed by a line contain-
              ing ‘%d’ in column 0.  The definition starts  on  the  following
              line.   The  first  line beginning ´%h´ and any lines beginnning
              ’%d’ are stripped from the .dict file, and  ’%h  ’  is  stripped
              from  in front of the headword.  All text before the first head-
              word is included in the headers.  This  option  was  written  to
              format Jay Kominek’s elements database.


       OPTIONS

       -u url Specifies  the  URL  of the site from which the raw database was
              obtained.

       -s name
              Specifies the name and, optionally, the version and date, of the
              database.  (If this contains spaces, it must be quoted.)

       -L     display license and copyright information

       -V     display version information

       -D     output debuugging information

       --help display a help message

       --locale locale
              specifies  the  locale used for sorting.  if no locale is speci-
              fied, the "C" locale is used.

       --allchars
              use all characters (not only alphanumeric and space) in  sorting
              the index

       --headword-separator sep
              sets the head word separator, which allows several words to have
              the same definition.  For example, if ´--headword-separator %%%’
              is  given,  and  the  input  file contains ´autumn%%%fall’, both
              ’autumn’ and ’fall’ will be indexed as  headwords, with the same
              definition.

       --without-headword
              head words will not be included in .dict file


CREDITS

       dictfmt  was  written  by  Rik  Faith (faith@cs.unc.edu) as part of the
       dict-misc package.  dictfmt is distributed under the terms of  the  GNU
       General  Public  License.  If you need to distribute under other terms,
       write to the author.


AUTHOR

       This   manual   page   was    written    by    Robert    D.    Hilliard
       <hilliard@debian.org> .



SEE ALSO

       dict(1), dictd(8), dictzip(1), http://www.dict.org, RFC 2229



                               25 December 2000                     DICTFMT(1)

Man(1) output converted with man2html