LOCATEDB(5)                                         LOCATEDB(5)





NAME
       locatedb - front-compressed file name database

DESCRIPTION
       This manual page documents the format of file name data-
       bases for the GNU version  of  locate.   The  file  name
       databases contain lists of files that were in particular
       directory trees when the databases were last updated.

       There can be multiple databases.  Users can select which
       databases  locate searches using an environment variable
       or command  line  option;  see  locate(1).   The  system
       administrator  can  choose  the file name of the default
       database, the frequency with  which  the  databases  are
       updated,  and  the  directories  for  which they contain
       entries.  Normally, file name databases are  updated  by
       running  the  updatedb  program  periodically, typically
       nightly; see updatedb(1).

       updatedb runs a program called frcode  to  compress  the
       list   of  file  names  using  front-compression,  which
       reduces the database size by a factor of 4 to 5.  Front-
       compression  (also  known as incremental encoding) works
       as follows.

       The database entries are a  sorted  list  (case-insensi-
       tively,  for  users'  convenience).   Since  the list is
       sorted, each entry is likely to share a prefix  (initial
       string)  with  the  previous entry.  Each database entry
       begins with an offset-differential count byte, which  is
       the  additional  number  of  characters of prefix of the
       preceding entry to use beyond the number that  the  pre-
       ceding  entry  is using of its predecessor.  (The counts
       can be negative.)  Following the count is a  null-termi-
       nated  ASCII remainder -- the part of the name that fol-
       lows the shared prefix.

       If the offset-differential count is larger than  can  be
       stored  in  a byte (+/-127), the byte has the value 0x80
       and the count follows in a 2-byte word,  with  the  high
       byte first (network byte order).

       Every  database  begins  with  a  dummy entry for a file
       called `LOCATE02', which locate  checks  for  to  ensure
       that  the  database  file  has  the  correct  format; it
       ignores the entry in doing the search.

       Databases can not be concatenated together, even if  the
       first  (dummy)  entry  is trimmed from all but the first
       database.  This is because the offset-differential count
       in the first entry of the second and following databases
       will be wrong.

       There is also an  old  database  format,  used  by  Unix
       locate and find programs and earlier releases of the GNU
       ones.  updatedb runs programs called bigram and code  to
       produce  old-format  databases.   The old format differs
       from  the  above  description  in  the  following  ways.
       Instead  of each entry starting with an offset-differen-
       tial count byte and ending with a null, byte values from
       0  through  28  indicate offset-differential counts from
       -14 through 14.  The byte value indicating that  a  long
       offset-differential  count  follows  is  0x1e  (30), not
       0x80.  The long counts are stored in  host  byte  order,
       which  is  not  necessarily network byte order, and host
       integer word size, which is usually 4 bytes.  They  also
       represent  a  count 14 less than their value.  The data-
       base lines have no termination byte; the  start  of  the
       next  line is indicated by its first byte having a value
       <= 30.

       In addition, instead of starting with a dummy entry, the
       old  database  format  starts with a 256 byte table con-
       taining the 128 most common bigrams in the file list.  A
       bigram  is a pair of adjacent bytes.  Bytes in the data-
       base that have the high bit set are  indexes  (with  the
       high bit cleared) into the bigram table.  The bigram and
       offset-differential count coding makes  these  databases
       20-25%  smaller  than the new format, but makes them not
       8-bit clean.  Any byte in a file name  that  is  in  the
       ranges  used  for  the  special codes is replaced in the
       database by a question mark, which not coincidentally is
       the shell wildcard to match a single character.

EXAMPLE
       Input to frcode:
       /usr/src
       /usr/src/cmd/aardvark.c
       /usr/src/cmd/armadillo.c
       /usr/tmp/zoo

       Length of the longest prefix of the preceding entry to share:
       0 /usr/src
       8 /cmd/aardvark.c
       14 rmadillo.c
       5 tmp/zoo

       Output  from frcode, with trailing nulls changed to new-
       lines and count bytes made printable:
       0 LOCATE02
       0 /usr/src
       8 /cmd/aardvark.c
       6 rmadillo.c
       -9 tmp/zoo

       (6 = 14 - 8, and -9 = 5 - 14)

SEE ALSO
       find(1), locate(1), locatedb(5), xargs(1) Finding  Files
       (on-line in Info, or printed)

BUGS
       The  best  way  to  report  a  bug is to use the form at
       http://savannah.gnu.org/bugs/?group=findutils.  The rea-
       son  for  this  is  that  you will then be able to track
       progress in fixing the problem.   Other  comments  about
       locate(1) and about the findutils package in general can
       be sent to the bug-findutils mailing list.  To join  the
       list, send email to bug-findutils-request@gnu.org.



                                                    LOCATEDB(5)
