GAWK(1)                 Utility Commands                GAWK(1)





NAME
       gawk - pattern scanning and processing language

SYNOPSIS
       gawk [ POSIX or GNU style options ] -f program-file [ --
       ] file ...
       gawk [ POSIX or GNU style options ] [ -- ]  program-text
       file ...

       pgawk  [  POSIX or GNU style options ] -f program-file [
       -- ] file ...
       pgawk [ POSIX or GNU style options ] [ -- ] program-text
       file ...

DESCRIPTION
       Gawk is the GNU Project's implementation of the AWK pro-
       gramming language.  It conforms to the definition of the
       language in the POSIX 1003.2 Command Language And Utili-
       ties Standard.  This version in turn  is  based  on  the
       description  in  The  AWK  Programming Language, by Aho,
       Kernighan, and Weinberger, with the additional  features
       found  in  the  System  V Release 4 version of UNIX awk.
       Gawk also provides more  recent  Bell  Laboratories  awk
       extensions, and a number of GNU-specific extensions.

       Pgawk is the profiling version of gawk.  It is identical
       in every way to gawk,  except  that  programs  run  more
       slowly,  and it automatically produces an execution pro-
       file in the file awkprof.out when done.  See the  --pro-
       file option, below.

       The command line consists of options to gawk itself, the
       AWK program text (if not supplied via the -f  or  --file
       options),  and  values  to be made available in the ARGC
       and ARGV pre-defined AWK variables.

OPTION FORMAT
       Gawk options may be either traditional POSIX one  letter
       options, or GNU style long options.  POSIX options start
       with a single "-", while long options start  with  "--".
       Long options are provided for both GNU-specific features
       and for POSIX-mandated features.

       Following the POSIX standard, gawk-specific options  are
       supplied  via  arguments  to the -W option.  Multiple -W
       options may be supplied Each -W option has a correspond-
       ing  long  option, as detailed below.  Arguments to long
       options are either joined with the option by an =  sign,
       with  no  intervening spaces, or they may be provided in
       the next command line argument.   Long  options  may  be
       abbreviated, as long as the abbreviation remains unique.

OPTIONS
       Gawk accepts the following  options,  listed  alphabeti-
       cally.

       -F fs
       --field-separator fs
              Use  fs  for the input field separator (the value
              of the FS predefined variable).

       -v var=val
       --assign var=val
              Assign the value val to the variable var,  before
              execution  of  the program begins.  Such variable
              values are available to the BEGIN block of an AWK
              program.

       -f program-file
       --file program-file
              Read  the  AWK  program source from the file pro-
              gram-file, instead of from the first command line
              argument.  Multiple -f (or --file) options may be
              used.

       -mf NNN
       -mr NNN
              Set various memory limits to the value NNN.   The
              f flag sets the maximum number of fields, and the
              r flag sets the maximum record size.   These  two
              flags and the -m option are from the Bell Labora-
              tories research version of UNIX  awk.   They  are
              ignored  by  gawk,  since gawk has no pre-defined
              limits.

       -W compat
       -W traditional
       --compat
       --traditional
              Run  in  compatibility  mode.   In  compatibility
              mode,  gawk behaves identically to UNIX awk; none
              of the GNU-specific  extensions  are  recognized.
              The  use  of  --traditional is preferred over the
              other forms of this option.  See GNU  EXTENSIONS,
              below, for more information.

       -W copyleft
       -W copyright
       --copyleft
       --copyright
              Print  the  short  version  of  the GNU copyright
              information message on the  standard  output  and
              exit successfully.

       -W dump-variables[=file]
       --dump-variables[=file]
              Print  a  sorted  list of global variables, their
              types and final values to file.  If  no  file  is
              provided,  gawk  uses a file named awkvars.out in
              the current directory.
              Having a list of all the global  variables  is  a
              good way to look for typographical errors in your
              programs.  You would also use this option if  you
              have a large program with a lot of functions, and
              you want to be sure  that  your  functions  don't
              inadvertently use global variables that you meant
              to be local.  (This is a particularly  easy  mis-
              take  to  make with simple variable names like i,
              j, and so on.)

       -W help
       -W usage
       --help
       --usage
              Print a relatively short summary of the available
              options  on  the  standard  output.  (Per the GNU
              Coding Standards, these options cause an  immedi-
              ate, successful exit.)

       -W lint[=value]
       --lint[=value]
              Provide  warnings about constructs that are dubi-
              ous or non-portable to other AWK implementations.
              With an optional argument of fatal, lint warnings
              become fatal errors.  This may  be  drastic,  but
              its  use will certainly encourage the development
              of cleaner AWK programs.  With an optional  argu-
              ment  of invalid, only warnings about things that
              are actually invalid are  issued.  (This  is  not
              fully implemented yet.)

       -W lint-old
       --lint-old
              Provide  warnings  about  constructs that are not
              portable to the original version of Unix awk.

       -W gen-po
       --gen-po
              Scan and parse the AWK program,  and  generate  a
              GNU  .po  format  file  on  standard  output with
              entries for all localizable strings in  the  pro-
              gram.   The  program itself is not executed.  See
              the GNU gettext distribution for more information
              on .po files.

       -W non-decimal-data
       --non-decimal-data
              Recognize  octal  and hexadecimal values in input
              data.  Use this option with great caution!

       -W posix
       --posix
              This turns on compatibility mode, with  the  fol-
              lowing additional restrictions:

              o \x escape sequences are not recognized.

              o Only space and tab act as field separators when
                FS is set to a single space, newline does  not.

              o You cannot continue lines after ?  and :.

              o The  synonym  func  for the keyword function is
                not recognized.

              o The operators ** and  **=  cannot  be  used  in
                place of ^ and ^=.

              o The fflush() function is not available.

       -W profile[=prof_file]
       --profile[=prof_file]
              Send profiling data to prof_file.  The default is
              awkprof.out.  When run with gawk, the profile  is
              just  a  "pretty printed" version of the program.
              When run with pgawk, the profile contains  execu-
              tion  counts  of each statement in the program in
              the left margin and function call counts for each
              user-defined function.

       -W re-interval
       --re-interval
              Enable the use of interval expressions in regular
              expression  matching  (see  Regular  Expressions,
              below).  Interval expressions were not tradition-
              ally available in the AWK  language.   The  POSIX
              standard  added  them, to make awk and egrep con-
              sistent with each other.  However, their  use  is
              likely  to  break  old AWK programs, so gawk only
              provides them if they  are  requested  with  this
              option, or when --posix is specified.

       -W source program-text
       --source program-text
              Use  program-text  as  AWK  program  source code.
              This  option  allows  the  easy  intermixing   of
              library  functions  (used  via  the -f and --file
              options) with source code entered on the  command
              line.   It  is  intended  primarily for medium to
              large AWK programs used in shell scripts.

       -W version
       --version
              Print version  information  for  this  particular
              copy  of  gawk  on  the standard output.  This is
              useful mainly for knowing if the current copy  of
              gawk on your system is up to date with respect to
              whatever the Free  Software  Foundation  is  dis-
              tributing.   This  is  also useful when reporting
              bugs.   (Per  the  GNU  Coding  Standards,  these
              options cause an immediate, successful exit.)

       --     Signal  the  end  of  options.  This is useful to
              allow further arguments to the AWK program itself
              to  start with a "-".  This is mainly for consis-
              tency with the argument parsing  convention  used
              by most other POSIX programs.
       In  compatibility mode, any other options are flagged as
       invalid, but are otherwise ignored.   In  normal  opera-
       tion, as long as program text has been supplied, unknown
       options are passed on to the AWK  program  in  the  ARGV
       array  for  processing.  This is particularly useful for
       running AWK programs via the "#!" executable interpreter
       mechanism.
AWK PROGRAM EXECUTION
       An  AWK program consists of a sequence of pattern-action
       statements and optional function definitions.
              pattern   { action statements }
              function name(parameter list) { statements }
       Gawk first reads the program source  from  the  program-
       file(s)  if  specified,  from  arguments to --source, or
       from the first non-option argument on the command  line.
       The  -f  and --source options may be used multiple times
       on the command line.  Gawk reads the program text as  if
       all  the program-files and command line source texts had
       been concatenated together.  This is useful for building
       libraries  of  AWK  functions, without having to include
       them in each new AWK program that uses  them.   It  also
       provides  the ability to mix library functions with com-
       mand line programs.
       The environment variable AWKPATH specifies a search path
       to  use  when  finding  source  files  named with the -f
       option.  If this variable does not  exist,  the  default
       path is ".:/usr/local/share/awk".  (The actual directory
       may  vary,  depending  upon  how  gawk  was  built   and
       installed.)   If a file name given to the -f option con-
       tains a "/" character, no path search is performed.
       Gawk executes  AWK  programs  in  the  following  order.
       First,  all  variable  assignments  specified via the -v
       option are performed.  Next, gawk compiles  the  program
       into  an internal form.  Then, gawk executes the code in
       the BEGIN block(s) (if any), and then proceeds  to  read
       each  file  named  in  the  ARGV array.  If there are no
       files named on the command line, gawk reads the standard
       input.
       If  a  filename on the command line has the form var=val
       it is treated as a variable  assignment.   The  variable
       var will be assigned the value val.  (This happens after
       any BEGIN block(s) have been run.)  Command  line  vari-
       able assignment is most useful for dynamically assigning
       values to the variables AWK uses to control how input is
       broken  into  fields and records.  It is also useful for
       controlling state if multiple passes are needed  over  a
       single data file.
       If  the  value  of a particular element of ARGV is empty
       (""), gawk skips over it.
       For each record in the input, gawk tests to  see  if  it
       matches  any  pattern in the AWK program.  For each pat-
       tern that the record matches, the associated  action  is
       executed.   The  patterns  are  tested in the order they
       occur in the program.
       Finally, after all the input is exhausted, gawk executes
       the code in the END block(s) (if any).
VARIABLES, RECORDS AND FIELDS
       AWK variables are dynamic; they come into existence when
       they are first used.  Their values are either  floating-
       point  numbers  or  strings, or both, depending upon how
       they are used.  AWK also  has  one  dimensional  arrays;
       arrays  with multiple dimensions may be simulated.  Sev-
       eral pre-defined variables are set as  a  program  runs;
       these  will be described as needed and summarized below.
   Records
       Normally, records are separated by  newline  characters.
       You  can  control how records are separated by assigning
       values to the built-in variable RS.  If RS is any single
       character, that character separates records.  Otherwise,
       RS is a regular expression.   Text  in  the  input  that
       matches  this  regular  expression separates the record.
       However, in compatibility mode, only the first character
       of  its string value is used for separating records.  If
       RS is set to the null string, then records are separated
       by  blank lines.  When RS is set to the null string, the
       newline character always acts as a field  separator,  in
       addition to whatever value FS may have.
   Fields
       As  each  input  record  is read, gawk splits the record
       into fields, using the value of the FS variable  as  the
       field  separator.   If  FS is a single character, fields
       are separated by that character.   If  FS  is  the  null
       string,  then  each individual character becomes a sepa-
       rate field.  Otherwise, FS is expected to be a full reg-
       ular  expression.  In the special case that FS is a sin-
       gle space, fields are separated by runs of spaces and/or
       tabs  and/or  newlines.   (But  see  the  discussion  of
       --posix, below).  NOTE: The  value  of  IGNORECASE  (see
       below)  also  affects  how fields are split when FS is a
       regular expression, and how records are  separated  when
       RS is a regular expression.
       If  the FIELDWIDTHS variable is set to a space separated
       list of numbers, each field is expected  to  have  fixed
       width, and gawk splits up the record using the specified
       widths.  The value of FS is ignored.   Assigning  a  new
       value  to  FS  overrides  the  use  of  FIELDWIDTHS, and
       restores the default behavior.
       Each field in the input record may be referenced by  its
       position,  $1,  $2,  and so on.  $0 is the whole record.
       Fields need not be referenced by constants:
              n = 5
              print $n
       prints the fifth field in the input record.
       The variable NF is set to the total number of fields  in
       the input record.
       References  to  non-existent  fields  (i.e. fields after
       $NF) produce the null-string.  However, assigning  to  a
       non-existent  field  (e.g.,  $(NF+2)  = 5) increases the
       value of NF, creates any  intervening  fields  with  the
       null  string  as their value, and causes the value of $0
       to be recomputed, with the fields being separated by the
       value  of  OFS.   References to negative numbered fields
       cause a fatal error.  Decrementing NF causes the  values
       of  fields  past the new value to be lost, and the value
       of $0 to be recomputed, with the fields being  separated
       by the value of OFS.
       Assigning  a value to an existing field causes the whole
       record to be rebuilt when $0 is referenced.   Similarly,
       assigning a value to $0 causes the record to be resplit,
       creating new values for the fields.
   Built-in Variables
       Gawk's built-in variables are:
       ARGC        The number of command line  arguments  (does
                   not  include options to gawk, or the program
                   source).
       ARGIND      The index in ARGV of the current file  being
                   processed.
       ARGV        Array  of command line arguments.  The array
                   is indexed from 0 to ARGC - 1.   Dynamically
                   changing  the  contents  of ARGV can control
                   the files used for data.
       BINMODE     On  non-POSIX  systems,  specifies  use   of
                   "binary"  mode  for  all  file I/O.  Numeric
                   values of 1, 2, or  3,  specify  that  input
                   files,  output  files, or all files, respec-
                   tively, should use binary I/O.  String  val-
                   ues of "r", or "w" specify that input files,
                   or output files,  respectively,  should  use
                   binary  I/O.   String values of "rw" or "wr"
                   specify that all  files  should  use  binary
                   I/O.   Any  other string value is treated as
                   "rw", but generates a warning message.
       CONVFMT     The conversion format for  numbers,  "%.6g",
                   by default.
       ENVIRON     An  array  containing the values of the cur-
                   rent environment.  The array is  indexed  by
                   the   environment  variables,  each  element
                   being the value of that variable (e.g., ENV-
                   IRON["HOME"] might be /home/arnold).  Chang-
                   ing this array does not affect the  environ-
                   ment  seen by programs which gawk spawns via
                   redirection or the system() function.
       ERRNO       If a system error occurs either doing a  re-
                   direction  for  getline,  during  a read for
                   getline, or during  a  close(),  then  ERRNO
                   will  contain a string describing the error.
                   The value is subject to translation in  non-
                   English locales.
       FIELDWIDTHS A white-space separated list of fieldwidths.
                   When set, gawk parses the input into  fields
                   of  fixed  width, instead of using the value
                   of the FS variable as the field separator.
       FILENAME    The name of the current input file.   If  no
                   files are specified on the command line, the
                   value of FILENAME is "-".  However, FILENAME
                   is  undefined inside the BEGIN block (unless
                   set by getline).
       FNR         The input record number in the current input
                   file.
       FS          The   input  field  separator,  a  space  by
                   default.  See Fields, above.
       IGNORECASE  Controls the case-sensitivity of all regular
                   expression   and   string   operations.   If
                   IGNORECASE has a non-zero value, then string
                   comparisons  and  pattern matching in rules,
                   field splitting with FS,  record  separating
                   with  RS, regular expression matching with ~
                   and !~, and the gensub(),  gsub(),  index(),
                   match(),  split(),  and sub() built-in func-
                   tions all ignore  case  when  doing  regular
                   expression  operations.   NOTE:  Array  sub-
                   scripting is  not  affected.   However,  the
                   asort() and asorti() functions are affected.
                   Thus, if IGNORECASE is not  equal  to  zero,
                   /aB/  matches all of the strings "ab", "aB",
                   "Ab", and "AB".  As with all AWK  variables,
                   the  initial value of IGNORECASE is zero, so
                   all regular expression and string operations
                   are  normally  case-sensitive.   Under Unix,
                   the full ISO 8859-1 Latin-1 character set is
                   used when ignoring case.
       LINT        Provides   dynamic  control  of  the  --lint
                   option from within  an  AWK  program.   When
                   true, gawk prints lint warnings. When false,
                   it does not.  When assigned the string value
                   "fatal",  lint warnings become fatal errors,
                   exactly like --lint=fatal.  Any  other  true
                   value just prints warnings.
       NF          The  number  of  fields in the current input
                   record.
       NR          The total number of input  records  seen  so
                   far.
       OFMT        The  output  format  for numbers, "%.6g", by
                   default.
       OFS         The  output  field  separator,  a  space  by
                   default.
       ORS         The  output  record  separator, by default a
                   newline.
       PROCINFO    The elements of this array provide access to
                   information  about  the running AWK program.
                   On some systems, there may  be  elements  in
                   the  array,  "group1"  through  "groupn" for
                   some n, which is the number of supplementary
                   groups  that  the  process  has.  Use the in
                   operator to test for  these  elements.   The
                   following  elements  are  guaranteed  to  be
                   available:
                   PROCINFO["egid"]   the value  of  the  gete-
                                      gid(2) system call.
                   PROCINFO["euid"]   the    value    of    the
                                      geteuid(2) system call.
                   PROCINFO["FS"]     "FS" if  field  splitting
                                      with  FS is in effect, or
                                      "FIELDWIDTHS"  if   field
                                      splitting   with   FIELD-
                                      WIDTHS is in effect.
                   PROCINFO["gid"]    the  value  of  the  get-
                                      gid(2) system call.
                   PROCINFO["pgrpid"] the  process  group ID of
                                      the current process.
                   PROCINFO["pid"]    the  process  ID  of  the
                                      current process.
                   PROCINFO["ppid"]   the  parent process ID of
                                      the current process.
                   PROCINFO["uid"]    the    value    of    the
                                      getuid(2) system call.
       RS          The  input  record  separator,  by default a
                   newline.
       RT          The record terminator.  Gawk sets RT to  the
                   input  text  that  matched  the character or
                   regular expression specified by RS.
       RSTART      The index of the first character matched  by
                   match();  0 if no match.  (This implies that
                   character indices start at one.)
       RLENGTH     The length of the string matched by match();
                   -1 if no match.
       SUBSEP      The character used to separate multiple sub-
                   scripts  in  array  elements,   by   default
                   "\034".
       TEXTDOMAIN  The  text domain of the AWK program; used to
                   find the localized translations for the pro-
                   gram's strings.
   Arrays
       Arrays are subscripted with an expression between square
       brackets ([ and ]).  If the expression is an  expression
       list  (expr,  expr  ...)   then the array subscript is a
       string consisting of the concatenation of  the  (string)
       value  of each expression, separated by the value of the
       SUBSEP variable.  This facility is used to simulate mul-
       tiply dimensioned arrays.  For example:
              i = "A"; j = "B"; k = "C"
              x[i, j, k] = "hello, world\n"
       assigns  the  string  "hello, world\n" to the element of
       the  array  x   which   is   indexed   by   the   string
       "A\034B\034C".   All arrays in AWK are associative, i.e.
       indexed by string values.
       The special operator in may be used in an  if  or  while
       statement  to see if an array has an index consisting of
       a particular value.
              if (val in array)
                   print array[val]
       If the array has multiple  subscripts,  use  (i,  j)  in
       array.
       The in construct may also be used in a for loop to iter-
       ate over all the elements of an array.
       An element may be deleted from an array using the delete
       statement.   The  delete  statement  may also be used to
       delete the entire contents of an array, just by specify-
       ing the array name without a subscript.
   Variable Typing And Conversion
       Variables and fields may be (floating point) numbers, or
       strings, or both.  How the value of a variable is inter-
       preted  depends  upon its context.  If used in a numeric
       expression, it will be treated as a number, if used as a
       string it will be treated as a string.
       To  force a variable to be treated as a number, add 0 to
       it; to force it to be treated as a  string,  concatenate
       it with the null string.
       When a string must be converted to a number, the conver-
       sion is accomplished using strtod(3).  A number is  con-
       verted  to  a  string by using the value of CONVFMT as a
       format string for sprintf(3), with the numeric value  of
       the  variable as the argument.  However, even though all
       numbers in AWK are floating-point, integral  values  are
       always converted as integers.  Thus, given
              CONVFMT = "%2.2f"
              a = 12
              b = a ""
       the  variable  b  has  a  string  value  of "12" and not
       "12.00".
       Gawk performs comparisons as follows: If  two  variables
       are  numeric,  they  are  compared  numerically.  If one
       value is numeric and the other has a string  value  that
       is  a  "numeric  string," then comparisons are also done
       numerically.  Otherwise, the numeric value is  converted
       to  a  string and a string comparison is performed.  Two
       strings are compared, of course, as strings.  Note  that
       the  POSIX  standard  applies  the  concept  of "numeric
       string" everywhere, even to string constants.   However,
       this  is  clearly  incorrect, and gawk does not do this.
       (Fortunately, this is fixed in the next version  of  the
       standard.)
       Note  that  string  constants,  such  as  "57",  are not
       numeric strings, they are string constants.  The idea of
       "numeric  string" only applies to fields, getline input,
       FILENAME, ARGV elements, ENVIRON elements and  the  ele-
       ments  of  an  array created by split() that are numeric
       strings.  The basic idea is that user  input,  and  only
       user  input,  that looks numeric, should be treated that
       way.
       Uninitialized variables have the numeric value 0 and the
       string value "" (the null, or empty, string).
   Octal and Hexadecimal Constants
       Starting  with version 3.1 of gawk , you may use C-style
       octal and hexadecimal  constants  in  your  AWK  program
       source  code.  For example, the octal value 011 is equal
       to decimal 9, and the hexadecimal value 0x11 is equal to
       decimal 17.
   String Constants
       String  constants  in  AWK  are  sequences of characters
       enclosed between double  quotes  (").   Within  strings,
       certain escape sequences are recognized, as in C.  These
       are:
       \\   A literal backslash.
       \a   The "alert" character; usually the ASCII BEL  char-
            acter.
       \b   backspace.
       \f   form-feed.
       \n   newline.
       \r   carriage return.
       \t   horizontal tab.
       \v   vertical tab.
       \xhex digits
            The character represented by the string of hexadec-
            imal digits following the \x.  As in  ANSI  C,  all
            following hexadecimal digits are considered part of
            the escape sequence.  (This feature should tell  us
            something  about  language  design  by  committee.)
            E.g., "\x1B" is the ASCII ESC (escape) character.
       \ddd The character represented by the 1-, 2-, or 3-digit
            sequence  of  octal  digits.   E.g.,  "\033" is the
            ASCII ESC (escape) character.
       \c   The literal character c.
       The escape sequences may also be  used  inside  constant
       regular   expressions   (e.g.,  /[ \t\f\n\r\v]/  matches
       whitespace characters).
       In compatibility mode,  the  characters  represented  by
       octal  and hexadecimal escape sequences are treated lit-
       erally when used in regular expression constants.  Thus,
       /a\52b/ is equivalent to /a\*b/.
PATTERNS AND ACTIONS
       AWK  is  a  line-oriented  language.   The pattern comes
       first, and  then  the  action.   Action  statements  are
       enclosed in { and }.  Either the pattern may be missing,
       or the action may be missing, but, of course, not  both.
       If  the  pattern  is missing, the action is executed for
       every single record  of  input.   A  missing  action  is
       equivalent to
              { print }
       which prints the entire record.
       Comments  begin  with  the  "#"  character, and continue
       until the end of the line.  Blank lines may be  used  to
       separate  statements.  Normally, a statement ends with a
       newline, however, this is not the case for lines  ending
       in  a  ",",  {,  ?, :, &&, or ||.  Lines ending in do or
       else also have their statements automatically  continued
       on  the  following  line.  In other cases, a line can be
       continued by ending it with a "\",  in  which  case  the
       newline will be ignored.
       Multiple statements may be put on one line by separating
       them with a ";".  This applies to  both  the  statements
       within  the  action  part  of a pattern-action pair (the
       usual case), and to the pattern-action statements  them-
       selves.
   Patterns
       AWK patterns may be one of the following:
              BEGIN
              END
              /regular expression/
              relational expression
              pattern && pattern
              pattern || pattern
              pattern ? pattern : pattern
              (pattern)
              ! pattern
              pattern1, pattern2
       BEGIN  and  END  are two special kinds of patterns which
       are not tested against the input.  The action  parts  of
       all  BEGIN  patterns are merged as if all the statements
       had been written in a single BEGIN block.  They are exe-
       cuted  before  any of the input is read.  Similarly, all
       the END blocks are merged, and  executed  when  all  the
       input  is  exhausted  (or when an exit statement is exe-
       cuted).  BEGIN and END patterns cannot be combined  with
       other  patterns  in  pattern expressions.  BEGIN and END
       patterns cannot have missing action parts.
       For /regular expression/ patterns, the associated state-
       ment  is executed for each input record that matches the
       regular expression.  Regular expressions are the same as
       those in egrep(1), and are summarized below.
       A  relational  expression  may  use any of the operators
       defined below in the section on actions.   These  gener-
       ally  test  whether certain fields match certain regular
       expressions.
       The &&, ||, and !  operators are  logical  AND,  logical
       OR,  and  logical  NOT,  respectively, as in C.  They do
       short-circuit evaluation, also as in C, and are used for
       combining  more  primitive  pattern  expressions.  As in
       most languages, parentheses may be used  to  change  the
       order of evaluation.
       The  ?: operator is like the same operator in C.  If the
       first pattern is true then the pattern used for  testing
       is  the second pattern, otherwise it is the third.  Only
       one of the second and third patterns is evaluated.
       The pattern1, pattern2 form of an expression is called a
       range  pattern.   It  matches all input records starting
       with a record  that  matches  pattern1,  and  continuing
       until  a  record  that  matches pattern2, inclusive.  It
       does not combine with any other sort of pattern  expres-
       sion.
   Regular Expressions
       Regular  expressions  are  the  extended  kind  found in
       egrep.  They are composed of characters as follows:
       c          matches the non-metacharacter c.
       \c         matches the literal character c.
       .          matches any character including newline.
       ^          matches the beginning of a string.
       $          matches the end of a string.
       [abc...]   character list, matches any of the characters
                  abc....
       [^abc...]  negated character list, matches any character
                  except abc....
       r1|r2      alternation: matches either r1 or r2.
       r1r2       concatenation: matches r1, and then r2.
       r+         matches one or more r's.
       r*         matches zero or more r's.
       r?         matches zero or one r's.
       (r)        grouping: matches r.
       r{n}
       r{n,}
       r{n,m}     One or two numbers inside  braces  denote  an
                  interval  expression.  If there is one number
                  in the braces, the preceding regular  expres-
                  sion r is repeated n times.  If there are two
                  numbers separated by a comma, r is repeated n
                  to  m times.  If there is one number followed
                  by a comma, then r is  repeated  at  least  n
                  times.
                  Interval  expressions  are  only available if
                  either --posix or --re-interval is  specified
                  on the command line.

       \y         matches the empty string at either the begin-
                  ning or the end of a word.

       \B         matches the empty string within a word.

       \<         matches the empty string at the beginning  of
                  a word.

       \>         matches  the  empty  string  at  the end of a
                  word.

       \w         matches any word-constituent character  (let-
                  ter, digit, or underscore).

       \W         matches  any  character that is not word-con-
                  stituent.

       \`         matches the empty string at the beginning  of
                  a buffer (string).

       \'         matches  the  empty  string  at  the end of a
                  buffer.

       The escape sequences that are valid in string  constants
       (see below) are also valid in regular expressions.

       Character  classes  are  a new feature introduced in the
       POSIX standard.  A character class is a special notation
       for  describing lists of characters that have a specific
       attribute, but where the  actual  characters  themselves
       can  vary  from country to country and/or from character
       set to character set.  For example, the notion  of  what
       is  an  alphabetic  character  differs in the USA and in
       France.

       A character class is only valid in a regular  expression
       inside  the  brackets  of  a  character list.  Character
       classes consist of [:, a keyword denoting the class, and
       :].  The character classes defined by the POSIX standard
       are:

       [:alnum:]  Alphanumeric characters.

       [:alpha:]  Alphabetic characters.

       [:blank:]  Space or tab characters.

       [:cntrl:]  Control characters.

       [:digit:]  Numeric characters.

       [:graph:]  Characters that are both printable and  visi-
                  ble.  (A space is printable, but not visible,
                  while an a is both.)

       [:lower:]  Lower-case alphabetic characters.

       [:print:]  Printable characters (characters that are not
                  control characters.)

       [:punct:]  Punctuation  characters  (characters that are
                  not letter, digits,  control  characters,  or
                  space characters).

       [:space:]  Space  characters  (such  as  space, tab, and
                  formfeed, to name a few).

       [:upper:]  Upper-case alphabetic characters.

       [:xdigit:] Characters that are hexadecimal digits.

       For  example,  before  the  POSIX  standard,  to   match
       alphanumeric  characters,  you  would  have had to write
       /[A-Za-z0-9]/.  If your character set had  other  alpha-
       betic  characters  in it, this would not match them, and
       if your character set collated differently  from  ASCII,
       this might not even match the ASCII alphanumeric charac-
       ters.  With the POSIX character classes, you  can  write
       /[[:alnum:]]/,  and  this  matches  the  alphabetic  and
       numeric characters in your character set.

       Two additional special sequences can appear in character
       lists.   These  apply to non-ASCII character sets, which
       can have single symbols (called collating elements) that
       are represented with more than one character, as well as
       several characters that are equivalent for collating, or
       sorting,  purposes.  (E.g., in French, a plain "e" and a
       grave-accented e` are equivalent.)

       Collating Symbols
              A collating symbol is a multi-character collating
              element  enclosed in [.  and .].  For example, if
              ch is a collating element, then  [[.ch.]]   is  a
              regular  expression  that  matches this collating
              element, while [ch] is a regular expression  that
              matches either c or h.

       Equivalence Classes
              An  equivalence  class  is a locale-specific name
              for a list of  characters  that  are  equivalent.
              The  name is enclosed in [= and =].  For example,
              the name e might be used to represent all of "e,"
              "'," and "`."  In this case, [[=e=]] is a regular
              expression that matches any of e, ', or `.

       These features are very valuable in non-English speaking
       locales.  The library functions that gawk uses for regu-
       lar expression matching currently only  recognize  POSIX
       character  classes; they do not recognize collating sym-
       bols or equivalence classes.

       The \y, \B, \<, \>, \w, \W, \`,  and  \'  operators  are
       specific  to  gawk; they are extensions based on facili-
       ties in the GNU regular expression libraries.

       The various command line options control how gawk inter-
       prets characters in regular expressions.

       No options
              In the default case, gawk provide all the facili-
              ties of POSIX regular  expressions  and  the  GNU
              regular  expression  operators  described  above.
              However, interval expressions are not  supported.

       --posix
              Only POSIX regular expressions are supported, the
              GNU operators are not special.  (E.g., \w matches
              a  literal w).  Interval expressions are allowed.

       --traditional
              Traditional  Unix  awk  regular  expressions  are
              matched.   The  GNU  operators  are  not special,
              interval expressions are not available, and  nei-
              ther are the POSIX character classes ([[:alnum:]]
              and so on).  Characters described  by  octal  and
              hexadecimal  escape  sequences are treated liter-
              ally, even if they represent  regular  expression
              metacharacters.

       --re-interval
              Allow  interval  expressions  in  regular expres-
              sions, even if --traditional has been provided.

   Actions
       Action statements are  enclosed  in  braces,  {  and  }.
       Action  statements consist of the usual assignment, con-
       ditional, and looping  statements  found  in  most  lan-
       guages.    The   operators,   control   statements,  and
       input/output statements available  are  patterned  after
       those in C.

   Operators
       The operators in AWK, in order of decreasing precedence,
       are


       (...)       Grouping

       $           Field reference.

       ++ --       Increment and  decrement,  both  prefix  and
                   postfix.

       ^           Exponentiation (** may also be used, and **=
                   for the assignment operator).

       + - !       Unary plus, unary minus, and  logical  nega-
                   tion.

       * / %       Multiplication, division, and modulus.

       + -         Addition and subtraction.

       space       String concatenation.

       < >
       <= >=
       != ==       The regular relational operators.

       ~ !~        Regular  expression  match,  negated  match.
                   NOTE: Do not use a constant regular  expres-
                   sion (/foo/) on the left-hand side of a ~ or
                   !~.  Only use one on  the  right-hand  side.
                   The  expression  /foo/  ~  exp  has the same
                   meaning as (($0 ~ /foo/) ~  exp).   This  is
                   usually not what was intended.

       in          Array membership.

       &&          Logical AND.

       ||          Logical OR.

       ?:          The  C conditional expression.  This has the
                   form expr1 ? expr2 :  expr3.   If  expr1  is
                   true,  the value of the expression is expr2,
                   otherwise it is expr3.  Only  one  of  expr2
                   and expr3 is evaluated.

       = += -=
       *= /= %= ^= Assignment.  Both absolute assignment (var =
                   value) and  operator-assignment  (the  other
                   forms) are supported.

   Control Statements
       The control statements are as follows:

              if (condition) statement [ else statement ]
              while (condition) statement
              do statement while (condition)
              for (expr1; expr2; expr3) statement
              for (var in array) statement
              break
              continue
              delete array[index]
              delete array
              exit [ expression ]
              { statements }

   I/O Statements
       The input/output statements are as follows:


       close(file [, how])   Close  file,  pipe  or co-process.
                             The optional how  should  only  be
                             used  when  closing  one  end of a
                             two-way pipe to a co-process.   It
                             must  be  a  string  value, either
                             "to" or "from".

       getline               Set $0 from next input record; set
                             NF, NR, FNR.

       getline <file         Set  $0  from next record of file;
                             set NF.

       getline var           Set var from  next  input  record;
                             set NR, FNR.

       getline var <file     Set  var from next record of file.

       command | getline [var]
                             Run  command  piping  the   output
                             either into $0 or var, as above.

       command |& getline [var]
                             Run command as a co-process piping
                             the output either into $0 or  var,
                             as above.  Co-processes are a gawk
                             extension.

       next                  Stop processing the current  input
                             record.   The next input record is
                             read and  processing  starts  over
                             with  the first pattern in the AWK
                             program.  If the end of the  input
                             data is reached, the END block(s),
                             if any, are executed.

       nextfile              Stop processing the current  input
                             file.   The next input record read
                             comes from the  next  input  file.
                             FILENAME  and  ARGIND are updated,
                             FNR is reset to 1, and  processing
                             starts over with the first pattern
                             in the AWK program. If the end  of
                             the input data is reached, the END
                             block(s), if any, are executed.

       print                 Prints the  current  record.   The
                             output  record  is terminated with
                             the value of the ORS variable.

       print expr-list       Prints expressions.  Each  expres-
                             sion  is separated by the value of
                             the  OFS  variable.   The   output
                             record   is  terminated  with  the
                             value of the ORS variable.

       print expr-list >file Prints expressions on file.   Each
                             expression  is  separated  by  the
                             value of the  OFS  variable.   The
                             output  record  is terminated with
                             the value of the ORS variable.

       printf fmt, expr-list Format and print.

       printf fmt, expr-list >file
                             Format and print on file.

       system(cmd-line)      Execute the command cmd-line,  and
                             return the exit status.  (This may
                             not be available on non-POSIX sys-
                             tems.)

       fflush([file])        Flush  any buffers associated with
                             the open output file or pipe file.
                             If  file is missing, then standard
                             output is flushed.  If file is the
                             null  string, then all open output
                             files and pipes have their buffers
                             flushed.

       Additional output redirections are allowed for print and
       printf.

       print ... >> file
              appends output to the file.

       print ... | command
              writes on a pipe.

       print ... |& command
              sends data to a co-process.

       The getline command returns 0 on end of file and  -1  on
       an  error.   Upon  an  error,  ERRNO  contains  a string
       describing the problem.

       NOTE: If using a pipe or co-process to getline, or  from
       print  or  printf within a loop, you must use close() to
       create new instances of the command.  AWK does not auto-
       matically  close  pipes or co-processes when they return
       EOF.

   The printf Statement
       The AWK versions of the printf statement  and  sprintf()
       function  (see  below)  accept  the following conversion
       specification formats:

       %c      An ASCII character.  If the argument used for %c
               is  numeric,  it  is  treated as a character and
               printed.  Otherwise, the argument is assumed  to
               be  a  string,  and  the only first character of
               that string is printed.

       %d, %i  A decimal number (the integer part).

       %e ,  %E
               A   floating   point   number   of   the    form
               [-]d.dddddde[+-]dd.    The   %E  format  uses  E
               instead of e.

       %f      A   floating   point   number   of   the    form
               [-]ddd.dddddd.

       %g ,  %G
               Use  %e  or %f conversion, whichever is shorter,
               with nonsignificant zeros  suppressed.   The  %G
               format uses %E instead of %e.

       %o      An unsigned octal number (also an integer).

       %u      An  unsigned decimal number (again, an integer).

       %s      A character string.

       %x ,  %X
               An unsigned  hexadecimal  number  (an  integer).
               The %X format uses ABCDEF instead of abcdef.

       %%      A  single % character; no argument is converted.

       NOTE: When using the integer format-control letters  for
       values  that  are outside the range of a C long integer,
       gawk switches to the %g format specifier. If  --lint  is
       provided  on  the  command  line  gawk warns about this.
       Other versions of awk may print  invalid  values  or  do
       something else entirely.

       Optional,  additional  parameters  may lie between the %
       and the control letter:

       count$ Use the count'th argument at this  point  in  the
              formatting.   This  is called a positional speci-
              fier and is intended primarily for use in  trans-
              lated  versions  of  format  strings,  not in the
              original text of an AWK program.  It  is  a  gawk
              extension.

       -      The  expression  should  be left-justified within
              its field.

       space  For numeric conversions, prefix  positive  values
              with  a  space,  and negative values with a minus
              sign.

       +      The plus sign, used  before  the  width  modifier
              (see  below),  says  to  always supply a sign for
              numeric conversions, even if the data to be  for-
              matted  is  positive.   The + overrides the space
              modifier.

       #      Use an "alternate form" for certain control  let-
              ters.   For  %o,  supply a leading zero.  For %x,
              and %X, supply a leading 0x or 0X for  a  nonzero
              result.   For  %e,  %E, and %f, the result always
              contains a decimal point.  For %g, and %G, trail-
              ing zeros are not removed from the result.

       0      A leading 0 (zero) acts as a flag, that indicates
              output should be padded with  zeroes  instead  of
              spaces.   This applies even to non-numeric output
              formats.  This flag only has an effect  when  the
              field  width  is  wider  than  the  value  to  be
              printed.

       width  The field should be padded to  this  width.   The
              field  is  normally padded with spaces.  If the 0
              flag has been used, it is padded with zeroes.

       .prec  A number that specifies the precision to use when
              printing.   For  the %e, %E, and %f formats, this
              specifies the number of digits you  want  printed
              to  the  right of the decimal point.  For the %g,
              and %G formats, it specifies the  maximum  number
              of  significant  digits.  For the %d, %o, %i, %u,
              %x, and %X formats, it specifies the minimum num-
              ber of digits to print.  For %s, it specifies the
              maximum number of characters from the string that
              should be printed.

       The  dynamic  width  and prec capabilities of the ANSI C
       printf() routines are supported.  A * in place of either
       the  width or prec specifications causes their values to
       be taken from the argument list to printf or  sprintf().
       To  use  a  positional specifier with a dynamic width or
       precision, supply the count$ after the * in  the  format
       string.  For example, "%3$*2$.*1$s".

   Special File Names
       When  doing  I/O redirection from either print or printf
       into a file, or via getline from a file, gawk recognizes
       certain  special  filenames internally.  These filenames
       allow access to open  file  descriptors  inherited  from
       gawk's  parent  process (usually the shell).  These file
       names may also be used on the command line to name  data
       files.  The filenames are:

       /dev/stdin  The standard input.

       /dev/stdout The standard output.

       /dev/stderr The standard error output.

       /dev/fd/n   The  file  associated  with  the  open  file
                   descriptor n.

       These are particularly useful for error  messages.   For
       example:

              print "You blew it!" > "/dev/stderr"

       whereas you would otherwise have to use

              print "You blew it!" | "cat 1>&2"

       The  following special filenames may be used with the |&
       co-process operator for creating TCP/IP network  connec-
       tions.

       /inet/tcp/lport/rhost/rport  File  for TCP/IP connection
                                    on  local  port  lport   to
                                    remote host rhost on remote
                                    port rport.  Use a port  of
                                    0 to have the system pick a
                                    port.

       /inet/udp/lport/rhost/rport  Similar,  but  use   UDP/IP
                                    instead of TCP/IP.

       /inet/raw/lport/rhost/rport  Reserved for future use.

       Other  special  filenames  provide access to information
       about the running gawk process.  These filenames are now
       obsolete.  Use the PROCINFO array to obtain the informa-
       tion they provide.  The filenames are:

       /dev/pid    Reading this file returns the process ID  of
                   the  current process, in decimal, terminated
                   with a newline.

       /dev/ppid   Reading this file returns the parent process
                   ID  of the current process, in decimal, ter-
                   minated with a newline.

       /dev/pgrpid Reading this file returns the process  group
                   ID  of the current process, in decimal, ter-
                   minated with a newline.

       /dev/user   Reading this file returns  a  single  record
                   terminated  with  a newline.  The fields are
                   separated with spaces.  $1 is the  value  of
                   the  getuid(2)  system call, $2 is the value
                   of the geteuid(2) system  call,  $3  is  the
                   value  of  the getgid(2) system call, and $4
                   is the value of the getegid(2) system  call.
                   If there are any additional fields, they are
                   the  group  IDs  returned  by  getgroups(2).
                   Multiple  groups may not be supported on all
                   systems.

   Numeric Functions
       AWK has the following built-in arithmetic functions:


       atan2(y, x)   Returns the arctangent of y/x in  radians.

       cos(expr)     Returns  the  cosine  of expr, which is in
                     radians.

       exp(expr)     The exponential function.

       int(expr)     Truncates to integer.

       log(expr)     The natural logarithm function.

       rand()        Returns a random number N, between  0  and
                     1, such that 0 <= N < 1.

       sin(expr)     Returns  the  sine  of  expr,  which is in
                     radians.

       sqrt(expr)    The square root function.

       srand([expr]) Uses expr as a new  seed  for  the  random
                     number generator.  If no expr is provided,
                     the time of day is used.  The return value
                     is the previous seed for the random number
                     generator.

   String Functions
       Gawk has the following built-in string functions:


       asort(s [, d])          Returns the number  of  elements
                               in the source array s.  The con-
                               tents  of  s  are  sorted  using
                               gawk's  normal rules for compar-
                               ing values, and the  indexes  of
                               the   sorted  values  of  s  are
                               replaced with  sequential  inte-
                               gers  starting  with  1.  If the
                               optional destination array d  is
                               specified,   then   s  is  first
                               duplicated into d, and then d is
                               sorted,  leaving  the indexes of
                               the source array s unchanged.

       asorti(s [, d])         Returns the number  of  elements
                               in  the  source  array  s.   The
                               behavior is the same as that  of
                               asort(),  except  that the array
                               indices are  used  for  sorting,
                               not   the  array  values.   When
                               done,  the  array   is   indexed
                               numerically,  and the values are
                               those of the  original  indices.
                               The  original  values  are lost;
                               thus provide a second  array  if
                               you  wish to preserve the origi-
                               nal.

       gensub(r, s, h [, t])   Search the target string  t  for
                               matches     of    the    regular
                               expression r.  If h is a  string
                               beginning  with  g  or  G,  then
                               replace all matches of r with s.
                               Otherwise,  h  is a number indi-
                               cating  which  match  of  r   to
                               replace.   If t is not supplied,
                               $0 is used instead.  Within  the
                               replacement text s, the sequence
                               \n, where n is a digit from 1 to
                               9,  may be used to indicate just
                               the text that matched  the  n'th
                               parenthesized     subexpression.
                               The sequence \0  represents  the
                               entire matched text, as does the
                               character &.  Unlike  sub()  and
                               gsub(),  the  modified string is
                               returned as the  result  of  the
                               function,  and the original tar-
                               get string is not changed.

       gsub(r, s [, t])        For each substring matching  the
                               regular   expression  r  in  the
                               string t, substitute the  string
                               s, and return the number of sub-
                               stitutions.  If t  is  not  sup-
                               plied,  use  $0.   An  &  in the
                               replacement  text  is   replaced
                               with  the text that was actually
                               matched.  Use \& to get  a  lit-
                               eral  &.  (This must be typed as
                               "\\&"; see GAWK:  Effective  AWK
                               Programming for a fuller discus-
                               sion of the rules  for  &'s  and
                               backslashes  in  the replacement
                               text of sub(), gsub(), and  gen-
                               sub().)

       index(s, t)             Returns  the index of the string
                               t in the string s, or 0 if t  is
                               not present.  (This implies that
                               character indices start at one.)

       length([s])             Returns the length of the string
                               s, or the length of $0 if  s  is
                               not supplied.

       match(s, r [, a])       Returns  the position in s where
                               the regular expression r occurs,
                               or  0  if  r is not present, and
                               sets the values  of  RSTART  and
                               RLENGTH.  Note that the argument
                               order is the same as for  the  ~
                               operator:  str ~ re.  If array a
                               is provided, a  is  cleared  and
                               then  elements  1  through n are
                               filled with the  portions  of  s
                               that   match  the  corresponding
                               parenthesized  subexpression  in
                               r.   The  0'th element of a con-
                               tains the portion of  s  matched
                               by the entire regular expression
                               r.   Subscripts  a[n,  "start"],
                               and  a[n,  "length"] provide the
                               starting index in the string and
                               length   respectively,  of  each
                               matching substring.

       split(s, a [, r])       Splits the  string  s  into  the
                               array  a  on the regular expres-
                               sion r, and returns  the  number
                               of  fields.  If r is omitted, FS
                               is used instead.  The array a is
                               cleared     first.     Splitting
                               behaves  identically  to   field
                               splitting, described above.

       sprintf(fmt, expr-list) Prints  expr-list  according  to
                               fmt, and returns  the  resulting
                               string.

       strtonum(str)           Examines  str,  and  returns its
                               numeric value.   If  str  begins
                               with  a  leading  0,  strtonum()
                               assumes that  str  is  an  octal
                               number.   If  str  begins with a
                               leading  0x  or  0X,  strtonum()
                               assumes  that str is a hexadeci-
                               mal number.

       sub(r, s [, t])         Just like gsub(), but  only  the
                               first   matching   substring  is
                               replaced.

       substr(s, i [, n])      Returns the at most  n-character
                               substring  of  s  starting at i.
                               If n is omitted, the rest  of  s
                               is used.

       tolower(str)            Returns  a  copy  of  the string
                               str,  with  all  the  upper-case
                               characters  in str translated to
                               their  corresponding  lower-case
                               counterparts.     Non-alphabetic
                               characters are left unchanged.

       toupper(str)            Returns a  copy  of  the  string
                               str,  with  all  the  lower-case
                               characters in str translated  to
                               their  corresponding  upper-case
                               counterparts.     Non-alphabetic
                               characters are left unchanged.

   Time Functions
       Since  one  of  the primary uses of AWK programs is pro-
       cessing log files that contain time  stamp  information,
       gawk provides the following functions for obtaining time
       stamps and formatting them.


       mktime(datespec)
                 Turns datespec into a time stamp of  the  same
                 form  as  returned by systime().  The datespec
                 is a string of the form YYYY MM DD HH  MM  SS[
                 DST].   The  contents of the string are six or
                 seven numbers  representing  respectively  the
                 full  year including century, the month from 1
                 to 12, the day of the month from 1 to 31,  the
                 hour  of the day from 0 to 23, the minute from
                 0 to 59, and the second from 0 to 60,  and  an
                 optional  daylight saving flag.  The values of
                 these numbers need not be  within  the  ranges
                 specified;  for example, an hour of -1 means 1
                 hour before midnight.  The origin-zero  Grego-
                 rian  calendar is assumed, with year 0 preced-
                 ing year 1 and year -1 preceding year 0.   The
                 time  is  assumed to be in the local timezone.
                 If the daylight saving flag is  positive,  the
                 time is assumed to be daylight saving time; if
                 zero, the time is assumed to be standard time;
                 and   if   negative  (the  default),  mktime()
                 attempts to determine whether daylight  saving
                 time  is in effect for the specified time.  If
                 datespec does not contain enough  elements  or
                 if   the  resulting  time  is  out  of  range,
                 mktime() returns -1.

       strftime([format [, timestamp]])
                 Formats timestamp according to the  specifica-
                 tion  in  format.   The timestamp should be of
                 the same form as returned  by  systime().   If
                 timestamp  is missing, the current time of day
                 is used.  If format is missing, a default for-
                 mat  equivalent  to  the  output of date(1) is
                 used.  See the  specification  for  the  strf-
                 time()  function in ANSI C for the format con-
                 versions that are guaranteed to be  available.
                 A  public-domain  version of strftime(3) and a
                 man page for it come with gawk; if  that  ver-
                 sion  was  used to build gawk, then all of the
                 conversions described in  that  man  page  are
                 available to gawk.

       systime() Returns  the current time of day as the number
                 of  seconds  since   the   Epoch   (1970-01-01
                 00:00:00 UTC on POSIX systems).

   Bit Manipulations Functions
       Starting  with  version  3.1  of gawk, the following bit
       manipulation functions are available.  They work by con-
       verting   double-precision   floating  point  values  to
       unsigned long integers, doing the  operation,  and  then
       converting the result back to floating point.  The func-
       tions are:

       and(v1, v2)         Return the bitwise AND of the values
                           provided by v1 and v2.

       compl(val)          Return  the  bitwise  complement  of
                           val.

       lshift(val, count)  Return the  value  of  val,  shifted
                           left by count bits.

       or(v1, v2)          Return  the bitwise OR of the values
                           provided by v1 and v2.

       rshift(val, count)  Return the  value  of  val,  shifted
                           right by count bits.

       xor(v1, v2)         Return the bitwise XOR of the values
                           provided by v1 and v2.


   Internationalization Functions
       Starting with version 3.1 of gawk, the  following  func-
       tions  may  be  used  from  within  your AWK program for
       translating strings at run-time.  For full details,  see
       GAWK: Effective AWK Programming.

       bindtextdomain(directory [, domain])
              Specifies  the directory where gawk looks for the
              .mo files, in case they will  not  or  cannot  be
              placed  in the ``standard'' locations (e.g., dur-
              ing testing).  It  returns  the  directory  where
              domain is ``bound.''
              The  default  domain  is the value of TEXTDOMAIN.
              If directory is the null string (""), then  bind-
              textdomain()  returns the current binding for the
              given domain.

       dcgettext(string [, domain [, category]])
              Returns the translation of string in text  domain
              domain for locale category category.  The default
              value for domain is the current value of  TEXTDO-
              MAIN.  The default value for category is "LC_MES-
              SAGES".
              If you supply a value for category, it must be  a
              string  equal  to  one  of the known locale cate-
              gories described in GAWK: Effective AWK  Program-
              ming.   You  must also supply a text domain.  Use
              TEXTDOMAIN if you want to use the current domain.

       dcngettext(string1 , string2 , number [, domain [, cate-
       gory]])
              Returns  the  plural  form used for number of the
              translation of string1 and string2 in text domain
              domain for locale category category.  The default
              value for domain is the current value of  TEXTDO-
              MAIN.  The default value for category is "LC_MES-
              SAGES".
              If you supply a value for category, it must be  a
              string  equal  to  one  of the known locale cate-
              gories described in GAWK: Effective AWK  Program-
              ming.   You  must also supply a text domain.  Use
              TEXTDOMAIN if you want to use the current domain.

USER-DEFINED FUNCTIONS
       Functions in AWK are defined as follows:

              function name(parameter list) { statements }

       Functions  are executed when they are called from within
       expressions  in  either  patterns  or  actions.   Actual
       parameters  supplied  in  the  function call are used to
       instantiate the formal parameters declared in the  func-
       tion.   Arrays  are passed by reference, other variables
       are passed by value.

       Since functions were not originally part of the AWK lan-
       guage,  the  provision  for  local  variables  is rather
       clumsy: They are declared as  extra  parameters  in  the
       parameter  list.   The  convention  is to separate local
       variables from real parameters by extra  spaces  in  the
       parameter list.  For example:

              function  f(p, q,     a, b)   # a and b are local
              {
                   ...
              }

              /abc/     { ... ; f(1, 2) ; ... }

       The  left  parenthesis in a function call is required to
       immediately follow the function name, without any inter-
       vening  white space.  This is to avoid a syntactic ambi-
       guity with the concatenation operator.  This restriction
       does not apply to the built-in functions listed above.

       Functions  may  call  each  other  and may be recursive.
       Function parameters used as local variables are initial-
       ized  to  the null string and the number zero upon func-
       tion invocation.

       Use return expr to return a value from a function.   The
       return value is undefined if no value is provided, or if
       the function returns by "falling off" the end.

       If --lint has been provided, gawk warns about  calls  to
       undefined  functions  at  parse  time, instead of at run
       time.  Calling an undefined function at run  time  is  a
       fatal error.

       The word func may be used in place of function.

DYNAMICALLY LOADING NEW FUNCTIONS
       Beginning  with version 3.1 of gawk, you can dynamically
       add new built-in functions to the  running  gawk  inter-
       preter.   The  full details are beyond the scope of this
       manual page; see GAWK: Effective AWK Programming for the
       details.


       extension(object, function)
               Dynamically link the shared object file named by
               object, and invoke function in that  object,  to
               perform  initialization.   These  should both be
               provided as strings.  Returns the value returned
               by function.

       This function is provided and documented in GAWK: Effec-
       tive AWK Programming, but everything about this  feature
       is  likely  to  change in the next release.  We STRONGLY
       recommend that you do not use this feature for  anything
       that you aren't willing to redo.

SIGNALS
       pgawk  accepts two signals.  SIGUSR1 causes it to dump a
       profile and function call stack  to  the  profile  file,
       which  is either awkprof.out, or whatever file was named
       with the --profile option.  It then  continues  to  run.
       SIGHUP  causes  it to dump the profile and function call
       stack and then exit.

EXAMPLES
       Print and sort the login names of all users:

            BEGIN     { FS = ":" }
                 { print $1 | "sort" }

       Count lines in a file:

                 { nlines++ }
            END  { print nlines }

       Precede each line by its number in the file:

            { print FNR, $0 }

       Concatenate and line number (a variation on a theme):

            { print NR, $0 }
       Run an external command for particular lines of data:

            tail -f access_log |
            awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'

INTERNATIONALIZATION
       String constants are sequences of characters enclosed in
       double quotes.  In non-English speaking environments, it
       is possible to  mark  strings  in  the  AWK  program  as
       requiring  translation  to  the native natural language.
       Such strings are marked in the AWK program with a  lead-
       ing underscore ("_").  For example,

              gawk 'BEGIN { print "hello, world" }'

       always prints hello, world.  But,

              gawk 'BEGIN { print _"hello, world" }'

       might print bonjour, monde in France.

       There  are  several steps involved in producing and run-
       ning a localizable AWK program.

       1.  Add a BEGIN action to assign a value to the  TEXTDO-
           MAIN variable to set the text domain to a name asso-
           ciated with your program.

                BEGIN { TEXTDOMAIN = "myprog" }

           This allows gawk to find  the  .mo  file  associated
           with your program.  Without this step, gawk uses the
           messages text domain, which likely does not  contain
           translations for your program.

       2.  Mark  all  strings  that  should  be translated with
           leading underscores.

       3.  If necessary, use the dcgettext() and/or bindtextdo-
           main() functions in your program, as appropriate.

       4.  Run  gawk --gen-po -f myprog.awk > myprog.po to gen-
           erate a .po file for your program.

       5.  Provide  appropriate  translations,  and  build  and
           install a corresponding .mo file.

       The  internationalization features are described in full
       detail in GAWK: Effective AWK Programming.

POSIX COMPATIBILITY
       A primary goal for gawk is compatibility with the  POSIX
       standard,  as  well  as  with the latest version of UNIX
       awk.  To this end, gawk incorporates the following  user
       visible  features  which  are  not  described in the AWK
       book, but are part of the Bell Laboratories  version  of
       awk, and are in the POSIX standard.

       The book indicates that command line variable assignment
       happens when awk would otherwise open the argument as  a
       file,  which is after the BEGIN block is executed.  How-
       ever, in earlier implementations, when such  an  assign-
       ment  appeared  before  any  file  names, the assignment
       would happen before the BEGIN block was  run.   Applica-
       tions  came  to  depend on this "feature."  When awk was
       changed to match its documentation, the  -v  option  for
       assigning  variables  before program execution was added
       to accommodate applications that depended upon  the  old
       behavior.   (This  feature  was  agreed upon by both the
       Bell Laboratories and the GNU developers.)

       The -W option for implementation  specific  features  is
       from the POSIX standard.

       When  processing arguments, gawk uses the special option
       "--" to signal the end of arguments.   In  compatibility
       mode,  it  warns  about  but otherwise ignores undefined
       options.  In normal operation, such arguments are passed
       on to the AWK program for it to process.

       The  AWK  book  does  not  define  the  return  value of
       srand().  The POSIX standard has it return the  seed  it
       was  using,  to  allow  keeping  track  of random number
       sequences.  Therefore srand() in gawk also  returns  its
       current seed.

       Other  new  features are: The use of multiple -f options
       (from MKS awk); the ENVIRON array; the \a, and \v escape
       sequences (done originally in gawk and fed back into the
       Bell Laboratories version); the tolower() and  toupper()
       built-in functions (from the Bell Laboratories version);
       and the ANSI C conversion specifications in printf (done
       first in the Bell Laboratories version).

HISTORICAL FEATURES
       There are two features of historical AWK implementations
       that gawk supports.  First, it is possible to  call  the
       length()  built-in  function  not only with no argument,
       but even without parentheses!  Thus,

              a = length     # Holy Algol 60, Batman!

       is the same as either of

              a = length()
              a = length($0)

       This feature is marked  as  "deprecated"  in  the  POSIX
       standard,  and  gawk  issues  a warning about its use if
       --lint is specified on the command line.

       The other feature is the use of either the  continue  or
       the  break  statements outside the body of a while, for,
       or  do  loop.   Traditional  AWK  implementations   have
       treated  such usage as equivalent to the next statement.
       Gawk supports this usage if --traditional has been spec-
       ified.

GNU EXTENSIONS
       Gawk  has a number of extensions to POSIX awk.  They are
       described in this section.  All the extensions described
       here  can be disabled by invoking gawk with the --tradi-
       tional option.

       The following features of  gawk  are  not  available  in
       POSIX awk.

       o No path search is performed for files named via the -f
         option.  Therefore the AWKPATH environment variable is
         not special.

       o The \x escape sequence.  (Disabled with --posix.)

       o The fflush() function.  (Disabled with --posix.)

       o The  ability  to continue lines after ?  and :.  (Dis-
         abled with --posix.)

       o Octal and hexadecimal constants in AWK programs.

       o The ARGIND, BINMODE, ERRNO, LINT,  RT  and  TEXTDOMAIN
         variables are not special.

       o The  IGNORECASE  variable and its side-effects are not
         available.

       o The FIELDWIDTHS variable and fixed-width field  split-
         ting.

       o The PROCINFO array is not available.

       o The use of RS as a regular expression.

       o The  special  file names available for I/O redirection
         are not recognized.

       o The |& operator for creating co-processes.

       o The ability to split out individual  characters  using
         the  null  string as the value of FS, and as the third
         argument to split().

       o The optional second argument to the close()  function.

       o The optional third argument to the match() function.

       o The  ability  to use positional specifiers with printf
         and sprintf().

       o The use of delete array to delete the entire  contents
         of an array.

       o The  use of nextfile to abandon processing of the cur-
         rent input file.

       o The  and(),   asort(),   asorti(),   bindtextdomain(),
         compl(),    dcgettext(),    dcngettext(),    gensub(),
         lshift(), mktime(), or(), rshift(),  strftime(),  str-
         tonum(), systime() and xor() functions.

       o Localizable strings.

       o Adding  new  built-in  functions  dynamically with the
         extension() function.

       The AWK book does not define the  return  value  of  the
       close() function.  Gawk's close() returns the value from
       fclose(3), or pclose(3), when closing an output file  or
       pipe,  respectively.  It returns the process's exit sta-
       tus when closing an input pipe.  The return value is  -1
       if  the  named  file,  pipe or co-process was not opened
       with a redirection.

       When gawk is invoked with the --traditional  option,  if
       the  fs argument to the -F option is "t", then FS is set
       to the tab character.  Note that typing  gawk  -F\t  ...
       simply  causes the shell to quote the "t,", and does not
       pass "\t" to the -F option.  Since this is a rather ugly
       special  case,  it  is  not  the default behavior.  This
       behavior also does not occur if --posix has been  speci-
       fied.   To really get a tab character as the field sepa-
       rator, it is best to use single quotes: gawk -F'\t' ....

ENVIRONMENT VARIABLES
       The  AWKPATH environment variable can be used to provide
       a list of directories that gawk  searches  when  looking
       for files named via the -f and --file options.

       If  POSIXLY_CORRECT exists in the environment, then gawk
       behaves exactly as if --posix had been specified on  the
       command line.  If --lint has been specified, gawk issues
       a warning message to this effect.

SEE ALSO
       egrep(1), getpid(2), getppid(2), getpgrp(2),  getuid(2),
       geteuid(2), getgid(2), getegid(2), getgroups(2)

       The  AWK  Programming  Language, Alfred V. Aho, Brian W.
       Kernighan, Peter J.  Weinberger,  Addison-Wesley,  1988.
       ISBN 0-201-07981-X.

       GAWK:  Effective AWK Programming, Edition 3.0, published
       by the Free Software Foundation, 2001.

BUGS
       The -F option is not necessary given  the  command  line
       variable  assignment  feature; it remains only for back-
       wards compatibility.

       Syntactically invalid single character programs tend  to
       overflow  the parse stack, generating a rather unhelpful
       message.  Such programs are  surprisingly  difficult  to
       diagnose  in the completely general case, and the effort
       to do so really is not worth it.

AUTHORS
       The original version of UNIX awk was designed and imple-
       mented  by  Alfred  Aho,  Peter  Weinberger,  and  Brian
       Kernighan of Bell Laboratories.  Brian Kernighan contin-
       ues to maintain and enhance it.

       Paul  Rubin and Jay Fenlason, of the Free Software Foun-
       dation, wrote gawk, to be compatible with  the  original
       version  of  awk  distributed  in  Seventh Edition UNIX.
       John Woods contributed a number  of  bug  fixes.   David
       Trueman,  with  contributions  from Arnold Robbins, made
       gawk compatible  with  the  new  version  of  UNIX  awk.
       Arnold Robbins is the current maintainer.

       The  initial  DOS port was done by Conrad Kwok and Scott
       Garfinkle.  Scott Deifik is the current DOS  maintainer.
       Pat  Rankin  did  the port to VMS, and Michal Jaegermann
       did the port to the Atari ST.  The port to OS/2 was done
       by Kai Uwe Rommel, with contributions and help from Dar-
       rel Hankerson.   Fred  Fish  supplied  support  for  the
       Amiga, Stephen Davies provided the Tandem port, and Mar-
       tin Brown provided the BeOS port.

VERSION INFORMATION
       This man page documents gawk, version 3.1.3.

BUG REPORTS
       If you find a bug in gawk, please send  electronic  mail
       to bug-gawk@gnu.org.  Please include your operating sys-
       tem and its revision, the version  of  gawk  (from  gawk
       --version),  what C compiler you used to compile it, and
       a test program and data that are as  small  as  possible
       for reproducing the problem.

       Before  sending  a  bug  report,  please  do two things.
       First, verify that you have the latest version of  gawk.
       Many  bugs  (usually  subtle  ones)  are  fixed  at each
       release, and if yours is out of date,  the  problem  may
       already  have been solved.  Second, please read this man
       page and the reference manual carefully to be sure  that
       what  you  think  is  a bug really is, instead of just a
       quirk in the language.

       Whatever  you  do,  do  NOT  post  a   bug   report   in
       comp.lang.awk.   While  the gawk developers occasionally
       read this newsgroup, posting bug  reports  there  is  an
       unreliable  way to report bugs.  Instead, please use the
       electronic mail addresses given above.

ACKNOWLEDGEMENTS
       Brian Kernighan of Bell Laboratories  provided  valuable
       assistance  during testing and debugging.  We thank him.

COPYING PERMISSIONS
       Copyright (C) 1989, 1991, 1992, 1993, 1994, 1995,  1996,
       1997, 1998, 1999, 2001, 2002, 2003 Free Software Founda-
       tion, Inc.

       Permission is granted to make  and  distribute  verbatim
       copies of this manual page provided the copyright notice
       and this permission notice are preserved on all  copies.

       Permission  is  granted  to copy and distribute modified
       versions of this manual page under  the  conditions  for
       verbatim  copying,  provided  that  the entire resulting
       derived work is distributed under the terms of a permis-
       sion notice identical to this one.

       Permission  is  granted  to copy and distribute transla-
       tions of this manual page into another  language,  under
       the  above conditions for modified versions, except that
       this permission notice may be stated  in  a  translation
       approved by the Foundation.



Free Software Foundation  June 25 2003                  GAWK(1)
