The 1992 POSIX standard introduced
the concept of a numeric string, which is simply a string that looks
like a number—for example, " +2"
. This concept is used
for determining the type of a variable.
The type of the variable is important because the types of two variables
determine how they are compared.
The various versions of the POSIX standard did not get the rules
quite right for several editions. Fortunately, as of at least the
2008 standard (and possibly earlier), the standard has been fixed,
and variable typing follows these rules:1
getline
input, FILENAME
, ARGV
elements,
ENVIRON
elements, and the elements of an array created by
patsplit()
, split()
and match()
that are numeric
strings have the strnum attribute. Otherwise, they have
the string attribute. Uninitialized variables also have the
strnum attribute.
The last rule is particularly important. In the following program,
a
has numeric type, even though it is later used in a string
operation:
BEGIN { a = 12.345 b = a " is a cute number" print b }
When two operands are compared, either string comparison or numeric comparison may be used. This depends upon the attributes of the operands, according to the following symmetric matrix:
+——————————————————————– | STRING NUMERIC STRNUM ———–+——————————————————————– | STRING | string string string | NUMERIC | string numeric numeric | STRNUM | string numeric numeric ———–+——————————————————————–
The basic idea is that user input that looks numeric—and only
user input—should be treated as numeric, even though it is actually
made of characters and is therefore also a string.
Thus, for example, the string constant " +3.14"
,
when it appears in program source code,
is a string—even though it looks numeric—and
is never treated as number for comparison
purposes.
In short, when one operand is a “pure” string, such as a string constant, then a string comparison is performed. Otherwise, a numeric comparison is performed.
This point bears additional emphasis: All user input is made of characters,
and so is first and foremost of string type; input strings
that look numeric are additionally given the strnum attribute.
Thus, the six-character input string ‘ +3.14’ receives the
strnum attribute. In contrast, the eight-character literal
" +3.14"
appearing in program text is a string constant.
The following examples print ‘1’ when the comparison between
the two different constants is true, ‘0’ otherwise:
$ echo ' +3.14' | gawk '{ print $0 == " +3.14" }' True -| 1 $ echo ' +3.14' | gawk '{ print $0 == "+3.14" }' False -| 0 $ echo ' +3.14' | gawk '{ print $0 == "3.14" }' False -| 0 $ echo ' +3.14' | gawk '{ print $0 == 3.14 }' True -| 1 $ echo ' +3.14' | gawk '{ print $1 == " +3.14" }' False -| 0 $ echo ' +3.14' | gawk '{ print $1 == "+3.14" }' True -| 1 $ echo ' +3.14' | gawk '{ print $1 == "3.14" }' False -| 0 $ echo ' +3.14' | gawk '{ print $1 == 3.14 }' True -| 1
[1] gawk has followed these rules for many years, and it is gratifying that the POSIX standard is also now correct.