The truth is that gawk was not designed for simple extensibility. The facilities for adding functions using shared libraries work, but are something of a “bag on the side.” Thus, this tour is brief and simplistic; would-be gawk hackers are encouraged to spend some time reading the source code before trying to write extensions based on the material presented here. Of particular note are the files awk.h, builtin.c, and eval.c. Reading awkgram.y in order to see how the parse tree is built would also be of use.
With the disclaimers out of the way, the following types, structure members, functions, and macros are declared in awk.h and are of use when writing extensions. The next section shows how they are used:
AWKNUM
AWKNUM
is the internal type of awk
floating-point numbers. Typically, it is a C double
.
NODE
NODE
.
These contain both strings and numbers, as well as variables and arrays.
AWKNUM force_number(NODE *n)
void force_string(NODE *n)
NODE
's string value is current.
It may end up calling an internal gawk function.
It also guarantees that the string is zero-terminated.
void force_wstring(NODE *n)
NODE
's wide-string value is current.
It may end up calling an internal gawk function.
It also guarantees that the wide string is zero-terminated.
size_t get_curfunc_arg_count(void)
get_actual_argument
. If this value is
greater than nargs
, the function was
called incorrectly from the awk program.
nargs
make_builtin()
function.
n->stptr
n->stlen
NODE
's string value, respectively.
The string is not guaranteed to be zero-terminated.
If you need to pass the string value to a C library function, save
the value in n->stptr[n->stlen]
, assign '\0'
to it,
call the routine, and then restore the value.
n->wstptr
n->wstlen
NODE
's wide-string value, respectively.
Use force_wstring()
to make sure these values are current.
n->type
NODE
. This is a C enum
. Values should
be one of Node_var
, Node_var_new
, or Node_var_array
for function parameters.
n->vname
void assoc_clear(NODE *n)
n
.
Make sure that ‘n->type == Node_var_array’ first.
NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)
symbol
is the array, subs
is the subscript.
This is usually a value created with make_string()
(see below).
reference
should be TRUE
if it is an error to use the
value before it is created. Typically, FALSE
is the
correct value to use from extension functions.
NODE *make_string(char *s, size_t len)
NODE
that
can be stored appropriately. This is permanent storage; understanding
of gawk memory management is helpful.
NODE *make_number(AWKNUM val)
AWKNUM
and turn it into a pointer to a NODE
that
can be stored appropriately. This is permanent storage; understanding
of gawk memory management is helpful.
NODE *dupnode(NODE *n)
NODE
;
understanding of gawk memory management is helpful.
void unref(NODE *n)
NODE
allocated with make_string()
or make_number()
.
Understanding of gawk memory management is helpful.
void make_builtin(const char *name, NODE *(*func)(NODE *), int count)
func
as new built-in
function name
. name
is a regular C string. count
is the maximum number of arguments that the function takes.
The function should be written in the following manner:
/* do_xxx --- do xxx function for gawk */ NODE * do_xxx(int nargs) { ... }
NODE *get_argument(int i)
i
-th argument from the function call.
The first argument is argument zero.
NODE *get_actual_argument(int i,
int optional, int wantarray);
i
. wantarray
is TRUE
if the argument should be an array, FALSE
otherwise. If optional
is
TRUE
, the argument need not have been supplied. If it wasn't, the return
value is NULL
. It is a fatal error if optional
is TRUE
but
the argument was not provided.
get_scalar_argument(i, opt)
get_actual_argument()
.
get_array_argument(i, opt)
get_actual_argument()
.
void update_ERRNO(void)
ERRNO
variable, based on the current
value of the C errno
global variable.
It is provided as a convenience.
void update_ERRNO_saved(int errno_saved)
ERRNO
variable, based on the error
value provided as the argument.
It is provided as a convenience.
void register_deferred_variable(const char *name, NODE *(*load_func)(void))
NODE
containing the
newly created variable. This function is used to implement the builtin
ENVIRON
and PROCINFO
arrays, so you can refer to them
for examples.
void register_open_hook(void *(*open_func)(IOBUF *))
IOBUF
structure in iop_alloc()
. After creating the new IOBUF
,
iop_alloc()
will call (in reverse order of registration, so the last
function registered is called first) each open hook until one returns
non-NULL
. If any hook returns a non-NULL
value, that value is assigned
to the IOBUF
's opaque
field (which will presumably point
to a structure containing additional state associated with the input
processing), and no further open hooks are called.
The function called will most likely want to set the IOBUF
's
get_record
method to indicate that future input records should
be retrieved by calling that method instead of using the standard
gawk input processing.
And the function will also probably want to set the IOBUF
's
close_func
method to be called when the file is closed to clean
up any state associated with the input.
Finally, hook functions should be prepared to receive an IOBUF
structure where the fd
field is set to INVALID_HANDLE
,
meaning that gawk was not able to open the file itself. In
this case, the hook function must be able to successfully open the file
and place a valid file descriptor there.
Currently, for example, the hook function facility is used to implement the XML parser shared library extension. For more info, please look in awk.h and in io.c.
An argument that is supposed to be an array needs to be handled with some extra code, in case the array being passed in is actually from a function parameter.
The following boilerplate code shows how to do this:
NODE *the_arg; /* assume need 3rd arg, 0-based */ the_arg = get_array_argument(2, FALSE);
Again, you should spend time studying the gawk internals; don't just blindly copy this code.