The SymbEL Language Reference Manual

Contents

1   Preface

SymbEL is an interpretive language based on C that was created to address
the need for simplified access to data residing in the SunOS kernel.
Although kernel interface API libraries have made this data more
available to C application programmers they are still out of reach by
means other than writing and compiling a program.

The goal was to create a language that did not need compilation to
run and would allow the development of scripts that could be
distributed to a large audience via e-mail without sending very large
files or requiring the end-user to possess a compiler.

The control and data structures of SymbEL are extensive enough to write a
complex program, but the language is not burdened with superfluous
syntax and the resulting grammar is far more compact than C.

The original interpreter, "se" was developed under SunOS 5.2 FCS-C and
tested on an MP690, SC1000 and an LC. Since then, it has been modified
and tested on every subsequent release of SunOS.
The SPARC and Intel platforms are supported and every effort has been
made to support new devices and configurations for each of these hardware
platforms. The "se" interpreter does not now, nor will it ever, run
under any release of SunOS 4.x.

This manual is written with the assumption that the reader is conversant
in the C programming language. Expertise is not necessary, but the reader
should be comfortable with C to best understand this document.

2   Tuning and Performance

Operating system performance is one of the most painful areas of system
administration. This is because it seems to involve so much magic. In
fact, gaining proficiency in performance tuning is much like any other
endeavor. Time and experience are the best teachers.

Yet for all that is known about tuning, the need to tune systems becomes
a more desperate one due to the lack of adequate tuning tools.
This lack of tools is present because of the rapid advances in machine
architecture and operating system design. And as the underlying technology
continues to expand, the existing tools for performance tuning become
insufficient to solve the problem.

SunOS provides an excellent starting place for the development of tools
for tuning. The existing tools shipped with SunOS provide a wealth
of data regarding the workings of the kernel in its management of
the process spaces and of the underlying hardware. It is from the
examination of this data that the decisions on how to tune the system
are made.

Traditionally, examination of this data is a manual process, typing in
commands and examining the output and making decisions based on some
preset rules. This is a time consuming and tedious process, especially
when trying to talk another person through it over the telephone or
explaining the procedure via e-mail. It also leaves the information
about how to make decisions regarding system performance where it does
the least good. In someone's memory.

In an attempt to translate this expertise from memory to written form,
C, Bourne, or Korn shell scripts could be written to run these commands
and extract the output and compare them to known threshold values.
This serves the purpose of making the expertise more portable and thus
more accessible by the masses. It also impacts the performance of the
machine under analysis enough to render the script useless as a "round
the clock" diagnostician. Access to the data is needed without the
overhead of the multiple applications and the shell language.

The first step towards the solution to this problem is to determine
how the applications themselves retrieve the data from the kernel.
An application programming interface (API) can then be built that
performs the same actions as these applications. These APIs could
extract their data from any of these sources on SunOS 5.x.


kvm: Kernel Virtual Memory
kstat: Kernel STATistics
mib: Management Information Base
ndd: Network Device Driver
procfs: The Process File System

2.1   Kvm

The kvm library is an API for reading the values of kernel variables
from the kernel memory. Before the kvm library, this was all done
manually using open(), lseek(), and read(). The code for reading
values from the kernel is repetitive, so the kvm library was developed.
It solved this problem and provided additional functionality for reading
core files and checking the validity of the addresses being read.
The kvm library functions that perform the work are kvm_open(), kvm_nlist(),

kvm_read(), kvm_write(), and kvm_close().

2.2   Kstat

A limitation of the kvm library is that the program reading values from
kernel memory must have super-user permission since /dev/kmem is not readable
by the world. A solution developed by Sun is the addition of a kernel
framework that is accessible for read only access by the world and write
access by the privileged user. This framework is called the kstat framework.
Kstat also has the benefit of having low overhead and being extensible
to accommodate any new kernel statistics that may be useful on future
architectures.

The kstat interface, /dev/kstat is an entry point into the kstat
driver which resides in the kernel and collects data from predefined
functional areas within various subsystems and drivers.
In addition to collecting data from drivers, data is also collected
from the I/O and VM subsystems, the HAT layer, and various information about
network interfaces and CPU activity.

The software interface to /dev/kstat is the kstat library, libkstat.
It contains functions for opening and reading the kstat data from
the kstat driver and for the super-user, writing data back to the kernel.
When the kstat data is copied out of the kernel, it is put into a
linked list of structures. Each node in the list is a structure which
contains data about a particular functional area and what format the data
is in. The possible data formats are



KSTAT_TYPE_RAW The data is a pointer to a structure, commonly defined
in a /usr/include/sys include file.
KSTAT_TYPE_NAMED

The data is an array of structures which represent a
polymorphic type. Each structure contains a textual
name, a type designation and a union of members
of the supported types. The supported types are


KSTAT_DATA_CHAR:
  8 bits signed or unsigned
KSTAT_DATA_INT32:
  32 bits signed
KSTAT_DATA_UINT32:
  32 bits unsigned
KSTAT_DATA_INT64:
  64 bits signed
KSTAT_DATA_UINT64:
  64 bits unsigned
KSTAT_TYPE_INTR

The data is a pointer to a structure containing an array
of unsigned long values whose indices represent the type of
interrupt. The types of interrupts and their index values
are


KSTAT_INTR_HARD=0:
  From the device itself
KSTAT_INTR_SOFT=1:
  From the system software
KSTAT_INTR_WATCHDOG=2:
  Timer expiration
KSTAT_INTR_SPURIOUS=3:
  Entry point called with no
interrupt pending
KSTAT_INTR_MULTSVC=4:
  Interrupt serviced prior to
return from other interrupt
service
KSTAT_TYPE_IO The data is a pointer to a structure containing
information relevant to I/O.
KSTAT_TYPE_TIMER The data is a pointer to a structure containing timing
information for any type of event.

The structures used by the kstat library are defined in /usr/include/kstat.h
and /usr/include/sys/kstat.h. Note that the kstat framework is constantly
evolving, and developing code around it is precarious as the mechanism may
grow, shrink, or change at any time.

2.3   Mib

The management information base is a set of structures containing data
relative to the state of the networking subsystem of the kernel. The nature,
structure, and representation of this data is based on RFC 1213, "Management
Information Base for Network Management of TCP/IP-based internets: MIB-II".

Specifically, there is data stored regarding the performance of the kernel
drivers for IP, TCP, UDP, ICMP, and IGMP. This data is contained in one
structure for each driver. The data can be retrieved by building a stream
containing the drivers whose MIB data is to be read and sending a control
message down stream and reading the results sent back by all modules in
the stream that interpreted the message and responded.

The structures are also declared as variables in the kernel address space
and therefore can be read using conventional kvm methods.

2.4   Ndd

The NDD interface is another source of data regarding the state of the
networking subsystem. Whereas the MIB is a source of data that is
statistical in nature, the NDD interface allows for tuning of driver
variables that control the function of the driver. The NDD interface
can access variables kept by the IP, TCP, UDP, ICMP, and ARP drivers.

Whereas the interface to the MIB structures is through the streams control
message interface, retrieval of NDD variables is through a much simpler
ioctl() mechanism. The structures kept by the drivers are also accessible
through the kvm interface but are rather convoluted and therefore are most
easily modified by use of the NDD interface.

2.5   Summary

From these sources comes a great deal of data to manage in an application
when the mechanisms used to retrieve it are the conventional APIs. It is
possible to distill this information into library calls so that the
information is more generalized. The next step beyond that would be to
develop a syntactic notation for accessing the data, a language whose most
significant feature is the retrieval of the necessary data from the
appropriate places without the notational overhead of actually performing
the read.

What follows is the description of a language that was designed
to remove the complication of accessing these values and provide a framework
for building applications that can make use of these values. This language
is called SymbEL, the Symbol Engine Language.

3   Basic Structure

In response to the need for simplified access to data contained inside the
SunOS kernel, the SymbEL (pronounced "symbol") language was created
and the SymbEL interpreter, "se", was developed. SymbEL resembles C visually
and contains many similar syntactic structures. This was done for ease of
use and to leverage existing programming knowledge.

3.1   First Program

Since the best way to learn a new language is through example, the first
example SymbEL program is presented.

main()
{
  printf("hello, world\n");
}

This program shows that the language structure is very similar to C. This
does not imply that you can pick up a book on C and start writing SymbEL
programs. There are a lot of differences and the interpreter will
let you know if the syntax is incorrect. This short program demonstrates
some of the similarities and differences in the language. First, a
SymbEL program MUST have a function called "main". This is the similarity.
But then, the printf() function in SymbEL is a builtin, not a library call.

To test this small program, put it in a file called hello.se and enter

se hello.se

and the resulting output is what you expect. The format string
used in printf() works the same as any equivalent statement that
could be used in a C program. Syntax of the printf() format string
will not be addressed in this document. Additional information can
be found in the printf() man page or any good book on the C language.

3.2   Simple Types

The simple types available in SymbEL are scalar, floating point, and
string. They are



Scalar
char 8 bit signed integer
uchar 8 bit unsigned integer
uchar_t
short 16 bit signed integer
ushort 16 bit unsigned integer
ushort_t
int32_t 32 bit signed integer
int
uint32_t 32 bit unsigned integer
uint
uint_t
long 32/64 bit signed integer
ulong 32/64 bit unsigned integer
ulong_t
pointer_t
int64_t 64 bit signed integer
longlong
longlong_t
uint64_t 64 bit unsigned integer
ulonglong
u_longlong_t
Floating Point
double double precision floating point
String
string pointer to null terminated ASCII text

The "long" and "ulong" types can be either 32 or 64 bits in size depending
on whether you are using the 32 or 64 bit version of the interpreter
respectively. On Solaris 7 and later, if the kernel is running in 64 bit mode,
the 64 bit interpreter will be used by default.

There are also structured types that will be covered in Chapter 5. There are
no pointer types in SymbEL.

3.3   Simple Program

The following program demonstrates some of the features of the language.
It has nothing to do with extracting data from the kernel but serves
the purpose.

/* print the times table for 5 */
main()
{
  // C++ style comments are also allowed
  int lower;
  int upper;

  lower = 1;   // lower limit
  upper = 10;  // upper limit

  while(lower <= upper) {
    printf("%4d\n", lower * 5);
    lower = lower + 1;
  }
}

3.3.1   About the Program

lower = 1;   // lower limit

Since "se" uses the C preprocessor to read the programs, the use of
C and C++ style comments and "pound sign directives" such as #define
and #include are valid.

while(lower <= upper) {

The while structure is one of the control structures supported by SymbEL.
This example requires curly braces around the two statements
inside the block of the while loop. However, the syntax of SymbEL requires all
sequences of statements inside control structures to be bracketed by curly
braces, even sequences of just one statement. The purpose for this is
cleanliness of the grammar.

printf("%4d\n", lower * 5);

The function printf() is a builtin function. Builtin functions are used
for library functions that cannot be implemented as attached functions
or represent a functionality that is specific to the interpreter. Attached
functions will be discussed in Chapter 4.

lower = lower + 1;

This is simply an incrementing of lower by one.

3.3.2   Embellishing the Program

The program will now be rewritten using different constructs and techniques.

#define LOWER    1
#define UPPER   10
#define DIGIT    5

main()
{
  int lower;
  int upper = UPPER;

  for(lower = LOWER; lower <= upper; lower = lower + 1) {
    printf("%4d\n", lower * DIGIT);
  }
}

3.3.3   About the Program

#define LOWER    1

As pointed out before, the use of "pound sign directives" is valid since
the interpreter reads the program through the C preprocessor.

int upper = UPPER;

Variables can be initialized to an expression. The expression can be
another variable, a constant, or an arithmetic expression. Assignment
of the result of a logical expression is not allowed.

for(lower = LOWER; lower <= UPPER; lower = lower + 1) {
  printf("%4d\n", lower * DIGIT);
}

The syntax of the for loop in SymbEL is virtually identical to the "C"
equivalent. The specifics of the "for" syntax is covered in Chapter 3.

3.4   Array Types

SymbEL supports single dimension arrays. Subscripts for array types start
at zero as is the case in "C". The next example is a new version of the
previous example using arrays.

3.4.1   A Program Using Arrays

#define LOWER    1
#define UPPER   10
#define DIGIT    5

main()
{
  int value[UPPER + 1];
  int i;

  for(i = LOWER; i <= UPPER; i++) {
    value[i] = i * DIGIT;
  }
  for(i = LOWER; i <= UPPER; i++) {
    printf("%4d\n", value[i]);
  }
}

3.4.2   About the Program

int value[UPPER + 1];

for(i = LOWER; i <= UPPER; i++) {

The size of the array is declared as UPPER + 1. You can use expressions
as the size of an array in its declaration provided that it is an integral
constant expression. That is, it can't contain other variables or function
calls. The size of the array must be UPPER + 1 since the for loop uses
the (i <= UPPER) condition for terminating the loop. If the array were
declared with a size of UPPER, "se" would abort with a "subscript out of
range" error during the execution.

Also, instead of using the i = i + 1 notation, the increment notation
is used. Prefix and postfix increment and decrement are supported by
SymbEL.

3.5   Functions and Parameters

So far, only one function has been shown in the examples, "main" with no
return type. If there is no return type declared on a function, returning
values will result in an error from the parser. A function can be declared
as returning a type and the function can then return a value of that type.
As an example, here is a function to raise a value to a power.

3.5.1   A Program Using Functions and Parameters

#define BASE   2

int64_t
power(int b, int p)        // raise base b to power p
{
  int64_t i;
  int64_t n;

  for(n = 1, i=1; i<=p; n *= b, i++) {
    ;
  }
  return n;
}

main()
{
  int i;

  for(i=0; i<=32; i++) {
    printf("%d raised to power %d = %lld\n", BASE, i, power(BASE, i));
  }
}

3.5.2   About the Program

int64_t
power(int b, int p)        /* raise base b to power p */

This is the declaration of a function returning a signed 64 bit integer
and taking as parameters two integers. All parameters to SymbEL functions
are value parameters and have the syntax of ANSI C parameters. As in local
declarations, each parameter must be declared separately. The syntax does
not support comma separated lists of parameters. Parameters are treated as
local variables and have all of the semantic constraints of local variables.

In this example, the power() function is declared before main.
This allows the parser to obtain type information about the function's
return value so that type checking can be done on the parameters to printf().


int64_t i;
int64_t n;

for(n = 1, i=1; i<=p; n *= b, i++) {

It is important to point out a semantic feature of local variables that
this code demonstrates. In C, the assignment of n could be made part
of the declaration. Although SymbEL supports the initialization of
local variables, the semantics of local variables are equivalent to a
"static" declaration in a C program. That is, the initialization of the
local variable is done only once upon entry to the function for the first
time. It is not performed on every call. The rationale for this is that
the overhead of maintaining automatic variables in an interpretive
environment would be too high for the language to perform reasonably.
One of the goals of "se" is put as little load on the system as possible
and still provide usable runtime performance.

for(n = 1, i=1; i<=p; n *= b, i++) {
  ;
}

The for loop is like the one shown in the previous example, however,
the statement in the do-after part of the loop shows that SymbEL supports
the compressed arithmetic assignment operators. These work just as the

"C" counterparts and are supported for the operations add, subtract,
multiply, divide, and modulus.

printf("%d raised to power %d = %lld\n", BASE, i, power(BASE, i));

Here, the call to power() is part of the parameter list for printf().
The printf() format strings conform to the types being sent, including
the %lld for the 64 bit integer.

The order of evaluation of actual parameters is from left to right.

3.6   Global Variables

The language also supports global variables. These act in the same
way as globals in "C". They are declared outside of the block of a function
and can be accesses by any function in the program. Global variables can
have initial values just as local variables and will be initialized before
the function "main" is called.

4   Operators and Expressions

In order to better understand what operators are available in SymbEL and
how expressions are constructed, an overview of the constituent parts
will now be presented.

4.1   Variable Names

Variable names are limited to 1024 characters and may begin with any alphabetic
character or an underscore. Characters beyond the first character may be any
alphabetic character, digit, underscore, or dollar sign.

4.2   Constants

There are four types of constants: integer, floating point, character, and
string. These constants may be used in expressions (except for strings),
assigned to variables, and passed as parameters. For convenience, these
constants may be placed into #define statements for expansion by the
preprocessor.

4.2.1   Integer

An integer constant can be any integer with or without sign whose value
is no more than 2 to the 64th power. Note that a large enough value that has
been negated will result in sign extension.

4.2.2   Floating Point

A floating point constant can be

{0-9} . {0-9} [ {eE} {+-} {0-9}+ ]

Some examples of valid floating point constants are

0.01
1.0e-2
0.001E+1

These constants all represent the same value, 0.01.

4.2.3   Character

A character constant is a character enclosed in single forward quotes.
As in C, the character may also be a backslash followed by another
backslash (for the backslash character itself) or a character which
represents a special ascii value. The current special characters are



\b Backspace
\f Form Feed
\n New Line
\t Tab
\r Carriage Return
\0 Null

The value following the backslash may also be an octal or hex value. For
an octal value, use a backslash followed by a valid octal digit beginning
with 0. For hex, use a backslash followed by the character 'x' followed
by a valid hex digit. Examples are



\012 New Line in Octal
\xA New Line in Hex

4.2.4   String

A string constant is enclosed in double quotes and may contain regular
ASCII characters or any of the special characters shown above. Examples are


"hello"
"hello\nworld\n"
"\thello world\xA"

There is also a special string value "nil" which defines a pointer value
of zero. String variables can be assigned nil and compared to it. Parameters
of type string can also be sent nil as an actual parameter.

4.3   Declarations

The syntax for a variable declaration is the same for local and global
variables. There may be an initialization part which is an expression.
An oddity of the interpretive environment is that the initialization part
of a global declaration may contain a function call. Great care should
be taken not to break your program however. Some example declarations are

char c;
int n = 5;
ulong ul = compute_it(n);
string hello = "world";

Note that the syntax only allows one variable per line.

4.4   Arithmetic Operators

The current operators available for performing arithmetic on numeric values
are



+ Addition of scalar or floating types.
- Subtraction of scalar or floating types.
* Multiplication of scalar or floating types.
/ Division of scalar or floating types.
% Modulus of scalar types.

These work the same as they do in "C" and arithmetic operations between
numeric variables of different types is allowed. Explicit modulus of
floating point types will be caught and disallowed by the parser, however,
if the result of an expression yields a floating point value and is used
as an operand to the modulus operator, it is converted to type longlong
first then performed. It has the potential for yielding unexpected results
so care should be taken.

In expressions that contain mixed types of scalar or scalar and floating
point, the resulting expression will be of the type of the highest precedence.
The order of precedence from lowest to highest is

  • char
  • uchar
  • short
  • ushort
  • int / long
  • uint / ulong
  • longlong
  • ulonglong
  • double

so, in this statement

int fahrenheit;
double celsius;

...
celsius = (5.0 / 9.0) * (fahrenheit - 32);

the resulting value will be double. The (5.0 / 9.0) expression yields a
double while the (fahrenheit - 32) yields an int. The multiply operator
then changes the right side of the expression into a double for
multiplication. The programmer needs to take care not to lose accuracy.

The precedence of the operators + and - are at the same level and of
lower precedence than the *, /, and % operators which are also at the
same level. Parenthesis can be used to create explicit precedence as
in the expression above. Without parenthesis, the implicit value of
the expression would be

celsius = (5.0 / (9.0 * fahrenheit)) - 32;

4.5   Logical Operators

The logical operators of SymbEL are



< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
!= Not equal to
== Equal to
=~ Regular expression comparison

They are all of equal precedence since the syntax does not allow for
expressions of the type a < b != c. The junctural operators for
short circuit logical AND and short circuit logical OR are && and ||
respectively. They are implicitly of lower precedence than the
comparators. There is no logical NOT defined by the language. Statements
can be rewritten without the logical NOT operation by comparing the value
to zero or using inverse comparators and/or juncturals.

The regular expression comparison operator =~ expects the regular expression
on the right side of the operator. Comparisons with regular expressions on
the left side of the operator will not yield the expected results.

4.6   Bitwise Operators

The bitwise operators of SymbEL are



& Bitwise AND
| Bitwise OR
^ Bitwise XOR
<< Shift left
>> Shift right

Just as there is no logical NOT, there is also no bitwise NOT. The
addition of a bitwise NOT would add certain unwanted complexities
to the parser. Working around this issue is more complex than the logical
NOT but the problem is not insurmountable.

4.7   Increment and Decrement Operators

SymbEL supports the prefix and postfix increment and decrement operators.
These operators can only be used on scalar types. Use on non-scalar
variables will result in an error from the parser.

Variables using these operators can appear in a stand-alone statement
or as part of an expression.

4.8   Compressed Assignment Operators

Support for the operators +=, -=, *=, /=, and %= is present and the semantics
for these operators are the same for the arithmetic operators and the
assignment of values. Statements with these operators may be used in the
"do part" of the "for" loop as well as in a statement by itself. For instance

for(i=0; i<10; i+=2) {
  ...
}

The bitwise version of these compressed operators are also supported. These
are the &=, |=, ^=, <<=, and >>= operators.

4.9   Address Operator

To obtain the address of a variable, the address operator is used as a prefix
to the variable name. This is the same as C with the exception that the
address of a structure cannot be taken. The address operator only works
with simple types and arrays of simple types. In the case of arrays,
using the address operator is equivalent to taking the address of the
first element of the array. The address of an array will not result in
a pointer to a pointer.

The return value of the address operator is ulong. Since there is no pointer
type in SymbEL, ulong satisfies the requirements for an address in that it
is 32 bits in length and has no sign associated with it. An example of the
use of the address operator is

int int_var = 5;
ulong pointer = &int_var;

The address operator is useful for functions such as "scanf".

The address operator can be viewed as a type cast when dealing with strings.
When the address operator is used on a string variable, the result is not
a pointer to a pointer to char. It simply has the effect of casting the
type of the variable from string to ulong since strings variables are
already addresses by definition.

4.10   Type Casting

SymbEL supports type casting between string and 4 byte numerics. This is
useful for orthogonality when taking the address of a string. If the
ulong result needs to be viewed as a string again a type cast can be used.

The syntax of the type cast is slightly different than C. The entire
expression must be enclosed in parenthesis. This is yet another effort
to keep the complexity of the grammar to a minimum. This is an example
of type casting.


#include <stdio.se>
#include <unistd.se>
#include <uio.se>

main()
{
  char buf[128];
  iovec_t iov[1];

  iov[0].iov_base = &buf;
  iov[0].iov_len = sizeof(buf);
  if (readv(0, iov, 1) > 0) {
    fputs(((string) iov[0].iov_base), stdout);
  }
}

Type casting between string and double or casts of structured types is not
allowed and will result in an error message from the parser.

5   Control Flow

Some of the control structures have already been covered. This chapter
will discuss all of the control structures in detail.

5.1   Blocks

All control statements in SymbEL must have a block associated with them.
The block will begin with a left curly brace, {, and end with a right
curly brace, }. This was done for the purpose of keeping the grammar
clean. It also provides additional clarity of the code.

5.2   If-Then-Else Statement

The "if" statement may be an "if" by itself or an "if" statement and an
"else" part. The structure of an "if" construct is

if ( logical expression ) {
  ...
}

There are two points that need to be made about this construct. The first is
that the condition that the "if" checks on is not an expression. It is a
logical expression. Therefore, statements such as

if (running) {
  ...
}

are not correct. The condition must be a comparison for it to
be a logical expression. This "if" is correctly written as


if (running != 0) {
  ...
}

where there is a logical expression instead of an expression as the condition.
The second point is that there are always curly braces surrounding the
statement blocks of the "if" and the "else" parts. This is true even if there
is only one statement.

Here are some examples.

if (x > y) {
  printf("x is greater than y\n");
}

if (x < y) {
  printf("x is less than y\n");
} else {
  printf("x is greater than y\n");
}

Care should be taken when writing if-then-else-if-then statements. It is
easy to mistakenly write

if (x == 2) {
  printf("x is equal to 2\n");
} else if (x == 3) {
  printf("x is equal to 3\n");
}

which is incorrect. The "else" part must begin with a curly brace. This
is correctly written as

if (x == 2) {
  printf("x is equal to 2\n");
} else {
  if (x == 3) {
    printf("x is equal to 3\n");
  }
}

5.3   Conditional Expression

The ternary operator "?:" known as a conditional expression is supported
as a value in an expression. There is a syntactic requirement on this
expression type however. The entire conditional expression must be
enclosed in parenthesis. The logical expression part can be any supported
logical expression. The two expression parts can be any supported expression
provided their types are compatible. Here's an example that prints out the
numbers from 1 to 10 and whether they're even or odd.

main()
{
  int i;

  for(i=1; i<=10; i++) {
    printf("%2d %s\n", i, ((i % 2) == 0 ? "even" : "odd"));
  }
}

5.4   Switch Statement

The "switch" statement is used for selecting a value from many possible
values. This statement type works exactly like the C equivalent with
an addition. The "switch" statement in SymbEL also works on strings.
Some examples are

// character switch
main()
{
  char t = 'w';

  switch(t) {
  case 'a':
    printf("a - wrong\n");
    break;
  case 'b':
    printf("b - wrong\n");
    break;
  case 'c':
    printf("c - wrong\n");
    break;
  case 'w':
    printf("w - correct\n");
    break;
  default:
    printf("say what?\n");
    break;
  }
}


// string switch
main()
{
  string s = "hello";

  switch(s) {
  case "yo":
  case "hey":
  case "hi":
    printf("yo/hey/hi - wrong\n");
    break;
  case "hello":
    printf("%s, world\n", s);
    break;
  default:
    printf("say what?\n");
    break;
  }
}

5.5   Loops

5.5.1   While Loop

The three looping constructs in SymbEL are the "while", "for", and "do"

loops. The structure of the "while" loop is

while( logical expression ) {
  ...
}

There are two points that need to be made about this construct. First is
that the condition that the "while" checks on is not an expression. It
is a logical expression. Therefore, loops such as

while(running) {
  ...
}

are not correct. The condition must be a comparison in order for it to
be a logical expression. This loop is correctly written as

while(running != 0) {
  ...
}

The second point is that there are always curly braces surrounding the
statement blocks of the "while", "for" and "do" statements. This is true even
if there is only one statement in the block. This is consistent with the
other control constructs of SymbEL.

5.5.2   For Loop

The structure of the "for" loop is

for( assign part; while part; do part ) {
  ...
}

The "assign part" can be an optionally comma separated list of statements
or it can be omitted. The "while part" can be any logical expression or
it can be omitted. The "do part" can be an optionally comma separated
list of statements or it can be omitted. Simple "for" loops such as

for(i=0; i<10; i++) {
   ...
}

are valid as well as the more complex

for(i=0, j=0; i<queue.depth && queue.members[i] != 0; i++, j=depth(queue)) {
  ...
}

for(i *= 2; i < 10 || x != 3; i = recompute(i)) {
  ...
}

for(; i < 10; ) {
  ...
}

for(;;) {
  ...
}

5.5.3   Do Loop

The structure of the "do" loop is

do {
  ...
} while( logical expression );

The rule regarding the logical expression is the same as the "while" loop.

5.6   Break

The "break" statement can be used for exiting from a "switch" case, or
exiting a "while", "for" or "do" loop. As in C, only the inner-most loop will
be exited. This program

for(i=0; i<10; i++) {
  if (i == 2) {
    printf("i is 2\n");
    break;
  }
}
printf("now, i is %d\n", i);

yields the output

i is 2
now, i is 2

when run. The same holds true for "while" and "do" loops.

5.7   Continue

The "continue" statement is supported for "while", "for" and "do" loops.
It works identically to the C construct and will continue the inner-most loop
at the top of the loop. If in a "for" loop, the "do part" of the "for" loop
will be executed before the block is entered again.

while(i != 18) {
  if (i < 10) {
    continue;
  }
  i = do_something();
}

5.8   Goto

There is no "goto" in SymbEL.

6   Functions, Procedures and Notes on Programming

SymbEL supports encapsulated, scoped blocks that have the ability to return
a value. These can be referred to as functions and procedures for notational
brevity. Some points need to made about these constructs in order for the
picture to be complete.

6.1   Function Return Types

So far, the only value that functions have returned in the examples have
been a scalar type. It is possible for functions to return "double" or
"string" as well. There are also more complex types that will be covered
in later chapters that may be returned as well.

It is not possible for functions to return arrays as there is no syntactic
accommodation for it. There is a way to get around this limitation for
arrays of non-structured types. See the section
Returning an Array of Non-Structured Type from a Function in the
"Tricks" chapter.

6.2   Scope

Although variables may be declared local to a function, the default semantics
for local variables are for them to be the C equivalent of "static". Therefore,
even though a local variable has an initialization part in a local scope, this
initialization IS NOT performed on each entry to the function. It is done
once before the first call and never done again.

6.3   Initializing Variables

Variables can be initialized to values that are compatible with their
declared type. This is the case for both simple and structured types.

The only exceptional condition in initializing variables is the ability to
initialize a global variable with a function call. This is supported, but
great care must be taken in the use of this capability. In general, it
should be avoided as bad practice.

Arrays can be given initial values through an aggregate initialization.
The syntax is identical to C. For example

int array[ARRAY_SIZE] = {
  1, 2, 3, 4, 5, -1
};

The size of the array must be large enough to accommodate the aggregate
initialization or the parser will flag it as an error.

6.4   Notes About Arrays and Strings

6.4.1   Array Assignment

Although pointer types are not allowed in SymbEL, assignment of arrays
is allowed provided that the size of the target variable is equal
to or greater than the size of the source variable. This is not a pointer
assignment though. It can be viewed as a value assignment where the values
of the source array are being copied to the target array.

6.4.2   String Type and Character Arrays

Variables declared as type "string" and arrays of type "char" have
interchangable values. The purpose of this is to make accessible individual
characters contained in a string. For example

char tmp[8];
string s = "hello";

tmp = s;
if (tmp[0] == 'h') {
  ...
}

This could not be done with the s variable by itself. If a subscript was
used, it would mean that the variable "s" was an array of strings, not an
array of characters. After any modification to the variable "tmp" in this
example is done, the value could be assigned back to "s".

6.4.3   Assignment to String Variables

When a variable of type string is assigned a new value, the existing value
of the variable is freed and a new copy of the source string is allocated
and assigned to the variable. This is also the case when string variables
are assigned the return value of a function that returns type string. See
the section Using Attached Functions Return Values in the "Tricks" chapter.

6.4.4   Empty Arrays

When a function accepts an array as a parameter, it is not convenient to
always send and array of the same size as the parameter. For this reason
the empty array declaration was added for use in parameter declarations.
This is a notation where no subscript size is included in the declaration,
just the "[]" suffix to the variable name. Here is an example.

print_it(int array[])
{
  int i;

  for(i=0; array[i] != -1; i++) {
    printf("%d\n", array[i]);
  }
}

main()
{
  int array[6] = { 1, 2, 3, 4, 5, -1 };

  print_it(array);
}

Upon entry to the function containing the empty array parameter, the
parameter variable obtains a size. It this example, the "array" parameter
is given a size of 24 (6 * 4) upon entry to the function "print_it". This
size will change for every array passed as an actual parameter.

6.5   Recursion

Recursion is not supported. Direct recursion will be flagged as an error by
the parser. Indirect recursion will be silently ignored. The disallowance
of recursion is due to a problem in the runtime that could not be overcome
in the short term and may be fixed in a future release. Examine this program.

one()
{
  // remember, initialization is only done once
  int i = 0;

  i++;
  switch(i) {
  case 1:
    printf("Here I am\n");
    break;
  case 2:
    printf("Here I am again\n");
    break;
  }
  two();
}

two()
{
  one();
}

main()
{
  one();
}

It seems that the output of this program would be

Here I am
Here I am again

but in fact only the first line will be printed out. The second call to
"one" is detected by the runtime and a return from the function is
performed before anything is done. SymbEL is not the place to do recursion.
Again, if you feel like being tricky, don't. It probably won't work.

6.6   Builtin Functions

SymbEL currently supports a limited set of builtin functions. As the need
arises, more builtins will be added. Many of these builtins work the same
or similarly to the C library version. For a complete description of those
functions, see the manual page for the C function. The current builtin
functions are

void atexit(string exit_func)

Call function "exit_func" before exiting. This is the same as the libc
version with the exception that the function is specified by name. The
interpreter will find the function and call it. The function called should
be declared as "exit_func()", i.e. void type with no parameters.

void bsearch(pointer_t key, pointer_t base,
             int nel, int size, string compare_func);

This works like the libc bsearch with the exception that the compare function
is specified by name. The interpreter will find the function and call it.
It should be declared as an "int func(pointer_t a, pointer_t b)". It will
work the same way as the libc version. The arguments a and b will be pointers
to the individual elements of the array being searched. The return value
is also a pointer to this type and should be casted accordingly.

void debug_off(void)

Turn debugging off until the next call to debug_on().

void debug_on(void)

Turn debugging on following this statement. Debugging information will
be printed out until the next call to debug_off().

int fileno(int)

This works like the stdio macro, only it's a builtin. The return value
of an fopen() or popen() can be sent to fileno() to retrieve the underlying
file descriptor.

struct prpsinfo_t first_proc(void)

This function can be used in conjunction with the next_proc() function
to traverse through all of the processes in the system. All of the fields
of the prpsinfo_t structure may not be filled due to permissions. As root,
it will be filled in completely.

int fprintf(int, string, ...)

Print a formatted string onto the file defined by the first parameter.
The man page for the fprintf C library function defines the usage of
this function in more detail.

struct prpsinfo_t get_proc(int)

In order to get a process by its process id (instead of traversing all
of the processes) this function can be used. The same rules regarding
permissions apply to this function as well as first_proc().

string itoa(int)

This function will convert an integer into a string.

ulong kvm_address(VAR)

Return the kernel seek value for this variable. This will only work on
variables designated as special kvm variables in the declaration.

void kvm_cvt(VAR, ulong)

Change the kernel seek value for this variable to the value specified by
the second parameter.

ulong kvm_declare(string)

Declare a new kvm variable while the program is running. The return value
is the kvm address of the variable, used as the second parameter to kvm_cvt.

struct prpsinfo_t next_proc(void)

When used in conjunction with first_proc(), this function can be used
to traverse through all of the processes on the system. When the pr_pid
member of the prpsinfo_t structure is -1 after a return from this function,
then all of the processes have been visited.

void printf(string, ...)

Print a formatted string. Internally, a call to fflush() is made after every
call to printf(). This causes a write() system call and the effects of this
should be taken into account when writing SymbEL programs.

void qsort(pointer_t base, int nel, int width, string compare_func);

This works like the libc qsort with the exception that the compare function
is specified by name. The interpreter will find the function and call it.
It should be declared as an "int func(pointer_t a, pointer_t b)". It will
work the same way as the libc version. The arguments a and b will be pointers
to the individual elements of the array being sorted.

void signal(int, string);

Specify a signal catcher. The first parameter specifies the signal name
according to the signal.se include file. The second parameter is the
name of the SymbEL function to call upon receipt of the signal.

int sizeof(...)

Return the size of the parameter. This can be a variable, a numeric value
or an expression.

string sprintf(string, ...)

Return a string containing the format and data specified by the parameters.
This function is like the C library function in what it does but does
not use the buffer parameter as the first argument. It is returned
instead. The function is otherwise like the C function.

void struct_empty(STRUCT, ulong)

Dump the contents of the variable passed as the first parameter into
the memory location specified by the second parameter. The binary value
dumped will be the same as it would appear if it were in a C program.

void struct_fill(STRUCT, ulong)

Replace the data from the second parameter into the structure variable
passed as the first parameter. This will allow C-structure-format data
to be translated into the internal representation of structures used
by SymbEL.

void syslog(int, fmt, ...)

Log a message through the syslog() facility. Note that the %m string must
be sent as %%m since the interpreter passes the format string through
vsprintf() before passing to syslog() internally.

6.7   Dynamic Constants

The SymbEL interpreter deals with a few physical resources that have
variable quantities on each computer that it is run on. These are the
disks, network interfaces, CPUs, and devices that have an interrupt-
counters structure associated with it. It is often necessary to declare
arrays that are bounded by the quantity of such a resource. When this
is the case, a value is required that is sufficiently large to prevent
subscripting errors when the script is running. This is dealt with by
using dynamic constants. These constants can be used as integer values,
and the interpreter views them as such. These dynamic constants are


MAX_DISK: Maximum number of disk or disk-like resources
MAX_IF: Maximum number of network interfaces
MAX_CPU: Maximum number of CPUs
MAX_INTS: Maximum number of devices with interrupt counters

These values are typically set to the number of discovered resources plus one.
A single CPU computer, for instance, will have a MAX_CPU value of 2. The
MAX_DISK and MAX_IF constants will often have a larger value than the number
of actual resources on the system. For instance, my workstation is reported
to have 23 disks and 16 network interfaces even though it only has 3 disks
and 1 network interface. The reason for this is rather convoluted. Suffice
it to say that it's better to have more than not enough since the latter
condition will cause the interpreter to exit with a subscript error. Run
this script on your system and see what it says.

main()
{
  printf("MAX_DISK = %d\n", MAX_DISK);
  printf("MAX_IF   = %d\n", MAX_IF);
  printf("MAX_CPU  = %d\n", MAX_CPU);
  printf("MAX_INTS = %d\n", MAX_INTS);
}

6.8   Attachable Functions

To ensure against the rampant effects of "creeping featurism" overtaking the
size and complexity of the interpreter, a mechanism needed to be devised so
many procedures and functions could be "built in" without being a builtin.

The solution was to provide a syntactic remedy that defined a shared
object that could be attached to the interpreter at run time. This
declaration would include the names of functions contained in that
shared object. Here is an example.


attach "libc.so" {
  int puts(string s);
};

main()
{
  puts("hello");
}

The attach statements are contained in the same "se" include files as the
C counterpart in /usr/include. Reading the man page for "fopen", for
instance, specifies that the file "stdio.h" should be included to obtain
its declaration. In SymbEL, the include file "stdio.se" is included to
obtain the declaration inside of an "attach" block.

There are some rules governing the use of attached functions.

  • Only parameters that are four bytes long or less can be passed as
    parameters. No longlong types or doubles.

  • Structures can be passed but they will be sent as pointers to structures.
    The equivalent "C" representation of the SymbEL structure can be declared
    and the parameters should then be declared as pointers to that type. The
    structure pointer parameter can then be used as it normally would.

  • Attached functions declaring a structure type as its return value
    will be treated as if the function returns a pointer to that type.
    This should match the actual return type of the attached function.

    There is no way to declare an attached function that returns a
    structure, i.e. not a pointer to a structure, but a structure.

    The value returned will be converted from the C representation into
    the internal SymbEL representation. No additional code is needed to
    convert the return value.

    Note that attached functions returning pointers to structures that
    may also return zero (a null pointer) to indicate error or end-of-file
    conditions should be declared as returning "ulong" and compared to
    zero. If a non-zero value is returned, the "struct_fill" builtin
    can be used to fill a structure. If an attached function is declared
    to return a structure and it returns zero, a null pointer exception
    will occur in the program and the interpreter will exit.

  • No more than twelve parameters can be passed.

  • Arrays passed to attached functions are passed by reference. The call

    
    fgets(buf, sizeof(buf), stdin);
    

    will do exactly what it is supposed to do. The "buf" parameter will be
    filled in by "fgets" directly since the internal representation of an
    array of characters is, not surprisingly, an array of characters.

    These semantics include passing arrays of structures. The SymbEL
    structure will be emptied before passing and filled upon return when
    sent to attached functions.

  • The rules for finding the shared library in an "attach" statement
    are the same for those defined in the man page on "ld".

6.8.1   Ellipsis Parameter

For attached functions only, the ellipsis parameter (...) can be used to
specify that there is an indeterminate number and type of parameters to
follow. Values passed up until the ellipsis argument will be type checked
but everything after that will not be type checked.

The ellipsis parameter allows functions like "sscanf" to work and therefore
makes the language more flexible. For instance, the program

attach "libc.so" {
  int sscanf(string buf, string format, ...);
};

main()
{
  string buf_to_parse = "hello 1:15.16 f";
  char str[32];
  int n;
  int i;
  double d;
  char c;

  n = sscanf(buf_to_parse, "%s %d:%lf %c", str, &i, &d, &c);
  printf("Found %d values: %s %d:%5.2lf %c\n", n, str, i, d, c);
}

yields the output "Found 4 values: hello 1:15.16 f".

6.8.2   Attached Variables

Global variables contained in shared objects can be declared within an
"attach" block using the keyword "extern" before the declaration. This
will cause the values within the internal SymbEL variable to read and
written to the variable as it is used in the execution of the program.

An example of the declaration of "getopt" with its global variables "optind",
"opterr" and "optarg" from the include file "stdlib.se" is as follows.


attach "libc.so" {
  int    getopt(int argc, string argv[], string optstring);
  extern int optind;
  extern int opterr;
  extern string optarg;
};

This works for all types, including structures.

6.9   Builtin Variables

Although extern variables can be attached the "extern" notation, there
are 3 very special cases of variables that cannot be attached this way.
These variables are "stdin", "stdout", and "stderr". These "variables"

in C are actually #define directives in the stdio.h include file. They
actually reference the addresses of structure members. Since the address
of structures cannot be taken in SymbEL, there is no way to represent
these. They are, therefore, provided by the interpreter as builtin variables.
They may be used without any declaration or include file usage.

6.10   Parameters to "main" and its Return Value

In C programs, the programmer can declare "main" as accepting three parameters.
They are

  • An argument count ( usually argc )
  • An argument vector ( usually argv )
  • An environment vector ( usually envp )

Similarly, the SymbEL "main" function can be declared as accepting two of
these parameters, argc and argv. Here is an example using these variables.

main(int argc, string argv[])
{
  int i;

  for(i=0; i<argc; i++) {
    printf("argv[%d] = %s\n", i, argv[i]);
  }
}

This example also demonstrates the use of an empty array declaration. When
this program is run with the command

se test.se one two three four five six

the resulting output is

argv[0] = test.se
argv[1] = one
argv[2] = two
argv[3] = three
argv[4] = four
argv[5] = five
argv[6] = six

It is not necessary to declare these parameters to "main". If they are not
declared then the interpreter will not send any values for them.

It is also possible to declare "main" as being an integer function. Although
the "exit" function can be used to exit the application with a specific code,
the value can also be returned from "main". It this case, the previous
example would be

int main(int argc, string argv[])
{
  int i;

  for(i=0; i<argc; i++) {
    printf("argv[%d] = %s\n", i, argv[i]);
  }
  return 0;
}

The value returned by the "return" statement will be the code that the
interpreter exits with.

7   Structures

Support for the aggregate type "struct" exists in SymbEL and is similar to
the C variety with some exceptions. An aggregate is a collection of
potentially dissimilar objects collected into a single group. As it turns out,
most of the SymbEL code developed will contain structures.

As an example, here is what a SymbEL password file entry might look like.

struct passwd {
  string pw_name;
  string pw_passwd;
  long   pw_uid;
  long   pw_gid;
  string pw_age;
  string pw_comment;
  string pw_gecos;
  string pw_dir;
  string pw_shell;
};

The declaration of structure variables differs from C in that the word "struct"
is left out of the variable declaration. So, to declare a variable of type
"struct passwd" only "passwd" would be used.

7.1   Accessing Structure Members

Accessing a structure member is done with "dot notation". This means the
first part of the variable is the variable name itself, followed by a dot
and then the structure member in question. To access the "pw_name" member
of the "passwd" structure above, the code could look like this.

main()
{
  passwd pwd;

  pwd.pw_name = "richp";
  ...
}

Structure members can be any type including other structures. A structure
may NOT contain a member of its own type. Attempting to do so will
result in an error from the parser.

7.2   Arrays of Structures

Declarations of arrays of structures is the same as any other type with
the provision stated in the previous paragraph. Notation for accessing
members of an array of structures is "name[expression].member".

7.3   Structure Assignment

The assignment operation is available to variables of the same structure
type.

7.4   Structure Comparison

Comparison of variables of structure type is not supported.

7.5   Structures as Parameters

Variables of structure type can be passed as parameters. As with other
parameters, they are passed by value, so the target function will be
able to access its structure parameter as a local variable.

Passing arrays of structures to other SymbEL functions are also passed
by value. This is not the case with passing arrays of structures to
attached functions. This is described in Dynamic Constants.

7.6   Structures as Return Values of Functions

Functions may return structure values. Assigning a variable the value of
the result of a function call that returns a structure is the same as a
structure assignment between two variables. The exception is when calling
attached functions. This is described in detail in Dynamic Constants.

8   Language Classes

The preceding chapters have discussed the basic structure of SymbEL. The
remainder of this document will discuss the features of SymbEL that make
it powerful as a language for extracting, analyzing, and manipulating
data from the kernel.

When generalizing a capability, the next step after creation of a library
is the development of a syntactic notation which represents the capability
that the library provided. The capability in question here is the retrieval
of data from the sources within the kernel that provide performance tuning
data. SymbEL provides a solution to this problem through the use of predefined
language classes that can be used to declare the type of a variable and/or to
designate it as being a special variable. When a variable with this special
designation is accessed, the data from the source that the variable represents
will be extracted and placed into the variable before it is evaluated.

There are four predefined language classes in SymbEL. They are


kvm: Access to any global kernel symbol
kstat: Access to any information provided by the kstat framework
mib: Read-only access to the MIB2 variables in the IP, ICMP, TCP,
and UDP modules in the kernel
ndd: Access to variables provided by the IP, ICMP, TCP, UDP, and ARP
modules in the kernel

Variables of these language classes have the same structure as any other
variable. They can be a simple type or a structured type. What needs
clarification in the declaration of the variable is

  • whether the variable type is simple or structured
  • whether the variable has a predefined language class attribute

The syntax selected for this capability is to define the variable with
a name which is the concatenation of the language class name and a dollar
sign ($). This would allow these prefixes for variables to denote their
their special status.

  • kvm$ kvm language class
  • kstat$ kstat language class
  • mib mib language class
  • ndd ndd language class

Examples of variables declared with a special attribute are

ks_system_misc kstat$misc;    // structured type, kstat language class
int            kvm$maxusers;  // simple type,     kvm   language class
mib2_ip_t      mib$ip;        // structured type, mib   language class
ndd_tcp_t      ndd$tcp;       // structured type, ndd   language class

When any of these variables appears in a statement the values that the
variables represent is retrieved from the respective source before the
variable is evaluated. Variables declared of the same type but not
possessing the special prefix are not evaluated in the same manner.
For instance, the variable

ks_system_misc tmp_misc;  // structured type, no language class specified

can be accessed without any data being read from the kstat framework.

Variables that use a language class prefix in their name are called "active"
variables. Those that do not are called "inactive".

8.1   The kvm Language Class

Let's look at an example of the use of a kvm variable.

main()
{
  int kvm$maxusers;

  printf("maxusers is set to %d\n", kvm$maxusers);
}

In this example there is a local variable of type "int". The fact that it
is an "int" is not exceptional. The fact that the name of the variable
begins with "kvm$" is. It is the "kvm$" prefix that flags the interpreter
to look up this value in the kernel via the kvm library. The actual name
of the kernel variable is whatever follows the "kvm$" prefix. No special
action needs to be taken by the program in order for the value to be read
from the kernel. Simply accessing it by using it as a parameter to the
printf() statement (in this example) causes the interpreter to read the value
from the kernel and place it in the variable before sending the value to

printf(). Use of kvm variables is somewhat limiting since the effective uid
of "se" must be super-user or the effective gid must be "sys" in order to
successfully use the kvm library.

In this example, the variable "maxusers" is a valid variable in the kernel
and when accessed is read from the kernel address space. It is possible
and legal to declare a "kvm$" active variable with the name of a variable
that is not in the kernel address space. The value will contain the original
initialized value and refreshing of this type of variable is futile since
there is no actual value in the kernel. This is useful when dealing with
pointers though and an example is included in the "Tricks" chapter.

8.2   The kstat Language Class

The use of kstat variables is different from kvm variables in that all of
the kstat types are defined in the header file "kstat.se". All kstat
variables must be structures since this is how it is defined in the header
file. Declaration of an active kstat variable that is not a structure will
result in a semantic error. Declaration of an active kstat variable that is
not of a type declared in the kstat.se header file will result in the variable
always containing zeros unless the program places something else in the
variable manually. Here is an example using kstat variables.

#include <kstat.se>

main()
{
  ks_system_misc kstat$misc;

  printf("This machine has %u CPU(s) in it.\n", kstat$misc.ncpus);
}

Just as in the kvm example, no explicit access must be done to retrieve
the data from the kstat framework. The access to the member of the active
ks_system_misc variable in the parameter list of printf() causes the
member to be updated by the runtime.

8.2.1   Multiple Instances

The kstat.se header file contains many structures that have information
that is unique in nature. The ks_system_misc structure is an example.
The number of CPUs on the system is unique and does not change depending
on something else. However, the activity of each of the individual CPUs
DOES change depending on which CPU is in question. This is also the
case for network interfaces and disks. This situation is handled by
the addition of two members to structures that contain data for devices
that have multiple instances. These members are "number$" and "name$".
The "name$" member will contain the name of the device as supplied by
kstat. The "number$" member is a linear number representing the "nth"

device of this type encountered. It is NOT the device instance number.
The reasoning for this is to allow a "for" loop to be written such that
all of the devices of a particular type can be traversed without needing
to skip over instances that are not in the system. It is not unusual,
for instance, for a multi-processor machine to contain CPUs that do not
have linear instance numbers. When traversing through all the devices,
the end of the list will be encountered when the "number$" member contains
a -1. Here is an example of searching through multiple disk instances.

#include <kstat.se>

main()
{
  ks_disks kstat$disk;

  printf("Disks currently seen by the system:\n");
  for(kstat$disk.number$=0; kstat$disk.number$ != -1; kstat$disk.number$++) {
    printf("\t%s\n", kstat$disk.name$);
  }
}

In this program, kstat$disk.number$ is set initially to zero. The "while
part" of the loop is then run checking the value of kstat$disk.number$ to
see if it's -1. That comparison causes the runtime to verify that there
is an "nth" disk. If there is, then the number$ member is left with its
value and the body of the loop runs. When the runtime evaluates the
kstat$disk.name$ value in the printf() statement it reads the name of the
"nth" disk, places it in the name$ member and which is then sent to printf().

8.2.2   Other Points About kstat

There are some points that need to be made about how to best use kstat
variables in a program.

Some of the values contained in the kstat structures are not immediately
useful by themselves. For instance, the "cpu" member of the ks_cpu_sysinfo
structure is an array of four unsigned longs representing the number of
clock ticks that have occurred since system boot in each of the the four
CPU states: idle, user, kernel, and wait. This data needs to be
disseminated to be useful.

If a program needs to access many members of a kstat variable then it is
in the best interest of the performance of the program and the system to copy
the values into an inactive kstat variable using a structure assignment. The
single structure assignment will cause all of the members of the structure to
be read from the kstat framework with one read and copied to the inactive
variable. When these values are accessed using the inactive variable,
no more reads from the kstat framework will be initiated and the net result
will be a reduction in the number of system calls being performed by the
runtime and therefore "se" will not have a significant impact on the
performance of the system. Here is an example.

8.2.3   Example kstat Program

#include <unistd.se>
#include <sysdepend.se>
#include <kstat.se>

main()
{
  ks_cpu_sysinfo kstat$cpusys;   // active kstat variable
  ks_cpu_sysinfo tmp_cpusys;     // inactive kstat variable
  ks_system_misc kstat$misc;     // active kstat variable
  int ncpus = kstat$misc.ncpus;  // grab it and save it
  int old_ints[MAX_CPU];
  int old_cs[MAX_CPU];
  int ints;
  int cs;
  int i;

  // initialize the old values
  for(i=0; i<ncpus; i++) {
    kstat$cpusys.number$ = i;       // does not cause an update
    tmp_cpusys = kstat$cpusys;      // struct assignment, update performed
    old_ints[i] = tmp_cpusys.intr;  // no update, inactive variable
    old_cs[i] = tmp_cpusys.pswitch; // no update, inactive variable
  }
  for(;;) {
    sleep(1);
    for(i=0; i<ncpus; i++) {
      kstat$cpusys.number$ = i;    // does not cause an update
      tmp_cpusys = kstat$cpusys;   // struct assignment, update performed
      ints = tmp_cpusys.intr - old_ints[i];
      cs = tmp_cpusys.pswitch - old_cs[i];

      printf("CPU: %d   cs/sec = %d  int/sec = %d\n", i, cs, ints);

      old_ints[i] = tmp_cpusys.intr;
      old_cs[i] = tmp_cpusys.pswitch;  // save old values
    }
  }
}

8.2.4   About the Program

ks_cpu_sysinfo kstat$cpusys;   // active kstat variable
ks_cpu_sysinfo tmp_cpusys;     // inactive kstat variable

Here is the declaration of the active and inactive variable. Use of the
active variable will cause the runtime to read the values from the kstat
framework for the ks_cpu_sysinfo structure. Later accesses to the inactive
variable will not cause the reads to occur.

ks_system_misc kstat$misc;     // active kstat variable
int ncpus = kstat$misc.ncpus;  // grab it and save it

Since the ncpus variable will be used extensively, it is best to put the value
into a variable that will not cause continual updates.

int old_ints[MAX_CPU];
int old_cs[MAX_CPU];

Since the program computes the rate at which interrupts and context switches
are occurring, the values from the previous iteration need to be saved so
they can be subtracted from the values of the current iteration. They are
arrays bounded by the maximum number of CPUs available on a system.

// initialize the old values
for(i=0; i<ncpus; i++) {
  kstat$cpusys.number$ = i;       // does not cause an update
  tmp_cpusys = kstat$cpusys;      // struct assignment, update performed
  old_ints[i] = tmp_cpusys.intr;  // no update, inactive variable
  old_cs[i] = tmp_cpusys.pswitch; // no update, inactive variable
}

This grabs the initial values that will be subtracted from the current values
after the first sleep() is completed. For the sake of simplicity, there are
no timers kept and it is assumed that only one second has elapsed between
updates. In practice, the elapsed time would be computed.

for(i=0; i<ncpus; i++) {
  kstat$cpusys.number$ = i;    // does not cause an update
  tmp_cpusys = kstat$cpusys;   // struct assignment, update performed

Here, the number$ member is set to the CPU in question and then the contents
of the entire active structure variable is copied into the inactive structure
variable. This causes only one system call to update the kstat variable.

ints = tmp_cpusys.intr - old_ints[i];
cs = tmp_cpusys.pswitch - old_cs[i];

printf("CPU: %d   cs/sec = %d  int/sec = %d\n", i, cs, ints);

old_ints[i] = tmp_cpusys.intr;
old_cs[i] = tmp_cpusys.pswitch;  // save old values

This code computes the number of interrupts and context switches for the
previous second and prints it out. The current values are then saved as
the old values and the loop continues.

8.2.5   Runtime Declaration of kstat Structures

The kstat framework is dynamic and contains information regarding devices
attached to the system. These devices are built by Sun and by third party
manufacturers. The interpreter contains static definitions of many devices
and these definitions are mirrored by the kstat.se include file. However,
it is not reasonable to assume that the interpreter will always contain
all of the possible definitions for devices. To accommodate this situation,
the addition of a syntactic element was needed. This is the kstat structure.
A kstat structure can define a KSTAT_TYPE_NAMED structure only, which are
the structures which define devices such as network interfaces.

As an example, the following script prints out the values of a kstat structure
that is not declared in the kstat.se file but has been part of the kstat
framework since the very beginning.

kstat struct "kstat_types" ks_types {
  ulong raw;
  ulong "name=value";
  ulong interrupt;
  ulong "i/o";
  ulong event_timer;
};

main()
{
  ks_types kstat$t;
  ks_types tmp = kstat$t;

  printf("raw         = %d\n", tmp.raw);
  printf("name=value  = %d\n", tmp.name_value);
  printf("interrupt   = %d\n", tmp.interrupt);
  printf("i/o         = %d\n", tmp.i_o);
  printf("event_timer = %d\n", tmp.event_timer);
}

The kstat structure introduces a few new concepts.

  • The structure starts with the work "kstat" to denote its significance.

  • The structure also contains members that are quoted. This will not work
    in an ordinary structure declaration, only for kstat structures. The
    purpose of this is to provide the ability to declare variables that
    accurately reflect the name of the member within the kstat framework.
    For instance, the member "name=value" could not be declared without quotes
    since the parser would generate errors. When accessed in the printf()
    statement, special characters are translated to underscores. This is the
    case for any character that is recognized as a token and also for spaces.
    The list of characters that will be translated to underscores are

    
    []{}()@|!&#;:.,+*/=-><~%? \t\n\\^
    
  • Members of KSTAT_TYPE_NAMED structures sometimes have no name. This
    situation will also by correctly handled by the interpreter. Any
    member of a structure with the name "" will be changed to "missingN"
    where N starts at 1 and increments for each occurrence of a missing
    member name. A declaration of

    kstat struct "asleep" ks_zzzz {
      ulong "";  // translates into missing1
    };
    
    

    will translate into

    kstat struct "asleep" ks_zzzz {
      ulong missing1;
    };
    

    for the purposes of the programmer. It would be a good idea to document
    such declarations as above.

  • Members with reserved words as names will also be munged into another
    form, the prefix "SYM_" is added to the name. For instance this
    declaration

    kstat struct "unnecessary" ks_complexity {
      short "short";
    };
    

    will be munged into

    kstat struct "unnecessary" ks_complexity {
      short SYM_short;
    };
    

    so the programmer can continue.

  • The quoted string following the keyword "struct" in the declaration
    represents the name of the KSTAT_TYPE_NAMED structure in the kstat
    framework and is an algebra onto itself. First, an introduction.
    Each "link" in the kstat "chain" which comprises the framework has
    3 name elements: a module, an instance number, and a name. The
    "kstat_types" link, for instance, has the complete name
    "unix".0."kstat_types". "unix" is the module, 0 is the instance number,
    and "kstat_types" is the name. Here are the possible ways to specify
    the kstat name within this quoted string.

    • "kstat_types" - This is the "name" of the kstat.

    • "cpu_info:" - This is the "module" of the kstat. A link with the full
      name of "cpu_info".0."cpu_info0" would map onto this structure.
      However, so too would "cpu_info".1."cpu_info1", which brings up an
      issue. When a kstat structure is declared with a kstat module name,
      the first two members of the structure must be

      long number$;
      string name$;
      

      This is in keeping with other kstat declarations with multiple
      instances. In the case of structures with multiple module names
      that have the same structure members, the list of names continues
      with colon separators, e.g.

      kstat struct "ieef:el:elx:pcelx" ks_elx_network { ...
      
    • "*kmem_magazine" - This is the prefix of the name portion of the
      kstat. In the case of the kmem_magazines, the module name will
      always be "unix" which is the module name of many other links that
      do not share the same structure members as the kmem_magazines.
      As is the case with specifying a module name, the number$ and name$
      members must be present.

    • ":module:instance:name:statistic" - This notation is inspired by
      that used by the kstat command line utility. Each of the module,
      instance, name, or statistic specifiers may be a shell glob pattern
      or a regular expression enclosed by '/' characters. It is
      possible to use both specifier types within a single operand.
      Leaving a specifier empty is equivalent to using the '*' glob
      pattern for that specifier. This format differs from that in the
      kstat command line utility in that the leading colon is required to
      distinguish this form from those above, and the regular expressions
      are Extended Regular Expressions, regex(5), not Perl. As is the
      case with specifying a module name, the number$ and name$ members
      must be present.

Note that when a dynamic kstat structure declaration replaces a static
declaration inside of the interpreter, the old declaration is discarded
and replaced with the new one. Therefore, if a kmem_magazine declaration
were used to replace the "ks_cache" declaration from kstat.se, the only
kstat links seen would be the kmem_magazine members, and all of the other
cache links (and there are a lot of them) would no longer be seen.

8.2.6   Adding New Disk Names

The internal function "se_add_disk_name(string name)" can be used to
add new disk names to the existing list internally. Therefore, if the
tape drives and nfs mounts that are recorded in the KSTAT_TYPE_IO section
of the kstat framework were to be added to the list of disks for display
by any script that shows disk statistics, these lines could be added at
the beginning of the script.

se_add_disk_name("st");
se_add_disk_name("nfs");

This function is declared in the se.se include file.

8.3   The mib Language Class

There is a lot of data residing in the mib variables of the kernel
regarding the network. Unfortunately, these mib variables are not
part of the kstat framework. Therefore, a new language class was
created to ease the access to this information.

Variables of the mib class have a unique feature in that they can be
read, but assigning values will generate a warning from the interpreter.
This is to remind the user that assigning values to the members of the
mib2_* structures will NOT result in the information being placed back
into the kernel. The mib variables are read-only.

Mib variables do not have the permissions limitation of kvm variables. Any
user can view mib variable values without special access permissions.

To view the mib information available from within SymbEL, run the
command "netstat -s" from the command line. All but the IGMP information
is available.

Since all mib variables are structures, the rules regarding structure
assignment being used to cut down on the overhead of the interpreter are
the same as the kstat and kvm classes.

Here is an example of using mib class variables.

#include <mib.se>

main()
{
  mib2_tcp_t mib$tcp;

  printf("Retransmitted TCP segments = %u\n", mib$tcp.tcpRetransSegs);
}

8.4   The ndd Language Class

SunOS 5.x makes access to variables that define the operation of the network
stack available through a command called "ndd" (see ndd(1M)). The ndd language
class within SymbEL provides access to the variables within the IP, ICMP, TCP,
UDP, and ARP modules. The definition of the variables available are in the
ndd.se include file. For each module, there is a structure that contains all
of the variables available for that module.

Some of these variables are read-write and others are read-only. If an attempt
is made to modify a variable that is read only a warning message will be
produced by the interpreter. Some of the read-only variables are tables
which can be quite large.

Like kstat and mib variables, all ndd variables are structures.

The following program displays the tcp_status variable of the TCP module.
This variable is type string and when printed looks like a large table.

#include <stdio.se>
#include <ndd.se>

main()
{
  ndd_tcp_t ndd$tcp;

  puts(ndd$tcp.tcp_status);
}

9   User Defined Classes

The four language classes provide a significant amount of data to a program
for analysis. But the analysis of this data can become convoluted and make
the program difficult to deal with. This is one of the problems that SymbEL
hoped to clear up.

Adding more language classes is a potential solution to this problem.
An example of an additional language class that would be useful is a
"vmstat" class. This would be a structure that provided all of the
information that the "vmstat" program provides. The problem is that
such an addition would make "se" larger and provide functionality that
didn't really require the internals of the interpreter to accomplish.
All of what "vmstat" does can be done by writing a SymbEL program.

In addition to the "vmstat" class, it would be useful to have classes for
"iostat", "mpstat", "nfsstat", "netstat" and any other "stat" program that
provided this type of information. What was needed to accomplish this
task correctly is a language feature that allowed the programmer to create
their own language classes written in SymbEL. This "user defined class"

would be a structure and an associated block of code that was called
whenever one of the members of the structure was accessed. This lead to
the development of the aggregate type "class".

A "class" type is a structure and a block of code inside the structure
that is first called when the block that contains the declaration of the
class variable is entered. Thereafter, whenever a member of the class
variable is accessed, the block is called. To illustrate the class
construct, here is a program that continually displays how long the
system has been up. The first example is without the use of a class.

#include <stdio.se>
#include <unistd.se>
#include <kstat.se>

#define MINUTES (60 * hz)
#define HOURS   (60 * MINUTES)
#define DAYS    (24 * HOURS)

main()
{
  ulong ticks;
  ulong days;
  ulong hours;
  ulong minutes;
  ulong seconds;
  ks_system_misc kstat$misc;
  long hz = sysconf(_SC_CLK_TCK);

  for(;;) {
    ticks = kstat$misc.clk_intr;
    days = ticks / DAYS;
    ticks -= (days * DAYS);
    hours = ticks / HOURS;
    ticks -= (hours * HOURS);
    minutes = ticks / MINUTES;
    ticks -= (minutes * MINUTES);
    seconds = ticks / hz;
    printf("System up for: %4u days %2u hours %2u minutes %2u seconds\r",
      days, hours, minutes, seconds);
    fflush(stdout);
    sleep(1);
  }
}

This program continues in an infinite "for" loop computing the uptime based
on the number of clock ticks the system has received since boot. The
computation is contained completely within the main program. This code
can be distilled into a user defined class as the following code shows.

#include <unistd.se>
#include <kstat.se>

#define MINUTES (60 * hz)
#define HOURS   (60 * MINUTES)
#define DAYS    (24 * HOURS)

class uptime {

  ulong ticks;
  ulong days;
  ulong hours;
  ulong minutes;
  ulong seconds;

  uptime$()
  {
    ks_system_misc kstat$misc;
    long hz = sysconf(_SC_CLK_TCK);

    ticks = kstat$misc.clk_intr;   /* assign these values to the */
    days = ticks / DAYS;           /* class members              */
    ticks -= (days * DAYS);
    hours = ticks / HOURS;
    ticks -= (hours * HOURS);
    minutes = ticks / MINUTES;
    ticks -= (minutes * MINUTES);
    seconds = ticks / hz;
  }
};

The start of the class looks like a structure but the final "member" of
the structure is a block of code called the "class block". The name used
after the "class" keyword is the type name that will be used in the
declaration of the variable. The name of the class block is the prefix
used in variable names to denote that the variable is active. Variables
declared of a user defined class type that do not use the prefix in the
variable name are inactive.

The main() function of the uptime program would now be written to use
the uptime class as shown in this example.

#include <stdio.se>
#include <unistd.se>
#include "uptime_class.se"

main()
{
  uptime uptime$value;
  uptime tmp_uptime;

  for(;;) {
    tmp_uptime = uptime$value;
    printf("System up for: %4u days %2u hours %2u minutes %2u seconds\r",
      tmp_uptime.days, tmp_uptime.hours,
      tmp_uptime.minutes, tmp_uptime.seconds);
    fflush(stdout);
    sleep(1);
  }
}

The previous chapter discussed how the assignment of entire structures
cuts down on the overhead of the system since only one copy is required.
Not only is this true here as well, but the structure copy also assures
that the data printed out represents the calculations of one snapshot
in time, instead of printing different values for each time that the
class block was called to update each member of the class that was
used as a parameter to printf().

9.1   The refresh$ Builtin

As a space saving measure, there is also a builtin function not discussed
previously. This is the "refresh$" builtin. This function can be used
with user defined classes to refresh the contents of the class. It works
the same way as assigning the active variable to the temporary one. In
the above example the code could be changed to

#include <stdio.se>
#include <unistd.se>
#include "uptime_class.se"

main()
{
  uptime up;

  for(;;) {
    refresh$(up);
    printf("System up for: %4u days %2u hours %2u minutes %2u seconds\r",
      up.days, up.hours, up.minutes, up.seconds);
    fflush(stdout);
    sleep(1);
  }
}

Care must be taken when using the "refresh$" builtin. When a program is
written using active variables, the interpreter will call the class method
as an initialization step when it first encounters it. This is not the case
for non-active variables. If this initialization step is required in a
program, it should be performed early in the function. Then, when it is
refreshed later, it will have the initial values already.

10   Pitfalls

These are some of the idiosyncrasies of the language that will catch the
programmer by surprise if they're accustomed to using a particular feature
in C and assume that it will be supported in SymbEL.

  • Only one variable can be declared per line. The variable names may
    not be a comma separated list.

  • There is no type "float". All floating point variables are type "double".

  • Curly braces must surround all sequences of statements in control
    structures. This includes sequences of length one.

  • The comparators work with scalars, floats and strings. Therefore, the
    logical comparison ("hello" == "world") is valid and in this case will
    return false.

  • If the result of an expression yields a floating value as an operand to
    the modulus operator, it is converted to long before the operation takes
    place. This happens while the program is running.

  • Assignment of the result of a logical expression is not allowed.

  • The "for" loop has some limitations.

    • There may only be one assignment in the assignment part.
    • There may only be logical expressions in the while part.
    • There may only be one assignment in the do part.
  • All local variables have "static" semantics.

  • All parameters are passed by value

  • Global variables can be assigned the value of a function call.

  • while(running) is not syntactically correct. while(running != 0) is.

  • There is no recursion in SymbEL.

  • Structure comparison is not supported.

  • Syntax of conditional expression is rigid: ( condition ? do_exp : else_exp )

  • Calling attached functions with incorrect values can result in a core dump
    and is not avoidable by the interpreter. This simple but effective
    script will cause a segmentation fault core dump.

    #include <stdio.se>
    
    main()
    {
      puts(nil);
    }
    

11   Tricks

Knowing the internals of the interpreter gives me insight into how to
accomplish some things that may not seem immediately obvious. Here are
some novel ways to shoot yourself in the foot.

11.1   Returning an Array of Non-Structured Type from a Function

Although it is not allowed to declare a function as

int []
not_legal()
{
  int array[ARRAY_SIZE] = { 1, 2, 3, 4, 5, -1 };

  return array;
}

it is still possible to return an array. Granted this is unattractive, but
most of the tricks in this chapter will involve something that is not very
appealing from the programming standpoint. But SymbEL is, after all, just
a scripting language. And if it can be done at all, it's worth doing. So
here's how it can be done.

#define ARRAY_SIZE 128

ulong
it_is_legal()
{
  int array[ARRAY_SIZE] = { 1, 2, 3, 4, 5, -1 };

  return &array;
}

struct array_struct {
  int array[ARRAY_SIZE];
};

main()
{
  array_struct digits;
  ulong address;
  int i;

  address = it_is_legal();
  struct_fill(digits, address);
  for(i=0; digits.array[i] != -1; i++) {
    printf("%d\n", digits.array[i]);
  }
}

11.2   Using Attached Functions Return Values

It is common to read input lines using "fgets" and then locate the new-line
character with "strchr" and change it to a null character. This has
unexpected results in SymbEL. For instance, the code segment

while(fgets(buf, sizeof(buf), stdin) ! = nil) {
  p = strchr(buf, '\n');
  p[0] = '\0';
  puts(buf);
}

would be expected to null the new-line character and print the line (yes I
know this code segment will cause "se" to exit with a null pointer exception
if a line is read with no new-line character). But this is not the case
since the "strchr" function will return a string that is assigned to the
variable "p". When this happens, a new copy of the string returned by
"strchr" is allocated and assigned to "p". When the "p[0] = '0';" line
is executed the new-line character in the copy is made null. The original

"buf" from the "fgets" call remains intact. The way around this (and this
should only be done when it is certain that the input lines contain the
new-line character) is

while(fgets(buf, sizeof(buf), stdin) ! = nil) {
  strcpy(strchr(buf, '\n'), "");
  puts(buf);
}

In this case, the result of the "strchr" call is never assigned to a variable
and its return value remains uncopied before being sent to the "strcpy"

function. Strcpy then copies the string "" onto the new-line and turns
it to the null character in doing so.

11.3   Using kvm Variables and Functions

Using the kvm functions and dealing with kvm variables in general is quite
confusing since there are so many levels of indirection of pointers. This
simple script performs the equivalent of "/bin/uname -m".

#include <stdio.se>

#include <devinfo.se>

main()
{
  ulong kvm$top_devinfo;     // top_devinfo is an actual kernel variable
  dev_info_t kvm$root_node;  // root_node is not, but it needs to be active

  // The next line effects a pointer indirection.  The value of top_devinfo
  // is a pointer to the root of the devinfo tree in the kernel.  This value
  // is extracted and the root_node variable has its kernel address changed
  // to this value.  Accessing the root_node variable after this assignment
  // will cause the reading of the dev_info_t structure from the kernel
  // since root_node is an active variable.  Note that root_node is not
  // a variable in the kernel though, but it's declared active so that
  // the value will be read out *after* it's given a valid kernel address.

  // And there's no need to explicitly read the string, it's done already.

  kvm_cvt(kvm$root_node, kvm$top_devinfo);
  puts(kvm$root_node.devi_name);
}

Another example of extracting kvm values is with the "kvm_declare" function.
This allows kernel variables to be declared while the program is running.
Instead of declaring a kvm variable for "maxusers", for instance, it could
be done this way

main()
{
  ulong address;
  int kvm$integer_value;

  address = kvm_declare("maxusers");
  kvm_cvt(kvm$integer_value, address);
  printf("maxusers is %d\n", kvm$integer_value);
}

A more general way to peruse integer variables entered at the user's leisure
is shown in this example.

#include <stdio.se>
#include <string.se>

int main()
{
  char var_name[BUFSIZ];
  ulong address;
  int kvm$variable;

  for(;;) {
    fputs("Enter the name of an integer variable: ", stdout);
    if (fgets(var_name, sizeof(var_name), stdin) == nil) {
      return 0;
    }
    strcpy(strchr(var_name, '\n'), ""); // chop
    address = kvm_declare(var_name);    // look it up with nlist
    if (address == 0) {
      printf("variable %s is not found in the kernel space\n", var_name);
      continue;
    }
    kvm_cvt(kvm$variable, address);     // convert the address of the kvm var
    printf("%s = %u\n", var_name, kvm$variable);
  }
}

11.4   Using an "attach" Block to call Interpreter Functions

The "attach" feature of SymbEL is an implementation of using the dynamic
linking feature of Solaris. The "dl" functions allow an external library
to be attached to a running process, thus making the symbols within that
binary available to the program.

One of the features of using dynamic linking is the ability to access
symbols within the binary that is running. That is to say, a process
can look into itself for symbols. This can also be accomplished in
SymbEL by using an attach block with no name. With this trick a script
can call functions contained within the interpreter. This does require
that the author of the script know what functions are available to begin
with though. Currently, the only list of functions available to the
user are listed in the "se.se" include file.

The most useful of these functions is the "se_function_call" function.
This allows the script to call a SymbEL function indirectly. This can
be used for a callback mechanism. It's the equivalent of a pointer to
a function. For example, this script calls the function "callback"
indirectly.

#include <se.se>

main()
{
  se_function_call("callback", 3, 2, 1);
}

callback(int a, int b, int c)
{
  printf("a = %d b = %d c = %d\n", a, b, c);
}

The "se_function_call" function is declared with an ellipsis argument so
any number of parameters can be passed