PART 1 == http://sourceforge.net/mailarchive/message.php?msg_id=10340294
So, I hope everyone"s here :-)
I"ll try to write a series of articles starting now. Please comment, ask
if something"s completely unclear, add things you think that should be
mentioned.
The first step is only a small and not very useful one, but this way
we give overseas readers a chance to join the mailing list before we
get into the interesting stuff (good excuse, isn"t it?)
Intro
-----
(please have a look at
http://bigsister.graeff.com/bsdevel/img01_intro.gif )
Each test MUST be defined somewhere in the etc/tests.cfg file. Since
the tests.cfg file does include every file in the etc/testdef directory
it is good style to put single tests or a series of similiar tests into
one file within the etc/testdef directory.
The test definition in tests.cfg tells uxmon what syntax the test
requires in uxmon-net, what arguments it takes, what requesters uxmon
must query, and how the query results are to be interpreted. Test
definitions are written in a special language.
Therefore many tests or test groups have an associated requester module.
It"s responsibility is to extract system information from the
underlying system (or, of course, via network) and pass this information
on to the uxmon in a standardized but flexible way. Each piece of
information is passed as one system information "variable" with a
name like
modulename.variablepath.variablename.index
For those who know some SNMP this will look familiar, as e.g. an
SNMP ifDescr variable out of the Internet-MIB will be represented
by the Requester::snmp as
snmp.ifDescr.0
Requester modules are implemented as a Perl object.
After uxmon retrieves information from requesters as directed by the
test definition in tests.cfg it sends this information to the Big
Sister server. While status information (the known status colors and
status texts) is directly handled by the server, bsmon does need some
more information on the semantics of performance data in order to create
graphs. This information is taken out of the etc/graphtemplates file.
Sind graphtemplates includes the whole graphdef directory, again it is
good style to put graph definitions for a single test or a series of
tests into one file in the graphdef directory.
Graph definitions are written in their own specific language.
... so far for today ...
PART 2 == http://sourceforge.net/mailarchive/message.php?msg_id=10346242
Are you listening? Here comes the 2nd part of the introduction to writing custom
tests ...
Implementing our first test
---------------------------
Time for implementing our first test. As I am a little bit lazy
I"ll go for re-implementing the "users" test in this series. But
in order to be able to test things without interfering with
existing tests we call our test "ourusers".
We learned in the introduction that the one thing that is mandatory
is a test description in etc/tests.cfg or rather etc/testdef.
So we start having a look at this.
Test descriptions use their own language, which does look slightly
familiar to everyone knowing some programming languages. First of
all, every description has got a name. This name must be unique
and need not necessarily be related to the name Big Sister admins
will use in uxmon-net. As we want the name to be unique and we want
to implement the "ourusers" test we go for the name "demo_ourusers". The
empty shell of the test definition looks like that:
test demo_ourusers {
}
that"s it, this is a valid test definition - but it does not yet
describe a test one could use in uxmon-net.
Each test definition is composed of a "global" section at its start
and a number of sub sections - readers used to OO programming can
replace "global section" by "static code" and "sub sections" by
"methods". In the global section we usually declare and initialize
a number of variables we will later use in the sub sections. A few
of these global (or static) variables will be used by uxmon in order
to match references in uxmon-net with our test definition. First of
all, if we want to make our test available to uxmon-net, we must
define a name. We want it to be "ourusers" and uxmon will look for a
variable called "name", so we go for
test demo_ourusers {
static set name "ourusers";
}
We set variables via a statement called set with the syntax
set name value;
and statements end by a colon. So far so good. Each statement
may be introduced by one of "static", "pernode" or "instance".
This defines the scope of the variable. Let"s imagine we have
got something like
localhost arg1=5 ourusers
localhost arg1=6 ourusers
somehost arg1=5 ourusers
this will make uxmon create 3 instances of our demo_ourusers
test. Two of them will have set arg1=5, one arg1=6, one is
targetted on somehost, the other two on localhost.
Each of these three test will share all the variables declared
static. Variables declared pernode will shared by all the instances
targetted on the same host, so pernode variables will be shared
between the two ourusers tests above, while the 3rd will get its
own. Variables declared instance will only be visible within each
single instance, so if demo_ourusers has got an instance variable
each 3 of the above tests will get its own copy whith not necessarily
the same value.
Only static variables will be chosen for matching uxmon-net entries
with test definitions, since at the point where uxmon searches
test definitions they are - of course - not instanciated nor related
to a target host, yet.
Are you still with us? I know, too much theory.
So, we"ve got a first test which uxmon will find - stop, uxmon will
not yet be happy with our test. You remember that with new style tests
you have to set a DESCRIPTION in uxmon-net, e.g. something like
DESCR features=unix,linux localhost
...
localhost ourusers
In our test we have to declare to which features it will match. Let"s
say, our users test will work on unix systems, so we say something like
test demo_ourusers {
static set name "ourusers";
static set features local unix;
}
We learn that set may take multiple values. This is because variables
are thought to be lists (call them arrays, if you prefer). You will
also notice that I did not use quotes around local and unix. This is
a little bit of liberty I chose, one could as well write
static set features "local" "unix";
or even
static set "features" "local" "unix";
No problem, is it?
You certainly also noticed that I used the feature "local". While
I do not need to DESCR hosts in uxmon-net local or remote I have to
when it comes to test definitions. For the DESCR in uxmon-net uxmon
will silently add the feature "local" or "remote", if it thinks, a
host looks like the local or a remote host.
So it"s a good idea to add "local" to the features list if a test will
only work against a local machine.
Ok, time for putting our test in a place uxmon can find it. Please
create a file etc/testdef/ourtest.cfg and put the above text into.
Try running
bin/testers
and see if "ourusers" actually appears in the list of known tests. Great!
Try to get a syntax description out of bin/testers:
bin/testers -f local -f unix -t ourusers
Oops, where is all the text we usually get as an answer? You certainly
guessed it, the "testers" command does not magically write it on its own,
we have to supply the text by adding
static set description "Demo variation of the users test";
static set arguments
"perf:int:report perf data every perf minutes"
"item:string:Item as which to report.";
We do not know at this point what arguments our test will take, but
"perf" and "item" is already a good start since this is what uxmon
will handle automatically even if we do not add any other handling
code.
Now, you are certainly ready for seeing our test appear on the Big Sister
server - but not tonight, I leave this for tomorrow"s chapter.
PART 3 == http://sourceforge.net/mailarchive/message.php?msg_id=10351729
Get reports
-----------
In part 2 we got uxmon to know about our test but we did not
yet get any reports. It is time we went ahead.
First of all we have to tell uxmon to which column our test
results should go to, we add
pernode set report_item "cpu";
to our existing test definition.
Before we can make uxmon send status reports to the Big Sister
server we have to remember that there is more than the global
section (aka. static code) in a test definition, we may define
other sections (aka. methods), each of them having their own
meaning.
The method which is responsible for getting system information
out of the monitored system is called "monitor". This method is
called every "frequency" minutes, thus every monitoring cycle.
In our case we have to find out how many users are logged in.
Let"s assume for now the number of users is available via the
requester called "who" and that the variable the who requester
sets is hrSystemDistinctUsers. So we add a monitoring section
retrieving the number of users:
test demo_ourusers {
static set name "ourusers";
static set features local unix;
static set description "Demo variation of the users test";
static set arguments
"perf:int:report perf data every perf minutes"
"item:string:Item as which to report.";
pernode set report_item "cpu";
pernode monitor {
get who.hrSystemDistinctUsers[0];
pernode set distinctusers ${who.hrSystemDistinctUsers[0]};
}
}
We just request the number of distinct users from the who requester
and store it in the variable distinctusers. Remember, that "pernode"
means that our distinctusers variable as well as the monitor method
is nailed to one target host, even if we put something like
localhost arg1=5 ourusers
localhost arg1=6 ourusers
in uxmon-net, the monitor method is only called once, since both tests
apply to the same host. This is a little bit tricky, but may save
quite some CPU time for tests more complex than this.
So, now we know the number of users logged in to the target system
internally. The next thing to do is to add a "report" method. This
method is called every "frequency" minutes like the monitor method,
but it is guaranteed that report is only called after every monitor
methods have successfully been processed. Should "report" rely on
the results of multiple monitor methods we can be sure that they are
there.
Most of our tests will report two things:
1) A status color together with a short text
2) A longer descriptive text
Let"s start with the simple one: The descriptive text. There is a
statement called "comment" which will just send an arbitrary text
to our server(s). Our report method could look like this:
report {
comment "html" "
"
"Number of distinct users logged in:"
"" ${disticntusers};
}
The first argument of comment does tell us if the comment is formatted
HTML or just text. We choose HTML. Any further arguments are
concatenated
and send as is to the server.
Of course, we also want to get a status color - tests not sending any
status are invalid, anyway. The statement being responsible for sending
status is called - you guessed it: status. Our report method sending
out status "green" in any case might look like this:
report {
status green 1 "&green" " " ${distinctusers} " users";
comment "html" "
"
"Number of distinct users logged in:"
"" ${disticntusers};
}
The first argument is the status color, the second argument must be a
boolean value (like in Perl or C language anything not being "0" or
empty is considered "true"). This boolean value tells uxmon if the
status
statement is to be considered or not. Note that our test definitions
language does not now an "if" statement, so we have to have the
condition
in our "status" statement itself.
The arguments starting with the 3rd one are the usual short descriptive
text associated with every test. By convention we include the status
color there, the leading ampersand "&" will be interpreted by the Big
Sister server and be replaced by an icon.
Please save our new test definition in etc/testdef/ourtest.cfg, and
create
a simple uxmon-net like
DESCR features=unix localhost
localhost ourusers
localhost bsdisplay
When running
uxmon -D 5
you will see that our just declared test will be run and see as well
that status and text is sent to the server.
Of course, always reporting green is not much fun. We have to decide
when to send which status color. Let"s say we want our test to sent
red if the number of users logged in is getting above 5. We just need
one more line for this:
report {
status red ( ${distinctusers} > 5 ) "&red" " " ${distinctusers} " users
(too many)";
status green 1 "&green" " " ${distinctusers} " users";
...
As we have already learned the second argument to the status statement
is a conditional. In this case we say "red" should apply if the value
of distinctusers is higher than 5.
Note that the syntax of the test definition language is very limitted.
For instance, do not think you can use something like
$distinctusers>5
in the example above, since variables must be referenced as ${...}, and
the parser will think
something>5
is one single word, not treating this as an operation.
Probably you will wonder why uxmon will only report "red" if distinctusers
goes above 5, since our "green" condition still applies. This is a little
hard-coded shortcut. "status" lines emitting a status which is better than
already set are silently ignored. Thus, if the first line tells uxmon that
the test status is red, any line trying to say the status is rather yellow,
purple, green or whatever else that is better than red are just ignored.
You can see this if you place the "status green" line before the "status
red" one. The resulting effect is that in case distinctusers grows above
5 the test will report both green *and* red.
Note that it is a good idea to sort status statements by severity, most
severe first. If you intentionally want a test to report multiple stati
you will have to either use multiple report methods (oh, yes, that"s
possible - we will see later on in this series) or use a loop (one other
idea we will meet again, later).
That"s enough for today.
PART 4 == http://sourceforge.net/mailarchive/message.php?msg_id=10360571
Accepting arguments
-------------------
We went ahead to having implemented our very first test. Now, of
course, most of the time the Big Sister users will expect to get
some influence on what your test will treat as problem. We don"t
want to just have something like
localhost ourusers
in uxmon-net, but rather like to be more specific like
localhost maxusers=10 warnusers=8 ourusers
In order to accept the arguments maxusers and warnusers we have
to add some code to our test. First of all we have to retrieve
the values of the arguments and store them into variables we then
can use when determining status codes. We already know the "monitor"
and "report" method. You will guess that neither of these is the
right place for handling arguments, since they are called regularly.
Exactly for this purpose uxmon knows a method called "init", which
is called exactly once for every occurrence of our test in uxmon-net.
The init method must look something like
init {
instance import maxusers;
instance import warnusers;
}
This will make our test import the arguments "maxusers" and "warnusers"
into instance variables called the same. You can import the values to
a variable with a different name than the argument with the two-argument
form of import, e.g. like
instance import maxusers umax;
Read the above statement as "import maxusers argument as variable umax".
Most of the time it is useful to have some default value if the argument
is not set in uxmon-net. Import will only (re-)set the variable if the
argument is present, thus
instance set maxusers 10;
instance import maxusers;
will set maxusers to 10, then look at uxmon-net and set whatever value
is set there unless there is no value.
Please not that it is not only good style to use "import" in the init
method - using it outside will lead to undefined behaviour (well, it
will work, but don"t do it, anyway).
We can now use the imported arguments in our report section:
report {
status red ( ${distinctusers} > ${maxusers} )
"&red" " " ${distinctusers} " users (way too many)";
status yellow ( ${distinctusers} > ${warnusers} )
"&yellow" " " ${distinctusers} " users (too many)";
status green 1 "&green" " " ${distinctusers} " users";
comment "html" "
"
"Number of distinct users logged in:"
"" ${distinctusers};
}
There is no magic, here. Note, we just added "yellow" and we obey to the
rule that status statements should be ordered by severity.
Nevertheless we have just skipped two important things. First of all
anyone using our test will not know what arguments it takes unless having
a close look at our test definition. We must add a description of the
two arguments to the "arguments" variable:
"maxusers:int:Test goes red if number of users goes above this (default: 10)"
"warnusers:int:Test goes yellow if number of users goes above this (default: 8)"
The 2nd thing is that there are a number of arguments every test should
accept. So just add
instance import item report_item;
instance import perf perf_frequency;
to the init section. Our test definition now looks like that:
test demo_ourusers {
static set name "ourusers";
static set features local unix;
static set description "Demo variation of the users test";
static set arguments
"perf:int:report perf data every perf minutes"
"item:string:Item as which to report.";
pernode set report_item "cpu";
init {
instance set maxusers 10;
instance set warnusers 8;
instance import maxusers;
instance import warnusers;
instance import item report_item;
instance import perf perf_frequency;
}
pernode monitor {
get who.hrSystemDistinctUsers[0];
pernode set distinctusers ${who.hrSystemDistinctUsers[0]};
}
report {
status red ( ${distinctusers} > ${maxusers} )
"&red" " " ${distinctusers} " users (way too many)";
status yellow ( ${distinctusers} > ${warnusers} )
"&yellow" " " ${distinctusers} " users (too many)";
status green 1 "&green" " " ${distinctusers} " users";
comment "html" "
"
"Number of distinct users logged in:"
"" ${distinctusers};
}
}
Debugging
---------
At some point of your trials you will notice that the test definition
language is very - hmh - flexible. This means that nearly whatever you
write uxmon will interprete it somehow, where "somehow" does not
necessarily imply that your interpretation is the same as uxmon"s. In
order to make uxmon be a little bit more noisy about how it is
evaluating your test it is a good thing to use the debug statement.
For instance, change your "monitor" method into something like
pernode monitor {
get who.hrSystemDistinctUsers[0];
pernode set distinctusers ${who.hrSystemDistinctUsers[0]};
debug 2 "number of users: " ${distinctusers};
}
This will just make uxmon output its arguments if the debug level you
are using (the -D command line argument ...) is >=2. The debug statement
is just another instance of the good old print statements pretty all of
us where using at the beginning of their career as a programmer.
PART 5 == http://sourceforge.net/mailarchive/message.php?msg_id=10370229
Performance graphing
--------------------
Status reporting is great, however most of us do like some
more or less nice graphs. Getting Big Sister to emit simple
graphs for our "ourusers" test will not take much effort.
You have certainly already expected what I am going to write
you: You have to add another method in order to tell uxmon
what values it should send to the server. On the server side
you have to give bsmon a hint on what to do with the incoming
values by creating a graphtemplate. Let"s start with uxmon.
Our test definition just gets extended by
perf {
export always single.OurDistinctUsers, ${distinctusers};
}
The perf method is invoked by uxmon once every perf cycle. By
having
instance import perf perf_frequency;
in our init method we give the admins a chance to set the duration
of each perf cycle via the "perf" argument, so that
localhost perf=5 maxusers=10 warnusers=8 ourusers
in uxmon-net will tell the ourusers test to send performance data
every 5 minutes.
The export statement"s first argument specifies how often the
variable should be sent to the server: "always", "often" or "sometimes".
If you chose always the export statement applies in each perf cycle,
while often and sometimes declared exports will be executed less often
(currently "often" means every second cycle, "sometimes" every 4th
cycle, but this might change).
For our simple test we just send one single value, the number of
distinct users and we send it in each cycle. You will want to do this
for every performance value you ever collect. On the other hand more
sophisticated tests will often not only submit the monitored values,
but also some meta information (like e.g. legend information or the like).
This meta information usually stays the same for a long time period, so
there is no need to waste bandwidth sending it each time. In this case
you will make use of "often" or "sometimes".
The second argument of export is the name the variable should get when
it is sent to the server. The last argument is a value or a list of values
to be sent. Note that the first value in the list will be sent as
single.OurDistinctUsers.0
and so on.
The comma between the variable name and the list of values is mandatory,
better do not ask why, this is a requirement imposed by the internal
design.
If you run uxmon now - I suggest you add "perf=1" to uxmon-net in order
to get performance data in short intervals - you will see performance
data loggen in the status.log file on the server side. That is all that
happens, we will not see a graph coming up. This is because the server
does just get some values but does not yet know the meaning of this values.
We have to create a graph template for bsmon. As tests.cfg the graphtemplates
are in their own file called etc/graphtemplates. And, like tests.cfg,
graphtemplates includes every file in etc/graphdef. So we just add a file
called etc/graphdef/ourusers.cfg.
By browsing through graphtemplates you will probably find out yourself
what templates look like, our ourusers graph might be defined like this:
graph ourdistinctusers
input single.OurDistinctUsers.0
value-type gauge
input-interval $shortInterval
input-deadtime $shortDeadTime
graph-id $HOST.ourdistinctusers
title Number of Users
legend # distinct users
unit-label users
size 400x120
Each graph must have a unique name, in our case it is specified with
the "graph ourdistinctusers" statement. The performance variable we want
to graph is single.OurDistinctUsers.0, so this is our "input". The value
is to be graphed as is, it"s of type "gauge" (other accepted types are
counter, derive, absolute - you will probably never use them since you have
to preprocess the performance data on uxmon side for determining status
information, anyway).
One thing to know is that our graph engine uses RRDTool as a backend. RRD
does not store all the data it gets in a growing table. It will just allocate
a fixed size ring (or rather multiple rings) at the time the graph is created.
The data then is pushed into this ring. As soon as the ring is full each time
a new value is pushed in the oldest value is thrown out. This way each graph
has its own fixed size and will not grow indefinitely.
Since we are not only interested in the last few hours RRD does allow multiple
rings - a first ring might take the last 120 values in 5 minute intervals,
a 2nd one might take the last 120 values, but this time in 30 minute intervals.
So the first ring will show us data for the last 10 hours in a granularity
of 5 minutes. The 2nd ring will allow us to watch a graph going back 60 hours,
but we only get a granularity of 30 minutes. Is Everything unclear up to this
point?
To put it in a nutshell: When bsmon creates a graph it will create it with
4 rings, one containing 2 days of data in the specified interval, one
containing one week of data, one containing one month of data, and finally
one containing a year.
In order to know what size the graph has to be created with bsmon will have
to know the time between two incoming values. This is the "input-interval".
If we use "perf=1" in uxmon-net the "input-interval" should be 60 (the
input interval is in units of seconds). If it is larger than 60, then RRD
will not store each value individually, it will build average values for
all performance values coming in during one input-interval.
The default values for $shortInterval are 10 minutes, so in our case we
will get a 10-minute average. Didn"t you ever wonder why Big Sister was
graphing something like 3.4 distinct users at a given time? Now, you know
why ...
The input-deadtime value is related to that. If RRD does not get any values
for more than input-deadtime seconds, then the graph gets a gap.
I do not have to explain settings like title, legend, unit-label or size,
do I?
One important concept, however, is the idea graph-id setting. Other than
the graph name this need not necessarily be unique. If multiple graphs with
the same graph-id exist, then bsmon or rather the web frontend will just draw
them in the same chart.
Note during testing that bsmon uses some caches in order to speed up
performance value handling. Most important, if graph creation fails for
any reason (usually because RRDTool is unavailable, because there is no
graph template or because ther is an error in the graph template), then
the performance variables being involved with the failing graph get
blacklisted for a while and will just be ignored. Therefore, during testing
you might circumvent headache by re-starting bsmon after performance
monitoring changes.
PART 6 == http://sourceforge.net/mailarchive/message.php?msg_id=10379122
Modules
-------
This article will lead us on a little excursion. We have seen
that typical tests will usually require at least two config files,
more to come. While you are writing your own site-specific tests
you will not necessarily take the time to carefully use file, test
and graph names until of course you either lose control, would like
to move your precious configuration over to a new server or you would
like to publish it.
In order to simplify getting under control which files/configuration
are linked together, Big Sister supports a simple module concept. Using
the "bsmodule" command you can list installed modules, install
additional modules and remove those you do not need any more. It is a
good idea to organize your own tests in modules.
By issuing "bsmodule" command without any additional arguments you will
get a list of installed modules. Note that the core Big Sister
distribution is partly implemented as modules - this explains why
you do not get an empty list even if you have not ever cared about
installing modules:
$ /usr/sbin/bsmodule
CPUperf (01.00) CPU performance monitoring
etherport (01.00) Check port operating status
[...]
We want to see our own "ourusers" test in this list. This time we are
lucky - crafting a module is rather easy once you have written the
code/config files you want to include.
First of all we have to decide which name our module will have. This
must be some unique name - we take "ourusers". The next thing to do is
to create an empty directory named like the module itself, so we do a
mkdir ourusers
Next we will just put a number of well-defined files into this
directory.
In order to know a few things about the module Big Sister will look for
a file called module.info. This file will look something like
module = ourusers
descr = Monitors number of users logged in
version = 01.00
author = Thomas Aeby
This is it. There is one other setting which is regularly used: You can
tell Big Sister that this module requires other modules to be installed
first by adding something like
depends = module1 module2
The next file we create is one of those I personally obviously hate
(:-)),
it"s a README file. Put any short or longish text describing your module
into this file.
So far so good. Now we get ahead to the more important things. Copy your
testdef/ourusers.cfg file into the directory, but call it "tests.cfg". Big
Sister will know that this file contains your test definitions. In order
to avoid naming conflicts tests defined in a modules tests.cfg start with
the modules name followed by an underscore and an arbitrary name. In our
case we are using "ourusers", which is already ok. If we were strict we
would rename our test to something like
test ourusers_users {
...
The next thing to do is to copy in our graph templates. Just copy the
etc/graphdef/ourusers.cfg file and call it "graphtemplates". Again, our
graph names should follow the above rule, so our template should be
renamed to something like
graph ourusers_distinctusers
In one of the next articles we will learn where to store requester
implementations
within a module, but as we do not have one, yet we leave this out for
now.
So, our new module is already ready for installation - stop, Big Sister will
expect the module to be a tar archive rather than a directory, so the next
step is
tar cf ourusers.mod ourusers
Now we can actually install the module running bsmodule like
bsmodule install ourusers.mod
Check back using "bsmodule list" if the module is appearing in the list. You
will have to restart Big Sister since bsmodule just copies the files in.
When copying the files they will automatically be renamed and copied to the
right target directory, the mapping is
module.info => etc/moduleinfo/.info
README => etc/moduleinfo/.readme
tests.cfg => etc/testdef/.cfg
graphtemplates => etc/graphdef/.cfg
Requester/* => uxmon/Requester/*
An interesting feature is that bsmon will automatically look for modules
on the Big Sister download server on
http://software.graeff.com/bigsis-modules/
If you do not want bsmodule to do this you will use the -n option.
As you see the bigsis-modules page does not contain too many modules, please
feel free to send me modules that you think might be of common interest
so that I can publish them on this server.
PART 7 == http://sourceforge.net/mailarchive/message.php?msg_id=10386628
This will be the last article this week because even I (:-))
am going to celebrate Christmas. Expect the next article on
Monday.
Advanced Test Definition Language 1
-----------------------------------
We know enough tests.cfg language now to write simple tests
ourselves. When looking at tests.cfg you see that there is
much more than we used until now. In the hope that the basic
idea is clear I will just heap a few advanced syntax/semantic
on you expecting you to digest without a major brain-breakdown.
I will get back to illustrate these with an example near the
end of the article series.
First of all there are many more methods uxmon knows than we
used up to now. After this paragraph you will know all of them,
too, though possibly will not understand all of their significance
immediately.
The "precheck" method is used whenever uxmon is searching for a
test matching a uxmon-net configuration. If a precheck method
exists it is expected to set a variable called "test_available"
with a boolean value. If the variable is false (0 or empty) after
precheck then uxmon will assume that this test definition will
not work on this system or with this parameters and will just
ignore this definition. In precheck you will usually test if a
certain requester is available and willing to perform. In most
cases it will be unwilling to perform if it requires special
system commands or APIs to exist and they are not there. For
instance the "who" test we use in our example test definition
will not work if the who command does not exist on the monitored
system. Our precheck rule could then look like
precheck {
set test_available domain_available( who );
}
The domain_available function will just look through the table
of requesters and see if a requester for "who" is registered and
if it is willing to accept requests. In our special case the above
precheck method will be useless since the actual implementation
of who was written before precheck was introduced and does not
provide information about availability - which means that uxmon
will treat it as if it were always available.
The "discover" method is invoked after a test is "init"-ed and
before the first test cycle (monitor and perf sequences) is run.
After the first run it is regularly re-invoked. You have to set
the time between two discover runs by setting static variables
for instance like this:
set discover_refresh 60*2;
set discover_retry 60;
which means: run discover every 2 hours (120 minutes). If one
discover run fails then run it again after 1 hour (discover_retry).
The discover method is expected to go discover the underlying system
before actual monitoring starts. A typical discover implementation
may e.g. look for available disks, network interfaces or the like
and store data that does not change often (like e.g. interface names/
speeds, disk partition names/devices, etc.) internally, so that the
monitor method can be limited to just get the real, fast changing
performance data out of the system. This reduces the size of the
monitor method and - more important - will save lots of CPU time or
network bandwidth for some tests. A discover method might look like
pernode discover {
get @snmp.ifType @snmp.ifSpeed @snmp.ifDescr;
pernode set interfacetype ${snmp.ifType};
pernode set interfacespeed ${snmp.ifSpeed};
pernode set interfacename ${snmp.ifDescr};
}
which will get network interface information (interface type, speed
and name) out of an SNMP accessible device. This information does not
change very often and is valid for all tests aiming at the same host,
so we can do this in a "pernode discover". Within the monitor method
we then can confine ourselves to just get the quickly changing information
like network load:
pernode monitor {
get @snmp.ifInOctets @snmp.ifOutOctets;
...
After each successfull discover run uxmon will invoke the "postdiscover"
method. When postdiscover is run you can expect every other scheduled
discover method to be completed. That is important if you use pernode
or static variables in discover that might be shared between multiple
tests. postdiscover can rely on them to be set and can safely do
any necessary post-processing. Often one uses "pernode discover" methods
in order to "get" information and then refine specific information in
instance "postdiscover" methods selecting only the information out of
the heap that is relevant for this single instance with the supplied
arguments in uxmon-net.
You also know the "monitor" method. Note that there is also a "postmonitor"
method. Similiar to postdiscover postmonitor will be run after all the
monitor methods scheduled during a cycle are terminated and it is expected
to do some post-processing on shared variables.
You already know the "perf" and "report" methods - nothing to add, do not
expect postperf and postreport to exist. There has never been a need for
such a thing.
One last method is called "unconfigure". You know that uxmon will
automatically re-read uxmon-net after this file is changed. During this
re-configuring our tests will be affected, too. All the test instances
will be dropped and uxmon will set up new instances for every configured
test. Static variables and pernode variables may survive this re-
configuration since they are bonded to each host or each test definition
rather than to a test instance. Therefore uxmon will call "unconfigure"
when dropping the old configuration. The job of "unconfigure" is to
bring pernode and static variables back to a defined value where they
do not make init/discover/monitor of newly set up tests fail because
they are irritated by information left over.
Note that only the discover and the monitor methods may request data
from requesters via "get" - though not obeying to this rule will
in most cases not be answered with an error message. Note, too, that
you must not use the "status" or "comment" statements outside of
report methods, and that the "export" statement must be within a
perf method. The "import" statement should only appear in the init
method though it will work in every other method, too.
PART 8 == http://sourceforge.net/mailarchive/message.php?msg_id=10408929
Back from Chrismas holiday I"ll write two more articles
on tests.cfg before we attempt to write our first Requester
module, ourselves. I"ll have some 14-hours-workdays this
week so please forgive me if I skipped one day without
posting an article ...
Inheritance
-----------
The tests.cfg language borrows a few concepts from object
oriented programming. One of them is inheritance. One can
define a new test basing on an existing test and just re-
write the parts that are unique for the new test.
Do you rememer our "ourusers" test? It actually uses the
who command to get the number of users logged in on the
system. Of course, this will not work via network, so we
cannot monitor remote systems using the ourusers test. One
possible solution: We could extend the who requester so
that it uses the rwho command on request. We are not going
to implement this, however. Instead, I assume the remote
system does support SNMP. In order to get the number of
users out of a box supporting the Host MIB we need to retrieve
the SNMP variable
hrSystemDistinctUsers
which is by chance (well, intentionally but not mandatory)
the same variable our who requester uses.
Do not mind if you are not familiar with SNMP. We are just
going to use another requester, the "snmp" requester, and
read some information this requester provides for us in order
to demonstrate inheritance.
At this point we could just copy&modify our ourusers test, use
the right features list like
set features remote;
and replace the monitor section by this:
pernode monitor {
get snmp.hrSystemDistinctUsers[0];
pernode set distinctusers ${snmp.hrSystemDistinctUsers[0]};
}
Do not forget that the test definition"s name must be unique, so we e.g.
use
test demo_ourusers_snmp {
...
There we are, now our test gets his information via SNMP.
However, copying the whole thing lacks some elegance and if
we ever should have to modify the ourusers test, for instance
because we ran into a bug, we will have to manually modify
every copied instance, too.
The way to go is to *extend* the ourusers test:
test demo_ourusers_snmp extends demo_ourusers {
...
When uxmon is reading test definitions the demo_ourusers_snmp
test will on the fly be copied by uxmon itself. Of course, our
SNMP test is not identical to the old one. Therefore, our
definition is not yet complete. First of all we have to adjust
the features list:
test demo_ourusers_snmp extends demo_ourusers {
static set features remote;
}
When uxmon reads uxmon-net it will always put "remote" in the
features list of any test targetting on a non-local machine, so
by using "remote" we just claim that our test will work against
any remote machine.
The name and arguments list stays the same, so we do not need
to set them again - Big Sister has already slurped them in.
We also have to replace the monitor section. We can do this by
extending our test definition to
test demo_ourusers_snmp extends demo_ourusers {
static set features remote;
overrides pernode monitor {
get snmp.hrSystemDistinctUsers[0];
pernode set distinctusers ${snmp.hrSystemDistinctUsers[0]};
}
}
Note that unlike the static variables we explicitly have to use
"overrides" together with our new "monitor" section. This is because
it is completely legal to have multiple monitor sections - they are
evaluated one after the other. So if we just use
pernode monitor {
...
in our test definition Big Sister will think that we want to *add*
another monitoring section. But as we want the original implementation
to be replaced by the new one, we tell Big Sister by declaring it
as an "override"-ing one.
So, that"s actually it. We know have two tests, both being called
"ouruser" when used in uxmon-net, one implementing local monitoring,
one remote monitoring.
If you now run
testers -f remote -t ourusers
You will actually get the description of our newly created test.
But compare this to
testers -f local -t ourusers
How do you actually know if Big Sister is using the right test
definition? It is good style to use different descriptions for
different tests, so that you and other Big Sister users will be
able to see a difference. So, let"s add a
static set description "Demo variation of the users test via SNMP";
If you know SNMP you will notice that we do have to have some
possibility to set the SNMP get community. We just implement another
argument users may use in uxmon-net, so that they can use the SNMP
version of the ourusers test like
somehost maxusers=10 community=public ourusers
Therefore we have to accept an argument called community. We do this
by just adding an init section:
init {
instance set community "public";
instance import community;
}
This time init does not override the existing init section, it is
an addition to it. We do not use the "overrides" keyword, then.
Luckily, the SNMP requester will directly access the community variable
defined within our test, so there is no additional work involved in
passing this argument on to the requester.
But, of course, we have one other thing to do: our tests self documentation
is out of date, now. We have to mention the "community" argument in the
arguments list. We do so by putting
static set arguments ${arguments}
"community:string:SNMP get community (default: public)";
Note that you can actually refer to variables defined in the parent test
and re-define them.
This time, work is complete and we can store the new test definition in
our existing ourtest.cfg file:
test demo_ourusers_snmp extends demo_ourusers {
static set features remote;
static set description "Demo variation of the users test via SNMP";
static set arguments ${arguments}
"community:string:SNMP get community (default: public)";
overrides pernode monitor {
get snmp.hrSystemDistinctUsers[0];
pernode set distinctusers ${snmp.hrSystemDistinctUsers[0]};
}
}
No big deal, is it?
PART 9 == http://sourceforge.net/mailarchive/message.php?msg_id=10417030
Please get a cup of coffee (or your favorite "drug"), lean back,
take a deep breath, calm down - today I"m going to tax your
patience.
Loops
-----
Up to this article most of you will have completely missed
traditional control structures like "if ... else" or loops
in tests.cfg language. Today we are going to learn that this
is because those are more or less absent. There is an if
*function*, but no real if structure, there are no real
loops.
There is one exception to this: the language actually offers
a limitted "for" loop.
Provided you set up a variable index containing a list of
integer values like
set index 1 3 4;
and a variable containing some arbitrary values at those indexes
like
set some_values "value0" "value1" "value2" "value3" "value4";
then you can loop through this index with
for result index
${some_values[${i}]};
This for statement will evaluate the expression ${some_values[${i}]}
for each value in the variable index (1,3,4). In each pass the
variable i will contain the current index. The list of results of
each run is then stored in the variable result. That"s it. Note that
both the result and index variables must actually be variable names,
not expression, do not try something like
for result ${some.variable} ...
as it just will not work.
At this point you probably want to know if the above statements
really work and what their results are. In order to test them
you can add this to the "ourtest.cfg" config file:
test syntax_test {
set index 1 3 4;
set some_values "value0" "value1" "value2" "value3" "value4";
for result index
${some_values[${i}]};
debug 0 ${result};
}
Now run
testers -D 1 -t test
In order to find matching tests the testers command must execute
the static sections of each test - and of course this means it
must also execute our "syntax_test" pseudo-test. Is the output
the one you expected?
Why this limited loop statement? In most cases you will face a list
of information a requester provides, e.g. a list of disk partitions
your test could/should monitor. Your test then somehow has to decide
what partitions it (or rather the Big Sister admin) is interested in.
The list of indexes of the partitions one intends to monitor is then
stored in an index variable. You use the "for" loop to actually go
through the information list and compute test results.
Stop, you are probably yelling at this point. Apart from that all
this sound very complicated, first of all, you need more than this
puny "for" and second, even for the case described above this "for"
is insufficient. The answer to the first point is: Please try to
keep things simple and try to get through with that - in a few cases
that "for" thing will not be enough. Even in this case there is a
solution we will learn later on in this article series: If
everything else fails we can just call Perl code.
The second point is entirely true. In most cases you will use the for
statement together with the "select" statement. Select allows you
to go through a list of values and remember the indexes of those
values you are interested in.
Imagine - or better try it out - code like this:
test syntax_test {
set partition_types ext2 msdos msdos xfs;
set partition_names /dev/hda1 /dev/hda2 /dev/hda3 /dev/hda4;
set partition_sizes 100 200 300 400;
set partition_used 60 180 50 40;
select selected_for_monitoring partition_types contains( "msdos",
${partition_types[${i}]} );
debug 0 "selected indexes: " ${selected_for_monitoring};
debug 0 "selected partitions: " ${partition_names[${selected_for_monitoring}]};
for used_ratio selected_for_monitoring
( ${partition_used[${i}]} / ${partition_sizes[${i}]} );
debug 0 "space used: " ${used_ratio[${selected_for_monitoring}]};
debug 0 "space used: " ${used_ratio};
}
Now, there is a bunch of new things, here. I hope you understand the
first 4 lines - we just set a few variables. Of course, normally you
would get them from a requester. The idea behind the partition_*
variables is that each of them contains information for each partition,
so that the first partition is of type partition_types[0], its name
is partition_names[0], its total size is partition_sizes[0] and the
space in use is partition_used[0].
We now select every partition of type "msdos". The syntax of "select"
is similiar to that of the for statement:
select result variable expression;
Select goes through all the indexes in variable, stores the current
index in the variable "i", evaluates the expression for each index.
Unlike "for" the result will contain every index for which expression
evaluated true. So the result in the above test definition is
1 2
since the partition_types at index 1 and 2 are contained in the list
"msdos" (do not mind the contains() function, you will soon get a
list of known functions).
In the second debug statement I used a new way of accessing list elements
in order to show off a bit. The index used in [] may itself also be a
list of indexes and the result of the evaluation is then not only
one single value but a list of the values at the respective index
positions - so "/dev/hda2" and "/dev/hda3" since those are at position
1 and 2 we selected.
Then we run the selected partitions through the for statement calculating
the used_size/total_size ratio only for the selected partitions.
Last but not least we use two different ways of printing the result of
the for statement. In the first one we use the just introduced way of
evaluating multiple members of one list, in the second one we just
print the whole list. If you think through this carefully you will see
that the used_ratio list will contain values at index 1 and 2, therefore
at the same indexes as selected_for_monitoring contains. The index 0d index 3 positions simply do not exist. Therefore both ways of
printing the resulting list give the same results.
Methods and loops
-----------------
Unfortunately the loop statedoes only loop through expressions,
never through other statements. This is very unfamiliar, isn"t it?
But, luckily there is still something left: you can use "for" at
method level. Withimple example I will answer two questions.
Let"s assume we have got a discover method that gets out the list
of partitions, their size and names via a requester, then selects
the intereng partitions via "select". This is just like the above
example - we do not use a true requester.
test alleged_disk_test {
pernode discover {
set partition_types ext2 msdos xfs;
set partition_names /dev/hda1 /dev/hda2 /dev/hda3 /dev/hda4;
set partition_sizes 100 200 300 400;
select selected_for_monitoring partition_typesins( "msdos", ${partition_types[${i}]}
);
}
In the monitor section we get the partition_used values out of the
requester and compute the used_ratio values.
pernode monitor {
set partition_used 60 180 50 40;
for used_ratio selected_for_monitoring
( ${partition_used[${i}]} / ${partition_sizes[${i}]} );
}
That"s nothing new, we just split the "select" example above into a
discovea monitor section.
Now we would like to report a status value for each of the partitions
we selected. Unfortunately the "report" section will only allow us
to set one single status. Ofurse we could put multiple "report"
sections into our test, but we do not want to hard-code the number
of partitions we might be interested in. There is a solution to this,
you can use "" around a method like this:
pernode report for selected_for_monitoring {
set thispartition ${partition_names[${p}]};
set thisratio ${used_ratio[${p}];
...
Did you gee point? This is like there would be one "report" section
for each value of selected_for_monitoring. The only difference between
each individual section is that the "p" variable will contone
single index out of selected_for_monitoring as does the "i" variable
in the normal "for" statement.
Big Sister uses "p" rather than "i" in this case because of course you
can use a for statement within a for method.
PART 10 == http://sourceforge.net/mailarchive/message.php?msg_id=10452994
Welcome back. I have had to take a longer break than that was
planned - but aren"t we used to this? :-)
This time you"ll merely get a boring but hopefully complete
list of syntactical and semantical "stuff" allowed in tests.cfg
Syntax, Statements, Operators, Functions
----------------------------------------
Statements
----------
set variable expr;
Evaluate "expr" and store the result in a variable called "variable".
import argument [variable];
Read an argument named "argument" from uxmon-net and store it in
a variable called "variable". If the variable name is omitted then
argument name and variable name are considered to be the same.
debug level expr;
Print the result of "expr" if the requested debug level (-D option
when running uxmon) is above "level".
get requester.variable [...];
Get one or multiple system parameter out of a requester. The
retrieved values are stored in variables called the same as the
get string (thus e.g. "get snmp.ifDescr" results in a variable
called "snmp.ifDescr" to be set). If a requested system parameter
is preceded by a "@" such as in "get @snmp.ifDescr" get will assume
that there are multiple system parameters below snmp.ifDescr and
that all of them are to be requested. The variable name of the
result is still the system parameter"s name, with "@" stripped off.
select var1 var2 condexpr;
Evaluates condexpr once for every value in variable var2 and store
a list of indexes in var1 where the expr evaluated to true. Within
condexpr a variable named "i" contains the current index.
comment type expr;
Evaluates expr and sends the result as a comment of type "type"
(where type must be one of "text" or "html") to the Big Sister
server.
export when parameter, expr;
Evaluates expr, and sends the resulting list under the name
"parameter" to the server. "when" must be one of always, often
or sometimes.
event level expr;
Evaluates expr and sends the result as a log message to the Big
Sister server at log level "level" (use the usual syslog log
levels such as "info", "notice", "warning", "err", etc.).
status color condexpr expr;
Evaluates condexpr, if it evaluates to true and there is no other
status statement before this statement with a more serious status
color with condexpr => true, then a status of color "color" and a
descriptive text of the result of "expr" is sent to the server.
Operators
---------
Currently known operators are (sorted by evaluation precedence):
* /
+ -
> < ==
|| &&
Note that operators do have some vector capabilities, e.g.
set a 1 2 3
set b 10 20 30;
set c ${a} + ${b};
will result in c set to the list 11 22 33.
Functions
---------
contains( matches, list )
String compares each value in matches with each value in list and
returns true if any of these pairs match. This is actually a special
slightly more efficient variation of contains_pattern().
contains_pattern( patterns, list )
Goes through the list of patterns and returns true if any of the values
in list match the pattern. If a pattern list entry starts with "~" then
the pattern is interpreted as a perl regular expression, otherwise
the match is a case sensitibe string comparison.
null( arg )
Returns true if the argument is not defined (aka. "null"), false
otherwise.
not( arg )
Returns true if the arg is false, false otherwise (logical NOT).
remove( from, what )
Returns a list of values out of from without those explicitly listed
in what.
percentage( value, base )
Returns a percentage computed from value and base as (value/base*100).
format_kbsize( kbytes )
The argument must be a number of kBytes. Format_kbsize tries to format
the size in a human readable format (rounded) and adds a unit text
(e.g. "GB" for Gigabytes, etc.)
format_percentage( percentage )
Interpretes the argument as a percentage and returns it in a human
readable (rounded) form.
pad( text, length, justified )
Adds space character to the text until its size is "length". If
justified is "l" then text is left-justified, if justified is "r" then
text is right-justified, "c" stands for "centered".
input_kbsize( value )
Interprets value as a size either Bytes or kbits and returns the number
of kBytes/kbits. Tries to be smart in interpreting units, so that e.g.
1MB (1 MBytes)will return 1024 and 1Gb (1 Gigabits) returns 1000000.
in_range( lower, upper, absolute, percents )
Accepts lower and upper being either numbers or percentages (recognized
by a ending "%"). Assumes absolute and percents is the absolute and
percentage value of the same parameter (e.g. the disk free size in
kBytes and as a percentage) and tests if this parameter is between the
given lower and upper values.
unmonitored( formerly, now, description )
Formerly is meant to be a list of entities monitored before and now
a list of entities monitored now with description being a list of
matching descriptions. unmonitored() then returns a text containing
the descriptions of all the entities that are not monitored any more.
This is meant to be used like in "event notice unmonitored( ... )"
in order to let the Big Sister administrator know that some configuration
change caused entities to be not monitored any more.
nowmonitored( formerly, now, description )
The same as unmonitored, but does report the entities that are newly
monitored.
when( value )
Returns the time (in seconds from January 1st, 1970) at which the value
was computed or retrieved. This is useful with values retrieved via the
get statement and stored for later processing, e.g. for computing bandwidths
and the like.
if( condexpr, val1, val2 )
Returns val1 if condexpr evaluates to true, val2 otherwise. Note that in
every case both val1 and val2 are evaluated!
reference_url( text, patter, host, url, port, service )
Evaluates pattern as a perl expression with text being $_, replaces
in the resulting text the string "" by the url specified via the
arguments. If the "url" argument is an absolute URL then the host,
port and serice arguments are ignored. See the tcp modules test.cfg
for an example.
call( procedure, arg1, ... )
Call a perl function defined in a Requester module. The procedure argument
mus be something like "ourrequester::ourfunction" in order to call a
function Requester::ourrequester::ourfunction(). The argN arguments are
passed through to the called function. Note that hese arguments are hashes
containing uxmon value lists and need some special treatment to extract
values from them. See Requester::procs::countprocs() and the matching
call() in etc/testf/procs.cfg for an example.
size( list )
Returns the number of entries in list.
domain_available( domain )
Returns true if the requester called domain is willing to perform cording
to its available() method.
Variable expansion
------------------
Variables are *always* referenced as ${varname}. The usual shortcut $varname
is *not* valid and will leado unexpected results.
Individual list elements are referenced as ${varname[index]}. Together with
the above rule this leads to something rather unreadable like
${varname[${index}]} if sing an index variable.
${varname[${index}]} will reference ltiple list elements, one for each
element of the index variable.