PART 1 == http://sourceforge.net/mailarchive/message.php?msg_id=10340294 So, I hope everyone"s here :-) I"ll try to write a series of articles starting now. Please comment, ask if something"s completely unclear, add things you think that should be mentioned. The first step is only a small and not very useful one, but this way we give overseas readers a chance to join the mailing list before we get into the interesting stuff (good excuse, isn"t it?) Intro ----- (please have a look at http://bigsister.graeff.com/bsdevel/img01_intro.gif ) Each test MUST be defined somewhere in the etc/tests.cfg file. Since the tests.cfg file does include every file in the etc/testdef directory it is good style to put single tests or a series of similiar tests into one file within the etc/testdef directory. The test definition in tests.cfg tells uxmon what syntax the test requires in uxmon-net, what arguments it takes, what requesters uxmon must query, and how the query results are to be interpreted. Test definitions are written in a special language. Therefore many tests or test groups have an associated requester module. It"s responsibility is to extract system information from the underlying system (or, of course, via network) and pass this information on to the uxmon in a standardized but flexible way. Each piece of information is passed as one system information "variable" with a name like modulename.variablepath.variablename.index For those who know some SNMP this will look familiar, as e.g. an SNMP ifDescr variable out of the Internet-MIB will be represented by the Requester::snmp as snmp.ifDescr.0 Requester modules are implemented as a Perl object. After uxmon retrieves information from requesters as directed by the test definition in tests.cfg it sends this information to the Big Sister server. While status information (the known status colors and status texts) is directly handled by the server, bsmon does need some more information on the semantics of performance data in order to create graphs. This information is taken out of the etc/graphtemplates file. Sind graphtemplates includes the whole graphdef directory, again it is good style to put graph definitions for a single test or a series of tests into one file in the graphdef directory. Graph definitions are written in their own specific language. ... so far for today ... PART 2 == http://sourceforge.net/mailarchive/message.php?msg_id=10346242 Are you listening? Here comes the 2nd part of the introduction to writing custom tests ... Implementing our first test --------------------------- Time for implementing our first test. As I am a little bit lazy I"ll go for re-implementing the "users" test in this series. But in order to be able to test things without interfering with existing tests we call our test "ourusers". We learned in the introduction that the one thing that is mandatory is a test description in etc/tests.cfg or rather etc/testdef. So we start having a look at this. Test descriptions use their own language, which does look slightly familiar to everyone knowing some programming languages. First of all, every description has got a name. This name must be unique and need not necessarily be related to the name Big Sister admins will use in uxmon-net. As we want the name to be unique and we want to implement the "ourusers" test we go for the name "demo_ourusers". The empty shell of the test definition looks like that: test demo_ourusers { } that"s it, this is a valid test definition - but it does not yet describe a test one could use in uxmon-net. Each test definition is composed of a "global" section at its start and a number of sub sections - readers used to OO programming can replace "global section" by "static code" and "sub sections" by "methods". In the global section we usually declare and initialize a number of variables we will later use in the sub sections. A few of these global (or static) variables will be used by uxmon in order to match references in uxmon-net with our test definition. First of all, if we want to make our test available to uxmon-net, we must define a name. We want it to be "ourusers" and uxmon will look for a variable called "name", so we go for test demo_ourusers { static set name "ourusers"; } We set variables via a statement called set with the syntax set name value; and statements end by a colon. So far so good. Each statement may be introduced by one of "static", "pernode" or "instance". This defines the scope of the variable. Let"s imagine we have got something like localhost arg1=5 ourusers localhost arg1=6 ourusers somehost arg1=5 ourusers this will make uxmon create 3 instances of our demo_ourusers test. Two of them will have set arg1=5, one arg1=6, one is targetted on somehost, the other two on localhost. Each of these three test will share all the variables declared static. Variables declared pernode will shared by all the instances targetted on the same host, so pernode variables will be shared between the two ourusers tests above, while the 3rd will get its own. Variables declared instance will only be visible within each single instance, so if demo_ourusers has got an instance variable each 3 of the above tests will get its own copy whith not necessarily the same value. Only static variables will be chosen for matching uxmon-net entries with test definitions, since at the point where uxmon searches test definitions they are - of course - not instanciated nor related to a target host, yet. Are you still with us? I know, too much theory. So, we"ve got a first test which uxmon will find - stop, uxmon will not yet be happy with our test. You remember that with new style tests you have to set a DESCRIPTION in uxmon-net, e.g. something like DESCR features=unix,linux localhost ... localhost ourusers In our test we have to declare to which features it will match. Let"s say, our users test will work on unix systems, so we say something like test demo_ourusers { static set name "ourusers"; static set features local unix; } We learn that set may take multiple values. This is because variables are thought to be lists (call them arrays, if you prefer). You will also notice that I did not use quotes around local and unix. This is a little bit of liberty I chose, one could as well write static set features "local" "unix"; or even static set "features" "local" "unix"; No problem, is it? You certainly also noticed that I used the feature "local". While I do not need to DESCR hosts in uxmon-net local or remote I have to when it comes to test definitions. For the DESCR in uxmon-net uxmon will silently add the feature "local" or "remote", if it thinks, a host looks like the local or a remote host. So it"s a good idea to add "local" to the features list if a test will only work against a local machine. Ok, time for putting our test in a place uxmon can find it. Please create a file etc/testdef/ourtest.cfg and put the above text into. Try running bin/testers and see if "ourusers" actually appears in the list of known tests. Great! Try to get a syntax description out of bin/testers: bin/testers -f local -f unix -t ourusers Oops, where is all the text we usually get as an answer? You certainly guessed it, the "testers" command does not magically write it on its own, we have to supply the text by adding static set description "Demo variation of the users test"; static set arguments "perf:int:report perf data every perf minutes" "item:string:Item as which to report."; We do not know at this point what arguments our test will take, but "perf" and "item" is already a good start since this is what uxmon will handle automatically even if we do not add any other handling code. Now, you are certainly ready for seeing our test appear on the Big Sister server - but not tonight, I leave this for tomorrow"s chapter. PART 3 == http://sourceforge.net/mailarchive/message.php?msg_id=10351729 Get reports ----------- In part 2 we got uxmon to know about our test but we did not yet get any reports. It is time we went ahead. First of all we have to tell uxmon to which column our test results should go to, we add pernode set report_item "cpu"; to our existing test definition. Before we can make uxmon send status reports to the Big Sister server we have to remember that there is more than the global section (aka. static code) in a test definition, we may define other sections (aka. methods), each of them having their own meaning. The method which is responsible for getting system information out of the monitored system is called "monitor". This method is called every "frequency" minutes, thus every monitoring cycle. In our case we have to find out how many users are logged in. Let"s assume for now the number of users is available via the requester called "who" and that the variable the who requester sets is hrSystemDistinctUsers. So we add a monitoring section retrieving the number of users: test demo_ourusers { static set name "ourusers"; static set features local unix; static set description "Demo variation of the users test"; static set arguments "perf:int:report perf data every perf minutes" "item:string:Item as which to report."; pernode set report_item "cpu"; pernode monitor { get who.hrSystemDistinctUsers[0]; pernode set distinctusers ${who.hrSystemDistinctUsers[0]}; } } We just request the number of distinct users from the who requester and store it in the variable distinctusers. Remember, that "pernode" means that our distinctusers variable as well as the monitor method is nailed to one target host, even if we put something like localhost arg1=5 ourusers localhost arg1=6 ourusers in uxmon-net, the monitor method is only called once, since both tests apply to the same host. This is a little bit tricky, but may save quite some CPU time for tests more complex than this. So, now we know the number of users logged in to the target system internally. The next thing to do is to add a "report" method. This method is called every "frequency" minutes like the monitor method, but it is guaranteed that report is only called after every monitor methods have successfully been processed. Should "report" rely on the results of multiple monitor methods we can be sure that they are there. Most of our tests will report two things: 1) A status color together with a short text 2) A longer descriptive text Let"s start with the simple one: The descriptive text. There is a statement called "comment" which will just send an arbitrary text to our server(s). Our report method could look like this: report { comment "html" "
" "Number of distinct users logged in:" "" ${disticntusers}; } The first argument of comment does tell us if the comment is formatted HTML or just text. We choose HTML. Any further arguments are concatenated and send as is to the server. Of course, we also want to get a status color - tests not sending any status are invalid, anyway. The statement being responsible for sending status is called - you guessed it: status. Our report method sending out status "green" in any case might look like this: report { status green 1 "&green" " " ${distinctusers} " users"; comment "html" "
" "Number of distinct users logged in:" "" ${disticntusers}; } The first argument is the status color, the second argument must be a boolean value (like in Perl or C language anything not being "0" or empty is considered "true"). This boolean value tells uxmon if the status statement is to be considered or not. Note that our test definitions language does not now an "if" statement, so we have to have the condition in our "status" statement itself. The arguments starting with the 3rd one are the usual short descriptive text associated with every test. By convention we include the status color there, the leading ampersand "&" will be interpreted by the Big Sister server and be replaced by an icon. Please save our new test definition in etc/testdef/ourtest.cfg, and create a simple uxmon-net like DESCR features=unix localhost localhost ourusers localhost bsdisplay When running uxmon -D 5 you will see that our just declared test will be run and see as well that status and text is sent to the server. Of course, always reporting green is not much fun. We have to decide when to send which status color. Let"s say we want our test to sent red if the number of users logged in is getting above 5. We just need one more line for this: report { status red ( ${distinctusers} > 5 ) "&red" " " ${distinctusers} " users (too many)"; status green 1 "&green" " " ${distinctusers} " users"; ... As we have already learned the second argument to the status statement is a conditional. In this case we say "red" should apply if the value of distinctusers is higher than 5. Note that the syntax of the test definition language is very limitted. For instance, do not think you can use something like $distinctusers>5 in the example above, since variables must be referenced as ${...}, and the parser will think something>5 is one single word, not treating this as an operation. Probably you will wonder why uxmon will only report "red" if distinctusers goes above 5, since our "green" condition still applies. This is a little hard-coded shortcut. "status" lines emitting a status which is better than already set are silently ignored. Thus, if the first line tells uxmon that the test status is red, any line trying to say the status is rather yellow, purple, green or whatever else that is better than red are just ignored. You can see this if you place the "status green" line before the "status red" one. The resulting effect is that in case distinctusers grows above 5 the test will report both green *and* red. Note that it is a good idea to sort status statements by severity, most severe first. If you intentionally want a test to report multiple stati you will have to either use multiple report methods (oh, yes, that"s possible - we will see later on in this series) or use a loop (one other idea we will meet again, later). That"s enough for today. PART 4 == http://sourceforge.net/mailarchive/message.php?msg_id=10360571 Accepting arguments ------------------- We went ahead to having implemented our very first test. Now, of course, most of the time the Big Sister users will expect to get some influence on what your test will treat as problem. We don"t want to just have something like localhost ourusers in uxmon-net, but rather like to be more specific like localhost maxusers=10 warnusers=8 ourusers In order to accept the arguments maxusers and warnusers we have to add some code to our test. First of all we have to retrieve the values of the arguments and store them into variables we then can use when determining status codes. We already know the "monitor" and "report" method. You will guess that neither of these is the right place for handling arguments, since they are called regularly. Exactly for this purpose uxmon knows a method called "init", which is called exactly once for every occurrence of our test in uxmon-net. The init method must look something like init { instance import maxusers; instance import warnusers; } This will make our test import the arguments "maxusers" and "warnusers" into instance variables called the same. You can import the values to a variable with a different name than the argument with the two-argument form of import, e.g. like instance import maxusers umax; Read the above statement as "import maxusers argument as variable umax". Most of the time it is useful to have some default value if the argument is not set in uxmon-net. Import will only (re-)set the variable if the argument is present, thus instance set maxusers 10; instance import maxusers; will set maxusers to 10, then look at uxmon-net and set whatever value is set there unless there is no value. Please not that it is not only good style to use "import" in the init method - using it outside will lead to undefined behaviour (well, it will work, but don"t do it, anyway). We can now use the imported arguments in our report section: report { status red ( ${distinctusers} > ${maxusers} ) "&red" " " ${distinctusers} " users (way too many)"; status yellow ( ${distinctusers} > ${warnusers} ) "&yellow" " " ${distinctusers} " users (too many)"; status green 1 "&green" " " ${distinctusers} " users"; comment "html" "
" "Number of distinct users logged in:" "" ${distinctusers}; } There is no magic, here. Note, we just added "yellow" and we obey to the rule that status statements should be ordered by severity. Nevertheless we have just skipped two important things. First of all anyone using our test will not know what arguments it takes unless having a close look at our test definition. We must add a description of the two arguments to the "arguments" variable: "maxusers:int:Test goes red if number of users goes above this (default: 10)" "warnusers:int:Test goes yellow if number of users goes above this (default: 8)" The 2nd thing is that there are a number of arguments every test should accept. So just add instance import item report_item; instance import perf perf_frequency; to the init section. Our test definition now looks like that: test demo_ourusers { static set name "ourusers"; static set features local unix; static set description "Demo variation of the users test"; static set arguments "perf:int:report perf data every perf minutes" "item:string:Item as which to report."; pernode set report_item "cpu"; init { instance set maxusers 10; instance set warnusers 8; instance import maxusers; instance import warnusers; instance import item report_item; instance import perf perf_frequency; } pernode monitor { get who.hrSystemDistinctUsers[0]; pernode set distinctusers ${who.hrSystemDistinctUsers[0]}; } report { status red ( ${distinctusers} > ${maxusers} ) "&red" " " ${distinctusers} " users (way too many)"; status yellow ( ${distinctusers} > ${warnusers} ) "&yellow" " " ${distinctusers} " users (too many)"; status green 1 "&green" " " ${distinctusers} " users"; comment "html" "
" "Number of distinct users logged in:" "" ${distinctusers}; } } Debugging --------- At some point of your trials you will notice that the test definition language is very - hmh - flexible. This means that nearly whatever you write uxmon will interprete it somehow, where "somehow" does not necessarily imply that your interpretation is the same as uxmon"s. In order to make uxmon be a little bit more noisy about how it is evaluating your test it is a good thing to use the debug statement. For instance, change your "monitor" method into something like pernode monitor { get who.hrSystemDistinctUsers[0]; pernode set distinctusers ${who.hrSystemDistinctUsers[0]}; debug 2 "number of users: " ${distinctusers}; } This will just make uxmon output its arguments if the debug level you are using (the -D command line argument ...) is >=2. The debug statement is just another instance of the good old print statements pretty all of us where using at the beginning of their career as a programmer. PART 5 == http://sourceforge.net/mailarchive/message.php?msg_id=10370229 Performance graphing -------------------- Status reporting is great, however most of us do like some more or less nice graphs. Getting Big Sister to emit simple graphs for our "ourusers" test will not take much effort. You have certainly already expected what I am going to write you: You have to add another method in order to tell uxmon what values it should send to the server. On the server side you have to give bsmon a hint on what to do with the incoming values by creating a graphtemplate. Let"s start with uxmon. Our test definition just gets extended by perf { export always single.OurDistinctUsers, ${distinctusers}; } The perf method is invoked by uxmon once every perf cycle. By having instance import perf perf_frequency; in our init method we give the admins a chance to set the duration of each perf cycle via the "perf" argument, so that localhost perf=5 maxusers=10 warnusers=8 ourusers in uxmon-net will tell the ourusers test to send performance data every 5 minutes. The export statement"s first argument specifies how often the variable should be sent to the server: "always", "often" or "sometimes". If you chose always the export statement applies in each perf cycle, while often and sometimes declared exports will be executed less often (currently "often" means every second cycle, "sometimes" every 4th cycle, but this might change). For our simple test we just send one single value, the number of distinct users and we send it in each cycle. You will want to do this for every performance value you ever collect. On the other hand more sophisticated tests will often not only submit the monitored values, but also some meta information (like e.g. legend information or the like). This meta information usually stays the same for a long time period, so there is no need to waste bandwidth sending it each time. In this case you will make use of "often" or "sometimes". The second argument of export is the name the variable should get when it is sent to the server. The last argument is a value or a list of values to be sent. Note that the first value in the list will be sent as single.OurDistinctUsers.0 and so on. The comma between the variable name and the list of values is mandatory, better do not ask why, this is a requirement imposed by the internal design. If you run uxmon now - I suggest you add "perf=1" to uxmon-net in order to get performance data in short intervals - you will see performance data loggen in the status.log file on the server side. That is all that happens, we will not see a graph coming up. This is because the server does just get some values but does not yet know the meaning of this values. We have to create a graph template for bsmon. As tests.cfg the graphtemplates are in their own file called etc/graphtemplates. And, like tests.cfg, graphtemplates includes every file in etc/graphdef. So we just add a file called etc/graphdef/ourusers.cfg. By browsing through graphtemplates you will probably find out yourself what templates look like, our ourusers graph might be defined like this: graph ourdistinctusers input single.OurDistinctUsers.0 value-type gauge input-interval $shortInterval input-deadtime $shortDeadTime graph-id $HOST.ourdistinctusers title Number of Users legend # distinct users unit-label users size 400x120 Each graph must have a unique name, in our case it is specified with the "graph ourdistinctusers" statement. The performance variable we want to graph is single.OurDistinctUsers.0, so this is our "input". The value is to be graphed as is, it"s of type "gauge" (other accepted types are counter, derive, absolute - you will probably never use them since you have to preprocess the performance data on uxmon side for determining status information, anyway). One thing to know is that our graph engine uses RRDTool as a backend. RRD does not store all the data it gets in a growing table. It will just allocate a fixed size ring (or rather multiple rings) at the time the graph is created. The data then is pushed into this ring. As soon as the ring is full each time a new value is pushed in the oldest value is thrown out. This way each graph has its own fixed size and will not grow indefinitely. Since we are not only interested in the last few hours RRD does allow multiple rings - a first ring might take the last 120 values in 5 minute intervals, a 2nd one might take the last 120 values, but this time in 30 minute intervals. So the first ring will show us data for the last 10 hours in a granularity of 5 minutes. The 2nd ring will allow us to watch a graph going back 60 hours, but we only get a granularity of 30 minutes. Is Everything unclear up to this point? To put it in a nutshell: When bsmon creates a graph it will create it with 4 rings, one containing 2 days of data in the specified interval, one containing one week of data, one containing one month of data, and finally one containing a year. In order to know what size the graph has to be created with bsmon will have to know the time between two incoming values. This is the "input-interval". If we use "perf=1" in uxmon-net the "input-interval" should be 60 (the input interval is in units of seconds). If it is larger than 60, then RRD will not store each value individually, it will build average values for all performance values coming in during one input-interval. The default values for $shortInterval are 10 minutes, so in our case we will get a 10-minute average. Didn"t you ever wonder why Big Sister was graphing something like 3.4 distinct users at a given time? Now, you know why ... The input-deadtime value is related to that. If RRD does not get any values for more than input-deadtime seconds, then the graph gets a gap. I do not have to explain settings like title, legend, unit-label or size, do I? One important concept, however, is the idea graph-id setting. Other than the graph name this need not necessarily be unique. If multiple graphs with the same graph-id exist, then bsmon or rather the web frontend will just draw them in the same chart. Note during testing that bsmon uses some caches in order to speed up performance value handling. Most important, if graph creation fails for any reason (usually because RRDTool is unavailable, because there is no graph template or because ther is an error in the graph template), then the performance variables being involved with the failing graph get blacklisted for a while and will just be ignored. Therefore, during testing you might circumvent headache by re-starting bsmon after performance monitoring changes. PART 6 == http://sourceforge.net/mailarchive/message.php?msg_id=10379122 Modules ------- This article will lead us on a little excursion. We have seen that typical tests will usually require at least two config files, more to come. While you are writing your own site-specific tests you will not necessarily take the time to carefully use file, test and graph names until of course you either lose control, would like to move your precious configuration over to a new server or you would like to publish it. In order to simplify getting under control which files/configuration are linked together, Big Sister supports a simple module concept. Using the "bsmodule" command you can list installed modules, install additional modules and remove those you do not need any more. It is a good idea to organize your own tests in modules. By issuing "bsmodule" command without any additional arguments you will get a list of installed modules. Note that the core Big Sister distribution is partly implemented as modules - this explains why you do not get an empty list even if you have not ever cared about installing modules: $ /usr/sbin/bsmodule CPUperf (01.00) CPU performance monitoring etherport (01.00) Check port operating status [...] We want to see our own "ourusers" test in this list. This time we are lucky - crafting a module is rather easy once you have written the code/config files you want to include. First of all we have to decide which name our module will have. This must be some unique name - we take "ourusers". The next thing to do is to create an empty directory named like the module itself, so we do a mkdir ourusers Next we will just put a number of well-defined files into this directory. In order to know a few things about the module Big Sister will look for a file called module.info. This file will look something like module = ourusers descr = Monitors number of users logged in version = 01.00 author = Thomas Aeby This is it. There is one other setting which is regularly used: You can tell Big Sister that this module requires other modules to be installed first by adding something like depends = module1 module2 The next file we create is one of those I personally obviously hate (:-)), it"s a README file. Put any short or longish text describing your module into this file. So far so good. Now we get ahead to the more important things. Copy your testdef/ourusers.cfg file into the directory, but call it "tests.cfg". Big Sister will know that this file contains your test definitions. In order to avoid naming conflicts tests defined in a modules tests.cfg start with the modules name followed by an underscore and an arbitrary name. In our case we are using "ourusers", which is already ok. If we were strict we would rename our test to something like test ourusers_users { ... The next thing to do is to copy in our graph templates. Just copy the etc/graphdef/ourusers.cfg file and call it "graphtemplates". Again, our graph names should follow the above rule, so our template should be renamed to something like graph ourusers_distinctusers In one of the next articles we will learn where to store requester implementations within a module, but as we do not have one, yet we leave this out for now. So, our new module is already ready for installation - stop, Big Sister will expect the module to be a tar archive rather than a directory, so the next step is tar cf ourusers.mod ourusers Now we can actually install the module running bsmodule like bsmodule install ourusers.mod Check back using "bsmodule list" if the module is appearing in the list. You will have to restart Big Sister since bsmodule just copies the files in. When copying the files they will automatically be renamed and copied to the right target directory, the mapping is module.info => etc/moduleinfo/.info README => etc/moduleinfo/.readme tests.cfg => etc/testdef/.cfg graphtemplates => etc/graphdef/.cfg Requester/* => uxmon/Requester/* An interesting feature is that bsmon will automatically look for modules on the Big Sister download server on http://software.graeff.com/bigsis-modules/ If you do not want bsmodule to do this you will use the -n option. As you see the bigsis-modules page does not contain too many modules, please feel free to send me modules that you think might be of common interest so that I can publish them on this server. PART 7 == http://sourceforge.net/mailarchive/message.php?msg_id=10386628 This will be the last article this week because even I (:-)) am going to celebrate Christmas. Expect the next article on Monday. Advanced Test Definition Language 1 ----------------------------------- We know enough tests.cfg language now to write simple tests ourselves. When looking at tests.cfg you see that there is much more than we used until now. In the hope that the basic idea is clear I will just heap a few advanced syntax/semantic on you expecting you to digest without a major brain-breakdown. I will get back to illustrate these with an example near the end of the article series. First of all there are many more methods uxmon knows than we used up to now. After this paragraph you will know all of them, too, though possibly will not understand all of their significance immediately. The "precheck" method is used whenever uxmon is searching for a test matching a uxmon-net configuration. If a precheck method exists it is expected to set a variable called "test_available" with a boolean value. If the variable is false (0 or empty) after precheck then uxmon will assume that this test definition will not work on this system or with this parameters and will just ignore this definition. In precheck you will usually test if a certain requester is available and willing to perform. In most cases it will be unwilling to perform if it requires special system commands or APIs to exist and they are not there. For instance the "who" test we use in our example test definition will not work if the who command does not exist on the monitored system. Our precheck rule could then look like precheck { set test_available domain_available( who ); } The domain_available function will just look through the table of requesters and see if a requester for "who" is registered and if it is willing to accept requests. In our special case the above precheck method will be useless since the actual implementation of who was written before precheck was introduced and does not provide information about availability - which means that uxmon will treat it as if it were always available. The "discover" method is invoked after a test is "init"-ed and before the first test cycle (monitor and perf sequences) is run. After the first run it is regularly re-invoked. You have to set the time between two discover runs by setting static variables for instance like this: set discover_refresh 60*2; set discover_retry 60; which means: run discover every 2 hours (120 minutes). If one discover run fails then run it again after 1 hour (discover_retry). The discover method is expected to go discover the underlying system before actual monitoring starts. A typical discover implementation may e.g. look for available disks, network interfaces or the like and store data that does not change often (like e.g. interface names/ speeds, disk partition names/devices, etc.) internally, so that the monitor method can be limited to just get the real, fast changing performance data out of the system. This reduces the size of the monitor method and - more important - will save lots of CPU time or network bandwidth for some tests. A discover method might look like pernode discover { get @snmp.ifType @snmp.ifSpeed @snmp.ifDescr; pernode set interfacetype ${snmp.ifType}; pernode set interfacespeed ${snmp.ifSpeed}; pernode set interfacename ${snmp.ifDescr}; } which will get network interface information (interface type, speed and name) out of an SNMP accessible device. This information does not change very often and is valid for all tests aiming at the same host, so we can do this in a "pernode discover". Within the monitor method we then can confine ourselves to just get the quickly changing information like network load: pernode monitor { get @snmp.ifInOctets @snmp.ifOutOctets; ... After each successfull discover run uxmon will invoke the "postdiscover" method. When postdiscover is run you can expect every other scheduled discover method to be completed. That is important if you use pernode or static variables in discover that might be shared between multiple tests. postdiscover can rely on them to be set and can safely do any necessary post-processing. Often one uses "pernode discover" methods in order to "get" information and then refine specific information in instance "postdiscover" methods selecting only the information out of the heap that is relevant for this single instance with the supplied arguments in uxmon-net. You also know the "monitor" method. Note that there is also a "postmonitor" method. Similiar to postdiscover postmonitor will be run after all the monitor methods scheduled during a cycle are terminated and it is expected to do some post-processing on shared variables. You already know the "perf" and "report" methods - nothing to add, do not expect postperf and postreport to exist. There has never been a need for such a thing. One last method is called "unconfigure". You know that uxmon will automatically re-read uxmon-net after this file is changed. During this re-configuring our tests will be affected, too. All the test instances will be dropped and uxmon will set up new instances for every configured test. Static variables and pernode variables may survive this re- configuration since they are bonded to each host or each test definition rather than to a test instance. Therefore uxmon will call "unconfigure" when dropping the old configuration. The job of "unconfigure" is to bring pernode and static variables back to a defined value where they do not make init/discover/monitor of newly set up tests fail because they are irritated by information left over. Note that only the discover and the monitor methods may request data from requesters via "get" - though not obeying to this rule will in most cases not be answered with an error message. Note, too, that you must not use the "status" or "comment" statements outside of report methods, and that the "export" statement must be within a perf method. The "import" statement should only appear in the init method though it will work in every other method, too. PART 8 == http://sourceforge.net/mailarchive/message.php?msg_id=10408929 Back from Chrismas holiday I"ll write two more articles on tests.cfg before we attempt to write our first Requester module, ourselves. I"ll have some 14-hours-workdays this week so please forgive me if I skipped one day without posting an article ... Inheritance ----------- The tests.cfg language borrows a few concepts from object oriented programming. One of them is inheritance. One can define a new test basing on an existing test and just re- write the parts that are unique for the new test. Do you rememer our "ourusers" test? It actually uses the who command to get the number of users logged in on the system. Of course, this will not work via network, so we cannot monitor remote systems using the ourusers test. One possible solution: We could extend the who requester so that it uses the rwho command on request. We are not going to implement this, however. Instead, I assume the remote system does support SNMP. In order to get the number of users out of a box supporting the Host MIB we need to retrieve the SNMP variable hrSystemDistinctUsers which is by chance (well, intentionally but not mandatory) the same variable our who requester uses. Do not mind if you are not familiar with SNMP. We are just going to use another requester, the "snmp" requester, and read some information this requester provides for us in order to demonstrate inheritance. At this point we could just copy&modify our ourusers test, use the right features list like set features remote; and replace the monitor section by this: pernode monitor { get snmp.hrSystemDistinctUsers[0]; pernode set distinctusers ${snmp.hrSystemDistinctUsers[0]}; } Do not forget that the test definition"s name must be unique, so we e.g. use test demo_ourusers_snmp { ... There we are, now our test gets his information via SNMP. However, copying the whole thing lacks some elegance and if we ever should have to modify the ourusers test, for instance because we ran into a bug, we will have to manually modify every copied instance, too. The way to go is to *extend* the ourusers test: test demo_ourusers_snmp extends demo_ourusers { ... When uxmon is reading test definitions the demo_ourusers_snmp test will on the fly be copied by uxmon itself. Of course, our SNMP test is not identical to the old one. Therefore, our definition is not yet complete. First of all we have to adjust the features list: test demo_ourusers_snmp extends demo_ourusers { static set features remote; } When uxmon reads uxmon-net it will always put "remote" in the features list of any test targetting on a non-local machine, so by using "remote" we just claim that our test will work against any remote machine. The name and arguments list stays the same, so we do not need to set them again - Big Sister has already slurped them in. We also have to replace the monitor section. We can do this by extending our test definition to test demo_ourusers_snmp extends demo_ourusers { static set features remote; overrides pernode monitor { get snmp.hrSystemDistinctUsers[0]; pernode set distinctusers ${snmp.hrSystemDistinctUsers[0]}; } } Note that unlike the static variables we explicitly have to use "overrides" together with our new "monitor" section. This is because it is completely legal to have multiple monitor sections - they are evaluated one after the other. So if we just use pernode monitor { ... in our test definition Big Sister will think that we want to *add* another monitoring section. But as we want the original implementation to be replaced by the new one, we tell Big Sister by declaring it as an "override"-ing one. So, that"s actually it. We know have two tests, both being called "ouruser" when used in uxmon-net, one implementing local monitoring, one remote monitoring. If you now run testers -f remote -t ourusers You will actually get the description of our newly created test. But compare this to testers -f local -t ourusers How do you actually know if Big Sister is using the right test definition? It is good style to use different descriptions for different tests, so that you and other Big Sister users will be able to see a difference. So, let"s add a static set description "Demo variation of the users test via SNMP"; If you know SNMP you will notice that we do have to have some possibility to set the SNMP get community. We just implement another argument users may use in uxmon-net, so that they can use the SNMP version of the ourusers test like somehost maxusers=10 community=public ourusers Therefore we have to accept an argument called community. We do this by just adding an init section: init { instance set community "public"; instance import community; } This time init does not override the existing init section, it is an addition to it. We do not use the "overrides" keyword, then. Luckily, the SNMP requester will directly access the community variable defined within our test, so there is no additional work involved in passing this argument on to the requester. But, of course, we have one other thing to do: our tests self documentation is out of date, now. We have to mention the "community" argument in the arguments list. We do so by putting static set arguments ${arguments} "community:string:SNMP get community (default: public)"; Note that you can actually refer to variables defined in the parent test and re-define them. This time, work is complete and we can store the new test definition in our existing ourtest.cfg file: test demo_ourusers_snmp extends demo_ourusers { static set features remote; static set description "Demo variation of the users test via SNMP"; static set arguments ${arguments} "community:string:SNMP get community (default: public)"; overrides pernode monitor { get snmp.hrSystemDistinctUsers[0]; pernode set distinctusers ${snmp.hrSystemDistinctUsers[0]}; } } No big deal, is it? PART 9 == http://sourceforge.net/mailarchive/message.php?msg_id=10417030 Please get a cup of coffee (or your favorite "drug"), lean back, take a deep breath, calm down - today I"m going to tax your patience. Loops ----- Up to this article most of you will have completely missed traditional control structures like "if ... else" or loops in tests.cfg language. Today we are going to learn that this is because those are more or less absent. There is an if *function*, but no real if structure, there are no real loops. There is one exception to this: the language actually offers a limitted "for" loop. Provided you set up a variable index containing a list of integer values like set index 1 3 4; and a variable containing some arbitrary values at those indexes like set some_values "value0" "value1" "value2" "value3" "value4"; then you can loop through this index with for result index ${some_values[${i}]}; This for statement will evaluate the expression ${some_values[${i}]} for each value in the variable index (1,3,4). In each pass the variable i will contain the current index. The list of results of each run is then stored in the variable result. That"s it. Note that both the result and index variables must actually be variable names, not expression, do not try something like for result ${some.variable} ... as it just will not work. At this point you probably want to know if the above statements really work and what their results are. In order to test them you can add this to the "ourtest.cfg" config file: test syntax_test { set index 1 3 4; set some_values "value0" "value1" "value2" "value3" "value4"; for result index ${some_values[${i}]}; debug 0 ${result}; } Now run testers -D 1 -t test In order to find matching tests the testers command must execute the static sections of each test - and of course this means it must also execute our "syntax_test" pseudo-test. Is the output the one you expected? Why this limited loop statement? In most cases you will face a list of information a requester provides, e.g. a list of disk partitions your test could/should monitor. Your test then somehow has to decide what partitions it (or rather the Big Sister admin) is interested in. The list of indexes of the partitions one intends to monitor is then stored in an index variable. You use the "for" loop to actually go through the information list and compute test results. Stop, you are probably yelling at this point. Apart from that all this sound very complicated, first of all, you need more than this puny "for" and second, even for the case described above this "for" is insufficient. The answer to the first point is: Please try to keep things simple and try to get through with that - in a few cases that "for" thing will not be enough. Even in this case there is a solution we will learn later on in this article series: If everything else fails we can just call Perl code. The second point is entirely true. In most cases you will use the for statement together with the "select" statement. Select allows you to go through a list of values and remember the indexes of those values you are interested in. Imagine - or better try it out - code like this: test syntax_test { set partition_types ext2 msdos msdos xfs; set partition_names /dev/hda1 /dev/hda2 /dev/hda3 /dev/hda4; set partition_sizes 100 200 300 400; set partition_used 60 180 50 40; select selected_for_monitoring partition_types contains( "msdos", ${partition_types[${i}]} ); debug 0 "selected indexes: " ${selected_for_monitoring}; debug 0 "selected partitions: " ${partition_names[${selected_for_monitoring}]}; for used_ratio selected_for_monitoring ( ${partition_used[${i}]} / ${partition_sizes[${i}]} ); debug 0 "space used: " ${used_ratio[${selected_for_monitoring}]}; debug 0 "space used: " ${used_ratio}; } Now, there is a bunch of new things, here. I hope you understand the first 4 lines - we just set a few variables. Of course, normally you would get them from a requester. The idea behind the partition_* variables is that each of them contains information for each partition, so that the first partition is of type partition_types[0], its name is partition_names[0], its total size is partition_sizes[0] and the space in use is partition_used[0]. We now select every partition of type "msdos". The syntax of "select" is similiar to that of the for statement: select result variable expression; Select goes through all the indexes in variable, stores the current index in the variable "i", evaluates the expression for each index. Unlike "for" the result will contain every index for which expression evaluated true. So the result in the above test definition is 1 2 since the partition_types at index 1 and 2 are contained in the list "msdos" (do not mind the contains() function, you will soon get a list of known functions). In the second debug statement I used a new way of accessing list elements in order to show off a bit. The index used in [] may itself also be a list of indexes and the result of the evaluation is then not only one single value but a list of the values at the respective index positions - so "/dev/hda2" and "/dev/hda3" since those are at position 1 and 2 we selected. Then we run the selected partitions through the for statement calculating the used_size/total_size ratio only for the selected partitions. Last but not least we use two different ways of printing the result of the for statement. In the first one we use the just introduced way of evaluating multiple members of one list, in the second one we just print the whole list. If you think through this carefully you will see that the used_ratio list will contain values at index 1 and 2, therefore at the same indexes as selected_for_monitoring contains. The index 0d index 3 positions simply do not exist. Therefore both ways of printing the resulting list give the same results. Methods and loops ----------------- Unfortunately the loop statedoes only loop through expressions, never through other statements. This is very unfamiliar, isn"t it? But, luckily there is still something left: you can use "for" at method level. Withimple example I will answer two questions. Let"s assume we have got a discover method that gets out the list of partitions, their size and names via a requester, then selects the intereng partitions via "select". This is just like the above example - we do not use a true requester. test alleged_disk_test { pernode discover { set partition_types ext2 msdos xfs; set partition_names /dev/hda1 /dev/hda2 /dev/hda3 /dev/hda4; set partition_sizes 100 200 300 400; select selected_for_monitoring partition_typesins( "msdos", ${partition_types[${i}]} ); } In the monitor section we get the partition_used values out of the requester and compute the used_ratio values. pernode monitor { set partition_used 60 180 50 40; for used_ratio selected_for_monitoring ( ${partition_used[${i}]} / ${partition_sizes[${i}]} ); } That"s nothing new, we just split the "select" example above into a discovea monitor section. Now we would like to report a status value for each of the partitions we selected. Unfortunately the "report" section will only allow us to set one single status. Ofurse we could put multiple "report" sections into our test, but we do not want to hard-code the number of partitions we might be interested in. There is a solution to this, you can use "" around a method like this: pernode report for selected_for_monitoring { set thispartition ${partition_names[${p}]}; set thisratio ${used_ratio[${p}]; ... Did you gee point? This is like there would be one "report" section for each value of selected_for_monitoring. The only difference between each individual section is that the "p" variable will contone single index out of selected_for_monitoring as does the "i" variable in the normal "for" statement. Big Sister uses "p" rather than "i" in this case because of course you can use a for statement within a for method. PART 10 == http://sourceforge.net/mailarchive/message.php?msg_id=10452994 Welcome back. I have had to take a longer break than that was planned - but aren"t we used to this? :-) This time you"ll merely get a boring but hopefully complete list of syntactical and semantical "stuff" allowed in tests.cfg Syntax, Statements, Operators, Functions ---------------------------------------- Statements ---------- set variable expr; Evaluate "expr" and store the result in a variable called "variable". import argument [variable]; Read an argument named "argument" from uxmon-net and store it in a variable called "variable". If the variable name is omitted then argument name and variable name are considered to be the same. debug level expr; Print the result of "expr" if the requested debug level (-D option when running uxmon) is above "level". get requester.variable [...]; Get one or multiple system parameter out of a requester. The retrieved values are stored in variables called the same as the get string (thus e.g. "get snmp.ifDescr" results in a variable called "snmp.ifDescr" to be set). If a requested system parameter is preceded by a "@" such as in "get @snmp.ifDescr" get will assume that there are multiple system parameters below snmp.ifDescr and that all of them are to be requested. The variable name of the result is still the system parameter"s name, with "@" stripped off. select var1 var2 condexpr; Evaluates condexpr once for every value in variable var2 and store a list of indexes in var1 where the expr evaluated to true. Within condexpr a variable named "i" contains the current index. comment type expr; Evaluates expr and sends the result as a comment of type "type" (where type must be one of "text" or "html") to the Big Sister server. export when parameter, expr; Evaluates expr, and sends the resulting list under the name "parameter" to the server. "when" must be one of always, often or sometimes. event level expr; Evaluates expr and sends the result as a log message to the Big Sister server at log level "level" (use the usual syslog log levels such as "info", "notice", "warning", "err", etc.). status color condexpr expr; Evaluates condexpr, if it evaluates to true and there is no other status statement before this statement with a more serious status color with condexpr => true, then a status of color "color" and a descriptive text of the result of "expr" is sent to the server. Operators --------- Currently known operators are (sorted by evaluation precedence): * / + - > < == || && Note that operators do have some vector capabilities, e.g. set a 1 2 3 set b 10 20 30; set c ${a} + ${b}; will result in c set to the list 11 22 33. Functions --------- contains( matches, list ) String compares each value in matches with each value in list and returns true if any of these pairs match. This is actually a special slightly more efficient variation of contains_pattern(). contains_pattern( patterns, list ) Goes through the list of patterns and returns true if any of the values in list match the pattern. If a pattern list entry starts with "~" then the pattern is interpreted as a perl regular expression, otherwise the match is a case sensitibe string comparison. null( arg ) Returns true if the argument is not defined (aka. "null"), false otherwise. not( arg ) Returns true if the arg is false, false otherwise (logical NOT). remove( from, what ) Returns a list of values out of from without those explicitly listed in what. percentage( value, base ) Returns a percentage computed from value and base as (value/base*100). format_kbsize( kbytes ) The argument must be a number of kBytes. Format_kbsize tries to format the size in a human readable format (rounded) and adds a unit text (e.g. "GB" for Gigabytes, etc.) format_percentage( percentage ) Interpretes the argument as a percentage and returns it in a human readable (rounded) form. pad( text, length, justified ) Adds space character to the text until its size is "length". If justified is "l" then text is left-justified, if justified is "r" then text is right-justified, "c" stands for "centered". input_kbsize( value ) Interprets value as a size either Bytes or kbits and returns the number of kBytes/kbits. Tries to be smart in interpreting units, so that e.g. 1MB (1 MBytes)will return 1024 and 1Gb (1 Gigabits) returns 1000000. in_range( lower, upper, absolute, percents ) Accepts lower and upper being either numbers or percentages (recognized by a ending "%"). Assumes absolute and percents is the absolute and percentage value of the same parameter (e.g. the disk free size in kBytes and as a percentage) and tests if this parameter is between the given lower and upper values. unmonitored( formerly, now, description ) Formerly is meant to be a list of entities monitored before and now a list of entities monitored now with description being a list of matching descriptions. unmonitored() then returns a text containing the descriptions of all the entities that are not monitored any more. This is meant to be used like in "event notice unmonitored( ... )" in order to let the Big Sister administrator know that some configuration change caused entities to be not monitored any more. nowmonitored( formerly, now, description ) The same as unmonitored, but does report the entities that are newly monitored. when( value ) Returns the time (in seconds from January 1st, 1970) at which the value was computed or retrieved. This is useful with values retrieved via the get statement and stored for later processing, e.g. for computing bandwidths and the like. if( condexpr, val1, val2 ) Returns val1 if condexpr evaluates to true, val2 otherwise. Note that in every case both val1 and val2 are evaluated! reference_url( text, patter, host, url, port, service ) Evaluates pattern as a perl expression with text being $_, replaces in the resulting text the string "" by the url specified via the arguments. If the "url" argument is an absolute URL then the host, port and serice arguments are ignored. See the tcp modules test.cfg for an example. call( procedure, arg1, ... ) Call a perl function defined in a Requester module. The procedure argument mus be something like "ourrequester::ourfunction" in order to call a function Requester::ourrequester::ourfunction(). The argN arguments are passed through to the called function. Note that hese arguments are hashes containing uxmon value lists and need some special treatment to extract values from them. See Requester::procs::countprocs() and the matching call() in etc/testf/procs.cfg for an example. size( list ) Returns the number of entries in list. domain_available( domain ) Returns true if the requester called domain is willing to perform cording to its available() method. Variable expansion ------------------ Variables are *always* referenced as ${varname}. The usual shortcut $varname is *not* valid and will leado unexpected results. Individual list elements are referenced as ${varname[index]}. Together with the above rule this leads to something rather unreadable like ${varname[${index}]} if sing an index variable. ${varname[${index}]} will reference ltiple list elements, one for each element of the index variable.