Context
OpenSTA is a Windows based tool, and so has the ability to monitor the performance of Windows-based machines built into it. Creating an NT Performance collector and selecting the ‘Browse Queries’ button to select what is required allows the simple definition of the information required. Where performance information is required for other systems, however, the definition of the required collectors is slightly more complex.
In our case, the system’s web server, application server and database
were deployed on Solaris machines. The information required was the CPU
usage of the system during the test. The objective was to be able to quantify
the CPU usage of the system for high-level performance modelling purposes.
OpenSTA provides for the definition of SNMP collectors, and Solaris provides
an SNMP service.
An initial examination of SNMP Collectors
Once the Solaris SNMP Service has been started, it is possible to define
an SNMP Collector that can read a value from it.
This can be performed by:
1.In the Commander, right-click on the ‘Collectors’ folder and select
‘New Collector -> SNMP’.
2.Name the Collector appropriately, and then open it.
3.In the Collector’s ‘Edit Query…’ dialog put the machine’s name into the ‘Address’
section, and then ‘Browse Queries’.
This provides a drop down of list of collectable data that is available,
and the current value of it. It is worth noting that OpenSTA can only
monitor values that are returned via SNMP as an integer. This is not too
limiting, but it is worth being aware of when designing the tests to be
performed.
As an example, using the ‘interfaces’ section of the system it is possible
to obtain details of the amount of network data coming into the system
with a request of the following form:
public interfaces.ifTable.ifEntry.ifInOctets.2
Unfortunately, the standard interfaces provided do not include CPU usage
figures. It is, therefore, necessary to examine the issue in more detail.
Examining the Solaris SNMP Service
What was required?
OpenSTA can send ad-hoc SNMP requests to a system. The correct request
to send to a system is defined using a ‘mib’ file, which is published by
the equipment’s manufacturer. The MIB definitions are available in various
public libraries as well as from the manufacturer, and using the SNMP resource
links at the OpenSTA portal it was relatively easy to obtain the Solaris
mib files.
To help in the examination of the system an SNMP browsing tool was also
obtained. In this case I used the Getif tool,
but there are a number of tools available. The tool allows the SNMP tree
from the target Solaris system to be examined.
The contents of the Solaris MIB
Examining the Solaris mib file, the following entry was located:
rsUserProcessTime OBJECT-TYPE
SYNTAX Counter
ACCESSread-only
DESCRIPTION
"total number of timeticks used by user processes
since the system was last booted."
::= { sunHostPerf 1 }
Similar entries were available for the ‘Idle’, ‘Nice’, and ‘System’
time. Searching upwards in the MIB file, the following section was located:
sun OBJECT IDENTIFIER ::= { enterprises 42 }
productsOBJECT IDENTIFIER ::= { sun 2 }
sunMibOBJECT IDENTIFIER ::= { sun 3 }
…
sunHostPerf OBJECT IDENTIFIER ::= { sunMib 13 }
SNMP requests are defined as a series of ‘.’ Separated numbers,
with non-table entries terminated by ‘.0’. Thus, the correct SNMP request
will end; *.42.3.13.1.0. The use of the ‘Getif’ tool allows the SNMP tree
to be examined, and that can provide the full SNMP request as: “.1.3.6.1.4.1.42.3.13.1.0”,
or symbolically “.iso.org.dod.internet.private.enterprises.sun.sunmib.sunhostperf.userticks”.
Building the OpenSTA Request
Comparing the interfaces request shown earlier with the requests above
it is clear they are in different formats. Using the full SNMP request
in the OpenSTA doesn’t provide correct results, and so the question arises
as to how to use the located SNMP request with OpenSTA? The key to this
comes from examining the Microsoft SNMP requests built in to OpenSTA.
Here the requests start “public enterprises.”, with the rest being Microsoft
specific. In the sun mib, ‘42’ (for sun) follow the ‘enterprises’ section.
Thus the request within OpenSTA is:
public enterprises.42.3.13.1.0
Using this request within the OpenSTA Query field, and then monitoring
the results provides correct results. Note that the numbers are counters,
with values collected since the machine was last rebooted, and so if the
graphs are to be displayed visually it is worth selecting the ‘Delta Value’
option in the Query dialog. Similar Queries can then be constructed to
obtain the other CPU allocation categories.
Using the results
To examine the usage of the data collected, the requests for the different
CPU types were placed in a single collector definition. This collector
can then be placed within a Monitoring script, which may then be manually
started and stopped. When using the collector in a test, however, it will
start and stop with the test by default.
The first stage of investigating the data being collected was to compare
result measured over time across all the CPU fields to the output of ‘top’
running on the target machine. The ‘top’ program displays, among other
data, the percentage CPU using over a variable time period. Whilst the
monitoring was running, the value of the collectors was graphed over time.
When using a delta value the first figure tends to be a large peak and
then the values settle down. It is, therefore, possible to provide a reasonable
view of the data during the test by using a ‘Rolling Graph’ in the monitor
window, and waiting for the initial peak to scroll out of the window. Empirically
this provides the information that the counters are incremented by a total
of 100 ticks per second on a single CPU system, with those ticks spread
across the CPU categories appropriately.
Once the data has been collected it may be exported to Excel for detailed
analysis. In attempting this analysis, one issue that needs to be dealt
with is the fact that the data arrives back at the collectors at different
times in the export.
The following data was collected using a 5s sampling rate on a system
that is mostly idle: | User Ticks (Task Group 0) | Nice Ticks (Task Group 0) | System Ticks (Task Group 0) | Idle Ticks (Task Group 0) | 00:06 | 0 | 152940 | 1141326 | 3.72E+08 | 00:11 | 0 | 0 | | 498 |
00:12 | | | 2 | | 00:16 | 1 | 0 | | 498 | 00:17 | | | 1 | | 00:21 | 0 | 0 | | 500 | 00:22 | | | 1 | |
When the system is busy the data can also get further out of step. In
interpreting the information it is important to remember that the underlying
data is a total for each category, and not a snapshot value. Where CPU
usage measurement metrics are required for an operation, rather than a
visual indication or general validation, it may be worthwhile not using
delta values. Alternatively the delta values may be added back together
between two points in time.
References
|