Unix Linux

Solaris Kernel Statistics - Accessing libkstat with C

_침묵_ 2007. 6. 28. 23:42

출처 :http://developers.sun.com/solaris/articles/kstatc.html

Solaris Kernel Statistics - Accessing libkstat with C 

By Peter Boothby, July 2001 

Contents:
Introduction
Data Structure Overview
Getting Started
Data Types
kstatNames
Functions
Dealing with Chain Updates
Putting It All Together
Introduction

The Solaris kernel provides a set of functions and data structures for device drivers and other kernel modules to export module-specific statistics to the outside world. This infrastructure, referred to askstat, provides the Solaris software developer with:

  • C-language functions for device drivers and other kernel modules to present statistics
  • C-language functions for applications to retrieve statistics data from Solaris without needing to directly read kernel memory
  • Perl-based command-line program/usr/bin/kstatto access statistics data interactively or in shell scripts (introduced in Solaris 8)

The Solarislibkstatlibrary contains the C-language functions for accessing kstats from an application. These functions utilize the pseudo-device/dev/kstatto provide a secure interface to kernel data, obviating the need for programs that aresetuidto root.

Since many developers are interested in accessing kernel statistics through C programs, this article focuses on usinglibkstat. It explains the data structures and functions, and provides example code to get you started using the library.

Back to Top

Data Structure Overview

Solaris kernel statistics are maintained in a linked list of structures referred to as thekstatchain. Eachkstathas a common header section and a type-specific data section.

사용자 삽입 이미지

Figure 1:kstatChain
(Click image to enlarge.)

The chain is initialized at system boot time, but since Solaris is a dynamic operating system, this chain may change over time.kstatentries can be added and removed from the system as needed by the kernel. For example, when adding an I/O board and all of its attached components to a running system using Dynamic Reconfiguration, the device drivers and other kernel modules that interact with the new hardware will insertkstatentries into the chain.

The structure memberks_datais a pointer to the kstat's data section. Multiple data types are supported: raw, named, timer, interrupt, and I/O. These are explained inData Types.


 

Example 1 shows the fullkstatheader structure.

Example 1-kstatHeader Structure from/usr/include/kstat.h

typedef struct kstat { 
       /*     
        * Fields relevant to both kernel and user    
        */   
       hrtime_t       ks_crtime;               /* creation time */
       struct kstat  *ks_next;                 /* kstat chain linkage */
       kid_t          ks_kid;                  /* unique kstat ID */
       char           ks_module[KSTAT_STRLEN]; /* module name */
       uchar_t        ks_resv;                 /* reserved */
       int            ks_instance;             /* module's instance */
       char           ks_name[KSTAT_STRLEN];   /* kstat name */
       uchar_t        ks_type;                 /* kstat data type */
       char           ks_class[KSTAT_STRLEN];  /* kstat class */
       uchar_t        ks_flags;                /* kstat flags */
       void          *ks_data;                 /* kstat type-specific data */
       uint_t         ks_ndata;                /* # of data records */
       size_t         ks_data_size;            /* size of kstat data section */
       hrtime_t       ks_snaptime;             /* time of last data snapshot */
       /*    
        * Fields relevant to kernel only 
        */ 
       int (*ks_update)(struct kstat *, int); 
       void           *ks_private; 
       int (*ks_snapshot)(struct kstat *, void *, int); 
       void           *ks_lock; 
     } kstat_t;

The significant members include:

  • ks_crtime

    This reflects the time thekstatwas created, and allows you to compute the rates of various counters since thekstatwas created ("rate since boot" is replaced by the more general concept of "rate sincekstatcreation").

    All times associated with kstats, such as creation time, last snapshot time,kstat_timer_t,kstat_io_ttimestamps, and the like, are 64-bit nanosecond values.

    The accuracy ofkstattimestamps is machine dependent, but the precision (units) is the same across all platforms. Refer to thegethrtime(3C)man page for general information about high-resolution timestamps.

  • ks_next

    kstats are stored as a NULL-terminated linked list or a chain.ks_nextpoints to the nextkstatin the chain.

  • ks_kid

    This is a unique identifier for thekstat.

  • ks_module and ks_instance

    These contain the name and instance of the module that created thekstat. In cases where there can only be one instance,ks_instanceis 0. Refer tokstatNamesfor more information.

  • ks_name

    This gives a meaningful name to akstat. For additionalkstatnamespace information, seekstatNames.

  • ks_type

    This is the type of data in thiskstat.kstatdata types are covered inData Types.

  • ks_class

    Eachkstatcan be characterized as belonging to some broad class of statistics, such asbus,disk,net,vm, ormisc. This field can be used as a filter to extract related kstats.

    The following values are currently in use by Solaris:

    bus
    controller
    device_error
    disk
    hat
    kmem_cache
    kstat
    misc
    net
    nfs
    pages
    partition
    rpc
    ufs
    vm
    vmem

  • ks_data,ks_ndata, andks_data_size

    ks_datais a pointer to the kstat's data section. The type of data stored there depends onks_type.

    ks_ndataindicates the number of data records. Only somekstattypes support multiple data records.

    The following kstats support multiple data records:

    • KSTAT_TYPE_RAW
    • KSTAT_TYPE_NAMED
    • KSTAT_TYPE_TIMER

  • The following kstats support only one data record:

    • KSTAT_TYPE_INTR
    • KSTAT_TYPE_IO

    ks_data_sizeis the total size of the data section, in bytes.

  • ks_snaptime

    This is the timestamp for the last data snapshot. It allows you to compute activity rates based on the following computational method:

    rate = (new_count - old_count) / (new_snaptime - old_snaptime);

Back to Top

Getting Started

To use kstats, a program must first call tokstat_open(), which returns a pointer to akstatcontrol structure. Example 2 shows the structure members.

Example 2-kstatChain Control Structure

typedef struct kstat_ctl {   
         kid_t     kc_chain_id;    /* current kstat chain ID */   
         kstat_t   *kc_chain;      /* pointer to kstat chain */   
         int       kc_kd;          /* /dev/kstat descriptor */  
     } kstat_ctl_t;

kc_chain
points to the head of your copy of thekstatchain. You typically walk the chain or usekstat_lookup()to find and process a particular kind ofkstat.kc_chain_idis thekstatchain identifier, or KCID, of your copy of thekstatchain. Its use is explained inkstatNames.

To avoid unnecessary overhead accessingkstatdata, a program first searches thekstatchain for the type of information of interest, then uses thekstat_read()andkstat_data_lookup()functions to get the statistics data from the kernel.

Example 3 is a code fragment that shows how you might print out allkstatentries with information about disk I/O. It traverses the entire chain looking for kstats ofks_typeKSTAT_TYPE_IO, callskstat_read()to retrieve the data, and then processes the data withmy_io_display()

How to implement this sample function is shown in Example 9.

Example 3- PrintkstatEntries with Disk I/O Information

     kstat_ctl_t    *kc; 
     kstat_t       *ksp; 
     kstat_io_t     kio; 
     kc = kstat_open(); 
     for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) { 
       if (ksp->ks_type == KSTAT_TYPE_IO) { 
          kstat_read(kc, ksp, &kio); 
          my_io_display(kio); 
       } 
     } 

Back to Top

Data Types

The data section of akstatcan hold one of five types, identified in theks_typefield. The followingkstattypes can hold multiple records. The number of records is held inks_ndata.

  • KSTAT_TYPE_RAW
  • KSTAT_TYPE_NAMED
  • KSTAT_TYPE_TIMER

The fieldks_data_sizeholds the size, in bytes, of the entire data section.

KSTAT_TYPE_RAW

The "raw"kstattype is treated as an array of bytes, and is generally used to export well-known structures, such asvminfo(defined in/usr/include/sys/sysinfo.h). Example 4 shows one method of printing this information.

Example 4- Dumping Out a Rawkstat

static void print_vminfo(kstat_t *kp)  
{  
     vminfo_t *vminfop;  
     vminfop = (vminfo_t *)(kp->ks_data);  
  
     printf("Free memory: %dn", vminfop->freemem);  
     printf("Swap reserved: %dn" , vminfop->swap_resv);  
     printf("Swap allocated: %dn" , vminfop->swap_alloc);  
     printf("Swap available: %dn", vminfop->swap_avail);  
     printf("Swap free: %dn", vminfop->swap_free);  
}
KSTAT_TYPE_NAMED

This type ofkstatcontains a list of arbitraryname=valuestatistics. Example 5 shows the data structure used to hold named kstats.

Example 5- NamedkstatDefinitions from/usr/include/kstat.h

typedef struct kstat_named { 
    char name[KSTAT_STRLEN];         /* name of counter */ 
    uchar_t data_type;               /* data type */ 
    union { 
        char c[16];             /* enough for 128-bit ints */ 
        int32_t i32; 
        uint32_t ui32; 
        int64_t i64; 
        uint64_t ui64; 
        /* These structure members are obsolete */ 
        int32_t l; 
        uint32_t ul; 
        int64_t ll; 
        uint64_t ull; 
    } value;                  /* value of counter */ 
} kstat_named_t; 
  #define KSTAT_DATA_CHAR       0 
  #define KSTAT_DATA_INT32      1 
  #define KSTAT_DATA_UINT32     2 
  #define KSTAT_DATA_INT64      3 
  #define KSTAT_DATA_UINT64     4 
  /* These types are obsolete */ 
  #define KSTAT_DATA_LONG       1 
  #define KSTAT_DATA_ULONG      2 
  #define KSTAT_DATA_LONGLONG   3 
  #define KSTAT_DATA_ULONGLONG  4 
  #define KSTAT_DATA_FLOAT      5 
  #define KSTAT_DATA_DOUBLE     6 

The program in Example 9 uses a functionmy_named_display()to show how one might display named kstats.

Note that if the type isKSTAT_DATA_CHAR, the 16-byte value field is not guaranteed to be null-terminated. This is important to remember when printing the value with functions likeprintf().

KSTAT_TYPE_TIMER

Thiskstatholds event timer statistics. These provide basic counting and timing information for any type of event.

Example 6- TimerkstatDefinitions from/usr/include/kstat.h

typedef struct kstat_timer {  
    char name[KSTAT_STRLEN];         /* event name */  
    uchar_t resv;                    /* reserved */  
    u_longlong_t num_events;         /* number of events */      
    hrtime_t elapsed_time;           /* cumulative elapsed time */  
    hrtime_t min_time;               /* shortest event duration */  
    hrtime_t max_time;               /* longest event duration */  
    hrtime_t start_time;             /* previous event start time */  
    hrtime_t stop_time;              /* previous event stop time */  
} kstat_timer_t;
KSTAT_TYPE_INTR

This type ofkstatholds interrupt statistics. Interrupts are categorized as:

Interrupt Type Definition
Hard Sourced from the hardware device itself
Soft Induced by the system by means of some system interrupt source
Watchdog Induced by a periodic timer call
Spurious An interrupt entry point was entered but there was no interrupt to service
Multiple Service An interrupt was detected and serviced just prior to returning from any of the other types

Example 7 - Interrupt kstat Definitions from /usr/include/kstat.h

#define KSTAT_INTR_HARD      0   
#define KSTAT_INTR_SOFT      1   
#define KSTAT_INTR_WATCHDOG  2   
#define KSTAT_INTR_SPURIOUS  3   
#define KSTAT_INTR_MULTSVC   4   
#define KSTAT_NUM_INTRS      5   
typedef struct kstat_intr {           
    uint_t intrs[KSTAT_NUM_INTRS]; /* interrupt counters */   
} kstat_intr_t;
KSTAT_TYPE_IO

Example 8- I/OkstatDefinitions from/usr/include/kstat.h

typedef struct kstat_io {  
 /*  
  * Basic counters.  
  */  
  u_longlong_t nread;    /* number of bytes read */  
  u_longlong_t nwritten; /* number of bytes written */  
  uint_t       reads;    /* number of read operations */  
  uint_t       writes;   /* number of write operations */  
  hrtime_t wtime;           /* cumulative wait (pre-service) time */  
  hrtime_t wlentime;        /* cumulative wait length*time product*/  
  hrtime_t wlastupdate;     /* last time wait queue changed */  
  hrtime_t rtime;           /* cumulative run (service) time */  
  hrtime_t rlentime;        /* cumulative run length*time product */  
  hrtime_t rlastupdate;     /* last time run queue changed */  
  uint_t wcnt;              /* count of elements in wait state */  
  uint_t rcnt;              /* count of elements in run state */  
} kstat_io_t;
Accumulated Time and Queue Length Statistics

Time statistics are kept as a running sum of "active" time. Queue length statistics are kept as a running sum of the product of queue length and elapsed time at that length that is, a Riemann sum for queue length integrated against time. Figure 2 provides a sample graphical representation of queue/time.

사용자 삽입 이미지

Figure 2: Queue Length Sampling
(Click image to enlarge.)

At each change of state (either an entry or exit from the queue), the elapsed time since the previous state change is added to the active time (wlenorrlenfields) if the queue length was non-zero during that interval.

The product of the elapsed time and the queue length is added to the running sum of the length (fieldwlentimeorrlentimefields) multiplied by the time.


 

Stated programmatically:

if (queue length != 0) {  
    time += elapsed time since last state change;  
    lentime +=  (elapsed time since last state change * queue length);  
}

This method can be generalized to measuring residency in any defined system. Instead of queue lengths, think of "outstanding RPC calls to server X."

A large number of I/O subsystems have at least two basic "lists" of transactions they manage:

  1. A list for transactions that have been accepted for processing, but for which processing has yet to begin
  2. A list for transactions which are actively being processed, but are not complete

For these reasons, two cumulative time statistics are defined:

  1. Pre-service (wait) time
  2. Service (run) time

The units of cumulative busy time are accumulated nanoseconds.

Back to Top

kstatNames

Thekstatnamespace is defined by three fields from thekstatstructure:

  • ks_module
  • ks_instance
  • ks_name

The combination of these three fields is guaranteed to be unique.

For example, imagine a system with four FastEthernet interfaces. The device driver module for Sun's FastEthernet controller is called "hme". The first Ethernet interface would be instance 0, the second instance 1, and so on. The "hme" driver provides two types ofkstatfor each interface. The first contains named kstats with performance statistics. The second contains interrupt statistics.

Thekstatdata for the first interface's network statistics is found underks_module == "hme",ks_instance == 0, andks_name == "hme0". The interrupt statistics are contained in akstatidentified byks_module == "hme",ks_instance == 0, andks_name == "hmec0".

In that example, the combination of module name and instance number to make theks_namefield ("hme0" and "hmec0") is simply a convention for this driver. Other drivers may use similar naming conventions to publish multiplekstatdata types but are not required to; the module is required to make sure that the combination is unique.

How do you determine what kstats the kernel provides? One of the easiest ways, with Solaris 8, is to run/usr/bin/kstatwith no arguments. This will print nearly all the currentkstatdata. The Solariskstatcommand can dump most of the known kstats of typeKSTAT_TYPE_RAW.

Back to Top

Functions

The following functions are available to C programs for accessingkstatdata from user programs:

  • kstat_ctl_t * kstat_open(void);

    Initializes akstatcontrol structure to provide access to the kernel statistics library. It returns a pointer to this structure, which must be supplied as the kc argument in subsequentlibkstatfunction calls.

  • kstat_t * kstat_lookup(kstat_ctl_t *kc, char *ks_module, int ks_instance, char *ks_name);

    Traverses thekstatchain searching for akstatwith a givenks_module,ks_instance, andks_namefields. If theks_moduleisNULL,ks_instanceis -1, or ifks_nameisNULL, then those fields will be ignored in the search. For example,kstat_lookup(kc,NULL,-1, "foo") will simply find the firstkstatwith the name "foo".

  • void * kstat_data_lookup(kstat_t *ksp, char *name);

    Searches the kstat's data section for the record with the specified name. This operation is valid only forkstattypes that have named data records. Currently, only theKSTAT_TYPE_NAMEDandKSTAT_TYPE_TIMERkstats have named data records. You must first make a call onkstat_read()to get the data from the kernel. This routine is then used to find a particular record in the data section.

  • kid_t kstat_read(kstat_ctl_t *kc, kstat_t *ksp, void *buf);

    Gets data from the kernel for a particularkstat.

  • kid_t kstat_write(kstat_ctl_t *kc, kstat_t *ksp, void *buf);

    Writes data to a particularkstatin the kernel. Only the superuser can usekstat_write().

  • kid_t kstat_chain_update(kstat_ctl_t *kc);

    Brings the user'skstatheader chain in sync with that of the kernel.

  • int kstat_close(kstat_ctl_t *kc);

    Frees all resources that were associated withkstatcontrol structure. This is done automatically onexit(2)andexecve(). (For more information onexit(2)andexecve(), see theexec(2)man page.)

Back to Top

Dealing with Chain Updates

As mentioned in Data Structure Overview, thekstatchain is dynamic in nature. Thelibkstatlibrary functionkstat_open()returns a copy of the kernel'skstatchain. Since the content of the kernel's chain may change, your program should call thekstat_chain_update()function at the appropriate times to see if its private copy of the chain is the same as the kernel's. This is the purpose of the KCID (stored inkc_chain_idin thekstatcontrol structure).

Each time a kernel module adds or removes akstatfrom the system's chain, the KCID is incremented. When your program callskstat_chain_update(), the function checks to see if thekc_chain_idin your program's control structure matches the kernel's. If not,kc_chain_update()rebuilds your program's localkstatchain and returns:

  • The new KCID, if the chain has been updated
  • 0 if no change has been made
  • -1 if some error was detected

If your program has cached some local data from previous calls to thekstatlibrary, then a new KCID acts as a flag to indicate that you have up-to-date information. You can search the chain again to see if data that your program is interested in has been added or removed.

A practical example is the system commandiostat. It caches some internal data about the disks in the system and needs to recognize if a disk has been brought on-line or off-line. If iostat is called with an interval argument, it prints I/O statistics every interval second. Each time through the loop, it makes a call tokstat_chain_update()to see if something has changed. If a change took place, it figures out if a device it's interested in has been added or removed.

Back to Top

Putting It All Together

Your C source file must contain:

#include <kstat.h>

When your program is linked, the compiler command line must include the argument-lkstat.

cc -o print_some_kstats -lkstat print_some_kstats.c

The following is a short example program. First, it useskstat_lookup()andkstat_read()to find the system's CPU speed. Then it goes into an infinite loop to print a small amount of information about all kstats of typeKSTAT_TYPE_IO. Note that at the top of the loop, it callskstat_chain_update()to check that you have current data. If thekstatchain has changed, it gives a short message onstderr.

Example 9- Sample Program to Print kstats of Different Types

/*  print_some_kstats.c:  
 *  print out a couple of interesting things  
 */  
#include <kstat.h>  
#include <stdio.h>
#include <inttypes.h>
#define SLEEPTIME 10  
  
void my_named_display(char *, char *, kstat_named_t *);  
void my_io_display(char *, char *, kstat_io_t);  
  
main(int argc, char **argv)  
{  
     kstat_ctl_t   *kc;  
     kstat_t       *ksp;  
     kstat_io_t     kio;  
     kstat_named_t *knp;  
  
     kc = kstat_open();  
  
     /*  
      * Print out the CPU speed. We make two assumptions here:  
      * 1) All CPUs are the same speed, so we'll just search for the  
      *    first one;  
      * 2) At least one CPU is online, so our search will always  
      *    find something. :)  
      */  
     ksp = kstat_lookup(kc, "cpu_info", -1, NULL);  
     kstat_read(kc, ksp, NULL);  
     /* lookup the CPU speed data record */  
     knp = kstat_data_lookup(ksp, "clock_MHz");   
     printf("CPU speed of system is ");  
     my_named_display(ksp->ks_name, ksp->ks_class, knp);  
     printf("n");  
  
     /* dump some info about all I/O kstats every  
        SLEEPTIME seconds  */  
     while(1) {  
        /* make sure we have current data */  
         if(kstat_chain_update(kc))  
             fprintf(stderr, "<<State Changed>>n");   
         for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) {  
           if (ksp->ks_type == KSTAT_TYPE_IO) {  
              kstat_read(kc, ksp, &kio);  
              my_io_display(ksp->ks_name, ksp->ks_class, kio);  
           }  
         }  
         sleep(SLEEPTIME);  
     } /* while(1) */  
  
}  
  
void my_io_display(char *devname, char *class, kstat_io_t k)  
{  
     printf("Name: %s Class: %sn",devname,class);  
     printf("tnumber of bytes read %lldn", k.nread);  
     printf("tnumber of bytes written %lldn", k.nwritten);  
     printf("tnumber of read operations %dn", k.reads);  
     printf("tnumber of write operations %dnn", k.writes);  
}  
  
void   
my_named_display(char *devname, char *class, kstat_named_t *knp)  
{  
     switch(knp->data_type) {  
     case KSTAT_DATA_CHAR:  
          printf("%.16s",knp->value.c);  
          break;  
     case KSTAT_DATA_INT32:  
          printf("%" PRId32,knp->value.i32);  
          break;  
     case KSTAT_DATA_UINT32:  
          printf("%" PRIu32,knp->value.ui32);  
          break;  
     case KSTAT_DATA_INT64:  
          printf("%" PRId64,knp->value.i64);  
          break;  
     case KSTAT_DATA_UINT64:  
          printf("%" PRIu64,knp->value.ui64);  
    }  
}
Additional Information

Much of the information in this paper derives from various SunSolve InfoDocs, Solaris white papers, and Solaris man pages (section3KSTAT). For detailed information on the APIs, refer to the Solaris 8 Reference Manual Collection and Writing Device Drivers. Both publications are available at:docs.sun.com.

July 2001