Sunday, March 11, 2012

Performance Monitoring on Operating System Level

It is that time of year again in my office. We have to fill up forms for this year’s goals. I can not say that I do enjoy filling up these forms. But it is one of these things you don’t like doing it, but you have to do it, because it is part of your job.

Some of our goals must be in align with business’s direction. One of them is to make our application faster. So, performance and performance related tasks are the main project in my work for this year. We  aim at boasting our performance to single digit millisecond, if not a nanosecond.

In software development, performance enhancement is not focused in first release or iterations, unless performance is a vital feature of application. Of course, as part of best practices, generally during design stage, performance side is considered. But main goal is to ship the product with a reasonable performance, initially. Once application is released, along with other new features, performance of the application is improved (provided that application is a success and some people are still using it.J). At the moment, in my work, we are in performance tuning stage for our application.

First step in performance enhancement is to gather performance related measurements, so you will have some quantitative data to determine which component of the application needs to be addressed. These initial data will be used also as benchmark for performance enhancement. The measurements are collected by profiling application, looking at OS level data and benchmarking.

OS level data provides you first-hand knowledge of utilization of your resources. In the first place, you may resolve some of these issues by increasing your resources (CPU, memory, disk, network) within given budget. In my current company, a new hardware or resource is not bought unless, either it is broken or a given utilization threshold for that resource is not met. For example, to get a new server with a better CPU specification, its CPU utilization must be more than 40% generally in a normal trading day. Fair enough! No need to waste money on new toys, if applications do not already utilise these resources. So, in performance analysing, first step is to measure how an application utilise resources such as CPU, memory, network, disk.

In this article, I will list commonly used tools for monitoring performance related data in OS (Windows and Linux). I am not going to dive into how to use these tools, as there are many related online/offline resources (man, docs …). If you want to get into more detail, I can also recommend a new book, Java Performance by C. Hunt and B. John, of which this and upcoming performance related text will be based.

CPU Utilization
Monitoring OS system allow us to see how an application utilize CPU cycles. For example, 
if a multithreading application causes saturation of CPU resources, that issue needs to be resolved before considering the increase of CPUs number. 

When monitoring CPU utilization, two measurements have to be collected: User and Kernel or System (sys) CPU utilization. Impact of an application will be displayed as user utilization in tools.

Windows
To monitor performance of CPU in Windows, there are three tools. Task Manager, perfmon, typeperf.

Task Manager is widely recognized and one click away on the Desktop. In performance tab of Task Manager, you can see CPU related graphs along with memory measurements. In order to display kernel utilization, you need to enable it by View à Show Kernel Times. By doing so, kernel utilization measurement will be shown in red line in charts. User utilization will be difference between these two charts.

Task Manager does not provide low level measurements. For that one perfmon tool is needed.This is a very advance tool and you can display many measurements in graph by selecting it from Add Counters … window. For example to display user and kernel time of a processor, you need to select % User Time and % Privileged Time in Processor item.

Perfmon is a graphical interface. To automate monitoring in a batch script, there is a command line command typeperf which is a command line interface of perfmon. For example to show user, kernel and total CPU utilization with 5 sec interval, following command is used.
typeperf –si 5 "\Processor(_Total)\% User Time" "\Processor(_Total)% Privileged Time" "\Processor(_Total)% Processor Time”

Linux
For linux type operating system, there are two main tools to display CPU utilizations (and other measurements too): vmstat, top.
In vmstat, CPU related measurements are shown under cs, sy, id and wa column of cpu header:
  • us: Time spent (%) running non-kernel code. (user time, including nice time)
  • sy: Time spent (%) running kernel code. (system time)
  • id: Time spent (%) idle. Prior to Linux 2.5.41, this includes IO-wait time.
  • wa: Time spent (%) waiting for IO.

By using top command, similar CPU related measurements in top of the screen can be seen.

CPU Schedule Run Queue
CPU schedule run queues hold light-weight processes which are ready to be run but are waiting for a CPU to be executed. Size of queue increases when there are more lightweight process are ready to be executed than system can handle. So, if queue size is an indication of performance issue in system. Generally, if size of a queue is 3 or 4 times bigger than the number of processes (in java this is Runtime.availableProcessprs()), then it can be assumed that system is not good enough to handle lightweight processes. As a solution, to this issue, either increase the CPU number or optimize the source code of applications to reduce CPU cycles.

In Windows, following typeperf command will show run queues:
typeperf -si 5 ”\System\Processor Queue Length”
In vmstat, first column r shows the actual number of lightweight processes.

Memory Utilization
In memory utilization, paging or swapping, locking, voluntary and involuntary context switching activities should be monitored.

If a system’s main memory is not enough to process a request, then a disk space from memory (swap space) is used to dump the content of the memory. Therefore, application causing so much swapping or paging will come to a slowdown. For example, for a Java application, if a part of heap memory is paged, then during garbage collector phrase, this part has to be taken in to memory. That would increase garbage collector time (if no concurrent garbage collector is used, that means application stop longer time).

In Windows following command show available memory and paging (per second) activities:
typeperf -si 5 ”\Memory\Available Mbytes” ”\Memory\Pages/sec”

In Linux, again top and vmstat commands are our best friend again. In vmstat, memory and si and so columns shows memory related measurements:
  • swpd: the amount of virtual memory used.
  • free: the amount of idle memory.
  • buff: the amount of memory used as buffers.
  • cache: the amount of memory used as cache.
  • inact: the amount of inactive memory. (-a option)
  • active: the amount of active memory. (-a option)
  • si: Amount of memory swapped in from disk (/s).
  • so: Amount of memory swapped to disk (/s).
In the case of less memory size and the rise of paging, increase of RAM should be considered 
as an appropriate step. Please, note that paging can happen frequently when you launch 
an application.

To monitoring locking, voluntary and involuntary context switching activities in linux a package called pidstat has to be installed.

Network Utilization
A system with heavy network communication, must utilise well its network resources (network bandwidth or network IO), otherwise that will degrade the performance of applications.

In Linux, we can use netstat to display network communications. But this tool does not provide total utilization of resources. A manual estimation has to be done with according to capacity of network resource and current network activities reported by netstat. Following formula can be used for utilization:

Network Utilization= Bytes Total/Sec/(Current Bandwith/8)*100

Note: Current bandwith in bits, therefore we convert it to byte by diving 8.

Similar estimation can be done in Windows by using following command:

typeperf -si 5 "\Network Interface(*)\Bytes Total/sec"

Disk Utilization
Apart from network, disk IO is another factor for performance. An application has significant IO interaction, such as a database, must consider disk IO utilizations.

In Linux iostat command is used for monitoring disk IO activities. For example when iostat is used with extended statistic argument (x) it will provide utilization for each devices.

SAR
These tools described above provide information related to current state of system. But if we are interested in historical performance data, sar command can be useful. This command in Linux, provides measurements data for an extended period of time (e.x : last 10 days).

CPU/Cache Utilization
To gather CPU and cache (L1, L2, L3) related statistics (such as load, stores, misses, number of cycles, instruction)  perf  and likwid.  For example to find detailed L1 cache statistics of a java process, following command can be used:

$ perf stat -d -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores,L1-dcache-store-misses,cache-references,cache-misses,cycles,instructions java myProcess