Some linux commands for monitoring

The Basics of Monitoring a UNIX System:

A slow system could be the result of a bottleneck in processing (CPU), memory, disk, or bandwidth. System monitoring tools help you to clearly identify the bottlenecks causing poor performance. Let’s briefly examine what’s involved in the monitoring of each of these resources on your system.

Monitoring CPU Usage:

User versus system usage:

You can identify the percentage of time the CPU power is being used for users’ applications as compared with time spent servicing the operating system’s overhead.

Runnable processes:

At any given time, a process is either running or waiting for resources to be freed up. A process that is waiting for the allocation of resources is called a runnable process. The presence of a large number of runnable processes indicates that your system may be facing a power crunch—it is CPU-bound.

Context switches and interrupts:

When the operating system switches between processes, it incurs some overhead due to the so-called context switches. If you have too many context switches, you’ll see deterioration in CPU usage. You’ll incur similar overhead when you have too many interrupts, caused by the operating system when it finishes certain hardware or software related tasks.

Managing Memory:

Page ins and page outs:

If you have a high number of page ins and page outs in your memory statistics, it means that your system is doing an excessive amount of paging, the moving of pages from memory to the disk system due to inadequate available memory. Excessive paging could lead to a condition called thrashing, which just means you are using critical system resources to move pages back and forth between memory and disk.

Swap ins and swap outs:

The swapping statistics also indicate how adequate your current memory allocation is for your system.

Active and inactive pages:

If you have too few inactive memory pages, it may mean that your physical memory is inadequate.

Monitoring Disk Storage:

Check for free space:

Using simple commands, a system administrator or a DBA can check the amount of free space left on the system. It’s good, of course, to do this on a regular basis so you can head off a resource crunch before it’s too late.

Reads and writes:

The read/write figures give you a good picture of how hot your disks are running. You can tell whether your system is handling its workload well, or if it’s experiencing an extraordinary I/O load at any given time.

Monitoring Tools for UNIX Systems:

Top – Linux Process Monitoring:

The top command used to display all the running and active real-time processes in ordered list and updates it regularly. It display CPU usage, Memory usage, Swap Memory, Cache Size, Buffer Size, Process PID, User, Commands and much more. It also shows high memory and cpu utilization of a running processes.

VmStat – Virtual Memory Statistics:

The vmstat utility helps you monitor memory usage, page faults, processes and CPU activity. The
vmstat utility’s output is divided into two parts: virtual memory (VM) and CPU. The VM section is
divided into three parts: memory, page, and faults. In the memory section, avm stands for “active virtual memory” and free is short for “free memory.” The page and faults items provide detailed
information on page reclaims, pages paged in and out, and device interrupt rates.

iostat: I/O Statistics:

A key part of the performance assessment is disk performance. The iostat command gives the performance metrics of the storage interfaces.

Device
The name of the device
tps
Number of transfers per second, i.e. number of I/O operations per second. Note: this is just the number of I/O operations; each operation could be huge or small.
Blk_read/s
Number of blocks read from this device per second. Blocks are usually of 512 bytes in size. This is a better value of the disk’s utilization.
Blk_wrtn/s
Number of blocks written to this device per second
Blk_read
Number of blocks read from this device so far. Be careful; this is not what is happening right now.
These many blocks have already been read from the device. It’s possible that nothing is being read now. Watch this for some time to see if there is a change.
Blk_wrtn
Number of blocks written to the device

sar (System Activity Recorder):

sar stands for System Activity Recorder, which records the metrics of the key components of the Linux system—CPU, Memory, Disks, Network, etc.—in a special place: the directory /var/log/sa. The data is recorded for each day in a file named sa<nn> where <nn> is the two digit day of the month.

CPU: The CPU identifier; “all” means all the CPUs
% user : The percentage of CPU used for user processes. Oracle processes come under this category.
% nice: The %ge of CPU utilization while executing under nice priority
% system : The %age of CPU executing system processes
% iowait: The %age of CPU waiting for I/O
% idle : The %age of CPU idle waiting for work

Display the number of CPUs
   cat /proc/cpuinfo|grep processor|wc –l

Show top CPU%
    ps aux|sort -n +2

Display top-10 CPU consumers
   ps aux|sort -rn +2|head -10

Shutdown server as root
     /sbin/shutdown -r now

Kill all xxx processes
     pkill [-9] “xxx”

Show swap paging space
     /sbin/swapon -s

Show Linux syslog errors
    tail /var/log/messages

Show swap disk details
   swapon -s

See held memory segments
    ipcs -m

Show Linux system parms
sysctl -a

Linux command history files
   history|more

Huge pages
----------
grep -i huge /proc/meminfo

CPU INFO
------------
cat /proc/cpuinfo | grep processor | awk '{a++} END {print a}'

Pages

Wednesday, 8 May 2013