Unix process tracing patterns

Author: Eric Laroche
Copyright © 1993-2004 Eric Laroche

This paper serves as an introduction to useful patterns to analyze Unix process behavior.

The powerful tracing tool used here is the truss tool (truss means 'trace [user] system calls and signals'). truss-based analyses are done fast and can even be accomplished on-the-fly by attaching to a running process (-p option; without killing or restarting the process). Programs do not need to be compiled in a debug mode, neither do they need to be instrumented (unlike with the purify tool). truss hence implies a minimal overhead.

Note that truss usually generates a lot of output. It is a good strategy to run the analyses with the -o option (also not to mix-in application-generated output) and have a convenient helper tool to view the output, such as emacs [XEmacs] with a convenient mode [that mode colorizes, to emphasize the essential, to distinguish from meta information, to emphasize errors, etc; maneuvering (e.g. jumping to file descriptor references, memory references, categories of system calls, etc), outlining, etc is not implemented (i.e.: left to the user)]. Splitting output [by pid/tid; into different files] is another strategy to make truss output more convenient to read.

The Unix truss samples here are Solaris [in 32 bit mode] samples (e.g. Solaris 8 a.k.a Solaris 2.8 a.k.a. SunOS 5.8). Other system's truss tools may be called ktrace, par, strace, trace, truss, tusc, etc.

Maximal output

Sometimes it is useful to truss as much output as possible.

Trace details (-v), system calls (-t), signals (-s), cpu 'faults' [synchronous interrupts] (-m) as well as read and write buffers (-r, -w).

    truss -f -l -a -e -vall -tall -sall -mall -rall -wall ...
Notes: unfortunately getmsg and putmsg buffers can't typically be wholly displayed.

System call errors

System call errors show up as Err#.

However, system call errors need not be exceptional conditions. Samples are the ioctl(..., TCGETA) Err#25 ENOTTY, which arise from output channel properties checks

Notes: Other errors of interest are EINTR (alarm signal went off and interrupted a system call), ENOENT (file not found), EPERM (permission denied).

Process execution chain

It may be of interest to trace the process execution chain. This is done by tracing the exec calls, which start ('load') new programs. The -a option shows the program arguments, which typically are of interest too. [Of course the follow flag (-f) is needed too.]
    truss -f -a -texec ...
Notes: -texec usually traces the whole exec system call family (exec, execv, execp, etc). Set-user-id flagged applications can only be trussed-across if initial permissions are sufficient (i.e. if user already has the set-user-id permissions, or is superuser).

Opened files

It may be of interest to know what files an application is opening or just looking for. Of interest too is the questions 'where does an application expect files?'.
    truss -f -topen,creat,stat,access ...
Notes: creat is an alternative to be considered for opening-for-writing; access is an alternative to stat.

Attaching to a running process

Attaching to a running process enables to trace a problematic process at the time a problem arises. Advantages are less output and possible post-problem analyses.
    truss ... -p pid
Notes: This truss is typically detached and ended by the intr signal (typically ^C). truss can also trace by means of the proc file system.

Truss samples

loading dynamic libraries

private mapping of the dynamic C runtime library (libc), with part of it writeable

    2725/1:     open("/usr/lib/libc.so.1", O_RDONLY) = 3
    2725/1:     fstat(3, 0xFFBEF15C) = 0
    2725/1:     mmap(0xFF390000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF390000
    2725/1:     mmap(0x00000000, 794624, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xFF280000
    2725/1:     mmap(0xFF33A000, 24668, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 696320) = 0xFF33A000
    2725/1:     munmap(0xFF32A000, 65536) = 0
    2725/1:     memcntl(0xFF280000, 113332, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
    2725/1:     close(3) = 0

vi sample: terminfo and raw tty mode

raw-user-interface library (e.g. libcurses) reads the terminal info database (a file indexed under its initial d; for TERM=dtterm), input modes have the ^M to ^J translation reset, output modes have the analogous reverse translation reset, line discipline modes have ICANON and ECHO reset and VMIN / VTIME adjusted

    9910/1:     open("/usr/share/lib/terminfo//d/dtterm", O_RDONLY) = 4
    9910/1:     read(4, 0xFFBED940, 4096) = 1652
    9910/1:       1A011D\0 %\0 !\08A01D002 d t t e r m | C D E   t e r m i n a l
    ...
    9910/1:     ioctl(2, TCGETS, 0x0009FAC8) = 0
    9910/1:         iflag=0012400 oflag=0000005 cflag=037744173375 lflag=0105073
    9910/1:             cc:  003 034 010 025 004 000 000 000
    9910/1:              021 023 032 031 022 017 027 026 000 000 000
    9910/1:     ioctl(2, TCSETSW, 0x0009FAC8) = 0
    9910/1:         iflag=0012000 oflag=0000001 cflag=037744173375 lflag=0105061
    9910/1:             cc:  003 000 010 025 001 001 000 000
    9910/1:              021 023 000 000 000 000 000 000 000 000 000

IP server

internet-domain streaming (TCP) socket is bound to all available local network interfaces, process is blocking/sleeping on accept, later returns a new file descriptor, read is blocking too

    1285/1:     so_socket(2, 2, 6, "", 1) = 3
    1285/1:     bind(3, 0xFFBEF0C8, 16, 3) = 0
    1285/1:         AF_INET  name = 0.0.0.0  port = 6572
    1285/1:     listen(3, 1, 1) = 0
    1285/1:     accept(3, 0xFFBEF0B8, 0xFFBEF0B4, 1) (sleeping...)
    1285/1:     accept(3, 0xFFBEF0B8, 0xFFBEF0B4, 1) = 4
    1285/1:         AF_INET  name = 10.10.10.10  port = 53264
    1285/1:     read(4, 0xFFBEF04F, 1) (sleeping...)
    1285/1:     read(4, " G", 1) = 1
    1285/1:     read(4, " E", 1) = 1
    1285/1:     read(4, " T", 1) = 1
    ...

netstat sample: obtaining information from a streams device

/dev/ip must be opened in read-write mode for the requested operations (device is usually protected by file permissions), protocol layers are pushed

    4655/1:     open("/dev/ip", O_RDWR) = 4
    4655/1:     ioctl(4, I_PUSH, "arp") = 0
    4655/1:     ioctl(4, I_PUSH, "tcp") = 0
    4655/1:     ioctl(4, I_PUSH, "udp") = 0
    4655/1:     ioctl(4, I_PUSH, "icmp") = 0
    4655/1:     putmsg(4, 0xFFBEFA64, 0x00000000, 0) = 0
    4655/1:     getmsg(4, 0xFFBEFA64, 0x00000000, 0xFFBEFA60) = 2
    4655/1:     getmsg(4, 0x00000000, 0xFFBEFA54, 0xFFBEFA60) = 0
    ...

a shell waiting on a child process

    1010/1:     psargs: bash
    1010/1:     waitid(P_ALL, 0, 0xFFBEEE18, WEXITED|WTRAPPED|WSTOPPED) (sleeping...)

an application waiting on user input

    2339/1:     psargs: dc
    2339/1:     read(0, 0xFF33F634, 1024) (sleeping...)


Eric Laroche, laroche@lrdev.com, Tue Mar 2 2004