Saturday, October 28, 2006

shell: parallel execution with timeouts

From time in time I was looking how to run a command with a timeout (in a shell script). I never put big efford to find the solution, because I always found a easyer (and probably better) workaround. But few days ago, in a discussion in the austing group (POSIX standardization), I found the solution, in an elegant manner, and it permit also to run commands in parallel.

The solution (it works on bash, dash, and it seems posix compatible, but you should test, because the discussion was about the different behaviours of $! in bash and kornshell):
 # run command1 in background, and the sleeping killer
command1 &
pid=$!
( sleep 60; kill $pid ) &
# other stuffs
command2
# wait command1
wait $pid

Note: kill (the shell buildin) prints some error messages, which you should filter
Note: Security: the methods is not very clean and secure. If an user can force system to recycle the PID, the scripts could do unintended things.

I this trick can be used to parallel the init script (parallelize within the script, not "run scripts in parallel", as the current trend).
So another short tip: Debian sleep support "floating sleep", so use sleep .1 istead of sleep 1, to speed up the init scripts (module loading, waiting for devices, net,...). Unfortunately most of the debian scripts use integer sleeps.

8 comments:

gatoatigrado said...

thanks a lot! Do you know of a way to make it quieter when creating the new process? i.e. get rid of e.g. [1] 5436

gatoatigrado said...

I found the answers to my questions: for quieter forking (only valid if calling within environment anyway) - surround command with parenthesis, e.g. ( command &). for quieter killing, use kill $pid 2>/dev/null to redirect stderr ("no process to kill") to /dev/null.

linux4all said...

This approach might be dangerous - pid may be already used by some other process when kill is issued.

Dave said...

Thanks! I used this tip as the base for a general "run command with timeout" function. Not sure if it'll work outside bash, but it seems to do the trick for me. There's still a PID-recycling race if the command exits right when the sleep finishes, but this should narrow the window quite a bit. And it's still a little noisy, but it doesn't really matter to me.

(copy and paste this to see it without the harsh wrapping)

function runWithTimeout ()
{
    if [ $# -lt 2 ]; then
        echo "usage: $0 <timeout ms> <commandline>"
        return 1
    fi
    timeout=$1
    shift

    # Start the command.
    "$@" &
    cmdPid=$!

    # Start the timeout process.
    ( sleep $timeout; echo "$1 timed out (pid $cmdPid)"; kill $cmdPid ) &
    sleepPid=$!

    # Don't leak the children if we're interrupted by ^C etc.
    trap "kill $cmdPid $sleepPid; exit 5" INT TERM EXIT

    # Block until the command exits or times out.
    wait $cmdPid
    cmdStatus=$?

    # Clear out the ^C trap.
    trap - INT TERM EXIT

    if [ $cmdStatus -le 128 ] ; then
        # The command exited on its own; kill the timeout process.
        # This will kill the background shell, not the sleep
        # process started by it, so the sleep itself may hang around
        # for a while. But, since the subshell will be gone when it
        # exits, nothing will try to kill $cmdPid.
        kill $sleepPid
    fi
    # The caller can check the return value against 128 to see
    # if the command timed out or simply failed.
    return $cmdStatus
}

Anonymous said...

Thanks, nice site. I've used the example given by Dave in a small demo program I wrote to figure out what was going on with my networking. Check it out.

http://www.angrylibertarian.com/node/7

Ole Tange said...

If you have GNU Parallel http://www.gnu.org/software/parallel/and timeout http://packages.debian.org/sid/timeout installed, you can do this:

cat commands | parallel timeout 60

I would find that more readable. Watch the intro video for GNU Parallel to learn more:
http://www.youtube.com/watch?v=OpaiGYxkSuQ

AndresVia said...

Check the parent of the process to kill.

#!/bin/bash

echo im doing some lengthy job
sleep 50 &
SPID=$!
PARENT=$$

echo current $PARENT

{ sleep 3 ; KILLPARENT=$(ps -o ppid -p $SPID | awk '{A=$1}END{print A}') ; [[ "$KILLPARENT" = "$PARENT" ]] && kill $SPID ; } &

echo im doing other stuff

wait "$SPID"

Anonymous said...

Actually, the pid race is a not an issue.
Because in unix if a child process ends it will become zombie (aka it's pid will be reserved and exit status will be recorded) until it's parent waits for it.
So let's keep it simple s.