Prevent duplicate script from running at the same time

I am using scrapy to fetch some resources, and I want to make it a cron job which can start every 30 minutes.

The cron job:

0,30 * * * * /home/us/jobs/run_scrapy.sh`

run_scrapy.sh:

#!/bin/sh
cd ~/spiders/goods
PATH=$PATH:/usr/local/bin
export PATH
pkill -f $(pgrep run_scrapy.sh | grep -v $$)
sleep 2s
scrapy crawl good

As the script shows I tried to kill the script process and the child process (scrapy) also.

However when I tried running two instances of the script, the newer instance does not kill the older one.

How to fix that?


Update:

I have more than one .sh scrapy script which run at different frequency configured in cron.


Update 2 - Test for Serg's answer:

All the cron jobs have been stopped before I run the test.

Then I open three terminal windows say they are named w1 w2 and w3, and run the commands in the following orders:

Run `pgrep scrapy` in w3, which print none.(means no scrapy running at the moment).

Run `./scrapy_wrapper.sh` in w1

Run `pgrep scrapy` in w3 which print one process id say it is `1234`(means scrapy have been started by the script)

Run `./scrapy_wrapper.sh` in w2 #check the w1 and found the script have been terminated.

Run `pgrep scrapy` in w3 which print two process id `1234` and `5678`

Press <kbd>Ctrl</kbd>+<kbd>C</kbd> in w2 (twice)

Run `pgrep scrapy` in w3 which print one process id `1234` (means scrapy of `5678` have been stopped)

At this moment, I have to use pkill scrapy to stop scrapy with id of 1234


Better approach would be to use a wrapper script, that will call the main script. This would look like this:

#!/bin/bash
# This is /home/user/bin/wrapper.sh file
pkill -f 'main_script.sh'
exec bash ./main_script.sh

Of course wrapper has to be named differently. That way, pkill can search only for your main script. This way your main script reduces to this:

#!/bin/sh
cd /home/user/spiders/goods
PATH=$PATH:/usr/local/bin
export PATH
scrapy crawl good

Note that in my example I am using ./ because script was in my current working directory. Use full path to your script for best results

I have tested this approach with a simple main script that just runs infinite while loop and wrapper script. As you can see in screenshot, launching second instance of wrapper kills previous

enter image description here

Your script

This is just example. Remember that I have no access to scrapy to actually test this so adjust this as needed for your situation.

Your cron entry should look like this:

0,30 * * * * /home/us/jobs/scrapy_wrapper.sh

Contents of scrapy_wrapper.sh

#!/bin/bash
pkill -f 'run_scrapy.sh'
exec sh /home/us/jobs/run_scrapy.sh

Contents of run_scrapy.sh

#!/bin/bash
cd /home/user/spiders/goods
PATH=$PATH:/usr/local/bin
export PATH
# sleep delay now is not necessary
# but uncomment if you think it is
# sleep 2
scrapy crawl good

If I understand what you are doing correctly, you want to call a process every 30 minutes (via cron). However, of when you start a new process via cron, you want to kill any existing versions still running?

You could use the "timeout" command to ensure that if scrappy if forced to terminate if it is still running after 30 minutes.

This would make your script look like this:

#!/bin/sh
cd ~/spiders/goods
PATH=$PATH:/usr/local/bin
export PATH
timeout 30m scrapy crawl good

note the timeout added in the last line

I have set the duration to "30m" (30 minutes). You might want to choose a slightly shorter time (say 29m) to ensure that the process has terminated before the next job starts.

Note that if you change the spawn interval in crontab, you will have to edit the script as well


Maybe you should monitor if script is running by creating parent shell script pid file and try to kill previous running parent shell script by checking pid file. Something like that

#!/bin/sh
PATH=$PATH:/usr/local/bin
PIDFILE=/var/run/scrappy.pid
TIMEOUT="10s"

#Check if script pid file exists and kill process
if [ -f "$PIDFILE" ]
then
  PID=$(cat $PIDFILE)
  #Check if process id is valid
  ps -p $PID >/dev/null 2>&1
  if [ "$?" -eq "0" ]
  then
    #If it is valid kill process id
    kill "$PID"
    #Wait for timeout
    sleep "$TIMEOUT"
    #Check if process is still running after timeout
    ps -p $PID >/dev/null 2>&1
    if [ "$?" -eq "0" ]
    then
      echo "ERROR: Process is still running"
      exit 1
    fi
  fi 
fi

#Create PID file
echo $$ > $PIDFILE
if [ "$?" -ne "0" ]
then
  echo "ERROR: Could not create PID file"
  exit 1
fi

export PATH
cd ~/spiders/goods
scrapy crawl good
#Delete PID file
rm "$PIDFILE"