CRON job erratic behavior with Bourne shell script

I have the following script, that runs normally when I type the script name at the prompt (logscript):

#!/bin/sh
dvar=`date +"%m\/%d\/%y"`
filedate=`date +%b%d%Y`
echo DSS1 > serverlog_${filedate}.txt
grep "^$dvar" oasErrLog >> serverlog_${filedate}.txt
echo CMX1 >> serverlog_${filedate}.txt
ssh GVECMX1 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
echo CMX2 >> serverlog_${filedate}.txt
ssh GVECMX2 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
echo XIS1 >> serverlog_${filedate}.txt
ssh GVEXIS1 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
echo XIS2 >> serverlog_${filedate}.txt
ssh GVEXIS2 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
scp serverlog_${filedate}.txt "GVEXOSR2:C:/Documents\ and\ Settings/gve/Desktop/logs"
#rm serverlog_${filedate}.txt

Sample normal output is:

DSS1
01/11/10 03:00:08.139 XIS run_backupserver - Startup of XISBACKUP backupserver complete.
CMX1
01/11/10 04:30:05.710 OMNICOMM 30274 - boatimesync: error 3 from boaRx rtu 35 pretry 1 {protocols/boa/boa_command.c:601}
01/11/10 04:30:12.117 OMNICOMM 30274 - CRC (0FFC) does not match calculated CRC (03B0) for remote JRS. headerLength=5 dataLength=0 crcByteOffset=2  functionCode=1  {protocols/boa/boa_io.c:1177}    
CMX2
XIS1
XIS2
01/11/10 03:00:10.563 XIS run_backupserver - Startup of XISBACKUP backupserver complete.

BUT when I set up a CRON job, it runs and scps the file, but the contents are wrong, AND the file isn't on the server (when the rm line is commented out like I showed above). Here is the output I'm getting, but NOTE: the output changes, it varys in what it outputs:

DSS1
CMX1
01/11/10 001/11/10 04:30:05.710 OMNICOMM 30274 - boatimesync: error 3 from boaRx rtu 35 pretry 1 {protocols/boa/boa_command.c:601}
01/11/10 04:30:12.117 OMNICOMM 30274 - CRC (0FFC) does not match calculated CRC (03B0) for remote JRS. headerLength=5 dataLength=0 crcByteOffset=2  functionCode=1  {protocols/boa/boa_io.c:1177}
CMX2
CMX2
CMX2
CMX2
XIS1
XIS1
XIS1
XIS1
XIS2
01/1101/11/10 001/11/10 03:00:10.563 XIS run_backupserver - Startup of XISBACKUP backupserver complete.

Any ideas on why the CRON job is not running this exactly as the system runs it when the command is typed in manually?

EDIT: I have modified the script to loop and with all absolute addressing and modified the CRON file with SHELL, PATH and HOME variables, but the output is still erratic, here's the script now:

#!/bin/sh

### internal variable definitions
dvar=`date +"%m\/%d\/%y"`
filedate=`date +%b%d%Y`

# add the prefix of new hosts into the string below
# which will be expanded later into GVE(whatever) while looping
HOSTLIST="DSS1 CMX1 CMX2 XIS1 XIS2"

# main Loop
for SUFFIX in $HOSTLIST
do
  echo $SUFFIX >> /home/gve/log/serverlog_${filedate}.txt
  ssh GVE$SUFFIX grep "^$dvar" /home/gve/log/oasErrLog \
    >> /home/gve/log/serverlog_${filedate}.txt
  echo "\n" >> /home/gve/log/serverlog_${filedate}.txt
done

# transfer and delete file
scp /home/gve/log/serverlog_${filedate}.txt \
  "GVEXOSR2:C:/Documents\ and\ Settings/gve/Desktop/logs"
#rm serverlog_${filedate}.txt

and here is the output:

DSS1
01/1201/12/10 03:00:08.323 XIS run_backupserver - Startup of XISBACKUP backupserver complete.
1
01/12/101/12/10 00:00:37.003 agc - dbioLower: DNP prev cmd may still in prog, name NPC_GT3_GOV raiseTimeout 1250 lowerTimeout 2500 curtime(1263286837:3) cmd_time(1263286834:807)
01/12/10 02:14:57.545 OMNICOMM 1562 - CRC (F110) does not match calculated CRC (1110) for remote ARS. headerLength=5 dataLength=10 crcByteOffset=7  functionCode=2  {protocols/boa/boa_io.c:1177}


CMX2


XIS1


XIS1


XIS2
01/12/101/12/10 03:00:10.408 XIS run_backupserver - Startup of XISBACKUP backupserver complete.

Notice the messed up dates on some lines, the '1' instead of 'CMX1' and the duped 'XIS1'.

FINAL EDIT:

It looks like that somehow CRON has spawned multiple processes that were tripping over each other. After kill all applicable processes it straightened out. CRON has a history (if you do some web-searching on it) of buggy multiple-process spawns, so beware.


Solution 1:

First rule of cron is to set up several things you expect. One, what directory you are in (explicitly 'cd' to it). Two, the path you expect (in the crontab, PATH=...) and three, where the mail goes (if you want to change it.)

For example:

SHELL=/bin/sh
PATH=/bin:/sbin:/usr/bin:/usr/sbin
HOME=/var/log

I would then also have each script either set up additional paths if needed, and always

cd $HOME

or to another explicit path.

Solution 2:

CRON does have its own environment.

Did you install the job with crontab -e as the user running the job? How was the job added?

Also, a little re-do of the script, with looping; this should work OK on your setup.

#!/bin/sh

### internal variable definitions
dvar=`date +"%m\/%d\/%y"`
filedate=`date +%b%d%Y`

# add the prefix of new hosts into the string below,
# which will be expanded later into GVE{whatever} while looping
HOSTLIST="DSS1 CMX1 CMX2 XIS1 XIS2"

# main processing loop
for SUFFIX in $HOSTLIST
do
  echo $SUFFIX >> serverlog_${filedate}.txt     
  ssh GVE$SUFFIX grep "^$dvar" /home/gve/log/oasErrLog \
    >> serverlog_${filedate}.txt
done

scp serverlog_${filedate}.txt \
  "GVEXOSR2:C:/Documents\ and\ Settings/gve/Desktop/logs"

Follow-up to 2nd attempt:

Ok, so something is definitely horked. The fact that you got 2x XIS1 is a good indication that either buffers are not being written correctly, or the shell itself is the culprit. The loop should be isolating each host as it runs, so unless you've got unflushed pipes/buffers/what-have-you lying around, it should not show XIS1 two times in a row. Try explicitly using #!/bin/bash as the shell instead of sh, sometimes vendors re-hook sh to something other than bash (and the loop is a bash-ism, so it could cause problems). Also, put a sync just before the done in the script, to see if it is a buffering issue.