Odd date behaviour

A

Adrian 7 years ago

Raspbrian 4.9.59-v7

I'm seeing some odd behaviour with date in some stuff that a Pi is running.

I've got a cron job that runs every minute, calling a bash script. The script writes out to a log file every time it runs giving the state of a sensor, typically four or five lines of output, and each stage starts with the time, generated by :

date +%H%M

If there is a change in the state of the sensor, then this will be reported, using just the date command :

echo `date`

The strange thing is that the entries published every minute are accurate, but in the change of state entries can be out by anything up to 9 minutes late, the discrepancy varies.

From yesterday's log :

0628

---------- thread started

0629

---------- thread started Thu 6 Dec 06:38:01 GMT 2018 sensor has become inactive

0630

---------- thread started

The sensor did become inactive at 0629 (I triggered it and checked the time). NTP is running, so the Pi time is correct.

running date +%H%M and date from the command line give the same results.

Any suggestions on what is going wrong welcomed.

TIA

Adrian

To Reply : replace "bulleid" with "adrian" - all mail to bulleid is rejected Sorry for the rigmarole, If I want spam, I'll go to the shops Every time someone says "I don't believe in trolls", another one dies.

Vote

N

NY 7 years ago

Weird one!

When you say running

date +%H%M and date

from the command line, do you mean that once the sensor has changed state and your script has logged this, the clock is permanently a few minutes seconds fast?

What's the significance of the "0630 / ---------- / thread started" at the beginning of each block - how are you generating that timestamp?

I could understand it if the clock was slow, as something could be "stealing" timer ticks, but running fast is more difficult to explain.

Once the sensor has changed state and the date command returns the wrong time, what happens if the sensor triggers again? Does the date command show an even bigger (double?) discrepancy from real time, or is it the always wrong by the same time. In other words, is the difference between sensor-changed times then the same as the interval as timed by reference to an external clock.

As a matter of interest, does anyone know why UNIX "date" command lists the fields in a stupid order? "Thu 6 Dec 06:38:01 GMT 2018" is bizarre. Something I've always wondered. The year is at the end instead of next to the month and day.

For some weird reason, the British and American locales are defined with the year field in the wrong place. "Thu 6 Dec 2018 06:38:01 GMT" makes a lot more sense.

I manually modified my en_GB locale definition with that change so the date command would get it right.

Vote

D

druck 7 years ago

ing.

The

How are you writing to the file? Simple redirection of stdout?

Better to use:-

echo $(date)

It's more reliable then back ticks which can have strange issues in scripts.

I'm not quite sure from your description if you mean the date in the log

messages is wrong, or if the log entry is being put in the log file after entries which should precede it.

One thing you might want to consider is instead of doing your own echoing of dates and messages to a file, is to use syslog, which does all this for you.

You can either output to syslog explicitly from your bash script

echo "state change message" | logger -t TAG

Or redirect the entire output of the script by doing it in he crontab

/full/path/to/your/script | /usr/bin/logger -t TAG

TAG is what will appear in each line in syslog so you can use something appropriate to your program such as SensorState.

To get your messgaes put in thier own logfile rather than /var/log/syslog create a in /etc/rsyslog.d/ with a name such as SensorState.conf containing:-

# Filter SensorState tags :syslogtag, isequal, "SensorState:" { /var/log/SensorState.log stop }

Finally to stop your log file growing indefinitely you can register it with log rotate. Create /etc/logrotate.d/SensorState containing:-

/var/log/backup2NASimage.log { weekly rotate 4 missingok notifempty compress nocreate }

This will start a new log file every week, keeping compressed copies of the last 4.

---druck

Vote

A

A. Dumas 7 years ago

Op 07-12-18 om 15:43 schreef Adrian:

Is it possible that two different date programs are called? Maybe one is the program and the other a script or an alias? (although aliases don't work from a script). Are both run from a sudo context? One might be in a different $PATH. Try being very specific:

/bin/date '+%T' echo "$(/bin/date '+%F %T %z') message"

Vote

C

Chris Elvidge 7 years ago

try: $ date +"%c state change message"

or $ M="state change message" $ date +"%c $M"

Chris Elvidge, England

Vote

A

Adrian 7 years ago

In message , NY writes

Running date +%H%M from within the script always produces the correct time. I've run tail -f on the log file, and that is always correct.

Running date (with no arguments) from the script is giving the odd results, with the time part of the output being anything up to 9 minutes late (I've spotted an entry from today where it is right).

However whenever I run those commands from the command line (rather than from the script) they always return the right answer.

The script is generating the timestamp (date +%H%M), the sensor status info is taken by parsing the output of the python code that checks the sensor status, and from that output I can tell whether or not the sensor state (which is Boolean) has changed.

The thing is, that those two executions of date are at best a second or so apart, and by the time cron fires again a minute later, the clock is correct again.

No. The time (date +%H%M) remains correct, but the full timestamp (date no arguments) can be anything from 0 to 9 minutes out, and one state change can give a different discrepancy to the other. So the period of time between sensor state changes is likely to be +/- 9 minutes of the actual period, this is not consistent either.

Adrian

To Reply : replace "bulleid" with "adrian" - all mail to bulleid is rejected Sorry for the rigmarole, If I want spam, I'll go to the shops Every time someone says "I don't believe in trolls", another one dies.

Vote

A

Adrian 7 years ago

Yes.

I've now changed to that, but it will likely be Monday before the next state change. unless I force one.

As the log is being done by simple redirection, entries should be in chronological order. Only one script is redirecting to the log. Therefore I can't see how things can get out of order. From the full log info (which I snipped for brevity) it is obvious when the state change is taking place. So the time at the start of each section (e.g.

0630) is correct, but the full timestamp (including the date) is often wrong, anything up to 9 minutes out (e.g. if the state change is at 0630, the full timestamp will be anything between 0630 and 0639), but the next set of log entries will have the correct time (0631 in this case).

Adrian

To Reply : replace "bulleid" with "adrian" - all mail to bulleid is rejected Sorry for the rigmarole, If I want spam, I'll go to the shops Every time someone says "I don't believe in trolls", another one dies.

Vote

A

Adrian 7 years ago

In message , A. Dumas writes

It is possible I suppose, but one script picking up different instances of a date executable depending on how it is being used I would have thought is unlikely.

The script is being run by root as I need to access stuff that is only accessible to root.

Adrian

To Reply : replace "bulleid" with "adrian" - all mail to bulleid is rejected Sorry for the rigmarole, If I want spam, I'll go to the shops Every time someone says "I don't believe in trolls", another one dies.

Vote

M

Martin Gregorie 7 years ago

Two things:

1) As the anomalous timestamp seems to be associated with a sensor state change, are you quite certain that there is nothing in the sensor hardware or the Python code that might get stuck for several minutes when its state changes?

2) Do you you know that Python code can read the system clock and so can timestamp? See the description of asctime in the time module for details. By using this you can add a timestamp to your status message without needing to use the date executable.

Explanations and a possible reason for the out of order timestamps follow.

I suspect that (1) is possible: if so, it explains what is happening. Don't forget that crond starts each cronjob as a separate task and does not wait for one cronjob to finish before starting another. It has to work this way because it has no idea how long a job will take to run and could easily do the unexpected otherwise. As an example, suppose there are two cron jobs:

- one is a backup run once a day at 01:30 that runs for two hours.

- The other is run every 10 minutes and polls a server for waiting messages: it only runs for a second or two at a time.

If crond only ran one job at a time the second job would not run at all during the backup, which could cause problems, especially if the messages must be processed immediately and originate worldwide - something like weather conditions and position reports being monitored by a large shipping company.

This also means that, if every so often your cron job hangs on a sensor state change for 9 minutes, then the next 7 or 8 jobs that don't hang can overtake it and so their messages will appear in the log ahead of the one that hung. The way your script is currently written, date would read the time as soon as the script starts and that old time is what will be prepended to the state message after the hung Python code unsticks and finally produces its message, with an old timestamp in it, some minutes later.

The easy way to see whether this is in fact happening is to change your Python code so it uses the time module to read the timestamp twice, once as it starts to execute and once just before outputting the message with both timestamps in it. If the program hangs, the before and after times will differ by a few minutes. If it doesn't hang, the time will be within a second of each other (assuming you're only displaying time as hh:mm:ss rather than as hh:mm:ss.sss). Once that's working you can find out exactly where the hang is occurring in your code by moving the points where the timestamps are read until both are the same: the problem must be in the statement(s) you just excluded from the region between the time stamps.

In any case, a process that 'ticks' as rapidly as once a minute it would be better if rewritten round an internal loop with a 60 sleep delay in it rather than being a series of cron jobs launched 60 seconds apart. It would look like this:

run = true while (run == true) sleep 60 seconds read the sensor read the clock write yyyy-mm-dd hh:mm:ss if (a stop condition is seen) run = false

because:

(a) this will make hangups in the program quite obvious

(b) this uses many fewer CPU cycles than having cron run your script: to do that it has to load the date executable and your program, run them and clean up after them when they exit.

The 'stop condition' can be equally simple - something like checking whether an empty file is in your home directory and stopping if you delete it. I hope the above is useful and didn't trigger any TL:DR reflexes.

Martin | martin at Gregorie | gregorie dot org

Vote

E

Eli the Bearded 7 years ago

If the state change message ever contains a % that will have unexpected output, whereas the original will not.

To summarize the original problem, cron every minute runs a script something like:

#!something date +%H:%M # known command check_state # undisclosed command if state_change # undisclosed test echo `date` # known command fi who_knows_what # undisclosed further actions

Which is generating logs like:

06:27 06:28 Dat Output 06:38 STATE CHANGED MESSAGE 06:29

I'm wondering what the check_state command is, and if it could ever take N > 1 (eg 10) minutes. And if it could, how is this log assembled to get a backwards jump in a timestamp.

Or if check_state has some sort of side effect that changes the system clock, and then it gets changed back in who_knows_what. Or if check_state has a side effect that changes the PATH.

An "strace -f" of the whole cron script would be interesting. It would show if different versions of "date" are being used and would also show if something was changing the system clock (from the script).

But it wouldn't help if the check_state command is just very slow and the date descrepancy is hidden by the way the output is assembled.

Elijah

------ trying to think of other ways data could be subverted

Vote

M

Martin Gregorie 7 years ago

No backward jumps needed. Cron "fires and forgets" every job it runs.

The OP's problem script usually gives the right result because his cron job takes only a second or two to run, so log entries mostly appear in the expected sequence, but if the state changes he appears to run a one- line script that executes date in back-ticks and his Python code run normally, so if the python hangs for a minute or more the timestamped message will be written to the log AFTER the next one or more cron jobs have run and logged their results, so its output hits the log after that.

Now, how come the timestamp matches the time when the job started rather than when its output was written? That's because the OP is running

echo $(date)

(actually he's back-ticking 'date', but the effect is the same: anything inside backticks or $(...) runs before the rest of the line is executed. If it didn't, then constructs which I use all the time, like:

less $(grep -l 'TARGET' *.java)

would not work. Since this does work, its execution sequence can only be:

(1) Run grep to build a list of Java sources containing TARGET.

(2) Insert the list into the line as command line parameters to less

(3) Start less showing the contents of the first file in the list.

Martin | martin at Gregorie | gregorie dot org

Vote

R

R.Wieser 7 years ago

Adrian,

You need to re-think that: If I may take the "minute tick" entries as correct than those "change of state entries" come a few minutes *too early* ( "06:38:01" is 8 minutes *later* than the "0630" directly following it)

And as that is not possible (it would involve time-travel) it makes me think that maybe the "change of state entries" are correct, and the "minute tick" entries too late.

... Which might be caused by buffered (text)file writes.

Also, what have you done in regard to finding more info about what happens ? Have eou already tried to enclose the "change of state entries" lines by a pair of "date +%H%M" (maybe even "date +Start%H%M" and "date +End%H%M" lines ?

If all is well the difference between those two should be neglishable (the time it takes to output your "change of state entry" line, maybe including the time for the conversion too. Who knows, as you have not shown the involved script ...)

Also, the "change of state" entry time *should* be in between the two it is enclosed in. If it is (even when those three times are not in sync with the "minute tick" time!) its not the time writing thats the problem.

In other words: Don't stare yourself blind on what *should* happen. Try to introduce check points which will give you more info about what happens, and maybe even will help you to "zoom into" the actual problem area.

I could even imagine adding an incrementing number (taken from a single source/program!) to both the "minute tick" and "change of state entry" lines. If those do not appear in sequence ...

Regards, Rudy Wieser

Vote

R

Richard Kettlewell 7 years ago

My guess is the same as Martin Gregorie?s. But, if that?s not what is going on, post the script somewhere; debugging a program one cannot see is just absurd.

https://www.greenend.org.uk/rjk/

Vote

A

Adrian 7 years ago

In message , R.Wieser writes

Thanks for the detailed reply.

I am able to tail -f the log file, so I can see that the minute ticks are correct.

I know when the change of state takes place. I know that in the case of Thursday morning example (see original post), it was between 0629 and

0630 (I witnessed it, and my watch is only a few seconds out). So the log file ought to have a change of state recorded at 0630.

Unfortunately, what I'm not able to do is observe the log file (in real time) at the time the state changes, I have to rely on what is written to it as being accurate.

I suppose buffering could be possible, but I can't see how it can happen in this case. Each set of data is generated by a script triggered by cron, so if one instance took 8 minutes to complete, then I would expect that the following instances (taking seconds to complete) would add to the log file (stdout redirecting, rather than an explicit file write operation) as they are written, with the long running entry adding its entry at the appropriate point in the log file.

I've now altered the script to that it adds additional timestamps to every run to show when the script starts, when the sensor check completes, and when the script completes. It will be interesting to see what the logs show next time there is a state change.

That could be a next step.

Adrian

To Reply : replace "bulleid" with "adrian" - all mail to bulleid is rejected Sorry for the rigmarole, If I want spam, I'll go to the shops Every time someone says "I don't believe in trolls", another one dies.

Vote

A

Adrian 7 years ago

Thanks for the detailed follow up.

No I'm not sure, but it is possible.

The Python code (downloaded from the net) gets the status information from the sensor (written to a file), my script looks at that file, and therefore works out if there is a state change. If I live long enough, I'll extend the Python to do that as well.

I've modded the script so that it now reports in seconds (as well), and it will always report the time the script started, when the Python finishes (the script sets up a couple of variables before running the Python, so I can't see that there will be a delay there), and when it finishes.

I take your point about how cron runs each job as required, rather than putting them in a queue. However, if I'm doing file redirects rather than explicitly open the log file at the start of the script, and then closing it at the end (and thus locking it), I would have thought that entries should be appearing in chronological order as each one is a distinct write operation.

No, and again thanks for your time.

Adrian

To Reply : replace "bulleid" with "adrian" - all mail to bulleid is rejected Sorry for the rigmarole, If I want spam, I'll go to the shops Every time someone says "I don't believe in trolls", another one dies.

Vote

A

Andy Burns 7 years ago

Bugs that only trigger once every few days are correspondingly more difficult to find. Make a test copy of your script with its own cron entry, cut out any reference to querying the sensor, maybe insert a short delay, make it always output the message to a different log file, then see what happens...

Vote

M

Martin Gregorie 7 years ago

Take a look at my reply to Eli the Bearded - I think that's a better explanation of how the use of a back-ticked invocation or using the $(date) construct inside an otherwise simple script can decouple the timestamp value from when other stuff in the script actually happened.

Take a look at this script (between the 'datetest start' and 'datetest end' lines). It shows what I think your cron script does when your program sees a state change.

================= datetest start ================= #!/bin/bash echo -n "starting: "; date dt=$(date)

if [ -n "$1" ] then sleep $1 fi

echo "dt is: $dt" echo -n "ending: "; date ================= datetest end =================

Save it as as a file called 'datetest, make it executable and then try running it:

$ datetest

should instantaneously display three lines, all with the same timestamp, but running it as

$ datetest 10

will force it to take 10 seconds to run (the parameter is the number of seconds to sleep before continuing). The first two lines should now show the same timestamp but the third line should show a timestamp that's 10 seconds later than the other two.

What this demonstrates is that by capturing the 'dt' timestamp and then waiting 'n' seconds before displaying the timestamp, I've forced the 'dt' message to lie about what the time was when the display was output.

By using the 'backtick' notation round 'date' (just as I used $(date) ) structure I recorded the timestamp when the script had just started. Now, if anything hangs up (for minutes or even a few seconds) between then and when the message containing the timestamp is displayed, the script is reporting the time as earlier than what was when the message was output (or in your case, appended to the log).

Since it appears that something in your Python program can hang up for up to for 9 minutes the effect on the log is like this

50th cron run at 09: 00 (no hangup ) -----> writes 09:00 to the log 51st cron run at 09: 01 (hangs 2 mins) --+ 52nd cron run at 09: 02 (no hangup ) --|--> writes 09:02 to the log 53rd cron run at 09: 03 (no hangup ) --|--> writes 09:03 to the log +--> writes 09:01 to the log 54th cron run at 09: 04 (no hangup ) -----> writes 09:04 to the log

...which is what you're seeing.

Right, now over to you to either prove me wrong or find and fix the problem!

One suggestion: since you say another process is reading the sensor and writing its status to a file that your Python program reads, its possible that there would be an occasional file locking conflicts between the two processes.

This would occur whenever your Python program tries to read data from the the end of the file at the same moment as the sensor monitoring program is adding another reading to the end of the file and so will have the file open for writing. A read at this point should fail or block because the OS must prevent 'at end' reads while a sequential file is open for writing. If it doesn't do that there would be file integrity problems because the reader can't know where the file end is until the writer has finished writing data, flushed its write cache and either released the lock or closed the file.

Martin | martin at Gregorie | gregorie dot org

Vote

A

Adrian 7 years ago

In message , Martin Gregorie writes

I agree that what you are describing will give the results that you've show. However, it isn't what I'm doing (sorry if that wasn't clear).

What I think I'm seeing is :

50th cron run at 09: 00 (no change ) -----> writes 09:00 to the log 51st cron run at 09: 01 (state change) ----> writes 09:07 to the log 52nd cron run at 09: 02 (no change ) ----> writes 09:02 to the log 53rd cron run at 09: 03 (no change ) ----> writes 09:03 to the log 54th cron run at 09: 04 (no change ) -----> writes 09:04 to the log

When there has been a hang/delay/whatever, the script timestamps the state change at a future time, and then continues to log stuff in real time, which is what is confusing me. So far as I can see, nothing can edit the log to add retrospective entries, it just appends to the end (stdout >> log).

If something is hanging (for want of a better idea of what is happening) for a few minutes, so be it (that is a different matter), but how does it mess with the time line ?

To give a fuller example

0619

---------- ble thread started d2:5f:2d:3c:6f:a6 666f496c5f626561636f6e314df5243c

666f496c5f626561636f6e314df5243c active ffoil_beacon1

0620

---------- ble thread started Sat 8 Dec 06:27:15 GMT 2018 ffoil_beacon1 has become inactive

666f496c5f626561636f6e314df5243c inactive ffoil_beacon1

0621

---------- ble thread started

666f496c5f626561636f6e314df5243c inactive ffoil_beacon1

There is now an id number applied to each run, and extra timing info, so it should be possible to see which run is reporting what. What I'm doing is :

#! /bin/bash

day=$(date +%y%m%d) time=$(date +%H:%M:%S) counter=$(cat /home/pi/iBeacon-Scanner/counter)

/home/pi/iBeacon-Scanner/incrementer.sh

logfile=/home/pi/iBeacon-Scanner/output_log_$day.txt

echo "$(date +%H:%M:%S) - $counter" >> $logfile

python /home/pi/iBeacon-Scanner/batch_blescan.py > /tmp/scan.txt

echo -- $(date +%H:%M:%S) $counter Python completed >> $logfile awk -F, ' {print $1 " " $2 }' /tmp/scan.txt | sort -u >> $logfile

beacon_list=`cat /home/pi/iBeacon-Scanner/beaconlist.txt`

for beacon in $beacon_list do grep $beacon /tmp/scan.txt > /dev/null

if [ $? -eq 0 ] then state=`grep $beacon /home/pi/iBeacon-Scanner/state.txt | awk '{ print $2}'` if [ "$state" != "active" ] then sed "s/$beacon inactive/$beacon active/" /home/pi/iBeacon-Scanner/state.txt > /home/pi/iBeacon-Scanner/tmp_state.txt mv /home/pi/iBeacon-Scanner/tmp_state.txt /home/pi/iBeacon-Scanner/state.txt beacon_name=`grep $beacon /home/pi/iBeacon-Scanner/state.txt | awk '{ print $3 }'`

echo $(date) $beacon_name has become active $counter >> $logfile fi else state=`grep $beacon /home/pi/iBeacon-Scanner/state.txt | awk '{ print $2}'` if [ "$state" != "inactive" ] then sed "s/$beacon active/$beacon inactive/" /home/pi/iBeacon-Scanner/state.txt > /home/pi/iBeacon-Scanner/tmp_state.txt mv /home/pi/iBeacon-Scanner/tmp_state.txt /home/pi/iBeacon-Scanner/state.txt beacon_name=`grep $beacon /home/pi/iBeacon-Scanner/state.txt | awk '{ print $3 }'` echo $(date) $beacon_name has become inactive $counter >> $logfile fi fi done cat /home/pi/iBeacon-Scanner/state.txt >> $logfile echo -- $(date +%H:%M:%S) $counter Completed >> $logfile echo "" >> $logfile

Originally, the cron job redirected the stdout to a separate logfile (new one each week), which only recorded the state changes, In an attempt to get to the bottom of this, I've modded the script to put everything into one log file (new one each day).

As you can hopefully see, every time I print out a date/time, it is freshly minted. Each run should get its own id number, which is included in the output, which should (hopefully) help trace what is happening next time the sensor state changes.

The Python called by this script gets the status and dumps that to a file, the script then looks at that file. So there shouldn't be any conflicts there.

The sensor in question (currently only one) is a Bluetooth low energy beacon. The Python (using the Pi's onboard Bluetooth) is looking to see what (if any) beacons are active. It doesn't look for specific beacons, so I would expect it to behave the same whether there are zero or a dozen visible. It prints (to stdout) the details of any beacon it finds (which seems to be a combination of something looking like a MAC address, a HEX string and some numbers which I think refer to the signal strength), and it can see the same beacon several times during the scan, hence the sort -u above. As you can see above, I've got a redirect from stdout to a file. Obviously, the more beacons present, the longer it will take to complete, and so far as I'm aware nothing else is using the onboard Bluetooth, so there shouldn't be contention.

Let us see what Monday brings to the log !

Adrian

To Reply : replace "bulleid" with "adrian" - all mail to bulleid is rejected Sorry for the rigmarole, If I want spam, I'll go to the shops Every time someone says "I don't believe in trolls", another one dies.

Vote

D

Dennis Lee Bieber 7 years ago

On Sun, 9 Dec 2018 23:43:58 +0000, Adrian declaimed the following:

I would strongly recommend you study the documentation for the Python logging module, and incorporate it into the Python script, along with the parsing of the state information... The only thing the cron job should then be responsible for is kicking off the Python script -- nothing else.

Right now you have half a dozen places where data is appended to your log, with a separate process invoked midway to scan, and another series of operations doing a scan of scan results. Possible chase conditions -- IF the Python script runs long, it is possible that, between the end of the Python script and the start of your parsing loop, a second run could start and overwrite the Python output file. At the least, replace /tmp/scan.txt with a file name unique to the run (maybe obtain the process ID and add that to the file name).

Having the Python script collect the beacons into an internal structure would allow one to avoid duplicates in the data, so no need for the sort operation after-the-fact. How does the Python script determine it has completed a scan for beacons? You state that it is possible for one beacon to be seen multiple times, so is the search based on time?

You might also have a chase conditions on your incrementer script along with other temporary (or long term) files. All scratch files should have names unique to the execution/run to avoid conflicts should the job be started before a previous run has completed.

Heck: I don't recall if this has been tried but... What happens if you change the cron job to only run every 10 minutes? Do you ever see this "mistime" happen then? Five minutes? Three?

Maybe create a lock-file to indicate if a prior version of the script is still running?

Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/

Vote

Odd date behaviour

Join the Discussion

Didn't find your answer?