timestamp in ms and 64-bit counter

pozz · 2020-02-06T12:43:30+00:00

I need a timestamp in millisecond in linux epoch. It is a number that doesn't fit in a 32-bits number. I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw counter. I need to create a mixed sw/hw 64-bits counter. It's very simple, I configure a 32-bits hw timer to run at 1kHz and increment an uint32_t variable in timer overflow ISR. Now I need to implement a GetTick() function that returns a uint64_t. I know it could be difficult, because of race conditions. One solutions is to disable interrupts, but I remember another solution. extern volatile uint32_t ticks_high; uint64_t GetTick(void) { uint32_t h1 = ticks_high; uint32_t l1 = hwcnt_get(); uint32_t h2 = ticks_high; if (h1 == h2) return ((uint64_t)h1

G

George Neuner 6 years ago

Yes, but the hardware tick was at 18Hz (~55ms) up until XP and the introduction of "multimedia" timers.

At first those "multimedia" timers were implemented by a realtime priority thread using the CPU's cycle counter. In a quiet system you could get down to ~50us.

However, 10+MHz HPET hardware timers were introduced in 2005 and quickly became standard on retail systems. Support for HPET based multimedia timers came in XPsp3 (2008).

Since Vista, if HPET is available, one channel of the timer is used to support the system clock at 1KHz.

George

Vote

D

David Brown 6 years ago

Yes. And that meant bugs in the graphics drivers could kill the whole system, unlike in NT 3.x. And bugs in the graphics drivers were certainly not unknown. However, with a little care it could run reliably for long times. I don't remember ever having a software or OS related crash or halt on our little NT 4 server.

(My NT 4 workstation eventually decided to wipe my start menu and replace it with a single entry "eject computer", complete with icon. And it kept asking me to insert a disk in drive C: and close the door. But that was after many years of use and abuse.)

I certainly liked W2K, and found it quite reliable. But I still remember NT 4.0 as good too.

Vote

R

Richard Damon 6 years ago

I did forget the delay in the interrupt controller. With that delay, you do have a fundamental issue between reading hardware registers and the software counter.

A couple of solutions, some that have been mentioned:

Have a 1ms interrupt and in software keep the 64 bit counter.

I believe you can also program the counters to generate multiple interrupts in the count cycle, if you generate on at 0 and one at a half way point, knowing which interrupt was last seen you can tell if one is 'pending' based on the lower counter read.

Another option on that processor is to chain a couple of timers together, so when the lower counter rolls over the upper counter counts automatically, and I believe it handles it so there isn't a skew between the counters. Then the read upper going direct to the hardware won't have the issue.

Vote

P

pozz 6 years ago

Il 06/02/2020 19:02, Rick C ha scritto: > [...]

Of course this limitation isn't usually a real problem.

However there could be some situation where GetTick() is called after 49 days. For example, you can have an IoT device that starts sending data (with timestamps) after the user make a request. And timestamps/GetTick() is used only in the routine that sends data.

Maybe the user, after purchasing, is excited of this gadget and make the request multiple times every day. After some weeks, he could forget to have this gadget and maybe remember of it only after many days...

Vote

P

pozz 6 years ago

I think FreeRTOS is already configured to have a fast interrupt, something similar to 1ms. I suspect it is used to check if some tasks, blocked waiting the expiration of a timer, must be activated.

My first idea is to implement the 64-bits ms-resolution timestamp counter as a completely different than OS ticks, but I think I could add some code to OS ticks interrupt.

Yes, I have a preemptive multi-tasking system (FreeRTOS).

Vote

P

pozz 6 years ago

I think Rick suggested his solution, where high 32-bits are increased in application layer, because it is simpler. As you read, increasing the high 32-bits in ISR, force us to implement a trickier GetTick() with multiple reads of high counter.

Unfortunately his solution doesn't work as is with preemptive scheduler when multiple tasks call GetTick().

At every bootup and at a regular interval I use NTP to synchronize the internal calendar time.

Vote

P

pozz 6 years ago

Il 08/02/2020 18:03, Kent Dickey ha scritto:

Good point, Kent. Thank you for your post that helps to fix some critical bugs.

You're right, ISRs aren't executed immediately after the relative event occurred. We should think ISR code runs after many cycles the interrupt event.

I think you mentioned to disable interrupts to avoid any preemption from RTOS scheduler, effectively blocking scheduler at all. However I know it's a bad idea to enable/disable interrupts "manually" with an RTOS. Maybe the mutex for GetTick() is a better idea, something similar to this:

uint64_t GetTick(void) { mutex_take();

static uint32_t ticks_high; uint32_t ticks_hw = hwcnt_get(); static uint32_t ticks_last;

if (ticks_last > ticks_hw) ticks_high++; ticks_last = ticks_hw; mutex_give();

return ((uint64_t)ticks_high 2) Use a higher interrupt rate. For instance, if we can take the interrupt

What about?

static volatile uint64_t ticks64; void timer_isr(void) { ticks64++; } uint64_t GetTick(void) { uint64_t t1 = ticks64; uint64_t t2; while((t2 = ticks64) - t1 > 100) { t1 = t2; } return t2; }

If dangerous things happen (ISR executes during GetTick), t2-t1 is a very big number. 100ms represent the worst case max duration of ISRs/tasks that could preempt/interrupt GetTick. We could increase 100 even more.

Do you mean...?

volatile uint32_t ticks_low; volatile uint32_t ticks_high;

void interrupt_at_every_second(void) { uint32_t tl = get_low_ticks(); // from free-running 1ms counter if (ticks_low > tl) { ticks_high++; } ticks_low = tl; }

uint64_t GetTick(void) { uint32_t h2;

uint32_t local_ticks_high = ticks_high; uint32_t local_ticks_low = ticks_low; while((h2 = ticks_high) != local_ticks_high) { local_ticks_high = h2; local_ticks_low = ticks_low; }

uint64_t ticks64 = ((uint64_t)local_ticks_high Basically, we return the ticks64 from the last regular interrupt, which could

Vote

P

pozz 6 years ago

Il 09/02/2020 21:55, Richard Damon ha scritto: > [...]

I *believe* too, but are we completely sure?

Vote

R

Richard Damon 6 years ago

It has been a bit since I have been through that processors documentation, but I remember a choice of synchronization modes, one which made the update simultaneous, at the cost of a bit more delay from the trigger pulse (you would typically run the counter on the system clock with a once per millisecond trigger pulse). If I am right about what the documentation says, it is a clear guarantee, the person designing the system should look that up to make sure they do it right.

Vote

R

Richard Damon 6 years ago

The use of SHORT critical sections based on disabling interrupts is almost never an issue, and most RTOSes that I know have that ability. They are often given a name based on entering/exiting a critical section as opposed to enable/disable the interrupts, in part to remind you that they need to be well paired and the region short.

Vote

R

Rick C 6 years ago

? In the systems I design that would not be a problem.

Yeah, and the reliability requirements of such consumer goods are typically not so stiff, so if they are not being used for months on end a malfunctio n is not unexpected. I recall having routers that needed a power cycle eve ry week or two. I have a wifi extender that seems to work pretty well, but has needed to be power cycled a few times a year. Once in 3 years is one thing. Once every 49 days is another.

Still, such a requirement in this case doesn't really solve any problems it seems. So no need to worry with it.

Rick C. --- Get 1,000 miles of free Supercharging --- Tesla referral code - https://ts.la/richard11209

Vote

U

upsidedown 6 years ago

Nice !

With a 64 bit counter with 1 ms resolution you can easily record any event since the days of dinosaurs.

Using VAX/VMS or Windows NT 100 ns resolution, a 64 bit counter can be used to represent a 60 000 year long period. If one sets the zero time at JD=1 Julian date 1 (in 4714 BCE) so any historical events can be represented with 100 ns resolutions.

Many processors have clock cycle counters. On x86 architectures, there is a 64 bit Time Stamp Counter register, which is updated every clock cycle. Even on a 4 GHz CPU, the counter rolls over after 136 years. Thus the counter is barely sufficient to handle the counts during the processor lifetime.

Vote

R

Robert Wessel 6 years ago

As I mentioned elsewhere in the thread, if you have an atomic 32-bit read, and a 32-bit CAS, you can do this without locks pretty simply.

Vote

D

Dimiter_Popoff 6 years ago

And I replied without having understood what you meant :-). Sorry about that.

Dimiter

Vote

R

Rick C 6 years ago

I did a search but it didn't turn up. What's a CAS???

Rick C. --+ Get 1,000 miles of free Supercharging --+ Tesla referral code - https://ts.la/richard11209

Vote

D

David Brown 6 years ago

Compare-and-swap. It is a common instruction for use in multi-threading systems as a building block for atomic accesses and lock-free algorithms (and for implementing locks):

It corresponds roughly to the C code, executed atomically :

bool cas(uint32_t * p, uint32_t old, uint32_t new) { if (*p == old) { *p = new; return true; } else { return false; } }

It is useful, but has its limits (the wikipedia page describes some, if you are interested). In cases like this, it could be useful.

However, the OP is using an ARM - and like most (but not all) RISC cpus, ARM does not have a CAS instruction. Instead, it has a load-link and store-conditional pair, which is more powerful and flexible than CAS but a little harder to use.

Vote

R

Rick C 6 years ago

rote:

Someone was on my case about a self designed CPU not having some instructio n that is essential for multitasking. Would this be the instruction? I'm not sure I understand. When you say *p == old, where is old kept? Is there really a stored value of old or is this a way of saying *p /= new?? ? In that case the code could be...

bool cas(uint32_t * p, uint32_t old, uint32_t new) { if (*p == old) { *p = new; return true; } else { *p = new; return false; } }

I write this because in my basic architecture memory is read/written on opp osite phases of the CPU clock and all instructions are one clock cycle. Th e write is predetermined in the first phase of the clock, so the CPU can't have a RMW cycle. It can have a W/R cycle where the read data is the old d ata before the write. As long at the write is always done it can do the ab ove in a single, non interruptible cycle... not that I'm contemplating perf orming multitasking. The code is more complex than warranted for a 600 LUT CPU. Just add another CPU. lol

Giving what you wrote more thought it seems pretty clear it has to be imple mented the way you have it written.

I should it look up and learn something, lol.

Rick C. -+- Get 1,000 miles of free Supercharging -+- Tesla referral code - https://ts.la/richard11209

Vote

R

Robert Wessel 6 years ago

No, the idea is to not update the word in memory unless it hasn't been changed. The classic example is using CAS to add an item to a linked list. You read the head pointer (that has to happen atomically, but on most CPUs that just requires that it be aligned), construct the new first element (most crucially the next pointer), and then if the head pointer is unchanged, you can replace it with a pointer to the new first item.

If the values are not equal, you don't want to update the head pointer or you'll trash the linked list. In that case you retry the insertion operation using the new head pointer.

CAS is intended to be safe to use to make that update, as it's atomic

- the read of the value in memory, the compare to the old value, and the conditional update form an atomic block, and can't be interrupted or messed with by other CPUs in the system.

CAS is pretty easy to simulate with LL/SC. In some cases you'd be better off adjusting the algorithm to better use LL/SC. In this case it depends on how you're accessing the low word of the timer. If you have only a single threaded of execution, you can fake CAS by disabling interrupts.

What ISA is this for?

Vote

R

Rick C 6 years ago

On Wednesday, February 12, 2020 at 4:44:13 PM UTC-5, snipped-for-privacy@yahoo.com wr ote:

t

.

ng

ms

f

s,

ut

tion that is essential for multitasking. Would this be the instruction? I 'm not sure I understand. When you say *p == old, where is old kept? Is there really a stored value of old or is this a way of saying *p /= ne w??? In that case the code could be...

opposite phases of the CPU clock and all instructions are one clock cycle. The write is predetermined in the first phase of the clock, so the CPU can 't have a RMW cycle. It can have a W/R cycle where the read data is the ol d data before the write. As long at the write is always done it can do the above in a single, non interruptible cycle... not that I'm contemplating p erforming multitasking. The code is more complex than warranted for a 600 LUT CPU. Just add another CPU. lol

plemented the way you have it written.

Ok, this is more clear now. Wikipedia explains LL/SC pretty well. This is actually for multiple CPUs as much as multitasking. While you can just di sable interrupts (assuming you can live with the interrupt latency issues) to make this work with a single CPU, if you are sharing the data structure with other CPUs the bus requires locking while these multiple transactions are happening. I assume the CPU has a signal to indicate a locked operatio n is happening to prevent other accesses from getting in and mucking up the works.

Is there a way to emulate this locking using semaphores? Someone I know is a big fan of Propeller CPUs which share memory and I don't know if they ha ve such an instruction. They share memory by interleaving access.

Custom stack processor, related to the Forth VM. When designing FPGAs I wa nt a CPU will deterministic timing, so 1 instruction = 1 clock cycle work s well. Interrupt latency is zero or one depending on how you count it. N ext cycle after an unmasked interrupt is asserted fetches the first instruc tion of the IRQ routine.

The CPU is not pipelined but the registers are aligned through the architec ture to make it decode-execute/fetch rather than fetch-decode-execute. The fetch only depends on flags and instruction decode so it happens in parall el with the execute as far as timing is concerned. Someone insisted this w as pipelined design because of these parallel parts.

It's nothing special, YAMC (Yet Another MISC CPU). I've never spent the ti me to optimize the design for speed. Instead I did some work to trying to hybridize the stack design with register-like access to the stack to minimi ze stack juggling. Once that happened, the number of instructions for the test case I was using (an IRQ for DDS calculations) dropped by either a thi rd or half, I forget which. The big stumbling block for me is coming up wi th software to help write code for it. lol

Rick C. -++ Get 1,000 miles of free Supercharging -++ Tesla referral code - https://ts.la/richard11209

Vote

R

Robert Wessel 6 years ago

In the algorithm I suggested, you could just put a mutex around the sequence that emulates the CAS. That's safe, since the extension word is never updated from inside an interrupt handler (unless you actually intend for that to be possible, such as you were reading the extended time value from inside and ISR). Even if that's slow, it's on a leg of the code that will happen only rarely.

You still need the atomic read of the extension word (although that's typically a non-issue, especially on a single hardware thread system).

Vote

timestamp in ms and 64-bit counter

Join the Discussion

Didn't find your answer?