measure CPU cycles. But the value is very variant?

Question

Hello, I'm doing a time measurement for an application running in user mode. (System Environment is Timesys Linux/RK on Pentium IV. and I run the application on runlevel 1 which is single user mode) ( Sorry my question is not exactly for embedded linux.

Because I need to measure execution time for even very small function, I used assembly instruction "rdtsc" which returns the number of CPU cycles. As these measured functions are executed a hundred of thousands of times, I record a minimum value and maximum value for cycles elapsed during the functions. But the variance between min and max is too big. The following is a sample data for 7 different functions.

(fn1) (fn2) (fn3) (fn4) (fn5) (fn6) (fn7) MIN 400 92 124 112 84

88 548 MAX 1336 296 412960 15368 1256 392 9960

I'm not sure how to explain the variance. Maybe firstly, ISR routines happen to run while executing these functions. Secondly, VM may affect because of page fault or page swap. For the first case, I considered disabling ISR (interrupt) before the starting point and enabling it again at the ending point. Because my app run in user mode, I have to implement a system call for this. But I don't think it's possible to implement kernel function(system call) for this purpose as system call wrapper uses s/w interrupt. (Am I right?)

Is there anybody who has any suggestion to solve my problem? My question is 1) why the measured data's min and max have so big gap, 2) how to minimize the affect of kernel or other processes' activity? 3) Is there a way to disable ISR while executing the function which is measured? 4) To avod page swap, Is there a way to dock the program in memory?

Thank you so much..

Michael Schnell · Accepted Answer

I suppose that you run the process with realtime priority (e.g. SHED FIFO). Otherwise your program is interrupted by the time slicing algorithm at random points for a random amount of time.

Linux is not a deterministic hard realtime system. So this is normal. The New 2.6 kernels helps with _soft_ realtime (reduces the count of very high delays). but even there there is no guarantee for any maximum.

Kernel 2.6 and/or low latency patches.

You can't do that in userland. In a device driver you do have additional possibilities.

AFAIK: Yes.

To do deterministic realtime work you need to add (e.g.) RTAI to your installation. Here you can handle things like you describe.

-Michael

Mark H Johnson · Answer

Why not? We run at a large embedded real time simulation at RL 3.

The 0.4 sec hit is pretty large, but the other values look pretty typical. It looks to me like other processes are running between your two time points. Do ps -fe or something similar to see what other processes are running and if you need to run them. Though on a system I checked while typing this up - (a

2.6.1 kernel) - I get primarily system processes and a handful of user applications (e.g., sh, minilogd, init).

Not quite sure why you want to do this. If you want the time to do these functions, then take the minimum values and be done with it. The maximums give you information on overhead / latency that is a problem with the overall application - not specific to these functions being timed.

You don't mention running this as a real time application - pthread_setschedparam is a portable way to do this. The Linux system call is - sched_setparam You need privileges to do this but in single user mode, you should have them. Otherwise - the kernel's scheduling policy can/will stop your job and run others.

[see above]

Not sure why you want to do this. If it is really a critical operation - say timed to some safety of operation hardware - you should probably create a kernel module / driver to do the operation. That way - you have direct use of functions to disable interrupts. If not, I would try using real time mode first and if that does not help, then consider other options.

See - mlockall to disable paging for the calling process.

--Mark

pinetaj · Answer

Hello, I'm doing a time measurement for an application running in user mode. (System Environment is Timesys Linux/RK on Pentium IV. and I run theapplication on runlevel 1 which is single user mode) ( Sorry my question is not exactly for embedded linux. Because I need to measure execution time for even very smallfunction, I used assembly instruction "rdtsc" which returns the number of CPU cycles.As these measured functions are executed a hundred of thousands oftimes, I record a minimum value and maximum value for cycles elapsedduring the functions. But the variance between min and max is too big.The following is a sample data for 7 different functions.           (fn1)      (fn2)         (fn3)          (fn4)       (fn5)     (fn6)       (fn7)MIN      400         92          124          112           84        88        548MAX   1336        296     412960      15368       1256        392      9960 I'm not sure how to explain the variance. Maybe firstly, ISR routineshappen to run while...

pinetaj · Answer

My system uses Timesys Linux/Real-time. This version extends Linux2.4 by supporting high resolution timer, kernel preemption, priority-based interrput handling..etc. But do you think using Linux2.6 may show different result for my application?

And when switched to run level 1, it shows about 30 processes. Not sure what to disable among them. So i didn't touch those processes any more.

Yes, I do. Timesys Linux/Real-time supports 511 priority level. So I assigned 255 with policy SCHED_FIFO.

Actually the fuctions measured are parts of whole applications. So I can't extract those functions to run in a kernel module. :(

That way - you have

real-time mode? you mean creating the tread with SCHED_FIFO? Otherwise, any other way to run my application in real-time mode?

Thank you so much.

Chicco L · Answer

have you taken in count effects of cache memories?

number of cpu cicles required for executing the same function may be affected by a large variance (such the one you measured) for effect of the cache memory. Lets think to fn3 for example, it may be that it is called inside a loop; the first time all the function code must be read from the RAM, while in the 2nd and in the other iterations of the loop all its code is in cache memory. The first execution requires many more time because the RAM access is very more long, while the other eecution are smaller.

Regards, Alessandro

Michael Schnell · Answer

Timesys Linux is a commercial product and should be supported by the vendor.

What do they say about your problem (which makes Timesys Linux look as a not very useful upgrade to the standard Kernel)?

-Michael

pinetaj · Answer

Hello,Thank you so much for your reply.I also suspect cache memory. There might be several ways to disable cachememory by setting up BIOS. But I hesitate to do it as I don't have muchknowledge about X86. Is there ways to have Linux not use cache? I mean,there are any Linux commands that let kernel not use cache?Thank you so much.

Chicco L · Answer

sorry i don't know iti ususaly disable it by bios.your question is interesting also for me cause in some CPU (likeathlonxp) i can't do it by bios.

measure CPU cycles. But the value is very variant?

Join the Discussion

Didn't find your answer?