Linux Intel Pstate driver. Pstate Driver Math and missed timer calls issues. WWW.Smythies.com

This web page concentrates on the math and timer issue. See also.

Note: Math issues are now understood (2014.05.29) and now only code that doesn't suffer from the math issues is used. So, this is now really about missed timer calls.

The issue is that there is often an excessively long delay between timer calls. During some tests the delays can be as long as 4 seconds, leading to some math issues. Other times for test runs the delays are up to 1 second, which cause grief for the driver. It is not yet understood why there seems to be two types delay issues. There might be some race condtion or similar, but evidence is sketchy as yet.

Important note: This issue should not be confused with the expected long timer call delays when the CPU is not in the C0 state for long periods, which is not the case with these tests.

There seems to be some more and some less optimal conditions for the manifestation of this issue. To assist in acquiring more meaningful "perf record" data, the more optimal conditions were used.

Correction: There were "more optimal" conditions for pointing us to the issue, due to the math overflow into negative numbers issue. Actually the issue becomes more and more probable as the load decreases. At this point, and without any proof, it is suspected that the issue is due to aliasing between some higher level function and when the CPU is in the C0 state and when it is not. Anyway, a work around method, with more robust math, is now being tested. See also.

Example excerpt from a "perf script" output:

consume 13631 [007] 109622.850520: power:pstate_sample: core_busy=87 scaled=95 before=31 state=31 mperf=46616 aperf=40876 freq=2981375
consume 13632 [007] 109623.046399: power:pstate_sample: core_busy=87 scaled=9 before=31 state=16 mperf=477181 aperf=418793 freq=2984031
swapper     0 [007] 109623.366190: power:pstate_sample: core_busy=47 scaled=3 before=16 state=16 mperf=779899 aperf=367011 freq=1599992
consume 13632 [007] 109623.546075: power:pstate_sample: core_busy=47 scaled=5 before=16 state=16 mperf=439970 aperf=207045 freq=1599992
swapper     0 [007] 109627.363605: power:pstate_sample: core_busy=4294967237 scaled=0 before=16 state=16 mperf=9310312 aperf=4381469 freq=4292972319
swapper     0 [007] 109630.661473: power:pstate_sample: core_busy=47 scaled=0 before=16 state=16 mperf=8042166 aperf=3784572 freq=1599992
swapper     0 [007] 109630.673472: power:pstate_sample: core_busy=47 scaled=89 before=16 state=18 mperf=37012 aperf=17464 freq=1604242
swapper     0 [007] 109631.361022: power:pstate_sample: core_busy=52 scaled=1 before=18 state=16 mperf=1652970 aperf=872162 freq=1793898
consume 13633 [007] 109632.852058: power:pstate_sample: core_busy=47 scaled=0 before=16 state=16 mperf=3652199 aperf=1718782 freq=1600125
swapper     0 [007] 109632.864080: power:pstate_sample: core_busy=47 scaled=88 before=16 state=18 mperf=33188 aperf=15642 freq=1602515

Notice there are only two entries for process 13632 in the above excerpt, yet process 13632 executed, albeit with (very very short) sleeps, for 10 seconds. So far this observation has been consistent with this issue. Notice also the almost 4 second and then the 3.3 second delta times between entries during the occurrence of the issue.
See also. (although, I think the math issues in that case were my problem.)

Histogram 1 - Overview:
time between driver runs - Overview

Histrogram 1 - Detail A:
time between driver runs - Detail A

The data for histogram 1.

The way (default, as opposed to low latency kernels) jiffy based timers work a "sample_rate_ms" of 10 is really too low, and will rarely be achieved. Really it should be at least 20, but even longer (say 30) is suggested. However, this is a tangent discussion.

For the below histograms a "sample_rate_ms" of 40 was used. The thinking behind this test was to back off in case of some overun issue or whatever. (There was no degradation of the Phoronix ffmpeg test when using 40.) while there were about half the number of samples, as expected, there was no change with respect to these issues.

Histogram 2 - Overview:
time between driver runs - Overview

Histrogram 2 - Detail A:
time between driver runs - Detail A

The data for histogram 2.

O.K. so what is going on? One thought was that perhaps the timer call was overunning itself. A flag was added to check for overrun, and "[PATCH 8/8] add sample time validation" (not a public patch from Dirk) was removed (well the removal was done as a hack job). The 4 second delay could not be re-created, but 1 second delays were still common (only 1 second maximum delays sometimes happen with the above previous kernel also).

Histogram 3 - Overview:
time between driver runs - Overview

The data for histogram 3.

For the below histograms a "sample_rate_ms" of 10 was used (due to operator error, it was supposed to be 30). And the test was run a little longer (no reason), 40 loops instead of the 30 in the script listing below.
The driver had a small math change to prevent overflow into negatives numbers for when mperf is shited by FRAC_BITS, something that would not occur if the times were not out of whack. Long delays are O.K. and expected, but not when mperf and aperf are actually running.
And "[PATCH 8/8] add sample time validation" was removed (well the removal was done as a hack job).

Note: histograms 4 and 5 are not included in these web pages

Histogram 6 - Overview:
time between driver runs - Overview

Histrogram 6 - Detail A:
time between driver runs - Detail A

The data for histogram 6.

Same data plotted as sample number and delta time:
Sample 06 matk plot

For the below histograms Dirk asked me to compile a kernel where 1 jiffy = 0.001 seconds (1000 Hz) instead of the 0.004 seconds (250 Hz) I had been using.
The driver had a small math change to prevent overflow into negatives numbers for when mperf is shited by FRAC_BITS, something that would not occur if the times were not out of whack. Again, Long delays are O.K. and expected, but not when mperf and aperf are actually running, as was the case with this test.
And "[PATCH 8/8] add sample time validation" was removed (well the removal was done as a hack job).

Note: histograms 7, 8, and 9 are not included in these web pages

Histogram 10 - Overview:
time between driver runs - Overview

Histrogram 6 - Detail A:
time between driver runs - Detail A

The data for histogram 10.

Same data plotted as sample number and delta time (missing - might never get done):
Sample 10 matk plot - missing - coming

Anticipated Questions:

Question: Could the missed timer calls issue be due to the scheduler, or some other upstream thing, somehow not maintaining the CPU affinity of the process?
Answer: Firstly, the time from the previous process that worked properly and the next process that works properly is always right; Secondly the number of aperf and mpref cycles between the two calls is always correct, indicating that the CPU was indeed doing something at the expected rate in the meantime;

Question: Aren't the two proposed method (the duration_us method and the very low C0 method) the same, in the end? No. The proposed C0 method is immune to missed timer calls. The duration_us method is prone to falsely driving the pstate low as it, incorrectly, interpretes a missed timer call as being due to a prfoundly idle CPU.
Note: As it stand now (2014.05.29) Dirk and I have agreed to disagree on this point.

References:

Dirk's (3.15RC5) intel_pstate.c 3. (The 8 patches one, with the addition * 10 delay threshold instead of * 3).
The diff output between "Dirk3" and "Dirk4" intel pstate drivers.
The diff output between "Dirk3" and "Dirk5" intel pstate drivers.
Script used.
Script called by above script.
Consume Program.
Output from "perf script" parsing program.

Command used to take output from parsing program and create histgoram data:

cat special.05 | sed 's/ /\n/g' | grep "\." | sort | uniq -c | sort -g > special.05.his

Linux Intel Pstate driver. Pstate Driver Math and missed timer calls issues. WWW.Smythies.com emaildoesnotwork@smythies.com 2014.05.26 Updated 2014.06:19