In order to work properly, or at all, the intel_pstate driver uses a default setpoint and proportional gain that attempts to drive up the CPU target pstate as the load increases. The response curve is quite aggressive, which both is desirable and provides an advantage over other frequency governor methods.  However, and due to its basic nature and integer math limitations, there is also a tendency for the driver to not push the target pstate to the lowest possible levels under very minimal load conditions, and therefore it needs some help to do so.

This patch re-introduces some C0 time effect, but only under lower load conditions and under piece wise continuous conditions.  The influence of C0 is minimal at the threshold of inclusion and grows to dominate at 0 load. This sliding scale of C0 weight should alleviate concerns over different best thresholds for different processors, which was a reason for rejecting this method in the past.  Furthermore, when included, C0 is modulating core_pct in such a way so as not to completely change its definition and overall range.

We are dealing with small numbers herein. The early multiply by 100% is merely to have some extra significant digits. There is a subsequent divide by 100 later on. At a later date this will be re-evaluated to determine if the multiply and divide can be eliminated, and the clock cycles saved.

With such a short sampling time window, and with respect to periodic workloads, it is not really possible to make an inclusion decision based on just one sample having a low C0, it needs to be based on a few. Ultimately, if the periodic workload is of low enough load /sleep frequency the CPU clock will still oscillate.

Notice that the old C0 values are used without being initialized. This is not inconsistent with other things in this driver.

The sample rate really should be some number that works for all HZ configurations, and covers the rare case where the actual sample is less than asked for (further investigation pending). 20 milliseconds seems more reasonable. It is also a balance between needing older and older C0 samples and target pstate ramp up/down response times. There might be a concern as to increased response time when the scheduler switches CPU, particularly given that a lot of testing is done via forcing a particular CPU. However, there is a drag along tendency where an unloaded core target pstate will follow a busy core up, thus a scheduler switch should be a mute point.