An idea that I (ADDW) had recently was that we could save power by switching off parts of the CPU.
I have written up an outline below:
I have not been able to get an idea of how long it would take to power up/down part of a CPU. Comments please.
I think powering off portions of the cpu could definitely be controlled by the OS, however, powering them back on should be controlled by the cpu hardware -- when it encounters a instruction needing those portions.
I'd expect the power-on latency could be on the order of a few 10s or 100s of nanoseconds, at which scale calling out to the OS greatly increases latency.
I think it's faster than voltage/frequency switching because V/F switching usually involves adjusting the power-supply voltage, that takes on the order of microseconds, dwarfing the interrupt-to-OS latency.
To help the OS decide when to power-off parts of the cpu, I think we need 32-bit saturating counters (16-bit is not enough, 64-bit is overkill, saturating to avoid issues with wrap-around which would happen once a second at 4GHz) of the number of clock cycles since the last time that part was last used. The counter is set to 0 when the cpu part is powered-back-on, even if it didn't end up being used (e.g. mis-speculation). The counters *must* be privileged-only, since they form an excellent side-channel for speculative execution due to mis-speculation still being a use of that hardware.
We could have a simple OS-controlled compare register for each part where the part is powered-down if the compare register is < the last-use counter, allowing simple HW power management. If the OS wants finer control, it can set the compare register to 0xFFFFFFFF to force-power-on the part, and to something less than the current counter to power-down the part.
I picked < instead of <= so both:
1. 0xFFFFFFFF will never power-down since the counter stops at 0xFFFFFFFF and 0xFFFFFFFF is not < 0xFFFFFFFF.
2. 0 will still power-on the part if it's in use, since the counter is continuously cleared to 0 while the part is in use, and it remains powered on since 0 is not < 0.
It might be handy to have a separate register the previous count is copied to when a part is powered-on, allowing the OS to detect edge-cases like the part being used shortly after power-off, allowing the OS to adjust the power-off interval to better optimize for the program's usage patterns.
There would be one set of those 32-bit registers (maybe combined into 64-bit registers) for each independent power-zone on the cpu core.
The compare field should be set to some reasonable default on core reset, I'd use 10000 as a reasonable first guess.
If the HW switches back on:
* what is the state of the hardware restarted ? If registers are switched on what are the initial values ? This is something that the os might want some say in - zero is not always the answer.
Maybe a status bit per hardware component that causes an OS interrupt.
I have added comments to the wiki page about:
* what do we power down ?
* Jacob's comments
We could track state for powered-off parts using something like RISC-V's XS/FS bit-fields:
Status & FS Meaning & XS Meaning\\
0 & Off & All off \\
1 & Initial & None dirty or clean, some on\\
2 & Clean & None dirty, some clean \\
3 & Dirty & Some dirty \\
Off is where using those instructions causes a interrupt to OS. The Initial state would be where the registers are cleared to 0 or whatever values make sense. If the state is Initial or Off, then registers can be powered-off and cleared back to their initial value when powered back on. If the state is Clean or Dirty, then the registers must stay powered-on, though they could be in a low-power mode. The computational logic can be powered-off in any case.
When the OS next has an interrupt, it can check if the counter has gone past some additional limit (since just turning off the logic but leaving the registers powered is a much faster power state to enter/exit, avoiding needing to involve the OS), then switch the state from Clean or Dirty to Off after saving the register states. This will allow the HW to turn off the registers too.
We'd want the XS/FS state anyway to speed up context switches, even if we don't use it for power-saving.