Tickless mode is incompatible with preemptible kernel

Spym · Postby **Spym** » Sat Oct 29, 2016 7:29 pm

Greetings, my fellow humans.

So I'm developing an embedded application based on STM32F446 with ChibiOS v16.1.6. The RTOS is configured in tickless mode with non-simplified IRQ handling (using BASEPRI for critical sections instead of PRIMASK), two highest IRQ priority levels are reserved for the application's fast IRQ handlers.

The application runs some communication interfaces and some basic logic in regular OS threads with a couple of regular IRQ; alongside that it runs some hard real time control process driven by the two IRQ which preempt the kernel (due to strict hard real time requirements). The hard real time part performs intensive computation involving matrices and FPU. Typical mode of operation requires the firmware to spend about 90-95% of time performing computations in the hard real time IRQ handlers; rest of the time it operates in the normal RTOS mode.

Obviously, the hard real time part is completely isolated from all RTOS services, and all standard precautions relevant to preemptible kernels are taken into account.

While debugging the application, I noticed that once in a while it would fail in either of 2 ways:

- Trip on the assertion check at chVTDoTickI.

- Make all threads that are in the SLEEPING state sleep forever. Threads that are blocked in other states, e.g. waiting for synchronization objects or channels, continue to function normally.

Two days of investigation led me to the system tick timer. According to my understanding, when the kernel needs to schedule a new alarm in tickless mode, it does the standard read-modify-write on the system timer registers:

1. read the current system time from the timer;
2. add the desired duration to the obtained value;
3. store the value into the compare register.

Pretty standard. OK, obviously the timer keeps running at all times, so the obvious necessary precautions were taken:

- the update is performed in a critical section;
- the minimum duration is limited by a configuration parameter named CH_CFG_ST_TIMEDELTA which cannot be less than 2.

Makes perfect sense so far.

The problem creeps in when the kernel is used in preemptive mode, i.e. when the entire RTOS can be interrupted at any point, including a critical section, to serve a hard real time IRQ, which is exactly the thing that is happening in my application. If a hard real time interrupt occurs anywhere between steps 1 and 3, and takes longer than CH_CFG_ST_TIMEDELTA to execute, the newly computed deadline will end up in the past, freezing the SLEEPing threads until the counter wraps around (which takes about forever). Sometimes the assertion check in chVTDoTickI would catch this and crash the system, sometimes it won't.

In order to verify this theory, I sketched the shim shown below and put it into the context switch and tick hooks:

Code: Select all

using TimeType = decltype(st_lld_get_counter());
static constexpr TimeType HalfRange = std::numeric_limits<TimeType>::max() / 2;
static constexpr TimeType DetectionThreshold = S2ST(2);

const TimeType counter = st_lld_get_counter();
const TimeType real_alarm = st_lld_get_alarm();
const TimeType alarm_with_offset = real_alarm + DetectionThreshold;

if (TimeType(alarm_with_offset - counter) >= HalfRange)
{
   chibios_rt::System::halt(os::heapless::concatenate(
      "OS TIMER DEADLINE MISSED: CNT=", counter,
      " ALARM=", real_alarm).c_str());
}

It proved to be able to reliably detect the problem, which, expectedly, happened to correlate well with the computational load in the hard IRQ handlers.

Having confirmed that, I thought about solutions. I see three of them:

- Disable tickless mode. I wouldn't like this, because I'm already having a hard time squeezing the application into the tight performance limits of this MCU, and the ticked mode will increase the RTOS overhead even further (although probably not significantly, I haven't checked this yet).

- Increase CH_CFG_ST_TIMEDELTA. This solution is highly unreliable for obvious reasons.

- Fix chVTDoTickI. This is the only proper solution to the problem, so I'll focus on it below.

We want to fix chVTDoTickI in a way that will make preemptible operation in tickless mode reliable, and at the same time minimally affect non-preemptible systems. A possible solution is to wrap the whole function (or, more precise, the part of it that handles the case CH_CFG_ST_TIMEDELTA > 1) into a loop, and add a check at the end of the loop if the freshly installed deadline is in the past. If it is, the function would have to try again, otherwise exit. The overhead for non-preemptible systems would be unnoticeable (just one more check, it will always succeed), which can be avoided completely if the looping is included via conditional compilation only if CORTEX_SIMPLIFIED_PRIORITY is false.

One might argue that the try-check-retry approach is fundamentally non-deterministic, but that should not pose a problem, since in preemptible kernels the OS itself typically does not have to adhere to strict real time requirements, delegating these to the preempting logic.

Another possible (very unlikely) issue is spurious synchronization of the try-check-retry loop with the hard real time IRQ: the problem occurs if each pass of the loop happens to perform the steps 1-2-3 (see above) exactly at the same time when the fast IRQ fires, which will cause it to fail continuously at every iteration forever, until it slides out of phase with the hard real time IRQ. A possible solution is to double CH_CFG_ST_TIMEDELTA after every iteration, until the time delta value exceeds the duration of the preemption.

This solution can be explored further. At any rate, even if the proposed solution will not be implemented, the documentation certainly must be updated with a warning explaining why tickless mode + preemptible kernel is a bad idea.

Feedback is welcome.

Pavel.

P.S. The title is a cliсkbait, I know.

P.P.S. Any plans to move the project away from Sourceforge, e.g. to Github?

Postby **Giovanni** » Sat Oct 29, 2016 8:04 pm

Hi,

Good analysis.

The obvious fix is to increase the delta parameter, the amount of increase depends on the maximum depth of the timers list, frequency of fast interrupts and duration of fast ISRs. Unless the interrupts are not deterministic (aperiodic and with variable "density") it should be possible to work only with the delta parameter.

I don't like the idea of creating workarounds into the kernel code, it is designed to rely on critical zones and maximize performance under that assumption.

BTW, the ticked mode is very efficient, it also reduces the kernel size significantly. You could also try to reduce the system tick frequency in tickless mode to levels comparable to ticked mode (1000Hz), that should also solve your problem without having to play with delta.

Giovanni

Spym · Postby **Spym** » Sat Oct 29, 2016 8:19 pm

Thanks for the quick response, Giovanni.

I don't like the idea of creating workarounds into the kernel code, it is designed to rely on critical zones and maximize performance under that assumption.

What I'm proposing is not a workaround, it's a bug fix. As I understand, ChibiOS is marketed as supporting tickless mode and preemptive kernel, is that true? If the fix will not be implemented, then at least the documentation should explicitly warn the user about incompatibility of these features. Also there shouldn't be any performance impact unless both features are enabled, so there should be no trade off involved for most applications.

Regarding your suggestion: could you please clarify what is the dependence between timer list depth and the delta?

Thanks!

Postby **Giovanni** » Sat Oct 29, 2016 8:52 pm

Hi,

If by "preemptive kernel" you want something that is able to be preempted anywhere then RT is not what you think.

In the context of an RTOS without separation between application and kernel, the meaning of "preemptive" is that threads can preempt each other (unlike cooperative ones) and the kernel code itself can be preempted outside critical zones. Critical zones, by definition, do not allow preemption, only voluntary context switches.

A true preemptive kernel should allow preemption AND context switch anywhere in the code, basically it should have no critical zones, obviously this is not the case with RT.

Sorry but this is not a bug, it is working as intended, fast interrupts are outside the kernel perimeter.

Giovanni

Spym · Postby **Spym** » Sat Oct 29, 2016 9:06 pm

Sorry about the confusion, what I meant to say was "preemptible kernel", not preemptive. The documentation says that ChibiOS/RT does support preemption by fast IRQ even from its critical zones, i.e. that its kernel is preemptible. This makes perfect sense to me. Is that correct?

If the above is correct, then we have an obvious problem with the current situation with the tickless mode - it is not compatible with kernel preemption, because that ruins the assumption made in chVTDoTickI() that its critical zone cannot be interrupted.

What am I getting wrong?

Thanks!

Postby **Giovanni** » Sat Oct 29, 2016 9:19 pm

It is correct but that does not mean that it is immune to anything fast interrupts can do, also, it is not preemptive in the sense that an ISR can trigger a context switch even during critical zones.

You need to trim delta and system tick frequency to accommodate for fast interrupts. Polling within a critical zone is not going to happen.

Giovanni

skute · Postby **skute** » Sat Nov 05, 2016 12:21 am

The way to solve this would be a fundamental change to ChibiOS. That said, you could decouple absolute time and interval time. Meaning, you always have a free-running absolute time source (perhaps from a 32kHz clock) and then you have an interval timer for performing the actual delay based on the delta between the desired absolute time in the future and the current absolute time. This would look something like:

Code: Select all

delta        = CH_CFG_ST_TIMEDELTA;
timeNow      = getAbsoluteTime();
vt->deadline = timeNow + timeToDelay;
if(vt->deadline > (timeNow + CH_CFG_ST_TIMEDELTA)) {
   delta = vt->deadline - timeNow;
}
intervalTimer->setInterval(delta);

There can be major jitter as to when the interval timer may fire depending on when a 'fast' interrupt occurs, but this pattern should guarantee that the timer fires as expected/desired.

ChibiOS Free Embedded RTOS

Tickless mode is incompatible with preemptible kernel

Tickless mode is incompatible with preemptible kernel

Re: Tickless mode is incompatible with preemptible kernel

Re: Tickless mode is incompatible with preemptible kernel

Re: Tickless mode is incompatible with preemptible kernel

Re: Tickless mode is incompatible with preemptible kernel

Re: Tickless mode is incompatible with preemptible kernel

Re: Tickless mode is incompatible with preemptible kernel

Who is online