Re: (Big) problems achieving fair scheduling of a semaphore ISR producer - thread consumer case


Erwan Gouriou
 

Hi Paul, Daniel,



On 10 April 2017 at 10:19, Daniel Thompson <daniel.thompson@...> wrote:
On 08/04/17 10:58, Paul Sokolovsky wrote:
Hello Daniel,

On Fri, 7 Apr 2017 12:38:34 +0100
Daniel Thompson <daniel.thompson@...> wrote:

[]

...
Console buffer overflow - char dropped
Console buffer overflow - char dropped
...

Did you run this on carbon? I've not been able to reproduce on this
board.

Ok, now I did. With the patch above, I can't reproduce the overflow
message. But now let's try to echo the input char, and for a
difference, set CONFIG_CONSOLE_GETCHAR_BUFSIZE=64 so we didn't
suspect too short a buffer.

I still can't reproduce overflow message,

... funny you should say that.

I've been taking a quick glance at the vendor HAL here. It implements
some kind of weird private locking scheme that looks like it doesn't
play nicely with a pre-emptive RTOS (or even with interrupt handlers).

Ack, thanks, I see it.

[]

So, probably can come to a conclusion: for reliable operation, both
UART rx *and* tx should be buffered and interrupt-driven.

I don't understand how the evidence leads to this conclusion.

Well, it's reasoning from the opposite direction, let's recount:

1. As was quoted, MicroPython has multiple implementations of that
handling for multiple environments, all "just working". Output
buffering is used at all times though.

Sorry I was getting carried away with the specific analysis for character drop: if input characters are responded to in >70000 cycles and the microcontroller is otherwise idle then input buffer and output buffering are equivalent.

I agree *entirely* that micropython requires output buffering however, because neither of the above constraints are true here (assuming expression parsing consumes a lot of cycles).


2. Pure mathematical reasoning that if a char at a given baud rate is
received each X cycles, then busy-polling to send a char will take
these X cycles. But while hardware receives a char each X cycles on its
own, before we can spend X cycles to send it, we first all also need to
spend some other cycles to handle an interrupt, extract received char,
etc. So, the ratio is never 1:1, but instead 1:1.xx, so we sustainably
will be getting late, and lose or overflow eventually.

This would be true if there were 0 bytes of output buffering but that should not be the case: there should be 1 bytes of buffering (e.g. we poll *before* subsequent TX, not after current TX).


3. Simply due to lack of better leads to a problem ("broken UART
drivers and handling" started to sound much more convincing than the
original "scheduling is broken" guess, which couldn't be proven).

Personally I'm still rather concerned that the driver may not be
robust (although, just as you blaming the scheduler, I haven't
collected much evidence to support this).

"The driver"? It's "the drivers, and for a while".

However mature Zephyr gets it will *always* be the driver you should examine for bugs first. It is inevitable that drivers are less well tested and less heavily reviewed than the core or arch code

For example on Zephyr currently only a handful of people care about STM32 drivers at present, dozens care about arch/arm and everyone in the project cares about the core code...

If you think that
the problem is peculiar to stm32, then there was example of
arduino_101 having had a big problem,

... which *was* an APIC driver problem.


and still having some, frdm_k64f
also had (== has) similar problems, etc.

I've done a quick review of the driver and so far I haven't seen
anything that explains the character loss (although it would good to
neutralize the private locking so we can see the output from the ISR).

I'm afraid, for me, time is up thinking about this for a while.
However, if I was looking at this further, I'd consider reinstating
the old UART driver (HAL is sufficiently complexified that it becomes
hard to reason about) and see if it behaves the same way...

Well, thanks for helping the investigation, I appreciate that, as
usual! Reinstate the old driver - for one-off testing? If so, I'd find
hack-hacking the existing one to be more practical. And anything beyond
that borders on organizational and political issues. Suffice that
others provided feedback on that (p.1 at
https://youtu.be/XUJK2htXxKw?t=1885), I don't want to go there ;-).

No worries. I was thinking a "git revert" is fast and cheap (providing one is not afraid of solving merge conflicts) for a quick test... also its always useful to distinguish regressions from this-never-worked.

About stm32 driver and Cube HAL, I'd like to mention that a Low Level cube API is now available
on almost all stm32 families (LL). This API intends to be lightweight and modular and hence have
a better fit with Zephyr architecture.

I'd like to start implementing drivers with this LL API soon.
It would still be a HAL (hence I understand some people would still complain), but would allow to have
a driver implementation closer to zephyr needs, removing a part of redundant functionalities and still
allow to implement (as far as possible) stm32 generic drivers.

This won't solve current issue (as I understood), but I hope it could help to simplify things on stm32 side.

Erwan
Anyhow, excited to see your pull requests this morning. I'll take a look at that in a minute.

Daniel.

_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...ct.org
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Join devel@lists.zephyrproject.org to automatically receive all group messages.