(Big) problems achieving fair scheduling of a semaphore ISR producer - thread consumer case
This a continuation of a discussion started in
https://gerrit.zephyrproject.org/r/12668 and on IRC yesterday.
Recently contributed console_getchar()'s internal buffer
easily overflows when large enough chunk of data is received via UART,
which happens e.g.: a) when sending console input programmatically
from host; b) clipboard is pasted into a terminal app from host.
A simple producer - consumer circular buffer synchronized using
semaphore is used, UART is slow (KHz), consumer thread is fast (MHz),
hence no circular buffer is expected.
Way to reproduce:
Go to samples/subsys/console/getchar. Build the example, deploy, run.
Can be tested with QEMU too. When started, select this paragraph from
the email (starting with "Go to ..."), copy to clipboard, paste. Behold
a big number of "Console buffer overflow - char dropped" messages
appear, all characters haven't been pasted.
Even from the reproduction steps above, if you look at what *have* been
pasted, you'll that's initial chars of content, so it looks (and later
confirmed) following happens: instead of producer and consumer working
in tandem, initially mostly producer gets run in loop, overflowing
buffer, and only then consumer kicks in. The producer is a function run
from an ISR:
consumer is a function run in the Zephyr main thread below it.
Let's do the math: UART speed is 115200 baud / (1 + 8 + 1 bits) = 11520
bytes/s, ISR runs with an average frequency 11.5 KHz. My test hardware
is e.g. BOARD=96b_carbon, running at 84MHz. For each ISR call, there're
7304 instructions of scheduler and main thread to run on average.
Should be plenty to run the simple consumer above.
We note effects of the possible UART hardware buffer. The sample above
uses rather small circular buffer, but in a real app, it's 256 bytes.
Growing it further doesn't help. Even with HW buffer, sustained char
production rate is still 11.5 KHz, and the SW buffer can't keep up with
that due to strong bias on production, with heavily throttled
History of investigation:
1. I initially thought that maybe need to prod Zephyr to do thread
rescheduling explicitly, looked for yield, dropped it, and rejoiced by
seeing the issue to be fixed by BOARD=qemu_x86 ! Later I noticed it
still doesn't fix it for BOARD=qemu_cortex_m3. To start a discussion, I
submitted a patch as is, even though k_yield is documenting as working
on threads (not ISRs). Patch was merged, later reverted, the discussion
started hopefully either way ;-).
2. One of the initial ideas was that maybe buffer DoSing was an
artifact of QEMU. The behavior is confirmed for 96b_carbon.
3. One of the early suggestion I got on IRC was to do flow control with
disabling/enabling UART IRQs. I wasn't happy about such suggestion, as
the code above is generic console buffering, and with several thousands
of cycles to process each char, everything should "just work".
4. I did a lot of investigation yesterday (with ARM, as that's hard
case, which isn't "fixed" with the incorrect fix). There was a
suspicion that context switching, implemented with PendSV interrupt
doesn't always kick in. In the end, I had to discard my results, as I
used printk output for diagnosing it, which is slow, and makes it
normal a case that if I report recv char with printk, then afterwards
there's another char in buf, causing ISR to run immediately again, so
PendSV handler can't run. (To elaborate, behavior in the absence of
printk still resembles of PendSV not running often enough, it just
can't proven in the presence of printk's).
5. I briefly tried "kernel event logger" facility. Well, "briefly"
doesn't say it right, I spent couple of hours trying to make it useful
for my case, and hunt down issues (patches submitted and merged), so at
one point I decided to not look for more bugs and switched back to
printk. Perhaps following this path further makes sense though to
disambiguate issues in p.4.
6. In the end I tried suggestion with UART IRQ disabling. If you read
this far, you won't be surprised that it didn't work. Even after I
disable IRQ, my ISR routine still gets called one more time. This
leaves production vs consumption rate at 2:1, still leading to buffer
overflow, and it elaborates my concern with adding such code to console
routine: it leaves it at mercy of a particular hardware implementation
(IRQ disabling/enabling might be not implemented for example).
7. With some creativity, I was able to find a USB-UART adapter to try
the sample on arduino_101. So, while incorrect fix fixes the issue
completely for qemu_x86, on the real hardware, it doesn't - while
there're no buffer overflow messages, majority of chars are dropped.
So, it's a typical "nothing works" type of bug each of us faced and
enjoyed. I really need 2nd (3rd, 4th) opinion on this. My kind request
though is: please don't try to shift it all on UART and talk about the
need of DTR/RTS, etc. UART is damn slow device by the current
standards, which should "just work" with dozens-of-MHz CPUs without
any flow control on hardware side.
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog