(Big) problems achieving fair scheduling of a semaphore ISR producer - thread consumer case
Paul Sokolovsky
Hello,
This a continuation of a discussion started in https://gerrit.zephyrproject.org/r/12668 and on IRC yesterday. The problem: Recently contributed console_getchar()'s internal buffer easily overflows when large enough chunk of data is received via UART, which happens e.g.: a) when sending console input programmatically from host; b) clipboard is pasted into a terminal app from host. Expected behavior: A simple producer - consumer circular buffer synchronized using semaphore is used, UART is slow (KHz), consumer thread is fast (MHz), hence no circular buffer is expected. Way to reproduce: Go to samples/subsys/console/getchar. Build the example, deploy, run. Can be tested with QEMU too. When started, select this paragraph from the email (starting with "Go to ..."), copy to clipboard, paste. Behold a big number of "Console buffer overflow - char dropped" messages appear, all characters haven't been pasted. Discussion: Even from the reproduction steps above, if you look at what *have* been pasted, you'll that's initial chars of content, so it looks (and later confirmed) following happens: instead of producer and consumer working in tandem, initially mostly producer gets run in loop, overflowing buffer, and only then consumer kicks in. The producer is a function run from an ISR: https://gerrit.zephyrproject.org/r/gitweb?p=zephyr.git;a=blob;f=subsys/console/getchar.c;hb=refs/heads/master#l21 , consumer is a function run in the Zephyr main thread below it. Let's do the math: UART speed is 115200 baud / (1 + 8 + 1 bits) = 11520 bytes/s, ISR runs with an average frequency 11.5 KHz. My test hardware is e.g. BOARD=96b_carbon, running at 84MHz. For each ISR call, there're 7304 instructions of scheduler and main thread to run on average. Should be plenty to run the simple consumer above. We note effects of the possible UART hardware buffer. The sample above uses rather small circular buffer, but in a real app, it's 256 bytes. Growing it further doesn't help. Even with HW buffer, sustained char production rate is still 11.5 KHz, and the SW buffer can't keep up with that due to strong bias on production, with heavily throttled consumption. History of investigation: 1. I initially thought that maybe need to prod Zephyr to do thread rescheduling explicitly, looked for yield, dropped it, and rejoiced by seeing the issue to be fixed by BOARD=qemu_x86 ! Later I noticed it still doesn't fix it for BOARD=qemu_cortex_m3. To start a discussion, I submitted a patch as is, even though k_yield is documenting as working on threads (not ISRs). Patch was merged, later reverted, the discussion started hopefully either way ;-). 2. One of the initial ideas was that maybe buffer DoSing was an artifact of QEMU. The behavior is confirmed for 96b_carbon. 3. One of the early suggestion I got on IRC was to do flow control with disabling/enabling UART IRQs. I wasn't happy about such suggestion, as the code above is generic console buffering, and with several thousands of cycles to process each char, everything should "just work". 4. I did a lot of investigation yesterday (with ARM, as that's hard case, which isn't "fixed" with the incorrect fix). There was a suspicion that context switching, implemented with PendSV interrupt doesn't always kick in. In the end, I had to discard my results, as I used printk output for diagnosing it, which is slow, and makes it normal a case that if I report recv char with printk, then afterwards there's another char in buf, causing ISR to run immediately again, so PendSV handler can't run. (To elaborate, behavior in the absence of printk still resembles of PendSV not running often enough, it just can't proven in the presence of printk's). 5. I briefly tried "kernel event logger" facility. Well, "briefly" doesn't say it right, I spent couple of hours trying to make it useful for my case, and hunt down issues (patches submitted and merged), so at one point I decided to not look for more bugs and switched back to printk. Perhaps following this path further makes sense though to disambiguate issues in p.4. 6. In the end I tried suggestion with UART IRQ disabling. If you read this far, you won't be surprised that it didn't work. Even after I disable IRQ, my ISR routine still gets called one more time. This leaves production vs consumption rate at 2:1, still leading to buffer overflow, and it elaborates my concern with adding such code to console routine: it leaves it at mercy of a particular hardware implementation (IRQ disabling/enabling might be not implemented for example). 7. With some creativity, I was able to find a USB-UART adapter to try the sample on arduino_101. So, while incorrect fix fixes the issue completely for qemu_x86, on the real hardware, it doesn't - while there're no buffer overflow messages, majority of chars are dropped. So, it's a typical "nothing works" type of bug each of us faced and enjoyed. I really need 2nd (3rd, 4th) opinion on this. My kind request though is: please don't try to shift it all on UART and talk about the need of DTR/RTS, etc. UART is damn slow device by the current standards, which should "just work" with dozens-of-MHz CPUs without any flow control on hardware side. -- Best Regards, Paul Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
|
|