Re: ARMv7 Cortex-A port for Xilinx Zynq7000

Immo Birnbaum

Hey everyone,

here's another update regarding the ARMv7 port: other than the 'interrupt' test, which I blacklisted, I've now got green lights on all kernel tests running on the QEMU Cortex-A9 (Zynq) target which are not automatically de-selected for whatever reason (61 passed, 23 skipped, 10 discarded). As a solution for the 'interrupt' test assertion failure is already being discussed and as I moved over completely to Stephanos' GIC interrupt handling code (although my code base isn't at the most recent regarding the modification of the interrupt handling, I'll take care of that soon), any solution of this issue should reflect directly in the ARMv7-A branch, eventually making the blacklisting obsolete.

There's been one major change along the way: I've exchanged the timer being used to drive the system. As some of the tests that failed were related to the tickless mode, and as the readily available Xilinx TTC driver isn't suitable for this mode (it's a self-reloading 16-bit counter, where dynamically calculated comparator values would require on-the-fly re-calculation of a prescaler, which would lead to varying imprecision in timing, and where, for example, you might miss out on an entire interval if interrupts are globally locked for more than one period), I've switched over to the ARM global timer, for which Carlo has already provided a high-level interface plus an access implementation in aarch64. The ARM global timer is similar to the timers that drive the system clock on the other architectures, continuously counting up while comparing against an absolute value. While on ARMv8 access is possible via special instructions, on ARMv7 the timer's identical register layout is memory-mapped. So I've added a register-based implementation in aarch32, for which the timer's device tree entry must be supplied with an additional base address. Using this timer, the tickless mode tests passed as well. The fact that the tests failed with the TTC driver might eventually be a test suite issue - the TTC driver doesn't set the tickless capability config flag when it is chosen in the build configuration, yet the test cases didn't get filtered out?

There's just one catch: I had to extend the ARM global timer's interface a little in the aarch32 implementation. While Carlo's implementation provides the basic functions for reading the current value and writing to the comparator which I re-implemented for aarch32, I had to add functions that read and write the state of the interrupt status ("event") bit. Initially I observed that the timer's ISR always ran twice in a row (the IAR register returns the timer's vector twice in a row before returning 0x3FF) when not using the auto-reload feature in tick-based mode (when configured this way, the interrupt works just as expected), erroneously incrementing the comparator value and leading to some really wonky timing behaviour.  I spent the best part of a day experimenting with various things such as extra barriers, puzzled why these double interrupts kept occurring. Eventually, I came across both ARM's and Xilinx's errata documentation - lo and behold, this behaviour is an actual bug in the MPcore silicon (which QEMU even emulates!). The suggested workaround is to clear the event bit *after* updating the comparator (to UINT64_MAX in tickless mode, some value must always be written from within the ISR) and checking its value whenever the ISR is entered. During the second, erroneous execution of the ISR, polling the event bit's value returns 0 as the comparator hasn't matched yet and the ISR exits upon detecting that no event is pending. This is also how the Linux folks implemented the workaround in their driver. It doesn't prevent the extra ISR execution, but at least it doesn't screw up the system's timing. Carlo, would you mind if I added a menuconfig option below the ARM global timer in order to enable the (ARMv7-specific) workaround and included the special handling #ifdef'd in arm_arch_timer.c -> arm_arch_timer_compare_isr()? Also, do you see any use in adding an option enabling the auto-update of the comparator in tick-based mode? It doesn't improve the interrupt latency in any way, it would just save two register writes in my case. I guess it's not that relevant, what do you think?

The one thing that works but isn't yet as clean as I would like it to be (and which should also have a matching test case for aarch32) is the FPU register saving thing. So far, I've completely skipped the CONFIG_FP_SHARING setting whereever I've added a branch for ARMv7 - if it's an ARMv7-A and if it has a FPU, pull the register saving off. Yet, while there is no marker which single thread gets to use the non-shared FPU registers as there is on Cortex-M, and the register saving basically isn't optional once the FPU is in play, I should probably auto-incorporate the FP_SHARING flag just to keep the semantics clean. I also still have to add the setting of the initial FPSCR and FPEXC values to the arch-specific thread creation code. More on that in the next update, I guess...

I'll have a look into how pull requests work and start off small with the minor bugfixes in the existing UART/TTC drivers, while with regards to the rest of my work I still have to work out with the big wigs at work if I personally get to take the credit in the copyright notices of any new sources, or if they'll mention the company - after all, they provided the resources.

Best regards,

Join to automatically receive all group messages.