Re: CONFIG_SEMAPHORE_GROUPS


Benjamin Walsh <benjamin.walsh@...>
 

On Thu, Feb 09, 2017 at 09:43:23AM +0000, Shtivelman, Bella wrote:
Hi Benjamin,
Thank you for helping with this issue.

I verified that in case of the crash thread->base.thread.state is 0xF1.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In the latest code base, that's not a valid state: only bits 0-5 are
used, giving a max value of 0x3f.

./kernel/include/kernel_structs.h:#define _THREAD_DUMMY (1 << 0)
./kernel/include/kernel_structs.h:#define _THREAD_PENDING (1 << 1)
./kernel/include/kernel_structs.h:#define _THREAD_PRESTART (1 << 2)
./kernel/include/kernel_structs.h:#define _THREAD_DEAD (1 << 3)
./kernel/include/kernel_structs.h:#define _THREAD_SUSPENDED (1 << 4)
./kernel/include/kernel_structs.h:#define _THREAD_POLLING (1 << 5)

This looks like corruption. Are you sure the stack of that thread has
not overflowed ?

It means, _THREAD_DUMMY bit is set indeed.

Thanks again,
Bella

-----Original Message-----
From: Benjamin Walsh [mailto:benjamin.walsh@windriver.com]
Sent: Tuesday, February 07, 2017 17:40
To: Shtivelman, Bella <bella.shtivelman@intel.com>
Cc: zephyr-devel@lists.zephyrproject.org
Subject: Re: [Zephyr-devel] CONFIG_SEMAPHORE_GROUPS

On Tue, Feb 07, 2017 at 11:42:22AM +0000, Shtivelman, Bella wrote:
Hi,
I have a question regarding the impact of CONFIG_SEMAPHORE_GROUPS flag.

In my sample application I experienced weird failures in k_sem_give
being called from udp receive callback. After debugging this issue I
found out that the failure occurred in sem_give_common function, and
more precisely, when entering this if condition:

If(handle_sem_group(sem, thread))

where handle_sem_group is defined as follows:

#ifdef CONFIG_SEMAPHORE_GROUPS
static int handle_sem_group(struct k_sem *sem, struct k_thread *thread)
{

}
#else
#define handle_sem_group(sem, thread) 0
#endif

Disassembly shows the following (faulting instruction address is 0x0040f38c:)

0040f36c: ands.w r5, r3, #1
0040f370: beq.n 0x40f42a <sem_give_common+226>
177 list = (sys_dlist_t *)dummy->desc.thread->base.swap_data;
0040f372: ldr r3, [r4, #48] ; 0x30
0040f374: ldr.w r8, [r3, #12]
132 return list->head == list;
0040f378: ldr.w r4, [r8]
145 return sys_dlist_is_empty(list) ? NULL : list->head;
0040f37c: cmp r8, r4
0040f37e: it eq
0040f380: moveq r4, #0
176 return (!node || node == list->tail) ? NULL : node->next;
0040f382: cbz r4, 0x40f390 <sem_give_common+72>
0040f384: ldr.w r3, [r8, #4]
0040f388: cmp r3, r4
0040f38a: beq.n 0x40f394 <sem_give_common+76>
0040f38c: ldr r5, [r4, #0]
0040f38e: b.n 0x40f396 <sem_give_common+78>
0040f390: mov r5, r4
0040f392: b.n 0x40f396 <sem_give_common+78>
0040f394: movs r5, #0
0040f396: ldr r3, [r4, #12]
0040f398: cmp r6, r3
0040f39a: beq.n 0x40f3c8 <sem_give_common+128>
0040f39c: ldr.w r3, [r4, #-8]
0040f3a0: adds r3, #2
0040f3a2: beq.n 0x40f382 <sem_give_common+58>
0040f3a4: sub.w r9, r4, #40 ; 0x28
0040f3a8: sub.w r0, r4, #24
0040f3ac: bl 0x40f2a0 <_abort_timeout>
0040f3b0: mov r0, r9
0040f3b2: bl 0x40f290 <sys_dlist_remove>

The default value of CONFIG_SEMAPHORE_GROUPS is “y”, and after setting
it to “n” in my prj.conf I stopped facing this issue.
OK, so it seems that, since you can disable the CONFIG_SEMAPHORE_GROUPS
feature, you are _not_ using said feature. In theory, this if statement
should always be hit in handle_sem_group():

if (!(thread->base.thread_state & _THREAD_DUMMY)) {
/*
* The awakened thread is a real thread and thus was not
* involved in a semaphore group operation.
*/
return 0;
}

and handle_sem_group() should never do anything else, unless the
thread_state of some thread got corrupted.

Can you verify that the thread->base.thread.state _THREAD_DUMMY bit is
actually set when you get that crash ?

Looking at it, I think I found a bug in handle_sem_group():

if (desc->sem != sem) {
sem_thread = CONTAINER_OF(desc, struct _sem_thread,
desc);
struct k_thread *dummy_thread =
(struct k_thread *)&sem_thread->dummy;

if (_is_thread_timeout_expired(dummy_thread)) {
+ sys_dlist_remove(node);
+ node = next;
continue;
}
_abort_thread_timeout(dummy_thread);
_unpend_thread(dummy_thread);

sys_dlist_remove(node);
}

If you _are_ using sem groups, I would suggest using the new k_poll()
API instead: semaphore groups are part of the legacy API, and the
implemenation is pretty bad w.r.t. interrupt locking duration.

Regards,
Ben


But I’m not sure it is a correct handling of the issue.

Are you familiar with this issue?
Am I missing anything here?
---------------------------------------------------------------------
A member of the Intel Corporation group of companies

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
--
Benjamin Walsh, SMTS
WR VxWorks Virtualization Profile
www.windriver.com
Zephyr kernel maintainer
www.zephyrproject.org

Join devel@lists.zephyrproject.org to automatically receive all group messages.