RFC: async I/O facility: "zpoll" API


Andy Ross
 

So I've been looking at the requirement for an async I/O framework in
ZEP-540, and this is the API I have so far. An implementation is
prototyped as far as working with a minimal test of a faked device,
I'll get that cleaned up and in gerrit for a review within a day. But
here's the API I'm looking at. Please tear it apart as appropriate;
it's an API we'll be stuck with for a while. Stuff to note:

It works a little like Unix select()/poll(), in that the caller gets
handles to a bunch of "zpoll_event" structures which are 1:1 with
async requests made (e.g. they're a return value from a putative
uart_write_async() call), and can wait on the full set of them, waking
up with any one of them has a completion event to signal.

Driver's are responsible for managing one of these events per
simultaneous outstanding request they want to handle (in mose cases
that number of requests will be one), and calling a straightforward
signal() function when the request is complete (or in an error state).

Because mixing blocking styles is madness (c.f. the historical
disaster of pthreads and file descriptors not being waitable in the
same call and the need for hacks like signalfd), it also accepts basic
semaphores as a waitable object. We might consider some of the other
IPC primitives for the same treatment.

The original request was for a callback mechanism, which I resisted.
But given this mechanism and the existance of the system work queue
(which, to be fair, is a heavyweight feature), that's not hard to
implement. So this supports both styles, but the callback one is
optional for apps with their own main loops that don't want to have a
extra 1kb thread stack sitting around just for event processing.

Semantically, this overlaps in a kind of ugly way with
sem_group_take(), which I see a patch in the queue for. Essentially
zpoll() is a proper superset of that facility. We should probably try
to unify/deprecate such that we can do things in only one way.

Please review. An implementation prototype will arrive promptly.

Andy


========================================================================

typedef void (*zpoll_callback_t)(int result, void *user_data);

enum zpoll_req_type { ZPOLL_REQ_POLL, ZPOLL_REQ_SEM, ZPOLL_REQ_CALLBACK };

struct zpoll_event {
enum zpoll_req_type type : 8;
int signaled : 1; /* Driver/source operation complete */
int active : 1; /* result not yet retrieved/delivered */
int result;
union {
struct {
zpoll_callback_t callback;
void *user_data;
} cb;
struct {
struct zpoll_call *poller;
struct k_sem *semaphore;
} polled;
};
};

/**
* @brief Re/initialize zpoll event
*
* For use by drivers who are initiating a new request. Will return
* an error (-EAGAIN) if the event is still in use, which should
* generally be either propagated back to the user code making the
* request or used to initiate some kind of queuing/allocation of a
* new event. Drivers should *not* pend or busy-wait at request time
* to "wait" for this to become available. These are asynchronous
* requests!
*
* @param evt A potentially-in-use zpoll_event structure
*/
int zpoll_event_init(struct zpoll_event *evt);

#ifdef ZPOLL_CALLBACKS
/**
* @brief Re/initialize zpoll event for a user-specified callback
*
* As for zpoll_event_init(), but assigns the callback fields where provided.
*
* @param evt A potentitally-in-use zpoll_event structure
* @param cb A user-provided callback function
* @param user_data A user-provided opaque pointer to be passed as the
* user_data argument to the callback
*/
int zpoll_event_init_cb(struct zpoll_event *evt, zpoll_callback_t cb, void *user_data);
#endif

/**
* @brief Static initializer for a semaphore zpoll_event
*
* Constructs a zpoll_event stucture suitable for passing to zpoll().
*
* @param sem A pointer to a valid k_sem_t.
*/
#define ZPOLL_EVENT_SEMAPHORE(sem) \
{ \
.request_type = ZPOLL_REQ_SEM, \
.signaled = 0, \
.semaphore = (sem), \
}

/**
* @brief Executes the user-provided handler for a complete event
*
* Called by driver code when the request abstracted by this zpoll
* event is complete, either by waking up a pending zpoll() call or
* scheduling a work thread to issue the user-provided callback.
*
* @param evt A zpoll_event structure whose action is complete
* @param result The integer result code of the asynchronous call
*/
void zpoll_signal(struct zpoll_event *evt, int result);

void _zpoll_sem_signal(struct k_sem *sem);

/**
* @brief Wait for the completion of any of a number of events
*
* Will pend until any of the zpoll_event structures receives a
* zpoll_signal() call from another context (or, for semaphores, a
* "give" event), or until a the specified timeout occurs.
*
* For convenience, will return one known-complete event pointer. But
* note that completion notifications are "level triggered" (they stay
* complete and will synchronously wake up future zpoll calls until
* their results are retrieved), and that there may be more whose
* state can be tested with zpoll_get_result(). In general most
* applications will simply loop on zpoll() getting one event at a
* time.
*
* @param events An array of zpoll_event structures
* @param nevents The size of the events array
* @param timeout_ticks How long to wait before returning, or
* TICKS_UNLIMITED to wait forever.
*
* @return One complete event, or NULL if the timeout expired.
*/
struct zpoll_event* zpoll(struct zpoll_event **events, int nevents,
int timeout_ticks);

/**
* @brief Retrieve the result code of a complete zpoll_event and mark
* it available
*
* Returns the integer result code of the operation encapsulated by
* the zpoll_event. For driver-managed events, this also marks the
* zpoll_event available for future asynchronous I/O calls and so
* *must* be called to free the resource for future I/O.
*
* Will return -EAGAIN if the event is not yet complete.
*
* @return The result code of the asynchronous operation
*/
int zpoll_get_result(struct zpoll_event *e);


Joseph, Jithu
 

Unix like generic async API framework can potentially simplify app/driver development . So it is great to have.

From a quick read, I have a question regarding the callback mechanism (Perhaps it may become clear with the whole patch ... anyway)

[Andy, Ross]
The original request was for a callback mechanism, which I resisted.
But given this mechanism and the existance of the system work queue (which,
to be fair, is a heavyweight feature), that's not hard to implement. So
this supports both styles, but the callback one is optional for apps with
their own main loops that don't want to have a extra 1kb thread stack
sitting around just for event processing.
[Joseph, Jithu] I think this implies that callback is made from (a new) zpoll thread,(which will get built into the system) and not from the context which invokes zpoll_signal().

A question on api usage in this scenario:

If the user app is interested only in a one async callback event , For e.g if the user app after making the uart_write_async_cb() (which internally calls zpoll_event_init_cb()) call, should it still call zpoll() ( if so will it return immediately - even when the event has not occurred - so that the main thread can continue) or can it continue remaining instructions without calling zpoll() ?


Thanks
Jithu

-----Original Message-----
From: Andy Ross [mailto:andrew.j.ross(a)intel.com]
Sent: Monday, September 19, 2016 1:22 PM
To: devel(a)lists.zephyrproject.org
Subject: [devel] RFC: async I/O facility: "zpoll" API

So I've been looking at the requirement for an async I/O framework in ZEP-
540, and this is the API I have so far. An implementation is prototyped as
far as working with a minimal test of a faked device, I'll get that cleaned
up and in gerrit for a review within a day. But here's the API I'm looking
at. Please tear it apart as appropriate; it's an API we'll be stuck with
for a while. Stuff to note:

It works a little like Unix select()/poll(), in that the caller gets
handles to a bunch of "zpoll_event" structures which are 1:1 with async
requests made (e.g. they're a return value from a putative
uart_write_async() call), and can wait on the full set of them, waking up
with any one of them has a completion event to signal.

Driver's are responsible for managing one of these events per simultaneous
outstanding request they want to handle (in mose cases that number of
requests will be one), and calling a straightforward
signal() function when the request is complete (or in an error state).

Because mixing blocking styles is madness (c.f. the historical disaster of
pthreads and file descriptors not being waitable in the same call and the
need for hacks like signalfd), it also accepts basic semaphores as a
waitable object. We might consider some of the other IPC primitives for
the same treatment.

The original request was for a callback mechanism, which I resisted.
But given this mechanism and the existance of the system work queue (which,
to be fair, is a heavyweight feature), that's not hard to implement. So
this supports both styles, but the callback one is optional for apps with
their own main loops that don't want to have a extra 1kb thread stack
sitting around just for event processing.

Semantically, this overlaps in a kind of ugly way with sem_group_take(),
which I see a patch in the queue for. Essentially
zpoll() is a proper superset of that facility. We should probably try to
unify/deprecate such that we can do things in only one way.

Please review. An implementation prototype will arrive promptly.

Andy


========================================================================

typedef void (*zpoll_callback_t)(int result, void *user_data);

enum zpoll_req_type { ZPOLL_REQ_POLL, ZPOLL_REQ_SEM, ZPOLL_REQ_CALLBACK };

struct zpoll_event {
enum zpoll_req_type type : 8;
int signaled : 1; /* Driver/source operation complete */
int active : 1; /* result not yet retrieved/delivered
*/
int result;
union {
struct {
zpoll_callback_t callback;
void *user_data;
} cb;
struct {
struct zpoll_call *poller;
struct k_sem *semaphore;
} polled;
};
};

/**
* @brief Re/initialize zpoll event
*
* For use by drivers who are initiating a new request. Will return
* an error (-EAGAIN) if the event is still in use, which should
* generally be either propagated back to the user code making the
* request or used to initiate some kind of queuing/allocation of a
* new event. Drivers should *not* pend or busy-wait at request time
* to "wait" for this to become available. These are asynchronous
* requests!
*
* @param evt A potentially-in-use zpoll_event structure */ int
zpoll_event_init(struct zpoll_event *evt);

#ifdef ZPOLL_CALLBACKS
/**
* @brief Re/initialize zpoll event for a user-specified callback
*
* As for zpoll_event_init(), but assigns the callback fields where
provided.
*
* @param evt A potentitally-in-use zpoll_event structure
* @param cb A user-provided callback function
* @param user_data A user-provided opaque pointer to be passed as the
* user_data argument to the callback
*/
int zpoll_event_init_cb(struct zpoll_event *evt, zpoll_callback_t cb, void
*user_data); #endif

/**
* @brief Static initializer for a semaphore zpoll_event
*
* Constructs a zpoll_event stucture suitable for passing to zpoll().
*
* @param sem A pointer to a valid k_sem_t.
*/
#define ZPOLL_EVENT_SEMAPHORE(sem) \
{ \
.request_type = ZPOLL_REQ_SEM, \
.signaled = 0, \
.semaphore = (sem), \
}

/**
* @brief Executes the user-provided handler for a complete event
*
* Called by driver code when the request abstracted by this zpoll
* event is complete, either by waking up a pending zpoll() call or
* scheduling a work thread to issue the user-provided callback.
*
* @param evt A zpoll_event structure whose action is complete
* @param result The integer result code of the asynchronous call */ void
zpoll_signal(struct zpoll_event *evt, int result);

void _zpoll_sem_signal(struct k_sem *sem);

/**
* @brief Wait for the completion of any of a number of events
*
* Will pend until any of the zpoll_event structures receives a
* zpoll_signal() call from another context (or, for semaphores, a
* "give" event), or until a the specified timeout occurs.
*
* For convenience, will return one known-complete event pointer. But
* note that completion notifications are "level triggered" (they stay
* complete and will synchronously wake up future zpoll calls until
* their results are retrieved), and that there may be more whose
* state can be tested with zpoll_get_result(). In general most
* applications will simply loop on zpoll() getting one event at a
* time.
*
* @param events An array of zpoll_event structures
* @param nevents The size of the events array
* @param timeout_ticks How long to wait before returning, or
* TICKS_UNLIMITED to wait forever.
*
* @return One complete event, or NULL if the timeout expired.
*/
struct zpoll_event* zpoll(struct zpoll_event **events, int nevents,
int timeout_ticks);

/**
* @brief Retrieve the result code of a complete zpoll_event and mark
* it available
*
* Returns the integer result code of the operation encapsulated by
* the zpoll_event. For driver-managed events, this also marks the
* zpoll_event available for future asynchronous I/O calls and so
* *must* be called to free the resource for future I/O.
*
* Will return -EAGAIN if the event is not yet complete.
*
* @return The result code of the asynchronous operation */ int
zpoll_get_result(struct zpoll_event *e);


Andy Ross
 

Joseph, Jithu wrote:
I think this implies that callback is made from (a new) zpoll
thread,(which will get built into the system) and not from the
context which invokes zpoll_signal().
Right. It's made from the system work queue, the same thread that
handles {nano,k}_work_submit entries. The signal call is typically
going to be made from interrupt context, which is a really poor place
to be issuing arbitrary user callbacks.

If the user app is interested only in a one async callback event ,
For e.g if the user app after making the uart_write_async_cb()
(which internally calls zpoll_event_init_cb()) call, should it still
call zpoll() ( if so will it return immediately - even when the
event has not occurred - so that the main thread can continue) or
can it continue remaining instructions without calling zpoll() ?
If you really are interested in only one event, you probably want to
be using the existing synchronous API. :)

But yes: zpoll() will block. You can specify a timeout to ensure you
come back (or TICKS_NONE -- I just now realized I need to have support
for that, even though in this case it's just a noop), or include a
semaphore as your own "wake up" signal that you can control from
another thread.

If you want to poll for completion for some reason, you can check
zpoll_get_result() from any context.

Andy


Benjamin Walsh <benjamin.walsh@...>
 

On Mon, Sep 19, 2016 at 01:22:14PM -0700, Andy Ross wrote:
So I've been looking at the requirement for an async I/O framework in
ZEP-540, and this is the API I have so far. An implementation is
prototyped as far as working with a minimal test of a faked device,
I'll get that cleaned up and in gerrit for a review within a day. But
here's the API I'm looking at. Please tear it apart as appropriate;
it's an API we'll be stuck with for a while. Stuff to note:

It works a little like Unix select()/poll(), in that the caller gets
handles to a bunch of "zpoll_event" structures which are 1:1 with
async requests made (e.g. they're a return value from a putative
uart_write_async() call), and can wait on the full set of them, waking
up with any one of them has a completion event to signal.

Driver's are responsible for managing one of these events per
simultaneous outstanding request they want to handle (in mose cases
that number of requests will be one), and calling a straightforward
signal() function when the request is complete (or in an error state).

Because mixing blocking styles is madness (c.f. the historical
disaster of pthreads and file descriptors not being waitable in the
same call and the need for hacks like signalfd), it also accepts basic
semaphores as a waitable object. We might consider some of the other
IPC primitives for the same treatment.

The original request was for a callback mechanism, which I resisted.
But given this mechanism and the existance of the system work queue
(which, to be fair, is a heavyweight feature), that's not hard to
implement. So this supports both styles, but the callback one is
optional for apps with their own main loops that don't want to have a
extra 1kb thread stack sitting around just for event processing.

Semantically, this overlaps in a kind of ugly way with
sem_group_take(), which I see a patch in the queue for. Essentially
zpoll() is a proper superset of that facility. We should probably try
to unify/deprecate such that we can do things in only one way.

Please review. An implementation prototype will arrive promptly.
OK, this is _very_ interesting. We were thinking of actuatlly
implementing a 'wait on multiple objects' in the kernel rather than just
this 'semaphore group' we currently have (for backwards compatibility
for now anyway).

I'm really eager to see the prototype.

A couple of nitpicks:

- I don't think the documentation should focus on drivers. The is usable
by applications as well.

- This should be a native kernel feature. It should probably be called
k_poll rather than zpoll.

More comments below.


Andy


========================================================================

typedef void (*zpoll_callback_t)(int result, void *user_data);

enum zpoll_req_type { ZPOLL_REQ_POLL, ZPOLL_REQ_SEM, ZPOLL_REQ_CALLBACK };

struct zpoll_event {
enum zpoll_req_type type : 8;
int signaled : 1; /* Driver/source operation complete */
int active : 1; /* result not yet retrieved/delivered */
int result;
union {
struct {
zpoll_callback_t callback;
void *user_data;
Shouldn't we have a pointer to a workqueue, so that the work could be
scheduled on another workqueue than the system one if desired ?

} cb;
struct {
Shouldn't this be a union since one zpoll event instance should be for
one object ?

struct zpoll_call *poller;
Where is zpoll_call defined ? Isn't that simply a k_tid_t ?

struct k_sem *semaphore;
} polled;
};
};

/**
* @brief Re/initialize zpoll event
*
* For use by drivers who are initiating a new request. Will return
* an error (-EAGAIN) if the event is still in use, which should
* generally be either propagated back to the user code making the
* request or used to initiate some kind of queuing/allocation of a
* new event. Drivers should *not* pend or busy-wait at request time
* to "wait" for this to become available. These are asynchronous
* requests!
*
* @param evt A potentially-in-use zpoll_event structure
*/
int zpoll_event_init(struct zpoll_event *evt);

#ifdef ZPOLL_CALLBACKS
/**
* @brief Re/initialize zpoll event for a user-specified callback
*
* As for zpoll_event_init(), but assigns the callback fields where provided.
*
* @param evt A potentitally-in-use zpoll_event structure
* @param cb A user-provided callback function
* @param user_data A user-provided opaque pointer to be passed as the
* user_data argument to the callback
*/
int zpoll_event_init_cb(struct zpoll_event *evt, zpoll_callback_t cb, void *user_data);
#endif

/**
* @brief Static initializer for a semaphore zpoll_event
*
* Constructs a zpoll_event stucture suitable for passing to zpoll().
*
* @param sem A pointer to a valid k_sem_t.
*/
#define ZPOLL_EVENT_SEMAPHORE(sem) \
{ \
.request_type = ZPOLL_REQ_SEM, \
.signaled = 0, \
.semaphore = (sem), \
}

/**
* @brief Executes the user-provided handler for a complete event
*
* Called by driver code when the request abstracted by this zpoll
* event is complete, either by waking up a pending zpoll() call or
* scheduling a work thread to issue the user-provided callback.
*
* @param evt A zpoll_event structure whose action is complete
* @param result The integer result code of the asynchronous call
*/
void zpoll_signal(struct zpoll_event *evt, int result);

void _zpoll_sem_signal(struct k_sem *sem);
Do we need this ? Shouldn't this be the job of k_sem_give() to handle
waking up a thread from its polling ? I suppose that you do not have to
give a semaphore knowing that it is waiting on by a thread using zpoll,
right ?

Or is this a hook to be called from k_sem_give(). It might be, since it
is named like an internal kernel function.

Hard to tell without documentation.

Regards,
Ben


/**
* @brief Wait for the completion of any of a number of events
*
* Will pend until any of the zpoll_event structures receives a
* zpoll_signal() call from another context (or, for semaphores, a
* "give" event), or until a the specified timeout occurs.
*
* For convenience, will return one known-complete event pointer. But
* note that completion notifications are "level triggered" (they stay
* complete and will synchronously wake up future zpoll calls until
* their results are retrieved), and that there may be more whose
* state can be tested with zpoll_get_result(). In general most
* applications will simply loop on zpoll() getting one event at a
* time.
*
* @param events An array of zpoll_event structures
* @param nevents The size of the events array
* @param timeout_ticks How long to wait before returning, or
* TICKS_UNLIMITED to wait forever.
*
* @return One complete event, or NULL if the timeout expired.
*/
struct zpoll_event* zpoll(struct zpoll_event **events, int nevents,
int timeout_ticks);

/**
* @brief Retrieve the result code of a complete zpoll_event and mark
* it available
*
* Returns the integer result code of the operation encapsulated by
* the zpoll_event. For driver-managed events, this also marks the
* zpoll_event available for future asynchronous I/O calls and so
* *must* be called to free the resource for future I/O.
*
* Will return -EAGAIN if the event is not yet complete.
*
* @return The result code of the asynchronous operation
*/
int zpoll_get_result(struct zpoll_event *e);


Andy Ross
 

Benjamin Walsh wrote:
- This should be a native kernel feature. It should probably be called
k_poll rather than zpoll.
Will do. I'm calling it zpoll only because the semantics aren't
exactly equivalent to unix poll and I didn't want to address the
confusion. But that naming is probably better.

Shouldn't we have a pointer to a workqueue, so that the work could
be scheduled on another workqueue than the system one if desired ?
That seems reasonable. Though currently I'm envisioning the work
queue as purely a convenience feature above the lower level API. It's
there to issue the callbacks and that's it. Someone who wants finer
control over what happens when should be making the poll call directly
IMHO.

Shouldn't this be a union since one zpoll event instance should be
for one object ?

[...]

Where is zpoll_call defined ? Isn't that simply a k_tid_t ?
One zpoll_event struct goes with one request, and a bundle of them
with (zero or) one waiting thread blocked under zpoll(). That's what
the zpoll_call struct is, it's a private record stored on the stack of
the pended zpoll call. Basically it's the arguments to the function
and a list link, so the other threads can enumerate the pended threads
and events to figure out which ones are complete and who to wake up.


void _zpoll_sem_signal(struct k_sem *sem);
Do we need this ? Shouldn't this be the job of k_sem_give() to handle
waking up a thread from its polling ?
It is. This is the utility code that does that. It's hooked out of
k_sem_give(), checks one bit (which I stole from the count/limit) to
see if there's a blocked zpoll thread, then hands control to this
thing to wake it up. It's not strictly part of the API and got left
here because it's in the same header block I pasted in. :)

Andy


Andy Ross
 

As promised, the prototype is up for review here:

https://gerrit.zephyrproject.org/r/4871 zconfig.h: Add DEFINED() macro for expresion-legal ifdef-checking
https://gerrit.zephyrproject.org/r/4872 DO NOT MERGE: zpoll: asynchronous I/O framework

Note that it's a few patches behind. The semaphore group patch merged
while this was sitting on my box and they collide a bit. I'll get
that sorted out. Most of Ben's comments are still unaddressed, in
particular naming.

Other notes, just to preempt criticism:

+ There still aren't any real drivers ported to the new scheme, so
usage is a little academic right now. Priority 1 for me is getting
uart_qmsi working with this so I can push a test that makes sense.
Most drivers should port cleanly I think; the existing
driver_sync_call_t turns out to be a mostly perfect fit to be
reimplemented in terms of zpoll.

+ There is no true queueing for complete requests, so the actual
implementation of zpoll() itself is based on iterating over the set
of events. That was done originally to save having to bake a list
pointer into the event structure, but now that callbacks are in this
is unavoidable anyway. I'll get that fixed.

+ The pend/unpend mechanism was copied directly out of the semaphore
implementation so I didn't have to figure out exactly how it
works. :) So there's a wait_q in the call record which is needless,
as there can only ever be one thread pended.

Andy