RFC: extend sanitycheck testcase filtering expressiveness


Boie, Andrew P
 

Problem statement:

Test case filtering in sanitycheck's testcase.ini files allows
sanitycheck to exclude certain test cases that are incompatible with
particular architectures, boards, or the presence or absence of Kconfig
options.

The goal of this mechanism is to allow the testcases to scale up to many
different boards without having to do constant gardening of the testcase
configuration files. I imagine a future where Zephyr supports dozens, if
not hundreds of microcontrollers and we want to ensure that we have good
test coverage on all these boards.

This is currently done in testcase.ini with the following directives:

arch_whitelist = <list of arches, such as x86, arm, arc>
Set of architectures that this test case should only be run for.

arch_exclude = <list of arches, such as x86, arm, arc>
Set of architectures that this test case should not run on.

platform_whitelist = <list of platforms>
Set of platforms that this test case should only be run for.

platform_exclude = <list of platforms>
Set of platforms that this test case should not run on.

config_whitelist = <list of config options>
Config options can either be config names like CONFIG_FOO which
match if the configuration is defined to any value, or key/value
pairs like CONFIG_FOO=bar which match if it is set to a specific
value. May prepend a '!' to invert the match.

This current scheme is less than ideal for several reasons, enumerated
below:

1. config_whitelist is very limited in expressiveness.

1a. You cannot do numerical comparisons, for example I can't currently
filter out a large footprint test by saying that CONFIG_RAM_SIZE
must be equal to or greater than some value. You can only test for
equality or if the config option is defined. This was somewhat
vexing when trying to add the new nucleo_f103rb board to sanity
checks as it has very little RAM.

1b. All the items in config_whitelist are ANDed together. You cannot
specify boolean OR relationship, do any kind of grouping with
parenthesis, or use NOT except on individual items. You can only
do very simple stuff.

1c. There is no way to tie config_whitelist in with the current arch
or platform. For example, test_tickless is currently improperly
specified. The true semantics of the test is that it works on
all x86, is unimplemented on ARC, and it supports two different
SOC types on ARM: CONFIG_SOC_ATMEL_SAM3, CONFIG_SOC_FSL_FRDM_K64F

Currently we have in its testcase.ini:

[test]
tags = core
config_whitelist = !CONFIG_SOC_TI_LM3S6965_QEMU

This currently only works for a few reasons that do not scale.

* The test is automatically excluded on ARC because of no
microkernel support (which may change in the future)
* This test runs on all x86, because currently we exclude
targets that have a particular SOC, which is a bad way to
do it as you will need to keep adding more ARM SOCs to the
list as they are introduced to the tree, such as
SOC_STM32F103RB. If I instead expressed all the CONFIG_SOC_*
on the ARM side that the test *is* compatible with, there's
no way to tell it that it's also good to run on all X86.

What we should be saying is something more like, "run on all
x86 boards, or those ARM boards which have a particular SOC
supported in the code". In the language developed for this
proposal, we would have:

filter = ARCH == "x86" or (ARCH == "arm" and
(CONFIG_SOC_FSL_FRDM_K64F or CONFIG_SOC_ATMEL_SAM3))

2. platform_whitelist or platform_exclude is the *worst* way to filter
testcases, as it simply doesn't scale. You have to keep adding new
boards to these lists as they are introduced to the Zephyr tree, which
is not fun to manage in a hypothetical optimistic future where Zephyr
takes off and we have dozens or even hundreds of supported boards in the
tree.

If possible, it's much better to filter on board *features* through
config_whitelist, as these typically do not require more changes to the
testcase.ini as more boards are added.

For example, we have a lot of bluetooth tests which have a
platform_whitelist with "arduino_101". This means the test will only run
on that board. It may be better to indicate that the test can run on any
Quark SE board with compatible Bluetooth hardware; on cursory inspection
these tests seem to be specific to the bluetooth chip in use and not
even the SOC.



Proposed solution:

We need a more expressive language for filtering test cases. I propose a
simple boolean expression language with the following grammar:

expression ::= expression "and" expression
| expression "or" expression
| "not" expression
| "(" expression ")"
| symbol "==" constant
| symbol "!=" constant
| symbol "<" number
| symbol ">" number
| symbol ">=" number
| symbol "<=" number
| symbol "in" list
| symbol

list ::= "[" list_contents "]"

list_contents ::= constant
| list_contents "," constant

constant ::= number
| string

When symbols are encountered, they are looked up in an environment
dictionary. In sanitycheck the environment will initially consist of:

{
ARCH : <architecture>,
PLATFORM : <platform>,
<all CONFIG_* key/value pairs in the test's generated defconfig>
}

We can later augment this with additional interesting metadata if we
want. For example I was thinking of ways we could integrate some
footprint information for large tests.

For the case where

expression ::= symbol

it evaluates to true if the symbol is defined to a non-empty string.

For all comparison operators, if the config symbol is undefined, it will
be treated as a 0 (for > < >= <=) or an empty string "" (for == != in).
For numerical comparisons it doesn't matter if the environment stores
the value as an integer or string, it will be cast appropriately.

Operator precedence, starting from lowest to highest:

or (left associative)
and (left associative)
not (right associative)
all comparison operators (non-associative)

In testcase.ini the 'config_whitelist' directive will be removed and
replaced with 'filter' directives containing one of these expressions.
arch_whitelist, arch_exclude, platform_whitelist, platform_exclude
are all syntactic sugar for these expressions. For instance

arch_exclude = x86 arc

Is the same as:

filter = not ARCH in ["x86", "arc"]


Implementation details:

Writing a parser by hand is a pain and prone to bugs. The best way to do
a language like this is to use a LR parser generator like lex/yacc.
There exists an open source library called PLY which does lex/yacc for
Python, it has been around for a long time and works very well:

http://www.dabeaz.com/ply/index.html

The sample implementation I have drops a copy of PLY directly in the
Zephyr source tree since its license is compatible and this will result
in the least pain for users as they won't have to do anything. If this
is unacceptable we can either find a way to stick it in the SDK, or add
it to the list of workstation dependencies (either have the user install
with pip or their distribution's package manager. Fedora appears to have
PLY installed by default).

Except for the case mentioned above with test_tickless, I haven't yet
gone through all the testcase.ini to ensure that the filtering done is
optimal. However I do have an implementation of the language which can
be looked at below:

Add PLY sources to tree
https://gerrit.zephyrproject.org/r/1075

Implement expression parser
https://gerrit.zephyrproject.org/r/1076

Integrate expression parser into sanitycheck
https://gerrit.zephyrproject.org/r/1077

Fix test_tickless filtering expression
https://gerrit.zephyrproject.org/r/1078

In addition to sanitycheck it is hoped that the expression language
could be generally useful for other scripts which need to do yes/no
filtering based on values in an environment.

--
Andrew Boie
Staff Engineer - EOS Zephyr
Intel Open Source Technology Center


Dmitriy Korovkin
 

On Fri, 25 Mar 2016, Boie, Andrew P wrote:

Proposed solution:

We need a more expressive language for filtering test cases. I propose a
simple boolean expression language with the following grammar:

expression ::= expression "and" expression
| expression "or" expression
| "not" expression
| "(" expression ")"
| symbol "==" constant
| symbol "!=" constant
| symbol "<" number
| symbol ">" number
| symbol ">=" number
| symbol "<=" number
| symbol "in" list
| symbol

list ::= "[" list_contents "]"

list_contents ::= constant
| list_contents "," constant

constant ::= number
| string

When symbols are encountered, they are looked up in an environment
dictionary. In sanitycheck the environment will initially consist of:

{
ARCH : <architecture>,
PLATFORM : <platform>,
<all CONFIG_* key/value pairs in the test's generated defconfig>
}

We can later augment this with additional interesting metadata if we
want. For example I was thinking of ways we could integrate some
footprint information for large tests.

For the case where

expression ::= symbol

it evaluates to true if the symbol is defined to a non-empty string.

For all comparison operators, if the config symbol is undefined, it will
be treated as a 0 (for > < >= <=) or an empty string "" (for == != in).
For numerical comparisons it doesn't matter if the environment stores
the value as an integer or string, it will be cast appropriately.

Operator precedence, starting from lowest to highest:

or (left associative)
and (left associative)
not (right associative)
all comparison operators (non-associative)

In testcase.ini the 'config_whitelist' directive will be removed and
replaced with 'filter' directives containing one of these expressions.
arch_whitelist, arch_exclude, platform_whitelist, platform_exclude
are all syntactic sugar for these expressions. For instance

arch_exclude = x86 arc

Is the same as:

filter = not ARCH in ["x86", "arc"]
I see nothing bad in the proposal.

Implementation details:

Writing a parser by hand is a pain and prone to bugs. The best way to do
a language like this is to use a LR parser generator like lex/yacc.
There exists an open source library called PLY which does lex/yacc for
Python, it has been around for a long time and works very well:

http://www.dabeaz.com/ply/index.html

The sample implementation I have drops a copy of PLY directly in the
Zephyr source tree since its license is compatible and this will result
in the least pain for users as they won't have to do anything. If this
is unacceptable we can either find a way to stick it in the SDK, or add
it to the list of workstation dependencies (either have the user install
with pip or their distribution's package manager. Fedora appears to have
PLY installed by default).
I would refrain from copying PLY into the source tree for the following
reasons:
- Support. Updating the library, which is not the part of the
project may be a headache.
- Licensing. There may be issues with distributing with Zephyr the code
that we have not developed.
- Importing. In case of side projects that, for instance, run all
sanity_chk projects on real hardware, using sanity_chk combined with
additional library may create problems.
- Distributions. Ubuntu has PLY 3.4 as a part of the distribution. As you
have mentioned, Fedora has it as well.
Regards,
/Dmitriy


Boie, Andrew P
 

On Mon, 2016-03-28 at 16:23 -0400, Dmitriy Korovkin wrote:
I would refrain from copying PLY into the source tree for the
following 
reasons:
- Support. Updating the library, which is not the part of the
   project may be a headache.
- Licensing. There may be issues with distributing with Zephyr the
code
   that we have not developed.
- Importing. In case of side projects that, for instance, run all
   sanity_chk projects on real hardware, using sanity_chk combined
with
   additional library may create problems.
- Distributions. Ubuntu has PLY 3.4 as a part of the distribution. As
you
   have mentioned, Fedora has it as well.
Yeah I'll take it out of the patch series.
I've asked Juro to look into including it into the Zephyr SDK, which is
where we typically put our dependent packages. If that doesn't work out
we would just have to add 'sudo dnf install python3-ply' (or
equivalent) to the workstation setup.

Andrew


Nashif, Anas
 

On 28 Mar 2016, at 17:04, Boie, Andrew P <andrew.p.boie(a)intel.com> wrote:

On Mon, 2016-03-28 at 16:23 -0400, Dmitriy Korovkin wrote:
I would refrain from copying PLY into the source tree for the
following
reasons:
- Support. Updating the library, which is not the part of the
project may be a headache.
- Licensing. There may be issues with distributing with Zephyr the
code
that we have not developed.
- Importing. In case of side projects that, for instance, run all
sanity_chk projects on real hardware, using sanity_chk combined
with
additional library may create problems.
- Distributions. Ubuntu has PLY 3.4 as a part of the distribution. As
you
have mentioned, Fedora has it as well.
Yeah I'll take it out of the patch series.
I've asked Juro to look into including it into the Zephyr SDK, which is
where we typically put our dependent packages. If that doesn't work out
we would just have to add 'sudo dnf install python3-ply' (or
equivalent) to the workstation setup.
Beside the issue above, the proposal looks good to me and will definitely improve and expand the coverage.
Lets find a way to make this work without adding the parser into the tree please.

Thanks,

Anas


Andrew


Wang, Jing J
 

QA test suite actually has similar requirement for test case filtering by expression. We took a lightweight approach, directly use "python expression" instead of creating a new one. The benefit of "python expression" is, the expression can be parsed by python eval() function directly. Thus, below filter expression is naturally supported with a small code snippets.
@tag_exp('soc not in ["quark_se", "quark_d2000"]')
def test_xxxx(self):

Rgds
jwang

-----Original Message-----
From: Dmitriy Korovkin [mailto:dmitriy.korovkin(a)windriver.com]
Sent: Tuesday, March 29, 2016 4:23 AM
To: Boie, Andrew P <andrew.p.boie(a)intel.com>
Cc: devel(a)lists.zephyrproject.org
Subject: [devel] Re: RFC: extend sanitycheck testcase filtering expressiveness

On Fri, 25 Mar 2016, Boie, Andrew P wrote:

Proposed solution:

We need a more expressive language for filtering test cases. I propose
a simple boolean expression language with the following grammar:

expression ::= expression "and" expression
| expression "or" expression
| "not" expression
| "(" expression ")"
| symbol "==" constant
| symbol "!=" constant
| symbol "<" number
| symbol ">" number
| symbol ">=" number
| symbol "<=" number
| symbol "in" list
| symbol

list ::= "[" list_contents "]"

list_contents ::= constant
| list_contents "," constant

constant ::= number
| string

When symbols are encountered, they are looked up in an environment
dictionary. In sanitycheck the environment will initially consist of:

{
ARCH : <architecture>,
PLATFORM : <platform>,
<all CONFIG_* key/value pairs in the test's generated defconfig>
}

We can later augment this with additional interesting metadata if we
want. For example I was thinking of ways we could integrate some
footprint information for large tests.

For the case where

expression ::= symbol

it evaluates to true if the symbol is defined to a non-empty string.

For all comparison operators, if the config symbol is undefined, it
will be treated as a 0 (for > < >= <=) or an empty string "" (for == != in).
For numerical comparisons it doesn't matter if the environment stores
the value as an integer or string, it will be cast appropriately.

Operator precedence, starting from lowest to highest:

or (left associative)
and (left associative)
not (right associative)
all comparison operators (non-associative)

In testcase.ini the 'config_whitelist' directive will be removed and
replaced with 'filter' directives containing one of these expressions.
arch_whitelist, arch_exclude, platform_whitelist, platform_exclude are
all syntactic sugar for these expressions. For instance

arch_exclude = x86 arc

Is the same as:

filter = not ARCH in ["x86", "arc"]
I see nothing bad in the proposal.

Implementation details:

Writing a parser by hand is a pain and prone to bugs. The best way to
do a language like this is to use a LR parser generator like lex/yacc.
There exists an open source library called PLY which does lex/yacc for
Python, it has been around for a long time and works very well:

http://www.dabeaz.com/ply/index.html

The sample implementation I have drops a copy of PLY directly in the
Zephyr source tree since its license is compatible and this will
result in the least pain for users as they won't have to do anything.
If this is unacceptable we can either find a way to stick it in the
SDK, or add it to the list of workstation dependencies (either have
the user install with pip or their distribution's package manager.
Fedora appears to have PLY installed by default).
I would refrain from copying PLY into the source tree for the following
reasons:
- Support. Updating the library, which is not the part of the
project may be a headache.
- Licensing. There may be issues with distributing with Zephyr the code
that we have not developed.
- Importing. In case of side projects that, for instance, run all
sanity_chk projects on real hardware, using sanity_chk combined with
additional library may create problems.
- Distributions. Ubuntu has PLY 3.4 as a part of the distribution. As you
have mentioned, Fedora has it as well.
Regards,
/Dmitriy


Boie, Andrew P
 

QA test suite actually has similar requirement for test case filtering by
expression. We took a lightweight approach, directly use "python
expression" instead of creating a new one. The benefit of "python
expression" is, the expression can be parsed by python eval() function
directly. Thus, below filter expression is naturally supported with a small code
snippets.
@tag_exp('soc not in ["quark_se", "quark_d2000"]')
def test_xxxx(self):
I briefly considered this approach but after doing some reading concluded it was unsafe.

http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

Andrew


Wang, Jing J
 

Andrew,
Eval is widely used in many projects, e.g. yocto. Its safety depends on what area you apply it. To build up a public server which allow any user input, maybe it is unsafe. For testcase filtering, all input is controlled by TC owner. In this usage, I think eval is appropriate, and save quite a lot efforts. On the other hand, are you sure introducing another parser or develop a new parser is safer than eval?

Rgds
jwang

-----Original Message-----
From: Boie, Andrew P
Sent: Wednesday, March 30, 2016 11:37 PM
To: Wang, Jing J <jing.j.wang(a)intel.com>; Korovkin, Dmitry (Wind River) <dmitriy.korovkin(a)windriver.com>
Cc: devel(a)lists.zephyrproject.org
Subject: RE: [devel] Re: RFC: extend sanitycheck testcase filtering expressiveness

QA test suite actually has similar requirement for test case filtering
by expression. We took a lightweight approach, directly use "python
expression" instead of creating a new one. The benefit of "python
expression" is, the expression can be parsed by python eval() function
directly. Thus, below filter expression is naturally supported with a
small code snippets.
@tag_exp('soc not in ["quark_se", "quark_d2000"]')
def test_xxxx(self):
I briefly considered this approach but after doing some reading concluded it was unsafe.

http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

Andrew