John's Observation Loft: Get to know the Random Query Generator

Thursday, February 14, 2013

Get to know the Random Query Generator

In the MySQL QA teams in Oracle we have been using a tool called the Random Query Generator (or "RQG" for short) for some time now. The main RQG testing effort has been on new server development, including regression testing. The recent MySQL 5.6 GA is a result of a couple of years of hard work, including a lot of RQG testing and bugfixing as a result of that.

I can easily say that the RQG has helped making MySQL 5.6 a better release than it would otherwise be. It is of course not our only testing tool, but there are plenty of bugs this tool has uncovered that would likely not have been seen in our other testing. Such issues include:

crashes and asserts
memory management bugs (valgrind errors)
wrong results

The main clue to the power of the RQG is its ability to automatically generate SQL queries based on some grammar file. This makes it possible to produce SQL statements that you would not have been able to cook up manually even if you tried for years.

I plan to write a series of blog posts which go into some more details, including specific techniques and use of the tool. In the mean time, if you are interested I recommend that you spend some time reading more at the RQG's home at Launchpad, or reading some getting started documentation on Github.

If you just want to get started right away, this is what you need:

Perl (version 5.10 or newer is recommended)
A MySQL Server installation (other databases are also supported to some degree, including Java DB, but MySQL has had the main focus)
Some Perl modules:

DBI
DBD::mysql
Digest::MD5

Note: For more than running just simple test runs, you need a few other Perl modules as well, see docs for details. Note that it pays off having a relatively recent version of Perl, as more modules are part of core Perl then, meaning you won't have to install that many yourself.
The Bazaar version control system (to get the most recent RQG code).

Once you think you have everything set up, branch the RQG code from launchpad, like this:

bzr branch lp:randgen

Then take the RQG for a test drive like this (unix style):

cd randgen

perl runall-new.pl \
  --grammar=conf/examples/example.yy \
  --queries=1000 \
  --threads=1 \
  --basedir=/path/to/your/mysql/installation \
  --vardir=$PWD/vardir

The test run should take less than a minute in total. You should see in the output that RQG starts a server, creates a test database, starts some validator (more on that later), starts running queries (the actual queries are not printed in this case), and at the end shuts down the server and reports the end result (should be STATUS_OK).

...
# 2013-02-14T08:18:57 Test completed successfully.
# 2013-02-14T08:18:57 GenTest exited with exit status STATUS_OK (0)
# 2013-02-14T08:18:57 Stopping server on port 10630
# 2013-02-14T08:18:57 [14332] runall-new.pl will exit with exit status STATUS_OK (0)

If you want to see what kind of queries the RQG actually produced, you can check out the file vardir/mysql.log (since the example grammar is relatively simple the SQL statements will be simple too).

6 Query     UPDATE `A` SET `pk` = 0 WHERE `col_varchar_key` < 9 LIMIT 6
6 Query     DELETE FROM `AA` WHERE `col_varchar_key` = 0 LIMIT 4
6 Query     UPDATE `C` SET `col_varchar_key` = 5 WHERE `col_varchar_key` < 8 LIMIT 0

You can also add the option --sqltrace=MarkErrors to the command line in order to make the RQG print all generated queries to standard output. This is what --sqltrace does, while the MarkErrors setting ensures that any failed statements (i.e. illegal statements that are produced from the grammar but refused by MySQL) are marked (prefixed) with "# [sqltrace] ERROR <errno>:". Example:

INSERT INTO `BB` ( `col_datetime_nokey` ) VALUES ( 1 );
# [sqltrace] ERROR 1062: UPDATE `DD` SET `pk` = 7 WHERE `col_int_key` < 7 LIMIT 4;

Error 1062 means "Duplicate entry for key", and is a perfectly valid error in this case.

Feel free to experiment, study the code, and try different settings. I hope to present some more specific details in this blog later.

18 comments:

AnonymousMay 6, 2013 at 7:50 PM
Any follow-ups coming? I'm new to RQG aqnd looking for anything to help me get started.
ReplyDelete
Replies
UnknownJune 23, 2013 at 3:50 PM
Very good for starter. Can I use RQG for my own data schema?
ReplyDelete
Replies
ShivMarch 25, 2014 at 11:47 AM
Hi,
I used the RQG for some extent after installing it on my PC.
It generated 1000 queries, But not all queries are proper.

I tried to get a inner join query as below,
Say I have 2 tables t1 with 3 columns x,y,z and T2 with 3 columns a, b, c
But the queries generated don't map the table column with it own columns.
For Ex: It says T1.a=T2.x in where clause which is ambiguous to SQL engine!
So this is just a random combination with existing table and all columns.

I tried invariant too but no use. Any other way out to get proper SQL query from RQG?
Or this work like this only?

Since I am looking for creating complex queries say multilevel join, from TPCH tables, it may not help.
Since TPCH has different data type columns and if it map randomly all column with all table it may not help me to get multi level join proper queries.

The used grammar file looks like below,
select:
SELECT _table[invariant]._field
FROM _table[invariant] AS X
INNER JOIN _table AS Y
ON _table[invariant]._field = Y._field ;

ReplyDelete
Replies
JohnMarch 25, 2014 at 3:46 PM
Hi Shiv,

Good to see that you are trying out the RQG. Grammar development is a big and potentially complex topic, for which I may not always have the answers. I'm glad to give beginners some pointers, though.

You are on to something when you are using aliases for your tables. The trick that is often used for JOINs is to use aliases to hide the real table names, and thus avoid the problem that the table name may be different each time you use the _table rule. This is mentioned in the docs, but it could perhaps be explained a bit better.

Your issue seems to be that you use columns in the SELECT list (and elsewhere) from tables that are not in the FROM list. Once you assign an alias to a table, you must use the alias name when you refer to the same table in the SELECT list.

Example:

mysql> CREATE TABLE t1 (a INT, b INT);
Query OK, 0 rows affected (0.66 sec)

mysql> SELECT t1.a FROM t1 AS mytable;
ERROR 1054 (42S22): Unknown column 't1.a' in 'field list'
mysql> SELECT mytable.a FROM t1 AS mytable;
Empty set (0.00 sec)

The _table[invariant] trick would probably work if you avoid assigning the X alias to the first table. But I think using aliases for both tables is better, like this:

select:
SELECT X._field
FROM _table AS X
INNER JOIN _table AS Y
ON X._field = Y._field ;

ReplyDelete
Replies
ShivMarch 27, 2014 at 11:53 AM
Hi John,
Thanks for the reply.
I tried many grammar options to get table to column mapping to get correct, but no success in that!

The 2 table is as below,
mysql> explain t1;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| x | int(11) | YES | | NULL | |
| y | int(11) | YES | | NULL | |
| z | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
3 rows in set (0.11 sec)

mysql> explain t2;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| a | int(11) | YES | | NULL | |
| b | int(11) | YES | | NULL | |
| c | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
3 rows in set (0.08 sec)

mysql>

The grammar as you said is below,
select:
SELECT X._field
FROM _table AS X
INNER JOIN _table AS Y
ON X._field = Y._field ;

The output queries as below of no use!
SELECT X.`y` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`b` = Y.`b`;
SELECT X.`y` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`a` = Y.`b`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`b` = Y.`b`;
SELECT X.`y` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`a` = Y.`b`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`x` = Y.`y`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`b` = Y.`c`;
SELECT X.`x` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`z`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`x` = Y.`z`;
SELECT X.`y` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`z`;
SELECT X.`x` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`y` = Y.`x`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`a` = Y.`c`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`x` = Y.`x`;
SELECT X.`x` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`c` = Y.`c`;
SELECT X.`x` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`b` = Y.`b`;
SELECT X.`x` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`y` = Y.`x`;
SELECT X.`y` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`y`;
SELECT X.`x` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`a` = Y.`c`;
SELECT X.`y` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`y` = Y.`x`;
SELECT X.`y` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`x`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`a` = Y.`b`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`c` = Y.`a`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`x` = Y.`x`;
SELECT X.`x` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`y`;
SELECT X.`z` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`b` = Y.`c`;
SELECT X.`x` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`b` = Y.`a`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`y` = Y.`y`;
SELECT X.`z` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`c` = Y.`c`;
SELECT X.`x` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`y`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`y` = Y.`x`;
SELECT X.`x` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`z`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`c` = Y.`c`;
SELECT X.`z` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`x`;
SELECT X.`y` FROM `t1` AS X INNER JOIN `t2` AS Y ON X.`a` = Y.`c`;
SELECT X.`x` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`x` = Y.`x`;
SELECT X.`z` FROM `t1` AS X INNER JOIN `t1` AS Y ON X.`x` = Y.`y`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t2` AS Y ON X.`c` = Y.`b`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`z`;
SELECT X.`y` FROM `t2` AS X INNER JOIN `t1` AS Y ON X.`z` = Y.`z`;

So not sure now how to go further in this?

Do you have any pointer who can give proper ans for this problem?

Regards,
Shiv
ReplyDelete
Replies
ShivMarch 28, 2014 at 6:02 AM
Hi John,
Yes, you are right. I meant same by no use, as logically generated queries are wrong.

As you said I tried renaming the column to same name. But it only work when we have same data type for all column. And also multiple tables should have same number of columns!

Basically I started this project to get complex multi level join queries from TPCH tables.
Let me spend some more time on this to crack the grammar issues.
Yes, will join the launchpad/randgen group as you said..to put my queries there.

Thanks....

Regards,
Shiv
ReplyDelete
Replies
UnknownJuly 16, 2014 at 10:43 AM
Hi!

It's a really good thing I found your blog, since there are so few blogs out there about beginner tutorials on RQG.
I tried the same commands on your post, but all I am getting is, "Segmentation fault (core dumped)" error.

Here is the exact command I used:
$ perl runall-new.pl --grammar=conf/examples/example.yy --queries=1000 --threads=1 --basedir=C:/Program Files/MySQL/MySQL Server 5.6 --vardir=$PWD/vardir
Segmentation fault (core dumped)

It would be great if you could let me know what I'm doing wrong. Is my MySQL path wrong?

Thank you so much!
Cherry
ReplyDelete
Replies
JohnAugust 1, 2014 at 1:23 PM
Hi Charissa,

First, I am sorry it took a while to moderate and reply to your post - I have been offline for a while for various reasons.

Thank you for reading my post. I'm sorry it does not work as well for you as you had hoped.

The example command I used was for unix/linux. The RQG works best on this platform. If you are using the RQG on Windows, you are to a larger degree "on your own", and will likely need to deal with various quirks every now and then.

First off, using "--vardir=$PWD/vardir" will not work as intended on Windows. The $VARIABLE syntax is valid in the unix shell, but on Windows the corresponding syntax is %VARIABLE%. Besides, the $PWD environment variable on unix points to the current working directory. On Windows, the equivalent seems to be %CD%. Try making these adjustments, and see if it helps.

Secondly, it has been long since I last tried to run the RQG on Windows, so I am not sure how well spaces in paths in options are handled. If it still fails for you, you may want to try to put a copy of MySQL 5.6 in a folder with no paths, or surround the --basedir value with quotes.

Last, but not least, the main reason I do not run the RQG directly on Windows anymore is that it has proven over time to be quite error prone. In my team we have seen mysterious issues, kernel panic in Perl, problems with concurrency and thread/process handling, etc on Windows. There are at least two well known Perl implementations on Windows, and neither of them work well with such advanced/heavy usage as what the RQG does. Perl is unfortunately not a platform independent scripting language.

Instead, when we want to test MySQL on Windows using the RQG, we run the MySQL server on windows, and the RQG as a client on Unix/Linux. We use another framework called JET (https://kenai.com/projects/jet/pages/Home), written in JAVA, to facilitate that. However, I generally recommend RQG beginners to stick to Linux, to avoid potential pitfalls and to be able to get started quickly.
ReplyDelete
Replies
WeiMarch 21, 2015 at 12:47 AM
Hi John, this tool is definitely a must-have for testing database development. I'm wondering if this can be used as a general purpose tool to test against other database engines other than MySQL. If so, could you point out which components need to be modified? I'm sure the syntax part is one of them. Anywhere else? Thanks!
ReplyDelete
Replies
Sona KurazyanDecember 21, 2015 at 4:03 PM
Hi!

Thank you for your article, it was very helpful!
I was wondering if there is any way to reproduce failures?
ReplyDelete
Replies
सौरभMay 8, 2017 at 8:17 AM
Hi, great article ! Is there a way to just generate the queries and not run it ?
ReplyDelete
Replies
सौरभMay 11, 2017 at 10:02 AM
Thanks you for the great article ! Is there a way to just produce queries and not run it ? I can see that gensql.pl can be used, but can it be done with runall.pl ?
ReplyDelete
Replies

Add comment