2022-12-29

I want to wish you a happy new year with MariaDB 11.0!

MariaDB 11.0-Alpha preview  is now available for download!

MariaDB 11.0 is in essence a "new optimizer cost model" release.
There are also some other features , but I will concentrate this blog post on the 11.0 optimizer changes.

Why/when is MariaDB 11.0 of interest to you?

MariaDB 11.0 should be of interest to you if any of the following scenarios match your typical queries:

Your queries are

  • using more than two tables
  • using indexes that have lot of identical values.
  • using ranges that cover more than 10% of a table.
  •    WHERE key between 1 and 1000  -- Table has values 1-2000
  • using complex queries where not all used columns are or can be indexed.
    • In which case you may need to depend on selectivity to get the right plan.
  • mixing different storage engines.
    • Like using tables from both InnoDB and Memory in the same query.
  • using FORCE INDEX to force an index
  • getting worse (or not good enough) when using ANALYZE TABLE.
  • depending on --optimizer-use-condition-selectivity > 1 (using selectivity to find out how many accepted rows there are in a table).
  • using lots of derived tables (subselects).
  • using ORDER BY / GROUP BY that could be resolved via indexes
  • using table scans when it should use keys and vice versa.

First a bit of history

When I created the original MySQL optimizer in 1995, it was quite simple. A big part of the plans were based on costs, but there where also a lot of rule based choices. The original 'base of cost' was "one disk seek". The cost of key read was also a disk seek.

Over time a lot of developers have worked on the optimizer and added many new features, but the original cost model has stayed the same (until now). Most new features was done with based on a cost model, but still some new things (like deciding which index to use for sorting) were still partly rule based.

If there was only one or a few choices for a plan, the old optimizer was reasonably good in finding a good plan. But the more complex the queries got (complex = a lot of different choices could be made for each table) the more chance that it would not find the optimal plan. This goes all back to the original cost model not being 'good enough'.

The main developers of the 11.0 optimizer changes in MariaDB 11.0 are Sergei Petrunia and me. Vicențiu Ciorbaru has also provided some code and help.

The optimizer changes consist of more than 100 commits, starting from October 2021!

It has been a long project!

What is new

In 11.0 the costs are based on the expected microseconds spent in the storage engine, in filesort and in filtering code. Almost all cost factors have been calculated base on result from performance tests.
The costs can be viewed and changed by the end user, either globally or for a specific storage engine. A few top level costs can be changed per query.

As part of the above work, a lot of other improvements have gone into the optimizer:

New cost for:

  • All storage engine low level index/row read calls (read key, read row, fetch next etc)
  • Filesort
  • Unique
  • Join_cache
  • Materialization
  • Disk access costs are by default based on modern SSD (can be changed by the end user).

A lot of improvements (including bug fixes):

  • Selectivity, when using --optimizer-use-condition-selectivity > 1.
  • Rowid_filter: cost updates and better performance
  • Adding Rowid_filter to Aria storage engine
  • Cost for “Using index for group-by"
  • Counting costs of disk accesses.

Cost based change to be rule based:

  • Choosing right index for scanning the table as part of GROUP BY/ORDER BY optimization.

Performance improvements:

  • Derived tables and unions now creates a distinct key in the derived table (when possible). This adds automatic duplicate elimination to the derived table (which makes it smaller) and improves performance when using the derived table.
  • A lot of  small optimization in all parts of the optimizer

Other things:

  • A lot more information in optimizer trace (which makes it easier to help customers and users to find out what the optimizer is doing).

What are the notable consequences of moving to MariaDB 11.0?

First note that MariaDB 11.0 is based on the stable MariaDB 10.11 release with just the optimizer changes + a few small safe features.
My expectations is that 11.0 should be stable (no major showstopper bug) from the start (beta).

The MariaDB optimizer team will do its best to quickly address any issues with the new code. We plan to do spot-releases with bug fixes if anything serious is found to ensure that there is a short feedback loop between users and developers!  If there is a major uptake of MariaDB 11.0, it will get updates regularly.

Most applications, which are properly using keys, should be unaffected:

  • Simple queries will work as before
  • Most complex queries (with many tables) should perform equal or better than before.

Some things that you may see based on the above cost changes:

  • The optimizer is more likely to use an index to resolve ORDER BY.
  • Very small tables are more likely to use table scan (as it's very fast for small tables).
  • The optimizer will use rowid_filters a bit more than before.
  • The optimizer will use “index for group by” optimization more optimal now
  • The optimizer will use materialization a bit less as it's cost is now a bit higher.
  • The optimizer should be able to do a better choice when deciding when to use table scan, index scan, index_merge, hash and other join methods needed when key lookup cannot be used.
  • Complex queries may get a different (better) plan.

If you have problems with some queries, MariaDB allows you to change costs to match your hardware/environment or data access patterns much better than any previous MariaDB release!

Example of examining engine costs:

select * from information_schema.optimizer_costs where engine="innodb"\G
*************************** 1. row ***************************
                         ENGINE: InnoDB
       OPTIMIZER_DISK_READ_COST: 10.240000
OPTIMIZER_INDEX_BLOCK_COPY_COST: 0.035600
     OPTIMIZER_KEY_COMPARE_COST: 0.011361
        OPTIMIZER_KEY_COPY_COST: 0.015685
      OPTIMIZER_KEY_LOOKUP_COST: 0.791120
   OPTIMIZER_KEY_NEXT_FIND_COST: 0.099000
      OPTIMIZER_DISK_READ_RATIO: 0.020000
        OPTIMIZER_ROW_COPY_COST: 0.060870
      OPTIMIZER_ROW_LOOKUP_COST: 0.765970
   OPTIMIZER_ROW_NEXT_FIND_COST: 0.070130
   OPTIMIZER_ROWID_COMPARE_COST: 0.002653
      OPTIMIZER_ROWID_COPY_COST: 0.002653

Note that the costs above are in microseconds (to make them more readable)!
The intention with having costs visible is not that normal users should have to understand or change the costs!
It is more a tool for MariaDB Corporations Support and Engineering teams to fix issues that affects customers without having to build a new binary!

Feel free to try out MariaDB 11.0 !

If you find anything that is wrong or can be improved, please report a bug in MariaDB bug tracker and link it to the optimizer entry, MDEV-26974.

If possible, put MariaDB 11.0 as an extra slave and add a comment to this blog on how things work (bug reports should still go to the MariaDB bug tracker).

You can help ensuring that MariaDB 11.0 gets/is proven stable ASAP!

You can read more about the 11.0 release at:

No comments:

Post a Comment