2008-11-29

Oops, we did it again (MySQL 5.1 released as GA with crashing bugs)

MySQL 5.1 is now released as "GA".

In this blog I will try to describe my opinions about this release and also try to set the expectations right for anyone trying out MySQL 5.1 GA.

What should you then expect from MySQL 5.1?
  • If you are using MySQL 5.1 just as a 'better' version of MySQL 5.0 and you don't plan to use any of the new features in MySQL 5.1 then you are probably fine to try out MySQL 5.1. You should however not put it into production without testing it fully, preferably by running it on a couple of slaves for some weeks. It may even be the best to wait for a couple of minor/patch releases before putting the MySQL 5.1 server into production.
  • Don't expect that all critical bugs that you may have encountered in 5.0 to be fixed in 5.1. Even if we have fixed a big majority of the bugs from 5.0 some really critical ones still haven't been addressed.
  • If you plan to use any of the new features of MySQL 5.1, regard these as if they would be of beta quality. Test any usage of these features extensively for in close-to-live scenarios before putting them onto a production server.
  • If you are a new user trying out MySQL for the first time, you should use MySQL 5.1; At least it's better than the MySQL 5.0 community version which has not been updated for some time.
The reason I am asking you to be very cautious about MySQL 5.1 is that there are still many known and unknown fatal bugs in the new features that are still not addressed.

To prove my points, here is some metrics and critical bugs for 5.1:
  • We still have 20 known and tagged crashing and wrong result bugs in 5.1 35 more if we add the known crashing bugs from 5.0 that are likely to also be present in 5.1.
  • We still have more than 180 serious bugs (P2) in 5.1. Some of these can be found here.
  • We have more than 300 known and verified less critical bugs that are not going to be addressed soon. (The total reported number of bugs to the MySQL server is of course much larger)
Some examples of older bugs that *should* have been fixed in 5.1 before GA:
  • Bug #989 "If DROP TABLE while there's an active transaction, wrong binlog order". This is a bug that has been known since August 2003, and has been discussed an referred to in several public places, including Wikipedia and my last talk at the MySQL users conference. It allows in effect anyone with rights to any database that is replicated to take down all slaves (either by accident or intentionally). This is also a bug that has been hit by several of our users in the past.
  • Bug #33082 Stored Procedure: crash if table replaced with a view in a loop
  • Bug #33094 Error in upgrading from 5.0 to 5.1 when table contains triggers
  • Bug #34110 Crash in InnoDB when used "embedded"
  • Bug #34502 mysqladmin debug causes a crash when server is creating/dropping many tmp tables
  • Bug #34660 crash when federated table loses connection during insert ... select
  • Bug #37756 enabling fulltext indexes with MyISAM_repair_threads > 1 causes crash
  • Bug #37936 "Crash when executing a query containing date expressions"
  • Bug #38816 kill + flush tables with read lock + stored procedures causes crashes!
  • Bug #39178 Server crash in YaSSL with non-RSA-requesting client if server uses RSA key
  • Bug #40386 Not flushing query cache after truncate
  • Bug #40675 MySQL 5.1 crash with index merge algorithm and Merge tables
  • Bug #32868 Stored routines do not detect changes in meta-data. Note that this will not be fixed until 6.1 !
  • Bug #39526 sql_mode not retained in binary log for CREATE PROCEDURE
  • The federated engine is not enabled by default. It was disabled during a previous MySQL-5.1 "RC" release because of bugs filed against the Federated engine that MySQL developers didn't have time to fix. This solution was deemed to be easier than upgrading the Federated engine to a newer version of the engine. This means that people that have problems with the federated engine are better off using a FederatedX plugin, compiling MySQL them self together with FederatedX or use the ourdelta MySQL distribution which contains FederatedX.
  • MySQL-Cluster bugs are not fixed in MySQL 5.1; Instead the Cluster engine is moved from the MySQL 5.1 release to a separate MySQL-Cluster release.
When it comes to "quality" of the new features in MySQL:

Partitioning:
  • 20 open bugs of which at least 7 are targeted to be fixed in later MySQL 5.1 releases.
  • Partitioning in MySQL 5.1 should be regarded as a step to a full partitioning feature with parallel query. Parallel query is however not scheduled even for MySQL 6.0.
  • For now partitioning is mainly useful in the case where you need to frequently drop a well defined part of a table (like one month of data) and when MERGE tables are too cumbersome to use.
  • If one partitioned table crashes it's very hard (sometimes impossible to repair it.
  • If you get a server crash during ALTER TABLE of a partitioned table you may loose all your data for that table.
  • Partitioning is very slow and can become unusable if you have a large number of partitions. This happens even if you only use a few of the underlying tables in your query.
  • Bug #40954 "Crash in MyISAM index code with concurrency test using partitioned tables"
  • Bug #40827 Killing insert-select from InnoDB to partitioned MyISAM can cause table corruption
  • Bug #30102 rename table does corrupt tables with partition files on failure
Row-based and mixed replication:

Row based replication has been regarded as one of the most wanted feature in 5.1. However, because of several problems with the implementation of row based and mixed mode replication it's not enabled by default. These problems are:
  • At least 28 open bugs of which 26 are verified and at least 11 are targeted to be fixed in later MySQL 5.1 releases.
  • Row based replication errors on the slave can be hard to debug as you can't see exactly what statement caused the problem. A new feature in MySQL 5.1.28 allows you to see what rows was changed, but this is usually not enough to find out the exact query that failed to replicate.
  • For bulk operations on non transactional tables, the data may appear inconsistent during selects on the slave (source MySQL manual)
  • Bug #40221 Replication failure on RBR + UPDATE the primary key. This bug is such a serious issue that it should have stopped a GA release!
  • Bug #38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUPP_KEY. This causes wrong data on slave if you do slave start/stop at the wrong time.
  • Bug #40116 Uncommitted changes are replicated and stay on slave after rollback on master
  • Bug #40276 Assertion trx_data->empty() in binlog_close_connection
  • Bug #31240 load data infile replication between (4.0 or 4.1) and 5.1 fails
Built in job scheduler (Events):
It's hard to find a number of bugs on events as there is no easy way to search for them in the bugs system. In general events are regarded to be one of the more stable features in MySQL, but it's not totally free from problems:
  • Bug #40915 "Events takes mutex in wrong order which can easily lead to deadlocks"
New SQL diagnostic aids and performance utilities:

This was part of the announcement but I don't know that we mean with this. I couldn't find anything about this on the "What's New in MySQL 5.1" page. I assume this refers to the SHOW PROFILE patch from MySQL 5.0 community which is now in MySQL 5.1

Logging to tables:

This major feature is unfortunately so slow (30% + slowdown) that it's unusable for busy sites. Ref: Bug #30414: "Slowdown (related to logging) in 5.1.21 vs. 5.1.20". I assume this was why it was left out from the MySQL announcement of MySQL 5.1

Some general crashing/wrong data bugs (not all, just enough to prove a point):
  • Bug #40770 Server Crash when running with triggers including variable settings (rpl_sys)
  • Bug #37016 TRUNCATE TABLE removes some rows but not all
Most if not alll of the above are things that could and should be been fixed before 5.1 was declared as "GA". Note that this was just a short list of known bugs to prove a point. The real list of serious bugs is much longer. To know if a features is stable enough for your usage, please check the features you plan to use in the MySQL bugs system!

So what went wrong with MySQL 5.1 ?

This is surprisingly not because our developers don't do a good job. On the contrary we have an excellent dedicated team of developers that are very good in what they are doing. However, even an excellent team can't work if the conditions are not right.

Here follows some of the main reasons why MySQL development department again
got a quality problem with a GA release:
  • MySQL 5.1 was declared beta and RC way too early. The reason MySQL 5.1 was declared RC was not because we thought it was close to being GA, but because the MySQL manager in charge *wanted to get more people testing MySQL 5.1*. This didn't however help much, which is proved by the fact that it has taken us 14 months and 7 RC's before we could do the current "GA". This caused problems for developers as MySQL developers have not been able to do any larger changes in the source code since February 2006!
  • We have changed the release model so that instead of focusing on quality and features our release is now defined by timeliness and features. Quality is not regarded to be that important. To quote Mårten Mickos: "MySQL 5.1 will be release as GA in or before December because I say so". Mårten's reasons for this is that he needs something he can sell and a release marked "GA" is much easier to sell than a release marked "RC".
  • The MySQL core developers have been split into too many teams and only a small part of the core developers have been working on MySQL 5.1 to get the bugs fixed. Some of the core developers have also recently left the MySQL organization which is a serious issue as there is not many of of them.
  • Too many new developers without a thorough knowledge of the server have been put on the product trying to fix bugs. This in combined with a failing review process have introduced of a lot new bugs while trying to fix old bugs.
  • Bug fixing and development processes are not systematic and not persistent.
  • We have not been giving the MySQL community enough opportunities to test MySQL 5.1 (too few releases). The reason few releases was made was that if we would have done a release every month, as we have done in the past, we would have got 14 RC releases which would have looked silly and proved that the first RC was made too early. In addition, the MySQL current development model doesn't in practice allow the MySQL community to participate in the development of the MySQL server.
  • The MySQL organization doesn't have a release criteria for the MySQL server that is followed; Both the external one and the internal one have not been followed when it comes to declaring MySQL 5.1 as GA. You can read more about our release policy in Kaj's blog.
  • Internal QA on the MySQL server was started very late in the process. Now when the process have started to show results, the found bugs have largely being ignored as fixing these they would delayed the MySQL 5.1 GA date.
  • The MySQL server team have a bug fixing policy where a bug that has existed a long time has a lower priority 'because people know about them'. This is supposedly one of the reasons why the Bug#989 mentioned above has not been fixed.
One would have thought that MySQL AB (now the MySQL department at Sun) should have learned something from our too early release of MySQL 5.0 but unfortunately this is not the case. The main argument I have heard for why MySQL 5.1 was declared as GA now is that it's better than MySQL 5.0 was when it was declared as GA. In my opinion, this is not a good reason to declare something GA, especially as 5.0 GA was in terrible shape when it was released. What is worse is that the new features in MySQL 5.1 are of no better quality than new features in MySQL 5.0 was at the time MySQL 5.0 was declared GA.

What should then have been done before declaring MySQL GA ?

It's of course impossible to get all issues fixed, but we should at least have tried to ensure that all issues important to a lot of MySQL developers and MySQL users should have been discussed, fixed and/or addressed in a public manner! We should also never have a single serious crashing/wrong data bug in a GA release.

There should also be, from MySQL management, an independent release criteria committee that would be the one deciding when the MySQL server is ready to be declared beta, RC and GA. This is something that Sun usually has for their other products.

As I said in my talk at the MySQL users conference, I think it's time to seriously review how the MySQL server is being developed and change the development model to be more like Drizzle and PostgreSQL where the community has a driving role in what gets done!

I would like to point out that the current release is not something that can be said to be fault of Sun. The decisions to do a GA release was solely been made by the MySQL management in Sun. The only thing Sun can be blamed of is to not start fixing the MySQL development organization soon enough to ensure that things like this can't happen.

I still have some hopes that Sun will come in and fix the MySQL development organization, but with MySQL server releases like this one my hopes have started to fade a bit.

There is however some good news in getting MySQL 5.1 released as GA:
  • MySQL community users that have not got an update for MySQL 5.0 for 4 months should be able to switch to MySQL 5.1 and now finally get some of their bugs fixed! What still worries me is that the MySQL organization has not yet clearly defined how future MySQL 5.1 versions will be released to the community. This is however a large topic of it's own...
  • The MySQL embedded library is back in a supported release. (Not a big thing, but still important for some part of our community).
So what to do next?

Install and test MySQL 5.1; If it works, feel lucky. If not, report a bug at http://bugs.mysql.com/. Don't forget to blog about your experiences with MySQL 5.1!

There is two ways things can go:
  • If MySQL 5.1 works for a lot of people and not too many get serious crashes and losses data, then I was concerned without a good reason and everything is fine.
  • If MySQL 5.1 does have some serious problems and people report them, the bugs will be fixed and the MySQL & Sun management will have more information to not repeat the same thing with MySQL 6.0.
Good luck with your MySQL 5.1 usage and keep us posted about it!

PS: For those interested, the picture in the blog is decidicated to those in charge of releasing MySQL 5.1 as GA. The statue itself was bought in Riga this year during the internal MySQL developers conference and was presented to the MySQL 6.0 managers as a symbol for the MySQL 6.0 server release planning.