2008-11-29

Oops, we did it again (MySQL 5.1 released as GA with crashing bugs)

MySQL 5.1 is now released as "GA".

In this blog I will try to describe my opinions about this release and also try to set the expectations right for anyone trying out MySQL 5.1 GA.

What should you then expect from MySQL 5.1?
  • If you are using MySQL 5.1 just as a 'better' version of MySQL 5.0 and you don't plan to use any of the new features in MySQL 5.1 then you are probably fine to try out MySQL 5.1. You should however not put it into production without testing it fully, preferably by running it on a couple of slaves for some weeks. It may even be the best to wait for a couple of minor/patch releases before putting the MySQL 5.1 server into production.
  • Don't expect that all critical bugs that you may have encountered in 5.0 to be fixed in 5.1. Even if we have fixed a big majority of the bugs from 5.0 some really critical ones still haven't been addressed.
  • If you plan to use any of the new features of MySQL 5.1, regard these as if they would be of beta quality. Test any usage of these features extensively for in close-to-live scenarios before putting them onto a production server.
  • If you are a new user trying out MySQL for the first time, you should use MySQL 5.1; At least it's better than the MySQL 5.0 community version which has not been updated for some time.
The reason I am asking you to be very cautious about MySQL 5.1 is that there are still many known and unknown fatal bugs in the new features that are still not addressed.

To prove my points, here is some metrics and critical bugs for 5.1:
  • We still have 20 known and tagged crashing and wrong result bugs in 5.1 35 more if we add the known crashing bugs from 5.0 that are likely to also be present in 5.1.
  • We still have more than 180 serious bugs (P2) in 5.1. Some of these can be found here.
  • We have more than 300 known and verified less critical bugs that are not going to be addressed soon. (The total reported number of bugs to the MySQL server is of course much larger)
Some examples of older bugs that *should* have been fixed in 5.1 before GA:
  • Bug #989 "If DROP TABLE while there's an active transaction, wrong binlog order". This is a bug that has been known since August 2003, and has been discussed an referred to in several public places, including Wikipedia and my last talk at the MySQL users conference. It allows in effect anyone with rights to any database that is replicated to take down all slaves (either by accident or intentionally). This is also a bug that has been hit by several of our users in the past.
  • Bug #33082 Stored Procedure: crash if table replaced with a view in a loop
  • Bug #33094 Error in upgrading from 5.0 to 5.1 when table contains triggers
  • Bug #34110 Crash in InnoDB when used "embedded"
  • Bug #34502 mysqladmin debug causes a crash when server is creating/dropping many tmp tables
  • Bug #34660 crash when federated table loses connection during insert ... select
  • Bug #37756 enabling fulltext indexes with MyISAM_repair_threads > 1 causes crash
  • Bug #37936 "Crash when executing a query containing date expressions"
  • Bug #38816 kill + flush tables with read lock + stored procedures causes crashes!
  • Bug #39178 Server crash in YaSSL with non-RSA-requesting client if server uses RSA key
  • Bug #40386 Not flushing query cache after truncate
  • Bug #40675 MySQL 5.1 crash with index merge algorithm and Merge tables
  • Bug #32868 Stored routines do not detect changes in meta-data. Note that this will not be fixed until 6.1 !
  • Bug #39526 sql_mode not retained in binary log for CREATE PROCEDURE
  • The federated engine is not enabled by default. It was disabled during a previous MySQL-5.1 "RC" release because of bugs filed against the Federated engine that MySQL developers didn't have time to fix. This solution was deemed to be easier than upgrading the Federated engine to a newer version of the engine. This means that people that have problems with the federated engine are better off using a FederatedX plugin, compiling MySQL them self together with FederatedX or use the ourdelta MySQL distribution which contains FederatedX.
  • MySQL-Cluster bugs are not fixed in MySQL 5.1; Instead the Cluster engine is moved from the MySQL 5.1 release to a separate MySQL-Cluster release.
When it comes to "quality" of the new features in MySQL:

Partitioning:
  • 20 open bugs of which at least 7 are targeted to be fixed in later MySQL 5.1 releases.
  • Partitioning in MySQL 5.1 should be regarded as a step to a full partitioning feature with parallel query. Parallel query is however not scheduled even for MySQL 6.0.
  • For now partitioning is mainly useful in the case where you need to frequently drop a well defined part of a table (like one month of data) and when MERGE tables are too cumbersome to use.
  • If one partitioned table crashes it's very hard (sometimes impossible to repair it.
  • If you get a server crash during ALTER TABLE of a partitioned table you may loose all your data for that table.
  • Partitioning is very slow and can become unusable if you have a large number of partitions. This happens even if you only use a few of the underlying tables in your query.
  • Bug #40954 "Crash in MyISAM index code with concurrency test using partitioned tables"
  • Bug #40827 Killing insert-select from InnoDB to partitioned MyISAM can cause table corruption
  • Bug #30102 rename table does corrupt tables with partition files on failure
Row-based and mixed replication:

Row based replication has been regarded as one of the most wanted feature in 5.1. However, because of several problems with the implementation of row based and mixed mode replication it's not enabled by default. These problems are:
  • At least 28 open bugs of which 26 are verified and at least 11 are targeted to be fixed in later MySQL 5.1 releases.
  • Row based replication errors on the slave can be hard to debug as you can't see exactly what statement caused the problem. A new feature in MySQL 5.1.28 allows you to see what rows was changed, but this is usually not enough to find out the exact query that failed to replicate.
  • For bulk operations on non transactional tables, the data may appear inconsistent during selects on the slave (source MySQL manual)
  • Bug #40221 Replication failure on RBR + UPDATE the primary key. This bug is such a serious issue that it should have stopped a GA release!
  • Bug #38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUPP_KEY. This causes wrong data on slave if you do slave start/stop at the wrong time.
  • Bug #40116 Uncommitted changes are replicated and stay on slave after rollback on master
  • Bug #40276 Assertion trx_data->empty() in binlog_close_connection
  • Bug #31240 load data infile replication between (4.0 or 4.1) and 5.1 fails
Built in job scheduler (Events):
It's hard to find a number of bugs on events as there is no easy way to search for them in the bugs system. In general events are regarded to be one of the more stable features in MySQL, but it's not totally free from problems:
  • Bug #40915 "Events takes mutex in wrong order which can easily lead to deadlocks"
New SQL diagnostic aids and performance utilities:

This was part of the announcement but I don't know that we mean with this. I couldn't find anything about this on the "What's New in MySQL 5.1" page. I assume this refers to the SHOW PROFILE patch from MySQL 5.0 community which is now in MySQL 5.1

Logging to tables:

This major feature is unfortunately so slow (30% + slowdown) that it's unusable for busy sites. Ref: Bug #30414: "Slowdown (related to logging) in 5.1.21 vs. 5.1.20". I assume this was why it was left out from the MySQL announcement of MySQL 5.1

Some general crashing/wrong data bugs (not all, just enough to prove a point):
  • Bug #40770 Server Crash when running with triggers including variable settings (rpl_sys)
  • Bug #37016 TRUNCATE TABLE removes some rows but not all
Most if not alll of the above are things that could and should be been fixed before 5.1 was declared as "GA". Note that this was just a short list of known bugs to prove a point. The real list of serious bugs is much longer. To know if a features is stable enough for your usage, please check the features you plan to use in the MySQL bugs system!

So what went wrong with MySQL 5.1 ?

This is surprisingly not because our developers don't do a good job. On the contrary we have an excellent dedicated team of developers that are very good in what they are doing. However, even an excellent team can't work if the conditions are not right.

Here follows some of the main reasons why MySQL development department again
got a quality problem with a GA release:
  • MySQL 5.1 was declared beta and RC way too early. The reason MySQL 5.1 was declared RC was not because we thought it was close to being GA, but because the MySQL manager in charge *wanted to get more people testing MySQL 5.1*. This didn't however help much, which is proved by the fact that it has taken us 14 months and 7 RC's before we could do the current "GA". This caused problems for developers as MySQL developers have not been able to do any larger changes in the source code since February 2006!
  • We have changed the release model so that instead of focusing on quality and features our release is now defined by timeliness and features. Quality is not regarded to be that important. To quote Mårten Mickos: "MySQL 5.1 will be release as GA in or before December because I say so". Mårten's reasons for this is that he needs something he can sell and a release marked "GA" is much easier to sell than a release marked "RC".
  • The MySQL core developers have been split into too many teams and only a small part of the core developers have been working on MySQL 5.1 to get the bugs fixed. Some of the core developers have also recently left the MySQL organization which is a serious issue as there is not many of of them.
  • Too many new developers without a thorough knowledge of the server have been put on the product trying to fix bugs. This in combined with a failing review process have introduced of a lot new bugs while trying to fix old bugs.
  • Bug fixing and development processes are not systematic and not persistent.
  • We have not been giving the MySQL community enough opportunities to test MySQL 5.1 (too few releases). The reason few releases was made was that if we would have done a release every month, as we have done in the past, we would have got 14 RC releases which would have looked silly and proved that the first RC was made too early. In addition, the MySQL current development model doesn't in practice allow the MySQL community to participate in the development of the MySQL server.
  • The MySQL organization doesn't have a release criteria for the MySQL server that is followed; Both the external one and the internal one have not been followed when it comes to declaring MySQL 5.1 as GA. You can read more about our release policy in Kaj's blog.
  • Internal QA on the MySQL server was started very late in the process. Now when the process have started to show results, the found bugs have largely being ignored as fixing these they would delayed the MySQL 5.1 GA date.
  • The MySQL server team have a bug fixing policy where a bug that has existed a long time has a lower priority 'because people know about them'. This is supposedly one of the reasons why the Bug#989 mentioned above has not been fixed.
One would have thought that MySQL AB (now the MySQL department at Sun) should have learned something from our too early release of MySQL 5.0 but unfortunately this is not the case. The main argument I have heard for why MySQL 5.1 was declared as GA now is that it's better than MySQL 5.0 was when it was declared as GA. In my opinion, this is not a good reason to declare something GA, especially as 5.0 GA was in terrible shape when it was released. What is worse is that the new features in MySQL 5.1 are of no better quality than new features in MySQL 5.0 was at the time MySQL 5.0 was declared GA.

What should then have been done before declaring MySQL GA ?

It's of course impossible to get all issues fixed, but we should at least have tried to ensure that all issues important to a lot of MySQL developers and MySQL users should have been discussed, fixed and/or addressed in a public manner! We should also never have a single serious crashing/wrong data bug in a GA release.

There should also be, from MySQL management, an independent release criteria committee that would be the one deciding when the MySQL server is ready to be declared beta, RC and GA. This is something that Sun usually has for their other products.

As I said in my talk at the MySQL users conference, I think it's time to seriously review how the MySQL server is being developed and change the development model to be more like Drizzle and PostgreSQL where the community has a driving role in what gets done!

I would like to point out that the current release is not something that can be said to be fault of Sun. The decisions to do a GA release was solely been made by the MySQL management in Sun. The only thing Sun can be blamed of is to not start fixing the MySQL development organization soon enough to ensure that things like this can't happen.

I still have some hopes that Sun will come in and fix the MySQL development organization, but with MySQL server releases like this one my hopes have started to fade a bit.

There is however some good news in getting MySQL 5.1 released as GA:
  • MySQL community users that have not got an update for MySQL 5.0 for 4 months should be able to switch to MySQL 5.1 and now finally get some of their bugs fixed! What still worries me is that the MySQL organization has not yet clearly defined how future MySQL 5.1 versions will be released to the community. This is however a large topic of it's own...
  • The MySQL embedded library is back in a supported release. (Not a big thing, but still important for some part of our community).
So what to do next?

Install and test MySQL 5.1; If it works, feel lucky. If not, report a bug at http://bugs.mysql.com/. Don't forget to blog about your experiences with MySQL 5.1!

There is two ways things can go:
  • If MySQL 5.1 works for a lot of people and not too many get serious crashes and losses data, then I was concerned without a good reason and everything is fine.
  • If MySQL 5.1 does have some serious problems and people report them, the bugs will be fixed and the MySQL & Sun management will have more information to not repeat the same thing with MySQL 6.0.
Good luck with your MySQL 5.1 usage and keep us posted about it!

PS: For those interested, the picture in the blog is decidicated to those in charge of releasing MySQL 5.1 as GA. The statue itself was bought in Riga this year during the internal MySQL developers conference and was presented to the MySQL 6.0 managers as a symbol for the MySQL 6.0 server release planning.

34 comments:

Unknown said...

Great post, Monty. Thanks for taking the time to write this, and thanks for still fighting for MySQL quality. Keep up the spirit!

Harold Fowler said...

Wow, now why am I not surprised!

jess
http://www.anonymity.pro.tc

bestis said...

'Bug #37936 "Crash when executing a query containing date expressions"'

'You do not have access to bug #37936.'

:( That sounds somethng I would like to know, I allow users to set date/time formats. If they would crash mysql with those it's kind of horrible situation. And I would like to know what i need to do to stop that happening, tracking that down if someone uses it is something that i think doesn't first come to mind..

Lars Johansson said...

We use 5.1 for a year or so, it works for us. We had a serious problem with 5.1.23 and reported the bug, which was fixed in 5.1.24, since then it works great. We been using the partition feature of 5.1 witout any problem. But we are a data warehouse doing all update in bach one at a time. For us it is a great release (5.0 was horrible).
I been drinking some Belgian trapisters so excuse bad spellings etc.

Phil Hildebrand said...

We've been using 5.1 in production since 5.1.24-rc. We have had a few issues (replication crashing), which we were able to get workarounds for quickly.

That said, we keep a tight eye on as many of the critical bugs entered in order to see what might affect us, as well as a somewhat decent test cycle (though - probably not any better than yours)

My experience so far does not lead me to believe, however, that bugs in MySQL 5.1 are any different or more numerous than any other major database system I've worked with (Oracle, SQLServer, Sybase, etc)

Thanks for listing the bugs though - there were a few I wasn't aware of ;)

Anonymous said...

(Most of your bug report links are broken. To fix them, delete all occurrences of the string “http//” without a colon.)

David Fetter said...

Monty, you've made your wealth and fame. Now it's time to move to the light side. PostgreSQL will welcome your participation in the project :)

Monty said...

Thanks for the information about the broken links. They are now fixed.

Unknown said...

We've been testing our software with 5.1 GA since Thursday, and found out that it's about 25-40% slower for our software compared to 5.0.67. Since we won't use new features, we'll stick to 5.0 for now.

Monty said...

About MySQL 5.1 being slower: As long as you don't use logging to tables, MySQL 5.1 should be of about the same average speed as MySQL 5.0 (There are even some optimizations that makes some queries notable faster). I don't know of anything else that would cause a notable slowdown in MySQL 5.1. Please try to find out what could be the problem and post a bug report and the MySQL developers will try to find and fix the problem! In spite of everything we treat any bug that causes a crash or slowdown very seriously. The main problem just now is that the MySQL organization don't have a strong release criteria that defines when we can call something GA!

Anonymous said...

Well written article for a sensitive topic Monty.

We know you guys fight daily at the source code front.

Keep up the good spirit !!

Anonymous said...

I do not understand why Marten Mickos hasn't been replaced yet by Sun management. He has little to no experience in running an open source software team, and continues to surround himself with marketing types, not execution specialists.

I have always known Sun to deliver carrier grade platforms, and it sure seems like Marten's in the way of having MySQL gain that status. Has anyone spoken to Sun management about this?

Mark Callaghan said...

MySQL Cluster _is_ carrier grade.

Unknown said...

Yes, Community first! Thanks for the post. It pushed to write an article, I wanted to write for some time: http://www.perspektive89.com/2008/community_first

I am supporting your view and would like to extend what you are writing to free software/content/infrastructure/wireless communities in general. I discuss this in regards to Freifunk free wireless (http://freifunk.net) and LXDE (http://lxde.org), the fastest full featured netbook desktop.

I believe communities must take the lead in order to make and keep a project on the bleeding edge and offer quality code and stability. We should work together with companies and exchange resources. Both can profit. In the end open and free community projects are all about cooperation. However, companies and communities have different agendas. You will find brilliant and ¨shiny¨ people in free software communities, who just would not work in some companies and some just would not fit in. Communities can integrate many different people. The motivation of these people goes beyond monetary interests. That is why the free software community development model is more powerful.

So, if we succeed to cooperate in a community the development is much more sustainable. And generally said a community can never go bankrupt :-) a company could (even though I do not think MySQL/Sun would in the foreseeable future).

Of course there are many things to discuss how communities can be more open, inviting and outreach more. Here it is sometimes easier if you have the more traditional structure of a company. I am curiously looking forward to see how things are continueing here :-)

Dathan Pattishall said...

well 5.1 offers some great things to play with at-least.

Unknown said...

Thanks for a very thorough follow-up Monty, and I'm tipping my hat to your constant fights for the quality of MySQL. That's the spirit that's made me believe in the product for the past 10+ years.

Jono Bacon said...

Interesting post, Monty. Thanks for sharing your thoughts.

I am curious though - you said that the problems are with the MySQL management in Sun - have these issues been address to Marten and others, and has there been some indication that they will be resolved?

It sounds like the intersection between Sun and Community with QA is facing some troubles, but I would imagine these can be fairly easy to solve, under the premise MySQL are open to change.

buddyglass said...

I disagree that "quality and features" should be the sole focus of a release. Marketing concerns such as timeliness should play a part. However, if forced to choose between quality and features in order to meet a release deadline, features should be sacrificed - not quality.

Hold the release schedule and quality level constant, then add as many features as you can. If "as many as you can" is "not that many" then so be it.

ceasless said...

Note to Sun:

Subject: Re: Sun's Management (and the Marketing they choose)

Get in or get off the diving board, you look silly dipping your toe in from all the way up there.

Anonymous said...

I started building data warehouse solutions on 5.1 about a year ago because I needed table partitioning for very large fact tables. I have not experienced any crashing or corruption problems, and performance has been very good.

As a former PeopleSoft (later Oracle) employee, I can tell you that it is unrealistic to expect that highly complex software will be released without known and unknown anomalies. GA != Utopian, bug-free software.

K B said...

Isn't it fun to be an armchair quarterback? :-)

While I agree a bit with what Chris (above) said about a utopian view of bug-free complex software, when a serious or critical bug exists in software prior to a release, an evaluation must be made on how common the bug is seen and whether or not data loss is caused as a result.

There are a number of bugs in the MySQL server that continue to cause data loss and prevent effective recovery. With bugs like #949 being around for five years, I think it's time to "stop the line" as they say at Toyota and get that binlog issue fixed. dropping a table while a transaction is in-progress should cause either the table drop to fail or the transaction commit to fail. There are no two ways about this.

Marty - this is a great post on process improvements MySQL / Sun can make so users don't get surprised by nasties of code quality failures.

There is a point that a for-profit company must decide - release with bugs and get it out the door with support headaches and the hits to reputation versus the other way - the hits to reputation due to continuing to slip what is already perceived as a late release.

As many here know (I hope), development teams have three things to "play with" on the way to a release - time available to release, features included (scope) in the release, and quality of code released (bugs fixed and defects released). Chaos results when all three are negotiable during the project lifecycle. Nothing gets done on time if none are negotiable. When one (or preferably two) of those items is fixed, that makes it much easier to negotiate on what the "Done" requirements are for a release.

I personally prefer to negotiate scope and time, but never quality. I think Monty would agree. If I had to loose one of the two of those, I'd rather give up negotiation on time. Scope is relatively easy to adjust downward and lets the customer get something useful earlier. The problem is, there are times when adjusting scope downward will force me to adjust quality downward as well.

Dave Edwards said...

On the heels of the long-awaited GA release of MySQL 5.1, Monty Widenius summed up his feeling on the release thus: Oops, we did it again.

Monty said...

To clarify things a little:
I think that MySQL 5.1 is a good *recommended* release, especially now when MySQL/Sun is providing full support for it.

What I disagree with is giving MySQL 5.1 a GA status, which at least for me, implicates it's has no crashing or other serious bug that affects normal operation.

That said, work on MySQL 5.1 continues and if things goes well we reach this goal more sooner than later.

Paul Urban said...

"...not too many get serious crashes and losses data..."

Gosh, who would use a database engine where that is a prospect you can realistically look forward to?

Dr. Roy Schestowitz said...

Better to know the truth than to do a marketing polish.

Thanks, Monty.

pcleddy said...

Ya, that staute gives me a GREAT idea for a variation on the theme ;)

powlean said...

II think that in this moment of transaction the comunity must promote a quality campaign
send a mail to Sun with this subject :MySql Ask Quality

Unknown said...

I develope Complex Web Server (http://ponkrac.net/complex-web-server/en) as a reliable, user friendly web server with many features. MySQL server is part of content of the one.

Reliability is very high priority for me. When I downloaded mysql 5.1.30 package for Windows 32 bit I got very bad feel from MySQL. Some things is not completed. For example libmysql.dll in 5.1.30 package caused crashing of PHP & Apache server. I need to substitute libmysql.dll and I use version from 5.0.67 which works good.

Official MySQL 5.1 documentation in Windows help form (.chm) is damaged - some chapters is entirely missing and caused message box with error.

Miloslav Ponkrác

Anonymous said...

cautious about MySQL 5.1 is that there are still many known and unknown fatal bugs in the new features that are still not addressed. Great post thanks

Anonymous said...

I've been a big fan of MySQL since v4. That said I think the worst thing that could have happened to MySQL was selling it to Sun.

You guys were on the right track, give the software away, charge for the support.

But once you sold to Sun it became create a marketable product on an obscenely bad timetable.

Best of luck with the new company.

Alex Tomic said...

I think most people in the closed-source database world recognise the release inflation that has taken place over the years.

GA == beta
RC == alpha

If, as Big Corporate(tm)'s CTO, you don't know that, you're likely going to be paying through the nose for support contracts and collecting unemployment not long after. Welcome to Oracle and DB2's business model.

For those of us that are just trying to get stuff done, if 5.1 works for us, great, if not, we roll back. Life will go on, no one expects complex software to be completely bug-free

Unknown said...

We have been using MySQL 5.1.22 on an cluster for 14 months now, with very few problems, we had some issues with the memory management if you have very big databases, but it was mainly due to the lack of explanation on the settings. Until I have a major crash on my hand or a need to reinstall, I'm planning to continue using the community version.

Thanks for the post though, Some of my collaborators were asking for the updating after the announcement, not going to happen now that I've read this

Paul Robinson said...

MySQL is a class of extremely critical software that people absolutely depend upon. That means that they should be doing daily builds and automated bug checking. I've read Joel Spolsky, and according to his metrics, this is one of the ten most important things a company can do to improve the quality of the software it releases. With SUN having bought this company it would presume they have the resources to do this. Hell, I am a one-man software company and when I was working on a project I did regular daily builds and (while not as rigorous as a company that can afford automated testing software) I still went through and ran tests on everything I changed, on a daily basis to make sure I didn't break anything.

Paul Robinson - My Blog

"The lessons of history teach us - if they teach us anything - that no one learns the lessons that history teaches us."

Shebnik said...

Monty, thank you for the post. I just started working a sysadmin at site with Mysql database, and had several days nightmare - segfault.

When we traced down it to a last stored procedure call we was ready to give up. Simple insert to ndm table crashed mysqld (both 5.1.25rc and 5.1.32). Recreating table did not help. But then my developer recreated index and... it's working now.