Lately there has been a lot of discussion about “hard” or “soft” forks related to MySQL. As someone who has done a successful fork of MySQL, I think this is both confusing and trivialising the concept of forking.
In my previous blog, I did touch a bit on this topic, but it looks like some more clarifications are needed.
When we did the initial fork of MariaDB from MySQL, we tried our best to keep things 100% user compatible while still adding new features and fixing issues in MySQL. For MariaDB 5.1 -> MariaDB 5.5, we merged all relevant changes from MySQL into MariaDB.
This did not mean that MariaDB was 100% compatible with MySQL, as any change in a fork makes things incompatible in some manner. For example, the enhanced optimiser in MariaDB 5.5 did work slightly differently (better) than MySQL, and if one used any of the new features in MariaDB, one could not trivially go back to MySQL anymore. However, for most users these changes were not notable and allowed most Linux distributions to automatically move MySQL users to MariaDB without any disturbance.
Over time, the merging of MySQL code became harder and gave us less benefit compared to the effort of doing the merges. The new MySQL developers had started to move source code around (which made merges harder), and we, the MariaDB developers, were not happy with the quality of the code related to bug fixes or some of the new features. It was easier to write the new feature from scratch than to use the MySQL code. However, for each feature we did our best to ensure that the syntax and behaviour were identical to MySQL.
Another big problem was that MySQL started to copy features (not code) from MariaDB, but used a different SQL syntax than what MariaDB was using. One example is the usage of CHANNEL in multi-source replication. It did not make any sense for MariaDB to copy the multi-source code from MySQL, as we already had a working, stable implementation we were happy with.
With MariaDB 10.0, we decided to stop merges from MySQL and instead monitor new features and implement those that we thought made sense for MariaDB.
Moving to MariaDB 10.0 allowed us more flexibility in adding more features to MariaDB without being constrained by the MySQL code, like Galera, Oracle compatibility, and a lot of other things listed here.
Nowadays, most of the MariaDB development work is adding features customers and MariaDB users are missing (link to MariaDB 13.0 roadmap will shortly be added here). A lot of this work is related to new Oracle compatibility required by new customers, like FULL OUTER JOIN. There are still a few notable features in MySQL that we have not had time to re-implement, like multi-value indexing (for indexing JSON), JSON operators, and LATERAL tables. All of the mentioned ones are on the MariaDB 13.0 roadmap.
We, the MariaDB developers, are still working on keeping MariaDB compatible with MySQL (and Percona Server). In MariaDB 10.11, we added support for the popular extensions from Percona Server. In the latest MariaDB versions we have ensured that one can replicate from MySQL to MariaDB and back. We have also added support for the caching_sha2_password plugin, to allow MySQL users to switch to MariaDB without changing their passwords, support of the default MySQL character collation set, utf8mb4_0900_* and multiple JSON functions.
We also listen to MySQL users moving to MariaDB and do our best to implement the features they need to be able to move to MariaDB. The MariaDB Foundation is there for those who want to be part of this effort!
The above hopefully gives the needed background to discuss different kinds of forks (just kidding) in more detail.
Internal fork
- Fork where the company/original development team forks the product for political, redesign, or development reasons. The fork may be more or less, or not at all, compatible with the predecessor.
Examples:
- MySQL 8.0 (someone could call this a “hard” fork as it was hard to move to it and very hard to go backwards )
- OpenOffice → Apache OpenOffice (after Oracle acquisition; internal governance shift)
- Sun Solaris → Oracle Solaris (post-acquisition direction change)
- KDE 3 → KDE 4 (often cited as an internal “hard” break due to massive architectural changes)
- Python 2 → Python 3 (not a fork in licence terms, but functionally an internal compatibility break)
- Drizzle (https://en.wikipedia.org/wiki/Drizzle_(database_server)
External fork
- When an external group or company forks a project for various reasons. The most common reasons are creational differences in how to take the project forward or distrust in the original project owners.
The external fork has a lot of subcategories:
Downstream "no-changes" fork
- The fork is based on the original project with a small, limited subset of changes to get the project to work within an ecosystem or with an external/internal project that requires some minor changes.
- The code is basically a rebase plus patches on top of the original code.
- No user-visible changes from the original project.
Examples:
- Packages in Linux and other OS distributions
- Ubuntu kernel (downstream of Linux with minimal, policy-driven patches)
- Homebrew / MacPorts packages
- Debian-patched GNU tools
- Android Linux kernel (arguably borderline, but many devices are close to upstream + patches)
Downstream fork
- The fork is based on a rebase of the original code, but with user-visible changes that bring a different user experience while keeping the base 100% compatible with the original project. It is reasonably easy to move to the fork, but harder for users of this fork to move back to the original.
- The forks usually have the problem that newer major versions have to drop options or features when the original project adds them, which makes upgrades to the next version a bit harder.
Examples:
- Red Hat Enterprise Linux (downstream of Fedora)
- Ubuntu (downstream of Debian)
- Amazon Linux (downstream of RHEL/CentOS lineage)
- PostgreSQL distributions (EDB Postgres, Amazon Aurora PostgreSQL-compatible)
- Percona Server
- MariaDB 5.1 -> 5.4 (these MariaDB versions never had to drop a feature)
Compatibility fork
- The fork was originally a 'Downstream fork' but moved to, instead of using rebases, only merging selected patches from the original project and rewriting things the developers disliked. The goal is still to have high compatibility with the original project.
- Examples:
- LibreOffice (from OpenOffice.org)
- Jenkins (from Hudson, especially post-Oracle divergence)
- Percona XtraDB Cluster
- MariaDB 5.5
Independent fork (or "branch")
- The fork is no longer dependent on the original project. It may still take selected patches or ideas from the original project.
- It usually tries to keep things compatible to make it easy for original project users to move to the new project, but the main focus is solving new problems for its growing user base.
Examples:
- GhostBSD
- OpenBSD (from NetBSD)
- Illumos (from OpenSolaris)
- systemd (initially replacing sysvinit, now fully independent ecosystem)
- Neo4j Community vs Enterprise split (conceptual fit)
- Firefox (historically from Mozilla Suite)
- MariaDB 10+
Some people have recently expressed that they are afraid that MySQL development is stopping or slowing down, and others have started to talk about the need to do a “soft” fork of MySQL.
The point I am trying to make is that if these worries are real, then any fork will sooner or later have to become an independent fork/branch or die together with MySQL (as there will be no new features in the fork).
One of the mantras in open source is that it is better to join an existing project than to create a new one! Instead of talking about creating yet another fork of MySQL, it would be better if everyone gathered around MariaDB! MariaDB development is not dependent on Oracle for its future. This is assured by the MariaDB Foundation, which was created to make it easy for anyone to participate in the development of the MariaDB server. MariaDB plc is working together with the MariaDB Foundation to make this possible.
MariaDB is, after all, created by the same people who created MySQL and is developed in the way it would have been if Oracle had not bought MySQL. The rapid adoption of MariaDB (350+ million database installations and rapidly increasing) shows that MariaDB is truly the future of MySQL.
PS:
Please leave a comment if you have a better name for any of the fork categories, another fork category that should be added, or more examples for the categories.
No comments:
Post a Comment