| Using MySQL as a Data Warehouse |
|
|
Any relational database system can be used to build a data warehouse. At Plus Three we chose MySQL v4.1.x for our warehouse system. This choice was natural for us as we're largely a MySQL shop. We chose v4.1.x because it was the stable version when we started development. Our operational database uses MySQL's InnoDB backend, providing referential integrity and transactions. However, we chose MySQL's MyISAM backend for our warehouse. Warehouse applications are read-only, and as such, have little need for run-time referential integrity checks and transactions. MyISAM can also be faster than InnoDB in some circumstances. An upside to using MySQL for the warehouse is high capacity and decent performance with little tuning. Our larger installations handle tables with more than one hundred million rows without major difficulties. Although we've spent some time optimizing for particular warehouse queries, the majority were fast enough out of the gate. All that said, there are features we'd like to have that MySQL is lacking. Perhaps chief among them is support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems. We also found MySQL's subquery optimization to be of poor quality. Subqueries are relatively new to MySQL, so there's a good chance this will improve when we upgrade to MySQL v5.1.x soon. A nontraditional alternative is a column-oriented database, like Vertica and MonetDB (which has a DBI driver). Column-oriented database systems store data by column rather than by row, and as such, offer performance advantages for typical data-warehouse usage patterns. |
| < Prev | Next > |
|---|






