Tuesday 14 May 2013

MySQL Storage Engine


MySQL Storage Engine:

Storage Engine is a software component of DBMS that take care of how  data is stored and retrieved. MYSQL supports various types of storage engines each having its own features, strengths and trade-offs.

The default storage engine used by MYSQL prior to version 5.5 was MyISAM. Since MYSQL 5.5 the default storage engine used is INNODB. Selecting appropriate storage engine can make a significant impact on an application in terms of performance. 

To check the list of storage engines supported by your MYSQL database, execute below statement:

show engines;

To check storage engine used by specific application database, execute following statement:

SELECT TABLE_NAME, ENGINE FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'espritkm'

To change the default storage engine used by MYSQL set the default-storage-engine variable in my.cnf conf file of MySQL.

To change the default storage engine used by current mysql session, execute

SET storage_engine = MYISAM

We can also specify the default storage engine when creating table:

Create table test (  
  `id` int NOT NULL AUTO_INCREMENT,
  `data` varchar(100),
  primary key (`id`)
) ENGINE = INNODB;

To alter storage engine used by existing table:

Alter table test ENGINE = MYISAM;

But we should be very careful while switching to other storage engine type because switching to storage engine that does not support the same index, field types or sizes may lead to data loss. 

Before selecting storage engine for our database table, we should know the core functionality provided by each engine. We can bifurcate the core functionality into four parameters:
  • 1. Data type support:
  • 2. Transaction support (ACID compliant)
  • 3.Types of indexes support
  • 4 Type of locking support
MySQL supports several storage engines. The standard storage engine supported by MySQL are:

  • InnoDB
  • MEMORY ( formerly known as HEAP)
  • MyISAM
  • BDB (BerkeleyDB)
  • MERGE
  • EXAMPLE
  • ARCHIVE
  • CSV
  • BLACKHOLE

InnoDB:
  • Has transaction support. It is ACID compliant storage engine.
  • Provides row level locking thus supports parallel insert/update operations on the same table.
  • Only storage engine that provides foreign key constraint support. This ensures data integrity.
  • InnoDB supports crash recovery. If we plug out the box, and then reboot, when MySQL starts up, it will rollback all standing and open transactions that were interrupted by the power failure
  • Provides caching for data in memory as well as on disk, which provides large increase in performance gain.
  • Support for FULLTEXT indexes is available in MySQL 5.6.4 and higher version.
  • InnoDB tables arrange table data on disk to optimize common queries based on primary keys. Each InnoDB table has a primary key index called the clustered index that organizes the data to minimize I/O for primary key lookups.

Use-case:

  •  InnoDB should be used where data integrity comes a priority because it inherently takes care of them by the help of relationship constraints and transactions.
  •  Faster in write-intensive (inserts, updates) tables because it utilizes row-level locking and only hold up changes to the same row that’s being inserted or updated.
MyISAM:
  • MyISAM is based on ISAM (Indexed Sequential Access Method), an indexing algorithm developed by IBM that allows retrieving information from large sets of data in a fast way. 
  • MyISAM is best choice for tables that have very less insert/update rate but very high select rate. It is very fast storage engine that provides best reading speed of all storage engines available in MySQL.
  • Uses table level locking thus limits multi-user concurrency
  • No transaction support
  • Does not have support  for data integrity i.e. no foreign key constraint supported
  • The max number of rows supported for MyISAM is ~4.29E+09 with up to 64 indexes per table. Fields of TEXT/BLOB type can also be fully indexed for cases such as searching
Use-case:

  • Read-only tables.  Can be used for tables that are never or rarely modified.
  • Replication configuration. In replication environment we can make an InnoDB based Master database which is used for writing and processing data and MyISAM-based Slave database which is used for reading.
MEMORY:
  • It stores table content in memory.  Data gets lost when server is restarted. But the table definition persist as it stores table definition on disk. To populate a memory table when the MySQL server starts, we can use --init-file option. For example, you can put statements such as INSERT INTO ... SELECT or LOAD DATA INFILE into this file to load the table from a persistent data source.
  • Fast data retrieval as it stores data in memory.
  • Provides table-level locking.
  • No Support for transaction.
  • Supports  both HASH and BTREE indexes. By default it uses HASH index.
  • Memory table does not support BLOB and TEXT column types
  • Memory is not reclaimed if individual rows are deleted from a Memory table. Memory  is reclaimed only when the entire table is deleted. Memory that was previously used by rows that have been deleted will be re-used for new rows only within the same table. To free up the memory used by rows that have been deleted, use ALTER TABLE ENGINE=MEMORY to force a table rebuild.
  • The data is lost when the database is restarted.
Use-case:
  • Can be used when temporary tables are required and fast data access is needed. It is ideal for creating temporary tables or for quick lookups.
  • It can also be used to access information on the fly basis for better processing. Can also be used to store session or caching data.
MERGE:

Provides functionality to combine many identical MyISAM tables into one logical table. Here identical means that the tables in collection should have identical column and index information.

Use-case:

Good for data warehousing environments.

CSV:
  • CSV storage engine does not store data in binary format, its stores data in CSV format (comma separated values). This creates limitation in storing data. It is not an efficient storage type to store large volume of data, although supports BLOB data type.
  • Provides table level locking
  • No indexing support
  • No transaction support
Use-case:

As it stores data in CSV format, it is effective and easiest method for data exchange. CSV files generated can be easily imported to many different software packages. Best use-case is when we want data to be transfer to spreadsheet for later use.

FEDERATED :

FEDERATED storage engine  provides the functionality to access remote MySQL database without using replication or cluster technology. It offers the ability to separate MySQL servers to create one logical database from many physical servers. Queries on the local server are automatically.

Use-case:

Good for distributed environments.

Limitation:
  • Remote server to be accessed must be MySQL server.
  • The remote table must exist before we try to access remote table through Federated table.
  • No support for transaction.
  • FEDERATED storage engine supports SELECT, INSERT, UPDATE, DELETE and indexes. It does not have support for DROP, ALTER statement.
  •  FEDERATED storage engine does not work on query cache.
ARCHIVE:

The ARCHIVE storage engine is used to store large volume of data as a small footprint. It stores the data in compressed form. It support only select and insert operation. It has no support for indexes.

Use-case:

As it stores data in compressed form and does not allow to modify data, it can be used to store log data or sales transaction data or old invoice data or any information that is not in active use.  

BLACKHOLE:

BLACKHOLE engine does not actually store any data. Although it allows us to create tables and indexes, all SQL statements that would add or update information to the database are executed without writing any data. It seems like of no use, but it does allow us to test database structures and play with table definitions without actually creating any data. Most importantly, if statement based binary logging is enabledSQL statements executed on BLACKHOLE databases are written to the binary log and therefore they can be replicated to slave databases. This can be useful as a repeater or filter mechanism.

Use-case:
Consider your application requires slave-side filtering rules, but transferring all binary log data to the slave first results in too much network traffic. In such a scenario, it is possible to set up on the master host a “dummy” slave process whose default storage engine is BLACKHOLE.


The master writes to its binary log. The “dummy” mysqld process acts as a slave, applying the desired combination of replicate-do-* and replicate-ignore-* rules, and writes a new, filtered binary log of its own. This filtered log is provided to the slave.
The dummy process does not actually store any data, so there is little processing overhead incurred by running the additional mysqld process on the replication master host. This type of set up can be repeated with additional replication slaves.

Conclusion:

Apart from the storage engines mentioned above, MySQL also provide functionality to develop custom storage engine and integrate it in MySQL server with the help of  Pluggable Storage Engine Architecture. No storage engine is ideal for all use-cases. Each have its own pros and cons. If we want data to be secured, then InnoDB is the way to go. If we want fast data retrieval  then we must go for MyISAM. Thus selection of storage engine depends on application requirement.

Reference URL:
http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html