By Peter Zaitsev, InfoWorld |

About |

Emerging tech dissected by technologists

7 keys to better MySQL performance

As size and load grow, performance tends to slow. Keep these keys in mind to keep MySQL running smoothly

One of the ways we measure applications is through performance. One of the metrics for application performance is user experience, which generally translates to “did the user need to wait longer than a reasonable amount of time to get what they wanted.”

This metric can mean different things in different scenarios. For a mobile shopping app, response times can’t be more than a couple of seconds. For an employee’s HR page, responses might be allowed to take a few seconds longer.

We have a lot of research on how performance impacts user behavior:

Whatever the standard, it is essential to maintain good performance for applications. Otherwise, users will complain (or worse, go to a different application). One of the factors that affects application performance is database performance. The interaction between applications, websites, and databases is critical in establishing the level of application performance.

A central component of this interaction is how applications query the database and how the database responds to requests. By any measure, MySQL is one of the most popular database management systems. More enterprises are shifting to MySQL (and other open source databases) as a database solution in their production environments.

There are many methods of configuring MySQL that can help ensure that your database responds to queries quickly, and with a minimum amount of application performance degradation.

The following are some essential tips to help you optimize the performance of your MySQL database.

MySQL optimization key #1: Learn how to use `EXPLAIN`

The two most important decisions you make with any database are designing how relationships between application entities are mapped to tables (the database schema) and designing how applications get the data they need in the format they need it (queries).

Complicated applications can have complicated schemas and queries. If you’re going to get the performance and scale your applications require, you can’t just count on intuition to understand how queries will be executed.

Instead of guessing and hoping, you should learn how to use the EXPLAIN command. This command shows you how a query will be executed, and gives you insight into both what performance you can expect and how the query will scale with changing data size.

There are a number of tools—such as MySQL Workbench—that can visualize the EXPLAIN output for you, but you still need to understand the basics to make sense of it.

There are two different formats in which the EXPLAIN command provides output: the old fashioned table format, and a more modern, structured JSON document that provides significantly more detail (shown below):

mysql> explain format=json select avg(k) from sbtest1 where id between 1000 and 2000 \G
*************************** 1. row ***************************
EXPLAIN: {
  “query_block”: {
    “select_id”: 1,
    “cost_info”: {
      “query_cost”: “762.40”
    },
    “table”: {
      “table_name”: “sbtest1”,
      “access_type”: “range”,
      “possible_keys”: [
        “PRIMARY”
      ],
      “key”: “PRIMARY”,
      “used_key_parts”: [
        “id”
      ],
      “key_length”: “4”,
      “rows_examined_per_scan”: 1874,
      “rows_produced_per_join”: 1874,
      “filtered”: “100.00”,
      “cost_info”: {
        “read_cost”: “387.60”,
        “eval_cost”: “374.80”,
        “prefix_cost”: “762.40”,
        “data_read_per_join”: “351K”
      },
      “used_columns”: [
        “id”,
        “k”
      ],
      “attached_condition”: “(`sbtest`.`sbtest1`.`id` between 1000 and 2000)”
    }
  }
}

One component you should look at is “query cost.” Query cost refers to how expensive MySQL considers this particular query in terms of the overall cost of query execution, and is based on many different factors.

Simple queries generally have query cost of less than 1,000. Queries with a cost between 1,000 and 100,000 are considered medium-cost queries, and generally are fast if you’re only running hundreds of such queries per second (not tens of thousands).

Queries with a cost of more than 100,000 are expensive queries. Often these queries will still run fast when you’re a single user on the system, but you should think carefully about how frequently you’re using such queries in your interactive applications (especially as the number of users grows).

Of course these are ballpark performance numbers, but they demonstrate the general principle. Your system might handle query workloads better or worse, depending on its architecture and configuration.

Chief among the factors that determine query cost is whether the query is using indexes correctly. The EXPLAIN command can tell you if a query isn’t using indexes (usually because of how the indexes are created in the database, or how the query itself is engineered). This is why it is so important to learn to use EXPLAIN.

MySQL optimization key #2: Create the right indexes

An index improves query performance by reducing the amount of data in the database that queries must scan. Indexes in MySQL are used to speed up access in the database and help enforce database constraints (such as UNIQUE and FOREIGN KEY).

Database indexes are much like book indexes. They are kept in their own location, and they contain information already in the main database. They are a reference method or a map to where data is located. Indexes do not change any of the data in a database. They simply point to the location of the data.

There are no indexes that are always right for any workload. You should always look at indexes in the context of the queries that the system is running.

Well-indexed databases not only run faster, but even a single missing index can slow a database to a crawl. Use EXPLAIN (as recommended previously) to find missing indexes and add them. But be careful: Don’t add indexes you don’t need! Unnecessary indexes slow databases down (check out my presentation on MySQL indexing best practices).

MySQL optimization key #3: No defaults!

Like any software, MySQL has many configurable settings that can be used to modify behavior (and ultimately, performance). And like any software, many of these configurable settings are ignored by administrators and end up being used in their default mode.

To get the best performance from MySQL, it is important to understand the configurable MySQL settings and—more important—set them to work best for your database environment.

By default, MySQL comes tuned for a small-scale development installation, not for production scale. You typically want to configure MySQL to use all of the memory resources available, as well as to allow the number of connections that your application requires.

Here are three MySQL performance tuning settings that you should always examine closely:

innodb_buffer_pool_size: The buffer pool is where data and indexes are cached. This is the main reason for using a system with a large amount of RAM as your database server. If you’re only running the InnoDB storage engine, you typically allocate around 80 percent of your memory for the buffer pool. If you are running very complicated queries, or you have very large numbers of concurrent database connections, or you have very large numbers of tables, you might need to take this value down a notch to allocate more memory for other purposes.

As you set the InnoDB buffer pool size, you need to make sure you don’t set it too large or it will cause swapping. This absolutely kills your database performance. An easy way to check is to look at Swapping Activity in the System Overview graph in Percona Monitoring and Management:

As this graph shows, some swapping is fine every so often. If, however, you see sustained swapping activity of 1MB per second or more, you will need to reduce your buffer pool size (or other memory uses).

If you don’t get the value for innodb_buffer_pool_size correctly on the first go, don’t worry. Starting with MySQL 5.7 you can change the size of the InnoDB buffer pool dynamically, without restarting the database server.

innodb_log_file_size: This is the size of a single InnoDB log file. By default, InnoDB uses two values so that you can double this number to get the size of the circular redo log space InnoDB uses to make sure your transactions are durable. This also optimizes applying changes to the database. Setting innodb_log_file_size is a question of trade-offs. The larger the redo space you allocate, the better the performance you will achieve for a write-intensive workload, but the longer the time for crash recovery if your system suffers power loss or other problems.

How do you know if your MySQL performance is limited by your current InnoDB log file size? You can tell by looking at how much of the usable redo log space is actually used. The easiest way is to look at the Percona Monitoring and Management InnoDB Metrics dashboard. In the graph below, the InnoDB log file size is not large enough, as the space used pushes very close to how much usable redo log space is available (indicated by the red line). Your log file size should be at least 20 percent greater than the amount of space used to keep your system performing optimally.

max_connections: Large-scale applications often require much more than the default number of connections. Unlike other variables, if you don’t don’t set this correctly you won’t have performance issues (per se). Instead, if the number of connections is not sufficient for your application needs, your application simply won’t be able to connect to the database (which looks like downtime to your users). Getting this variable right is important.

It can be difficult to know how many connections you need for complex applications with many components running on multiple servers. Fortunately, MySQL makes it very easy to see how many connections are used at peak operation. Typically you want to ensure there is at least a 30 percent gap between the maximum number of connections your application uses and the maximum number of connections available. An easy way to view these numbers is to use the MySQL Connections Graph in the MySQL Overview dashboard in Percona Monitoring and Management. The graph below shows a healthy system, where there is a good number of additional connections available.

One thing to keep in mind is that if your database runs slowly, applications often create an excessive number of connections. In such cases, you should work on the database performance problem rather than simply allowing more connections. More connections can make the underlying performance problem worse.

(Note: When you set the max_connections variable significantly higher than the default value, you often need to consider increasing other parameters such as the size of the table cache and the number of open files MySQL allows. This goes beyond the scope of this article, however.)

MySQL optimization key #4: Keep the database in memory

We have seen a transition to solid state drives (SSDs) in recent years. Even though SSDs are much faster than spinning hard drives, they are still no match for having data available in RAM. This difference comes not just from the storage performance itself, but also from the additional work the database must do when it retrieves the data from disk or SSD storage.

With recent hardware improvements, it is increasingly possible to get your database in memory—whether you’re running in the cloud or managing your own hardware.

The even better news is that you do not need to fit all of your database into memory to get a majority of the in-memory performance benefits. You just need to fit the working set of data into memory—the data that is accessed most frequently.

You may have seen some articles providing some specific numbers about what portion of the database you should keep in memory, ranging from 10 percent to 33 percent. In fact, there is no “one size fits all” number. The amount of data to fit into memory for the best performance advantage is workload related. Rather than looking for a specific “magic” number, you should check how much I/O the database is running at its steady state (typically a few hours after it is started). Look at reads, because reads can be completely eliminated if your database is in memory. Writes will always need to happen, whatever amount of memory you have available.

Below you can see I/O happening in the InnoDB I/O Graph in the InnoDB Metrics dashboard of Percona Monitoring and Management.

In the graph above, you see spikes as high as 2,000 I/O operations per second, which shows that (at least for some portions of the workload) the database working set does not fit well into memory.

MySQL optimization key #5: Use SSD storage

If your database doesn’t fit in memory (and even if it does) you still need fast storage to handle writes and to avoid performance issues as the database warms up (right after restart). These days fast storage means SSDs.

Some “experts” still advocate the use of spinning disks for cost or reliability reasons. Frankly, when it comes to operational databases, these arguments tend to be outdated or simply wrong. Today, SSDs offer impressive performance and reliability at a great price point.

Not all SSDs are created equal, however. For database servers, you should use an SSD that is designed for server workloads and that takes great care to protect your data (from corruption during power loss, for example). Avoid using commercial SSDs designed for consumer desktop computers and laptops.

An SSD directly attached through NVMe or Intel Optane technology offers the best performance. Even if attached remotely as a SAN, NAS, or cloud block device, SSDs still offer far superior performance compared to spinning disks.

MySQL optimization key #6: Scale out

Even the highest performance server has its limits. There are two ways to scale: up and out. Scaling up means buying more hardware. This can be expensive, and hardware gets obsolete quickly. Scaling out to handle more load has several benefits:

You can take advantage of smaller and less costly systems.
It’s much faster and easier to scale linearly by scaling out.
Because the database is spread across more than one physical machine, the database is protected from a single point of hardware failure.

Although scaling out comes with advantages, it also has certain limitations. Scaling out requires replication, such as basic MySQL Replication or Percona XtraDB Cluster, for data synchronization. But in return you gain both additional performance and high availability. If you need even higher scale, use MySQL sharding.

You also need to make sure that applications connecting to the clustered architecture can find the needed data—usually through some proxy servers and load balancers such as ProxySQL or HAProxy.

While planning to scale out, avoid scaling out too early. Working with distributed databases tends to be more complicated. Modern hardware and MySQL servers can go quite far using just a single server. The recently available MySQL 8 release candidate has shown itself capable of handling more than 2 million simple queries on a single system.

MySQL optimization key #7: Embrace observability

The best-designed systems are designed with observability in mind—MySQL is no exception.

Once you get your MySQL environment up, running, and properly tuned, you can’t just set it and forget it. Database environments can be fairly organic in that they are affected by changes to the system or workload. Be prepared for surprises such as traffic spikes, application bugs, and MySQL glitches. These things can and will happen.

When they do, you need to resolve them quickly and effectively. The only way to do this is to have some sort of monitoring solution set up and properly instrumented. This allows you to see what is happening in your database environment as it runs in production and to analyze server data when something goes wrong. Ideally the system allows you to prevent issues before they happen, or before they grow to the point that their impact becomes visible to the user.

Examples of monitoring tools are MySQL Enterprise Monitor, Monyog, and our own Percona Monitoring and Management (PMM), which has the additional advantage of being free and open source. These tools provide excellent operational visibility for monitoring and troubleshooting.

As more companies turn to open source databases (especially MySQL) to manage and serve their business data in large-scale production environments, they’ll need to focus on keeping these databases tuned and running at optimum efficiency. As with all things crucial to your business goals, your database performance can make or break how your year turns out. MySQL is an excellent database solution for powering your applications and websites, but it needs to be tuned to match your needs and monitored to find and prevent bottlenecks and performance issues.

Peter Zaitsev is the co-founder and CEO of Percona, a provider of enterprise-class MySQL and MongoDB solutions and services. Co-author of “High Performance MySQL” published by O’Reilly, one of the most popular books on MySQL performance, Zaitsev blogs regularly at PerconaDatabasePerformanceBlog.com and speaks frequently at conferences worldwide.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Next read this:

MySQL optimization key #1: Learn how to use EXPLAIN