[mysql advanced] query optimization principles and solutions (6)

0 Preface

The query operation in mysql is the top priority, and the core of most optimizations is to improve query efficiency. We have explained the optimization of indexes before, and indirectly also explained the optimization of some query operations. Today, we will explain the optimization plan for queries.

1. Reasons affecting the query

First of all, we need to know what is the reason for the slow query, in order to know how to tune it. The main reasons that affect the query are as follows:

  • Network: When calling mysql for query, the first thing to do is the network connection. Poor network directly affects the speed of the query. At the same time, the cost of network transmission with large amount of data is also high, and the efficiency is also low.
  • CPU: When the CPU usage is too high, the execution efficiency of mysql will be reduced, which will naturally affect the query efficiency
  • Disk IO: MySQL itself is based on disk storage. If the read and write performance of the disk decreases, the speed of accessing data will naturally decrease. At the same time, if the read data pages are too fragmented, the read performance will be degraded.
  • Context switching: The CPU gives each task a certain service time. When the time slice is switched, the state of the current task will be saved, and the next task will be loaded at the same time. This process is called context switching, and switching between different tasks will also consume time.
  • system call time
  • Generate statistics
  • lock wait time

Of course, these reasons are not all listed above, but most of the low-performance queries can be optimized by reducing the amount of data

2. Query optimization

2.1 Avoid unnecessary data queries

    1. Whether the unnecessary data is queried.
      According to the business requirements, just return the data required by the business, and do not query redundant columns.
    1. When multiple tables are associated, it is easy for many students to ignore whether all the columns are returned
      . They feel that I have limited the returned columns in the outermost layer, but there is no limit in the sub-query statement of the associated query. Certain columns, but all columns are queried in the subquery
    1. Avoid using select *
      Do not use select *, even if you really want to query all the data, you can also display all the column names. Instead of querying the newly added unneeded fields together.
      With the introduction of the persistence layer framework, many students like to use the simpler CRUD interface methods provided by the persistence layer framework such as mybatis-plus to implement queries, but they often forget to declare the columns to be queried .select(column...), which is no different from using select * directly
    1. Repeatedly query the same
      data. Consider using cache for data with high query frequency. Don’t think that mysql is omnipotent.

2.2 Use limit to limit queries

We often use limit to achieve paging, and many students may not know that we can also use limit to optimize queries.

We use actual cases to demonstrate its role:

In the new user function, each save requires checking whether the user account name exists. If it already exists, it is not allowed to add

There are many similar needs, think about how you will achieve it yourself?

The choice of many students is that I can directly query according to the account name, right?

select * from user where name = 'xxx'

Students who may have read the content of our previous section will immediately react: to avoid using select *, so:

select name from user where name='xxx'

But is this perfect? Look at the returned data, name? Is this what we want? We just want to judge whether the name of ‘xxx’ exists, not to query it, so in fact we want to see if the number of data with name=’xxx’ is greater than 0, so we can use count(*) As a query result, it is more portable than querying specific values.

select count(*) from user where name='xxx'

Is this the limit of our optimization? Not really! Thinking about the business, in fact, it only needs a Boolean value, that is to say, as long as it is determined whether the result is > 0, as long as a satisfactory result is found, I don’t need to check it later. Some students may say that the uniqueness check has been done, and there must be only one satisfactory result. But mysql doesn’t think so, what is the instruction it receives: how many rows of data with name=’xxx’

And we don’t need all the results, just need to judge whether it exists, so we can use the limit statement, so that after the first query is found, we don’t need to continue the subsequent query

select count(*) from user where name='xxx' limit 1

This method is suitable for:
1. When we clearly know the target number of data we want to query
2. In the sub-query of the associated query, the amount of data returned by the sub-query is limited by limit, of course, the premise is to meet the business requirements

In addition, it should be noted that some students will pay attention to using limit to optimize when writing sql, but they forget when using persistence layer frameworks such as mybatis-plus, which also requires attention!

When the start value of limit is too large, a full table query will still be performed.
We also explained several cases of index failure in the previous chapter. When the amount of data queried is large, the index will become invalid. Similarly, when the start value in limit is too large, it will cause mysql to scan a large amount of data to reach the start position, and instead fall back to full table scan. So try to avoid deep paging, but if deep paging cannot be avoided, we can optimize it by other means, which we will explain later.

2.3 Using index coverage

In the previous chapter, we introduced the concept of index coverage. For columns with high query frequency, we can build an index, so that the index coverage mechanism can be used to improve query efficiency.

For example, in the user table, to frequently obtain user list data, you need to query id and name. Then we can build an index on the name, select id,name from userand query through it, so that the data can be obtained from the name index tree, and the query efficiency is greatly improved.

2.4 Query a large amount of data, but return a small amount of data optimization method

For query scenarios that need to scan a large amount of data and only return a small amount of data, for example: generate monthly sales data of orders, a lot of order data is generated every day, but the statistical result is monthly sales. This is a typical business scenario of checking more and getting less

For such scenarios, our optimization method is:

  • Use index coverage to add indexes to all required fields
  • Change the table structure, appropriate field redundancy, or generate an intermediate table/summary table, and use timed tasks to periodically calculate the intermediate table/summary table data
  • Rewrite complex query statements, which we discuss separately below
  • Use search middleware, such as ES, when we need to count massive data in real time, the intermediate table/summary table can no longer be satisfied, we need to introduce some middleware to achieve

2.5 Splitting complex queries

For some complex queries, we can split them into a set of simple queries. The benefits of doing so are:

  • Make the cache hit rate higher
  • Reduce lock contention
  • Queries to reduce redundant records

For example, the following query

select p.id,p.name from product p left join order o on p.oid=o.id 
where o.amount > 100

can be decomposed into

select  id  from  order  where amount > 10 ;
 # Here xxx,yyy,zzz are the query results of the above sql 
select  id , name  from product where  id  in (xxx,yyy,zzz)

The association of results between different simple queries can be processed in the business layer

However, such a split should comprehensively consider the execution efficiency, not blindly.

2.6 Replacing union with [union all]

If the association result does not require deduplication, then use union all, because the union keyword will add distinct to the keyword when querying, and the cost is higher than union all

2.7 union optimization

When using union to associate two tables, if you require operations such as filtering, sorting, limit, etc., put the operations in the subquery and execute them first, so that a smaller number of results can be obtained, and then perform the union operation. The union operation will create a Temporary table, which can reduce the data of the temporary table and improve the efficiency

2.8 Use association queries instead of subqueries as much as possible (following MySQL 5.6)

Because the subquery will increase the temporary table, there are many sayings on the Internet that it is recommended to replace the subquery with the associated query to realize the optimization of the subquery. But this statement is not rigorous. Strictly speaking, this strategy can be followed before mysql5.6, but after mysql5.6 version, mysql has been optimized, and this strategy can be ignored

2.9 Deep Paging Optimization

When the start value of limit paging is large, that is, deep paging, the query will degenerate into a full table scan. In this case, we can use some statement transformation to achieve optimization

  • When the amount of data is large,
    the original writing method

select * from rental limit 1000000,5;

Better writing: use subqueries to optimize

SELECT
    * 
FROM
    rental a
    JOIN ( SELECT rental_id FROM rental LIMIT 1000000, 5 ) b 
                 ON a.rental_id = b.rental_id;

Summarize

The optimization of mysql query depends on the continuous pursuit of details. It also requires everyone to maintain a rigorous attitude when developing, and don’t think that it can be achieved. Such a mentality will never be able to temper the technology to a higher level, but I believe that By the time you arrive at this article, you already have such an attitude of excellence. Cheers and perseverance!

Leave a Comment

Your email address will not be published. Required fields are marked *