8 common SQL errors

8 common SQL errors

1. LIMIT statement
Paging query is one of the most commonly used scenarios, but it is also usually the most problematic place. For example, for the following simple statement, the general DBA thinks of is to add a composite index on the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance can be improved rapidly.

SELECT * FROM operation WHERE type ='SQLStats' AND name ='SlowLog' ORDER BY create_time LIMIT 1000, 10; Copy code

Well, maybe more than 90% of DBAs solve the problem and stop here. But when the LIMIT clause becomes "LIMIT 1000000,10", the programmer will still complain: Why is it slow that I only fetch 10 records?

To know that the database does not know where the 1,000,000th record starts, even if there is an index, it needs to be calculated from the beginning. When this kind of performance problem occurs, programmers are lazy in most cases.

In scenarios such as front-end data browsing and page turning, or big data export in batches, the maximum value of the previous page can be used as a parameter as a query condition. SQL is redesigned as follows:

SELECT * FROM operation WHERE type ='SQLStats' AND name ='SlowLog' AND create_time> '2017-03-16 14:00:00' ORDER BY create_time limit 10; Copy code

Under the new design, the query time is basically fixed and will not change as the amount of data grows.

2. The
mismatch between the query variable and the field definition type in the implicit conversion SQL statement is another common mistake. For example, the following statement:

mysql> explain extended SELECT * > FROM my_balance b > WHERE b.bpn = 14000000123 > AND b.isverified IS NULL; mysql> show warnings; | Warning | 1739 | Cannot use ref access on index'bpn' due to type or collation conversion on field'bpn' Copy code

The field bpn is defined as varchar(20), and MySQL s strategy is to compare strings after converting them into numbers. The function acts on the table field, the index is invalid.

The above situation may be parameters automatically filled in by the application framework, rather than the programmer's original intention. There are many application frameworks nowadays that are very complicated, and they are easy to use, and at the same time be careful that they may dig holes for themselves.

3. Associated updates and deletes
Although MySQL 5.6 introduces materialization features, it is necessary to pay special attention to the optimization of query statements only. For update or delete, it needs to be manually rewritten into JOIN.

For example, in the following UPDATE statement, MySQL actually executes a loop/nested subquery (DEPENDENT SUBQUERY), and the execution time can be imagined.

UPDATE operation o SET status ='applying' WHERE o.id IN (SELECT id FROM (SELECT o.id, o.status FROM operation o WHERE o.group = 123 AND o.status NOT IN ('done') ORDER BY o.parent, o.id LIMIT 1) t); Copy code

Implementation plan:

+----+--------------------+-------+-------+------- --------+---------+---------+-------+------+------ -----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+-------+------- --------+---------+---------+-------+------+------ -----------------------------------------------+ | 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary | | 2 | DEPENDENT SUBQUERY | | | | | | | | Impossible WHERE noticed after reading const tables | | 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort | +----+--------------------+-------+-------+------- --------+---------+---------+-------+------+------ -----------------------------------------------+ Copy code

After rewriting to JOIN, the selection mode of the subquery changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.

UPDATE operation o JOIN (SELECT o.id, o.status FROM operation o WHERE o.group = 123 AND o.status NOT IN ('done') ORDER BY o.parent, o.id LIMIT 1) t ON o.id = t.id SET status ='applying' Copy code

The execution plan is simplified to:

+----+-------------+-------+------+--------------- +-------+---------+-------+------+---------------- -------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+--------------- +-------+---------+-------+------+---------------- -------------------------------------+ | 1 | PRIMARY | | | | | | | | | Impossible WHERE noticed after reading const tables | | 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort | +----+-------------+-------+------+--------------- +-------+---------+-------+------+---------------- -------------------------------------+ Copy code

4. Mixed sort
MySQL cannot use indexes for mixed sort. But in some scenarios, there are still opportunities to use special methods to improve performance.

SELECT * FROM my_order o INNER JOIN my_appraise a ON a.orderid = o.id ORDER BY a.is_reply ASC, a.appraise_time DESC LIMIT 0, 20 Copy code

The execution plan is shown as a full table scan:

+----+-------------+-------+--------+------------- +---------+---------+---------------+---------+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra +----+-------------+-------+--------+------------- +---------+---------+---------------+---------+-+ | 1 | SIMPLE | a | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort | | 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 122 | a.orderid | 1 | NULL | +----+-------------+-------+--------+---------+--- ------+---------+-----------------+---------+-+ Copy code

Since is_reply has only two states, 0 and 1, we rewrite it in the following way, and the execution time is reduced from 1.58 seconds to 2 milliseconds.

SELECT * FROM ((SELECT * FROM my_order o INNER JOIN my_appraise a ON a.orderid = o.id AND is_reply = 0 ORDER BY appraise_time DESC LIMIT 0, 20) UNION ALL (SELECT * FROM my_order o INNER JOIN my_appraise a ON a.orderid = o.id AND is_reply = 1 ORDER BY appraise_time DESC LIMIT 0, 20)) t ORDER BY is_reply ASC, appraisetime DESC LIMIT 20; Copy code

5. EXISTS statement When
MySQL treats the EXISTS clause, it still uses the nested subquery execution mode. Such as the following SQL statement:

SELECT * FROM my_neighbor n LEFT JOIN my_neighbor_apply sra ON n.id = sra.neighbor_id AND sra.user_id ='xxx' WHERE n.topic_status <4 AND EXISTS(SELECT 1 FROM message_info m WHERE n.id = m.neighbor_id AND m.inuser ='xxx') AND n.topic_type <> 5 Copy code

The execution plan is:

+----+--------------------+-------+------+-----+-- ----------------------------------------+--------- +-------+---------+ -----+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+------+ -----+-- ----------------------------------------+--------- +-------+---------+ -----+ | 1 | PRIMARY | n | ALL | | NULL | NULL | NULL | 1086041 | Using where | | 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using where | | 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition; Using where | +----+--------------------+-------+------+ -----+-- ----------------------------------------+--------- +-------+---------+ -----+ Copy code

Remove exists and change to join to avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.

SELECT * FROM my_neighbor n INNER JOIN message_info m ON n.id = m.neighbor_id AND m.inuser ='xxx' LEFT JOIN my_neighbor_apply sra ON n.id = sra.neighbor_id AND sra.user_id ='xxx' WHERE n.topic_status <4 AND n.topic_type <> 5 Copy code

New execution plan:

+----+-------------+-------+--------+ -----+------- -----------------------------------+---------+ ---- -+------+ -----+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+ -----+------- -----------------------------------+---------+ ---- -+------+ -----+ | 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition | | 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where | | 1 | SIMPLE | sra | ref | | idx_user_id | 123 | const | 1 | Using where | +----+-------------+-------+--------+ -----+------- -----------------------------------+---------+ ---- -+------+ -----+ Copy code

6. Conditional pushdown The conditions in which
external query conditions cannot be pushed down to complex views or sub-queries are as follows:

  • Aggregate subqueries;
  • Subquery with LIMIT;
  • UNION or UNION ALL subqueries;
  • Subqueries in the output field;

As in the following statement, it can be seen from the execution plan that its condition acts after the aggregate subquery:

SELECT * FROM (SELECT target, Count(*) FROM operation GROUP BY target) t WHERE target ='rm-xxxx' +----+-------------+------------+-------+--------- ------+-------------+---------+-------+------+---- ---------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+--------- ------+-------------+---------+-------+------+---- ---------+ | 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 514 | const | 2 | Using where | | 2 | DERIVED | operation | index | idx_4 | idx_4 | 519 | NULL | 20 | Using index | +----+-------------+------------+-------+--------- ------+-------------+---------+-------+------+---- ---------+ Copy code

After confirming that the query conditions can be pushed down directly from the semantics, rewrite as follows:

SELECT target, Count(*) FROM operation WHERE target ='rm-xxxx' GROUP BY target Copy code

The execution plan becomes:

+----+-------------+-----------+------+----------- ----+-------+---------+-------+------+------------ --------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+----------- ----+-------+---------+-------+------+------------ --------+ | 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index | +----+-------------+-----------+------+----------- ----+-------+---------+-------+------+------------ --------+ Copy code

7. Narrow the scope in advance
1. upload the initial SQL statement:

SELECT * FROM my_order o LEFT JOIN my_userinfo u ON o.uid = u.uid LEFT JOIN my_productinfo p ON o.pid = p.pid WHERE (o.display = 0) AND (o.ostaus = 1) ORDER BY o.selltime DESC LIMIT 0, 15 Copy code

The original intention of the SQL statement is: first make a series of left joins, and then sort the first 15 records. It can also be seen from the execution plan that the estimated number of sorted records in the last step is 900,000, and the time consumption is 12 seconds.

+----+-------------+-------+--------+------------- --+---------+---------+-----------------+--------+ -------------------------------------------------- --+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+------------- --+---------+---------+-----------------+--------+ -------------------------------------------------- --+ | 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | | 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+--------+------------- --+---------+---------+-----------------+--------+ -------------------------------------------------- --+ Copy code

Since the final WHERE condition and sorting are for the leftmost main table, you can sort my_order to reduce the amount of data in advance before doing the left join. After SQL is rewritten as follows, the execution time is reduced to about 1 millisecond.

SELECT * FROM ( SELECT * FROM my_order o WHERE (o.display = 0) AND (o.ostaus = 1) ORDER BY o.selltime DESC LIMIT 0, 15 ) o LEFT JOIN my_userinfo u ON o.uid = u.uid LEFT JOIN my_productinfo p ON o.pid = p.pid ORDER BY o.selltime DESC limit 0, 15 Copy code

Check the execution plan again: After the subquery is materialized (select_type=DERIVED), it participates in the JOIN. Although the estimated row scan is still 900,000, after using the index and the LIMIT clause, the actual execution time becomes very small.

+----+-------------+------------+--------+-------- -------+---------+---------+-------+--------+----- -----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+-------- -------+---------+---------+-------+--------+----- -----------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using temporary; Using filesort | | 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | | 1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) | | 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where | +----+-------------+------------+--------+-------- -------+---------+---------+-------+--------+----- -----------------------------------------------+ Copy code

8. Push down the intermediate result set
Let's look at the following example that has been initially optimized (the main table in the left join has priority on the query conditions):

SELECT a.*, c.allocated FROM ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a LEFT JOIN ( SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated FROM my_resources GROUP BY resourcesid) c ON a.resourceid = c.resourcesid Copy code

Are there any other problems with this statement? It is not difficult to see that the subquery c is a full-table aggregation query, which will cause the performance of the entire statement to decrease when the number of tables is particularly large.

In fact, for the subquery c, the final result set of the left join only cares about the data that can match the resourceid of the main table. So we can rewrite the statement as follows, and the execution time drops from the original 2 seconds to 2 milliseconds.

SELECT a.*, c.allocated FROM ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a LEFT JOIN ( SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated FROM my_resources r, ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a WHERE r.resourcesid = a.resourcesid GROUP BY resourcesid) c ON a.resourceid = c.resourcesid Copy code

But the subquery a appears many times in our SQL statement. This way of writing not only has extra overhead, but also makes the entire sentence apparently complicated. Use the WITH statement to rewrite again:

WITH a AS ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) SELECT a.*, c.allocated FROM a LEFT JOIN ( SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated FROM my_resources r, a WHERE r.resourcesid = a.resourcesid GROUP BY resourcesid) c ON a.resourceid = c.resourcesid Copy code

Summary The
database compiler generates an execution plan, which determines the actual execution mode of SQL. But the compiler is only doing its best, and all database compilers are not perfect.

Most of the scenarios mentioned above also have performance problems in other databases. Understanding the characteristics of the database compiler can avoid its shortcomings and write high-performance SQL statements.

When designing data models and writing SQL statements, programmers must bring in the idea or consciousness of algorithms.

To write complex SQL statements, develop the habit of using the WITH statement. Concise and clear SQL statements can also reduce the burden on the database.