How to improve the query speed of MySQL database with millions of records

How to improve the query speed of mysql when processing millions of data?

Recently, due to the need of work, I began to pay attention to the optimization method of select query statement in Mysql database.

In the actual project, it is found that when the data volume of mysql table reaches millions, the query efficiency of ordinary sql decreases linearly, while if where are many query conditions, the query speed is simply unbearable. I once tested a conditional query on a table containing more than 4 million records (with index), and the query time was as high as 40 seconds. I believe that any user will be crazy about such a high query delay. Therefore, how to improve the query efficiency of sql statements is very important. The following are 30 optimization methods of SQL query statements widely circulated on the Internet:

1, you should try to avoid using it in the where clause! = or

2. In order to optimize the query, we should avoid scanning the whole table as much as possible. First, we should consider building indexes on the columns involved in where and order by.

3. Try to avoid judging the null value of the field in the where clause, otherwise the engine will give up using the index and scan the whole table, for example:

Select the id from t, where num is empty.

You can set the default value of num to 0 to ensure that the num column in the table has no null value, and then query it like this:

Select id from t, where num=0.

4. Try to avoid using the or join condition in the where clause, otherwise the engine will give up using the index and scan the whole table, for example:

Select the id from t, where num= 10 or num=20.

You can query like this:

select id from t where num= 10

Joint ownership

Select id from t, where num=20.

5. The following query will also lead to full table scanning: (cannot be preceded by a percent sign)

Select the id from t, where the name is similar to "%c%".

In order to improve efficiency, full-text retrieval can be considered.

6, in and not in should also be used with caution, otherwise it will lead to full table scanning, such as:

select id from t where num in( 1，2，3)

For continuous values, you can use between instead of in:

Select the id from t, where num is between 1 and 3.

7. If parameters are used in the where clause, it will also lead to a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot postpone the selection of access plan until runtime; You must select it at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown, so it cannot be used as an input item for index selection. The following statement scans the entire table:

select id from t where num=@num

You can force the query to use an index:

Select id from t with(index) where num=@num.

8. Try to avoid performing expression operations on fields in the where clause, which will cause the engine to give up using indexes and scan the whole table. For example:

Select id from t where num/2= 100.

It should read:

select id from t where num = 100 * 2

9. Function operations on fields in the where clause should be avoided as much as possible, which will cause the engine to abandon the use of indexes and scan the whole table. For example:

Select ID from where substring (name, 1, 3)=' abc '- name id starting with ABC.

ID generated by select Id from where datediff (date of creation,' 2005-11-30') = 0-"2005-11-30".

It should read:

Select the id from t, where the name is "abc%".

select id from t where create date & gt； ='2005- 1 1-30' and creation date <' 2005-12-1'

10. Do not perform functions, arithmetic operations or other expression operations on the left side of "=" in the where clause, otherwise the system may not use the index correctly.

1 1. When the index field is used as a condition, if the index is a composite index, the first field in the index must be used as a condition to ensure that the index is used by the system, otherwise the index will not be used, and the field order should be as consistent as possible with the index order.

12, don't write some meaningless queries, if you need to generate an empty table structure:

Select col 1, col2 into #t from t where 1=0.

This kind of code will not return any result set, but it will consume system resources. It should be changed to this:

Create table #t(…)

13, many times it is a good choice to use exists instead of in:

Select the number from A, where the number is in (Select the number from B).

Replace with the following statement:

select num from a where exists(select 1 from b where num = a . num)

14. Not all indexes are valid for the query. SQL optimizes the query according to the data in the table. When there is a lot of duplicate data in the index column, the SQL query may not use the index. For example, if there are almost half of the fields in a table, men and women, even if the index is based on gender, it will not play a role in query efficiency.

15, the more indexes, the better. Although the index can improve the efficiency of corresponding selection, it will also reduce the efficiency of insertion and update. Because the index may be rebuilt during insertion or update, how to build the index needs to be carefully considered according to the specific situation. The number of indexes in a table should not exceed 6. If there are too many indexes, consider whether it is necessary to establish indexes on some columns that are not commonly used.

16. Updating clustered index data columns should be avoided as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the column values change, the order of the whole table records will be adjusted, which will consume considerable resources. If the application system needs to update the clustered index data columns frequently, it is necessary to consider whether the index should be built as a clustered index.

17. Try using a numeric field. If the field only contains numerical information, try not to design it as characters, which will reduce the performance of query and connection and increase the storage overhead. This is because the engine will compare each character in the string one by one when processing queries and connections, but only one comparison is enough for the number type.

18, use varchar/nvarchar instead of char/nchar as much as possible, because firstly, the storage space of variable-length fields is small, which can save storage space, and secondly, for queries, the search efficiency is obviously higher in relatively small fields.

19. Don't use select * from t anywhere, replace "*" with a specific field list, and don't return any unnecessary fields.

20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, please note that the index is very limited (only the primary key index).

2 1. Avoid creating and deleting temporary tables frequently to reduce the consumption of system table resources.

22. Temporary tables are not unavailable. Using them correctly can make some routines more effective, for example, when it is necessary to repeatedly refer to data sets in large tables or public tables. However, for one-time events, it is best to use export tables.

23. When creating a temporary table, if you insert a large amount of data at a time, you can use select into instead of create table to avoid creating a large number of logs and improve the speed; If the amount of data is not large, in order to reduce the resources of system tables, tables should be created first and then inserted.

24. If temporary tables are used, all temporary tables must be explicitly deleted at the end of the stored procedure. Truncate the table first, and then delete the table to avoid long-term locking of the system table.

25. Try to avoid using cursors, because cursors are inefficient. If the data of cursor operation exceeds 6.5438+0 million rows, then it is necessary to consider rewriting.

26. Before using the cursor-based method or temporary table method, we must first find a set-based method to solve the problem, and the set-based method is usually more effective.

27. Like temporary tables, cursors are not unavailable. Using FAST_FORWARD cursor for small data sets is usually superior to other line-by-line processing methods, especially when multiple tables must be referenced to obtain the required data. A routine that contains "Total" in the result set is usually faster than using a cursor. If development time permits, both cursor-based method and set-based method can be tried to see which method works better.

28. Set SET NOCOUNT ON at the beginning of all stored procedures and triggers, and set set SET NOCOUNT OFF at the end. There is no need to send the DONE_IN_PROC message to the client after executing each statement of the stored procedure and trigger.

29. Try to avoid returning a large amount of data to the client. If the amount of data is too large, it is necessary to consider whether the corresponding requirements are reasonable.

30. Try to avoid big transaction operations and improve system concurrency.

How much does it cost to treat freckles in Fuzhou?

Introduction to the Basic Formation of Tribal Conflict

How to treat tinnitus in Jinan? My ears are going to ring. It's really hard.

“20,000 yuan for a hot pot meal is not expensive.” Where does the pseudo-socialite get her confidence?

Don't want to open a copywriting circle of friends (selected 80 sentences)

Plastic surgery

On the last day of 2023, a complete greeting book.

Introduction to plastic surgery

Friendship Plastic Hospital Affiliated to Southern Medical University

Nipple invagination, since childhood. What operation does it take to go to North Korea's Chenyuan Surgical Hospital?