Whether to use int type (self-increment or manual increment), GUID or combo.
Considering these problems is nothing more than considering efficiency and ease of use. The following are the advantages and disadvantages of four primary key generation methods:
Automatic growth field
Advantages: 1 Simple and easy to use.
Disadvantages: 1 Different databases get the current value in different ways;
2. It is difficult to apply data migration between multiple databases.
3. Unable to gather
Artificial growth field
Advantages: 1 You can get the latest key values.
2. It can ensure that there will be no key-value conflict in the process of data merging.
Disadvantage 1. Usually, a separate table is needed to store the current primary key value;
2. Add database access to obtain the current primary key value;
3. Considering concurrent conflicts, increasing system complexity.
4. unable to gather
Use GUID
Advantages: 1 Directly generate GUID, get the latest key value to fill the primary key, which is convenient to use;
2. It can ensure that there will be no key-value conflict in the process of data merging;
3. It avoids the overhead of obtaining the current key value by the first two methods.
Disadvantages 1. Occupy more storage space;
2. Indexing is time-consuming;
3. The efficiency of multi-table link query is not as good as that of int type.
Use the "comb" type
Advantages 1. Retain the existing advantages of GUID;
2. Combine time information with GUID to increase the order to improve the index efficiency.
Disadvantages 1. Need to design the generation algorithm of comb;
2. It takes up more storage space like GUID;
3. The efficiency of multi-table link query is not as good as int, but better than GUID.
From the comparison of the above table, it can be seen that the focus of the problem is to use plastic surgery with high efficiency but poor controllability and portability, or to use a character-type primary key with high controllability and portability but low efficiency and large storage capacity. It is really impossible to have both. COMB needs to design a generation algorithm to increase the complexity of the program. If the algorithm is improper, it will produce unexpected results. GUID can also improve performance by optimizing the index, so COMB is not used for the time being. )
From the point of view of database, although the efficiency of plastic query is the highest, there are great problems in data merging and transplantation. At the same time, under the condition of high concurrency, all kinds of plastic generation methods face this problem, which is not conducive to clustering processing. The character type generated by GUID can solve the problems of integration and concurrency, but it takes up a lot of space and has low query efficiency, which may become a problem after the system runs.
From the point of view of program development, it is very convenient to generate the primary key through plastic generation, but the primary key can only be obtained from the database after the whole transaction is completed. At the same time, when saving multiple related tables, it is necessary to save the main table first and pass the generated primary key to the word table, which will also lead to the loss of performance and the inability to obtain the primary key directly, which will increase the complexity of program development and processing. Character-based primary keys need programmers to define the rules for generating primary keys, and need to intervene in the generation of primary keys, but the primary keys can be obtained before being inserted into the database, which is convenient for program processing.
From the point of view of system data, there may be a lot of concurrency in business data, so it is very convenient to use GUID, and it is easy to cluster in the case of large data order. Archive data has a small amount of concurrency, but a large number of citations, and there are many cases of data consolidation. It is not appropriate to use shaping completely, and using GUID completely will lead to the loss of performance. A more compromise scheme is needed, which not only ensures the use of strings with strong controllability and unique identification, but also minimizes the number of bytes occupied by strings. As for the auxiliary data of the system, it should be used flexibly according to the actual situation, not rigidly unified, and should be shaped as much as possible in the case of a small amount of data.