2. High resolution (dispersion)
3. Small length
4. Cover commonly used query fields as much as possible.
High discrimination (dispersion): 654.38+00,000 users, and the gender is basically 500,000 male/female, so the discrimination is very low.
The index length directly affects the size of the index file, the speed of adding, deleting and modifying, and indirectly affects the query speed (occupying more memory). In the joint index, the field with high discrimination ranks first.
For some long fields that need to be queried frequently, the front part can be intercepted as an index.
For example, the idiom field in the idiom table is about 4-20 in length.
For the values in this column, cut the part from left to right to build the index.
1: The shorter truncation, the higher repetition, the smaller discrimination and the worse indexing effect.
2. The longer the interception time, the lower the repetition and the higher the discrimination, and the better the index effect, but the longer the index length, the greater the impact-adding, deleting and changing slowly, which indirectly affects the query.
Therefore, it is necessary to strike a balance between the degree of differentiation and the length.
Methods: Intercept different lengths, conduct discrimination test, and get the appropriate length.
select((select count(distinct left(` word `, 1))from dict)/(select count(*)from dict))
select((select count(distinct left(` word `,2))from dict)/(select count(*)from dict))
select((select count(distinct left(` word `,3))from dict)/(select count(*)from dict))
Because of the left prefix principle of innodb, xxx% is valid and %xxx is invalid.
How to index the fields with low left prefix discrimination?
For example, the URL is prefixed with
Hint 1:
Store the data in reverse, such as moc.udib.www: ptth;
Tip 2:
Use pseudo-hash
Add a url_crc32 field,
Use crc32 algorithm to convert URL into plastic storage, and query the crc32 value of URL when querying.
Crc32 is a hashing algorithm, which can calculate a string as 32 integers.
The calculation results of crc32 may be repeated, but the probability is not high, so you can query and then filter.
Multicolumn index
Considering factors, the query efficiency and discrimination of columns, but also combined with specific business.