As mentioned above, if there are many $ term in the name field, such as Kara, Sarah, Erin, Ada, Patty, Kate and selena, if they are arranged in this order, it will be very slow to find a specific $ term, because there is no sorting of $ term, and all of them need to be filtered to find a specific $ term. After sorting, it becomes: Ada, Kara, Elaine, Kate, Patty, Sarah, Selena.
In this way, we can use dichotomy to find out the $ term of the target faster than complete traversal. If the way to organize these $ term is to use the $ term dictionary, then using the $ term dictionary, we can find the target with less comparisons and disk reads. However, the random reading operation of the disk is still very expensive, so it is necessary to cache some data into the memory as little as possible, but the whole $ term dictionary itself is too big to fit into the memory at all, so there is a $ Term index, which is a bit like the typing chapter table of the dictionary. For example:
Page $ TERM. .......................... starts with an A.
Call $ TERM, ........................., starting with C.
Call $ TERM........................e, starting with e.
If all $ term are English characters, maybe this $ term index really consists of 26 English character tables. But the fact is that $ term is not all English characters, and $ term can be any byte array. In addition, each of the 26 English characters may not have the same $ term. For example, there may be no $ term at the beginning of an X character, while there are many $ term at the beginning of an S. The actual $ term index is a dictionary tree (trie tree):
The above example is a trie tree, which contains a ""to ""tea ""ted ""ten ""I ""in ""inn ". This tree does not contain all $ term, but contains some prefixes of $ term. With the $ term index, you can quickly find a quotation in the $ term dictionary, and then search it sequentially from this position. Using some compression techniques, the size of the $ term index may be only one tenth of that of all $ term, so the whole $ Term index can be cached in memory, which is the overall effect:
From the $ Term index to the $ Term dictionary, and then to the inverted list, the process of querying results through keywords in a certain field is clear, and it is also very simple to query intersection and union through the inverted list of multiple keywords (inverted index introduces the process of intersection and union)
Comparing with MySQL's B+ tree indexing principle, we can find that: