Starting with the main process of sentinel, this paper analyzes how sentinel collects flow indicators and completes traffic shaping.
First, let's look at a simple demonstration of sentinel. Just call SphU.entry to get the entry, and then call entry.exit after completing the business method.
SphU.entry will call Env.sph.entry, encapsulate the name and traffic flow direction into StringResourceWrapper, and then continue to call entry processing.
Enter the entry method of CtSph, and finally come to entryWithPriority and call InternalContextutil. InternalEnter initializes the context of ThreadLocal, then calls lookProcessChain to initialize the responsibility chain, and finally calls chain.entry to enter the responsibility chain for processing.
InternalContextutil。 InternalEnter will call the trueEnter method, which mainly generates the process from DefaultNode to ContextNameNode, and then generates the process of setting the context to contextHolder.
LookProcessChain has been optimized to support spi loading custom responsibility chain generator. If it is not defined, the default DefaultSlotChainBuilder will be used for loading. The slot positions and sequences loaded by default can be seen in the urban architectural drawings, and will not be repeated here.
Finally, the main event chain.entry enters the responsibility chain for processing, and each processor will analyze it separately in order.
NodeSelectorSlot comes first, mainly to get the DefaultNode corresponding to name and cache it, set it as the current node of the context, and then notify the next node.
The next node is ClusterBuilderSlot, continue to set ClusterNode and OriginNode for DefaultNode, and then notify the next node.
The next node is LogSlot, which simply prints the log, so I won't go into details.
The next node is StatisticSlot, which is a post-node. First, inform the next node that after processing,
1. If there are no errors, increase the number of threads and delivery requests of node, clusterNode, originNode and ENTRY_NODE.
2. If the error is PriorityWaitException, it will only increase the number of threads.
3. If the error is BlockException, set the error to node, and then increase the number of blocking requests.
4. If other errors are reported, just set the error to node.
The next node is FlowSlot, which is an important node for current limiting processing. Entering this node is to call checker.checkFlow for current limiting.
Come to the checkFlow method of FlowRuleChecker, call ruleProvider.apply to get the FlowRule list corresponding to the resource, then traverse the FlowRule and call canPassCheck to check the current restriction rule.
CanPassCheck will choose cluster current limiting or local current limiting according to the regular current limiting mode, and analyze them separately here.
PassLocalCheck is the entrance of local current limit. First, it will call SelectNodeByreQuestRandStrategy to select the node with current limit, and then it will call canPass for verification.
Selectnodebyrequestrandstrategy will select nodes according to the following rules.
1. The strategy is STRATEGY_DIRECT.
When 1. 1.limitApp is not other and is the default, and is equal to orgin, select originNode.
1.2.limitApp Other, select originNode.
1.3.limitApp is the default value, and clusterNode is selected.
2. the strategy is STRATEGY_RELATE, and clusterNode is selected.
3. The strategy is STRATEGY_CHAIN, and the node is selected.
After selecting the corresponding node, canPass will be called to verify the current limiting rules. At present, sentinel has three local current limiting rules: ordinary current limiting, constant speed current limiting and cold start current limiting.
The common implementation of current limiting is DefaultController, which is to count whether the current number of threads or qp plus the number to pass is greater than the limit value, and if it is less than or equal to it, it will pass directly, otherwise it will block.
The realization of unified current limiting is RateLimiterController, which uses AtomicLong to ensure the atomic growth of latestPassedTime, so the pause time is calculated according to latestPassedTime-currentTime, and a unified sleep time is obtained.
The realization of cold start current limiting is WarmUpController, which is the most difficult current limiting method in sentinel. In fact, you don't need to pay too much attention to the calculation of these complicated formulas, and you can also get the current limiting idea of cold start:
1. When qps has reached the warm state, tokens will be added and consumed normally.
2. When qps is in supercooled state, a token will be added to keep the algorithm cool.
3. When the qps gradually rises and exceeds the value of the supercooled boundary qps, no more tokens are added, and the number of requests that can pass in a unit time is gradually increased by slowly consuming tokens, so that the algorithm continues to warm up.
To sum up, the number of requests that can be passed is inversely proportional to the number of tokens remaining in the token bucket, thus achieving the effect of cold start.
The next step is cluster current limiting. PassClusterCheck is the entrance of cluster current limiting. It will call clusterSerivce to obtain the specified number of tokens according to the flowId, and then judge whether it is passed, dormant, downgraded to local current limiting or blocked according to its results.
Next, take a look at the processing of ClusterService, and you will get the corresponding FlowRule according to the ruleId, and then call ClusterFlowChecker. Get the result and return it. Clusterflowchecker。 AcquireClusterToken is handled in the same way as ordinary current limiting, except that all requests of the cluster will be handled in one service to achieve the effect of cluster current limiting, so I won't go into details.
The next node of FlowSlot is DegradeSlot, which is a fuse processor. When it enters, it will call performChecking to get the list of circuit breakers, and then call its tryPass to check whether it is blown.
The tryPass method of abstract circuit breaker is mainly used to judge the state of fuse. If it is closed, release it directly. If it is open, check whether it is time to open halfopen. If it is successful, continue to release it, otherwise it will be blocked.
So how do you change the state of the fuse from off to on? How to turn semi-open into closed or open? Sentinel consists of two fuses: the error number fuse ExceptionCircuitBreaker and the response time fuse ResponseTimeCircuitBreaker. All fuses are analyzed.
When a business method reports an error, it will call Tracer.traceEntry to set the error to Entry.
When entry.exit is called, it will follow the responsibility chain to the exit method of DegradeSlot, and it will traverse the fuse list and call its onRequestComplete method.
ExceptionCircuitBreaker's onRequestComplete records the number of errors and the total number of requests, and then calls HandleStateChangeWhentheresholded to continue processing.
1. When the current state is on, the bottom of the fuse should not switch the state and exit directly.
2. When the current state is half open, if there is no error, turn half open to close, otherwise, turn half open to open.
3. When the current status is close, it is based on whether the total number of requests has reached the minimum number of requests. If so, compare whether the error number/error rate is greater than the limit value. If so, it is directly converted to open.
ExceptionCircuitBreaker's onRequestComplete records the number of slow responses and the total number of requests, and then calls HandleStateChangeWhentheresholded to continue processing.
1. When the current state is on, the bottom of the fuse should not switch the state and exit directly.
2. When the current state is half-open, if the current response time is less than the limited value, turn half-open to close, otherwise turn half-open to open.
3. When the current status is close, it is based on whether the total number of requests has reached the minimum number of requests. If so, compare whether the slow request number/slow request ratio is greater than the limit value. If so, it is directly converted to open.
The next node is AuthoritySlot, which is the authority controller. This controller only looks at whether the current origin is allowed to enter the request. If it is not allowed, it will report an error and will not elaborate.
Finally, we come to the last node, SystemSlot, which is an adaptive processor. It is mainly based on the load of the system itself (qps, maximum number of threads, maximum response time, cpu utilization, system bbr) to judge whether the request can pass, so as to ensure that the system is in a safe state that can handle the request stably.
Especially worth mentioning is bbr algorithm. Referring to the design of tcp bbr, the author dynamically calculates the number of threads that can enter through the maximum qps and the minimum response time, rather than a roughly fixed number of threads that can enter. Why can we calculate the number of threads that can enter through these two values? You can search the analysis of tcp bbr algorithm online, which is very clever and I won't go into details.