1, which slows down the crawling speed and reduces the pressure on the target website, but it will reduce the amount of crawling per unit time.
2. Forged cookies. If you can access the page normally from the browser, you can copy cookies from the browser and use them.
3. Forge the user agent, that is, set the user agent in the request header as the user agent in the browser, and forge the browser access.
4. use proxy IP. After using proxy IP, web crawler can disguise its real IP.
For python web crawler, sometimes the business volume is heavy, and distributed crawler is the best way to improve the efficiency, but distributed crawler urgently needs a lot of IP resources, which can't be satisfied by free IP, and free proxy generally does not provide high anonymous proxy IP, so it is not recommended to use free proxy IP. In order to save the upfront cost, the use of free ip proxy will only lead to bitter fruit because of the poor quality of free ip, but it will not pay off. The use of proxy IP can effectively ensure the security of the network, and when the IP is blocked, there can be enough IP replacement to ensure normal work.