Current location - Plastic Surgery and Aesthetics Network - Plastic surgery and beauty - How to calculate md5 for very large files?
How to calculate md5 for very large files?
First of all, at least you don't need to read the whole file into memory first. For example, in php, if someone MD5 (file _ get _ contents (big _ file _ name)) is really inappropriate. Because md5 is calculated by a block of every 5 12 bits. So you can read a part of the content at a time (at least 5 12 bit, more appropriately st_blksize), calculate those chunk parts, and then read the next part to continue the calculation. Simply put, md5 is standardized and provides a ready-made algorithm (the name of the specification is md5 algorithm. RFC 132 1 MD5 message digest algorithm), we only need to translate it into C, java, python, js and so on. It is faster to calculate the md5 value together with the chunk content in the front end and the name+update time of the whole file, just to uniquely identify the breakpoint of continuous transmission, which should be sufficient in business logic. Js spark-md5 open source library is recommended, which supports directly adding various components and then calculating md5. The breakpoint continuation function I made is to use it to calculate the md5 of the front end. The TB md5 algorithm of each major network disk should be like this. Several people upstairs said that the file md5 is calculated by the block of the file stream, so the network disk must read the file stream of the whole file to get the md5 of the TB-level file, but the efficiency is very low and the operation time is a problem. However, we have overlooked a problem. Files are uploaded in blocks during the uploading process, and these uploaded fragments are actually file streams. Then the time for calculating md5 can be distributed on each segment. In this way, every time a clip is uploaded, one point will be calculated. After the upload is completed, the md5 of the file will be calculated. OkTB level MD5 is no longer a problem. After uploading, md5 will naturally come out. I wonder if you have any other opinions on my guess. Just now, I mentioned how to pass seconds after everything has passed. The most basic thing of second transmission is that the front end calculates md5 and then transmits it to the back end (more hash values may be needed). I have studied it for a long time, and the front end can't complete md5 of a very large file in a few seconds. Now, the MD5 of any size file can be calculated with html5 api, but it takes a long time. I have no solution. I didn't think how those network disks got md5 quickly at the front end.