2. Floating-point numbers are digital representations belonging to a specific subset of rational numbers, which are used to approximately represent any real number in a computer. Specifically, this real number is obtained by multiplying an integer or a fixed number (mantissa) by the integer power of a certain radix (usually 2 in a computer), which is similar to the method of counting with radix 10 in science.
3. Floating-point calculation refers to the operation in which floating-point numbers participate. Because it cannot be accurately expressed, it is usually accompanied by approximation or rounding.
A floating-point number A is represented by two numbers M and E: A = M× B E. In any such system, we choose a radix B (the radix of the numeration system) and a precision P (that is, how many bits are used for storage). M (mantissa) is the number of p digits in the form of D. DDD ... DDD (each digit is an integer between 0 and b- 1, including 0 and b- 1). If the first bit of m is a non-zero integer, m is said to be normalized. Some descriptions use a single sign bit (s stands for+or-) to represent symbols, so m must be a positive number. E is the exponent.