Floating-point calculation refers to the operation involving floating-point numbers, which is usually accompanied by approximation or rounding because it cannot be accurately expressed.
A floating-point number A is represented by two numbers M and E: A = M× B E. In any such system, we choose a radix B (the radix of the numeration system) and a precision P (that is, how many bits are used for storage). M (mantissa) is the number of p digits in the form of D. DDD ... DDD (each digit is an integer between 0 and b- 1, including 0 and b- 1). If the first bit of m is a non-zero integer, m is said to be normalized. Some descriptions use a single sign bit (s stands for+or-) to represent symbols, so m must be a positive number. E is the exponent.
As you can see, floating-point numbers are represented in computers, and their structures are as follows:
Mantissa part (fixed-point decimal) sequence code part (fixed-point integer) digital symbol mantissa m sequence symbol sequence code e
Integer is an integer as the saying goes. If there is no decimal point, integer values can be specified by decimal, hexadecimal or octal symbols, which can be preceded by an optional symbol (-or+).
If octal symbols are used, numbers must be preceded by 0 (zero) and hexadecimal symbols must be preceded by 0x.