MIPS 16 is an optional instruction set extension published by 1997, which can reduce the size of binary programs by 30-40%. Implementers hope that this kind of CPU can be more attractive when the code size is very concerned-this situation usually refers to low-cost systems. Because it is only suitable for a specific implementation, it is a multi-vendor standard: LSI, NEC and Philips all produce CPU supporting MIPS 16.
What makes MIPS binary code better than other architectures is not that MIPS instruction sets do less work, but that they are bigger-each instruction is 4 bytes long, while in some CISC architectures, it is 3 bytes on average.
MIPS adds a mode in which the CPU can decode 16 fixed-size instructions. Most MIPS 16 instructions are extended to ordinary MIPS III instructions, so it is obvious that this will be a rather limited instruction subset. The trick is to make this subset efficiently code enough programs, so that the size of the whole program can be greatly reduced.
Of course, the 16 bit instruction does not make it a 16 bit instruction set. MIPS 16 CPU is an actual CPU with 32-bit or 64-bit registers, and the operation of MIPS 16 CPU is also on these registers.
MIPS 16 is far from a complete instruction set-for example, it has neither CPU control instructions nor floating-point operation instructions. But it doesn't matter, because every MIPS 16 CPU must also run a complete MIPS ISA. You can run mixed instructions of MIPS 16 and ordinary MIPS code. Each function call or jump register instruction can change the running mode.
1.MIPS did not invent the idea of providing an alternative method to make some instructions only half the size. Advanced RISC computer (ARM) company's thumb version of arm CPU first put forward this idea.
In MIPS 16, it is convenient and efficient to encode the instruction address into least significant bit (LSB) mode. MIPS 16 instruction must be even byte aligned, so bit 0 is no longer a part of instruction pointer (program counter PC); Instead, every instruction that jumps to an odd address starts to execute MIPS 16, and every instruction that jumps to an even address returns to normal MIPS. The target address of the MIPS subroutine call instruction jal is always word-aligned, so the new instruction jalx hides the mode conversion of the instruction.
In order to compress the instruction to half the size, we only allocate 3-bit selection registers for most instructions, which only allows free access to 8 general registers; 16 bit constant field, which can be seen in many MIPS instructions, is also compressed and usually becomes 5 bits. Many MIPS 16 instructions specify only two registers instead of three. In addition, there are some special coding rules that will be introduced in the next section.
Special encoding formats and descriptions in D.1.1MIPS16.
There is nothing wrong with simplified general instructions, but there are two specific weaknesses that will increase the size of the program; It is not enough to construct constants in the 5-bit immediate field, and there is not enough address range in the load/store operation. Three new directives and a special provision help to solve these problems.
Extend is a special MIPS 16 instruction, which consists of a 5-bit code and a 1 1 bit field. This 1 1 bit field can be connected with the immediate digit field in the subsequent instruction, thus allowing the immediate digit of 16 bit to be encoded with one instruction. This instruction looks like an instruction prefix in assembly language.
In normal MIPS mode, especially in MIPS 16 mode, loading constants requires extra instructions. It is faster to put constants in memory and then read them. MIPS 16 adds support for loading operations relative to the instruction's own position (PC relative loading), allowing constants to be embedded in code segments (usually before the function starts). These are the only MIPS 16 instructions that do not completely correspond to normal MIPS instructions -MIPS has no data operation related to PC.
Many MIPS load/store operations are performed directly in the stack framework, and $29 /mp is probably the most common base address register. MIPS 16 defines a set of instructions that implicitly use mp, allowing us to encode the stack frame reference address of a function without a separate register field.
MIPS load instructions always produce a complete address of 32 bits. Because the load word instruction is legal only when the address is a multiple of 4, the lowest two bits are wasted. The Load instruction of MIPS 16 is scalable: the offset of the address will move to the left according to the size of the loaded/stored object, thus increasing the available address range in the instruction.
As an additional emergency mechanism, MIPS 16 defines some instructions that allow any data movement between one of the eight registers accessible by MIPS 16 and any of the 32 MIPS general registers.
Evaluation of d. 1.2 MIPS 16
MIPS 16 is not suitable for assembly language programming, and we are not going to elaborate on it. These are all the work of the compiler. Most programs compiled in MIPS 16 mode will be reduced to 60-70% of those compiled in MIPS mode. MIPS 16 is more compact than 32-bit CISC architecture, similar to arm's Thumb code, and quite competitive compared with pure 16-bit CPU.
But there is no free lunch; MIPS 16 program may have 40-50% more instructions than MIPS. This means that running a program on the CPU core will take 40-50% more clock cycles. But low-end CPU is often mainly limited by memory, not CPU core. Smaller MIPS 16 programs need lower bandwidth to get instructions, which leads to lower cache miss rate. When the cache is small and the program memory is limited, MIPS 16 will make up the gap, and it is possible to rewrite the normal MIPS code.
MIPS 16 code is not attractive in computers with large memory resources and wide buses due to performance degradation. This is why it is only an optional extension.
At the other end of the application range, MIPS 16 will compete with software compression technology. After being put into ROM memory, the normal MIPS program compressed by the usual file compression algorithm will be smaller than the uncompressed MIPS 16 equivalent code, but slightly larger than the compressed MIPS 16 equivalent code (note1); If your system has enough memory to use ROM as the file system and decompress the code into RAM for execution, then full ISA software decompression is likely to bring better overall performance.
There is also a trend to construct systems, that is, to use byte-coded interpretation languages (Java or its successors) to write a large number of programs that are not strict with time. This intermediate code is very small and much more efficient than any binary machine code in terms of size. If there are only interpreters and some programs with strict performance requirements left in the ISA of the machine, then the dense instruction set coding format will only affect a small part of the programs. Of course, the interpreter itself (especially Java) will be very large, but the increasing complexity of the application will soon make it less important.
I predict that MIPS 1998-2003 will be widely used in systems with low energy consumption, small size and limited cost. It is still worth inventing, because some systems, such as "smart" mobile phones, may be mass-produced.
The coding format with higher density of 1. has lower redundancy than the compression algorithm.
D.2 MIPSV/MDMX
MIPS V and MDMX were released together earlier in 1997. They were originally designed for a new instruction in CPU, and they are going to release MIPS/SGI in 1998. But that CPU was later cancelled, and their future was also in doubt.
They are all designed to overcome the shortcomings of some known traditional instruction sets, which are produced in the multimedia-oriented application of ISA. Tasks such as speech coding/decoding of soft modems, or streaming media applications, or image/video compression/decompression adopt some mathematical algorithms that were only used by digital signal processors (DSP) in the past. At this computing level, multimedia tasks usually include repeating some of the same operations on large vector or array data.
In register-based machines, the usual scheme is to package multimedia data items into machine registers, and then execute register-register instructions, which perform the same work for each field in each register. This is a very obvious form of parallel processing, called single instruction multiple data. SIMD。
This idea first appeared in an Intel microprocessor (about 88 years) and the disappearing i860 architecture. As an extension of Intel x86 instruction set, MMX was put on the market in 1996, and SIMD was more noticeable when it reappeared.
MDMX provides a set of operations for manipulating 8×8 bit integer groups in 64-bit registers, which can do the same thing for all 8 dice. These instructions include ordinary arithmetic operations (addition, subtraction, multiplication) and multiply-accumulate instructions, which can put the results into a huge accumulator, which is accurate enough to prevent overflow.
Because these instructions are used when specific data types are obviously separated from ordinary program variables, it makes sense for the MDMX instruction set to use floating-point registers. Reusing existing registers in this way means that there is no need to change the existing operating system (the operating system has saved and restored floating-point registers when the task is switched).
Similar to MDMX, Intel's MMX provides "eight-way" instructions for eight 8-bit numbers packed into a 64-bit package. MIPS MDMX also defines 4x 16-bit (four short integer operations) and 2x32-bit (two-word operations) formats, but in the early days, some MDMX implementations may think that octibyte formats and instructions are enough.
When performing arithmetic operations on 8-8 digits, the results often overflow and underflow. If we have to write handlers for many overflow test conditions, the performance of multimedia applications will not be improved. It is more helpful for machine operation to simply truncate the overflow and underflow results of the maximum and minimum numbers (255 and 0 are unsigned 8 digits). This process is called "saturation" algorithm. MDMX has this ability.
This brings us MIPS V, although the name seems to mean an upgraded instruction set-like MIPS I to IV, MIPS V is similar to MDMX in floating-point field, providing pairwise single operations. Paired-single FP is used twice for a pair of single-precision numbers encapsulated in a 64-bit floating-point register.
MIPS V is not as weird as MDMX; MIPS IV contains a fairly extensive set of floating-point operations, and directly provides paired single-version instructions for most of them. Pairwise comparisons can even be made, because the CPU of MIPS IV already has multiple floating-point conditional bits to receive the results. However, MIPS V does not provide the paired operation version of complex multi-cycle instructions, which will require a lot of new resources (such as no square root and division).
D.2. 1 Can the compiler use multimedia instructions?
The reason for introducing SIMD multimedia instructions is similar to the reason for providing vector processing units in supercomputers before the end of 1970s. It is easy to construct manual matrix arithmetic packages for vector processors. It is much more difficult to compile a program written in a high-level language with vector operations, although supercomputer providers have also made some achievements in this regard. Usually these achievements are concentrated on Fortran; For traditional programming, the semantic weakness makes Fortran a bad language, but it also makes it an easy-to-optimize language.