Current location - Plastic Surgery and Aesthetics Network - Plastic surgery and beauty - Verilog programming skills
Verilog programming skills
FPGA/CPLD

The design ideas and skills of table tennis are a very big topic. Due to the limitation of space, this paper only introduces some commonly used design ideas and skills, including table tennis operation, serial-parallel conversion, pipeline operation and data interface synchronization method.

I hope this paper can attract the attention of engineers. If we can consciously use these principles to guide future design work, it will get twice the result with half the effort!

Table tennis operation

"Ping-pong operation" is a processing technique that is often applied to data flow control. The typical ping-pong operation method is shown in figure 1.

Click sound

The processing flow of pong operation is: the input data stream passes through the "input data selection unit"

The data stream is synchronously allocated to two data buffers. The data buffer module can be any storage module, and the commonly used storage units are dual-port RAM(DPRAM) and single-port RAM(SPRAM).

, FIFO, etc. In the first buffering period, the input data stream is buffered to "data buffering module1"; In the second buffer period, through the "input data selection unit"

In this way, the input data stream is buffered in "data buffer module 2", and the 1 th cycle data buffered by "data buffer module 1" passes through "input data selection unit".

And send it to the "data stream operation processing module" for operation processing; In the third buffering period, the input data stream is buffered by switching the "input data selection unit" again.

"data buffer module 1", and at the same time, the data of the second cycle cached by "data buffer module 2" is switched and sent to "

Data flow operation processing module "for operation processing. This cycle.

The biggest feature of ping-pong operation is through "input data selection unit"

By switching with the "output data selection unit" according to the beat and cooperating with each other, the cached data stream is sent to the data stream operation processing module without stopping.

Perform operations and processes. Taking the ping-pong operation module as a whole, standing at both ends of the module to see the data, the input data stream and the output data stream are continuous without any pause, which is very suitable for data stream.

Pipeline processing Therefore, ping-pong operation is often applied to pipeline algorithm to realize seamless buffering and processing of data.

The second advantage of ping-pong operation is that

Buffer space can be saved. For example, in WCDMA baseband application, 1 frame is composed of 15 time slots, and sometimes 1 needs to be changed.

The whole frame data is delayed by one time slot for post-processing. A more direct method is to buffer the data of this frame, and then delay 1 slot for processing. The length of the buffer is 1.

The whole frame data is long. Assuming the data rate is 3.84Mbps and the frame length of 1 is 10ms, the required buffer length is 38,400.

A little. If ping-pong operation is adopted, just define two RAM (Single Port RAM) that can buffer 1 time slot data. When writing data from one RAM to another ram

RAM reads data and then sends it to the processing unit for processing. At this time, the capacity of each RAM is only 2560 bits, and the total capacity of the two RAM is only 5 120 bits.

In addition, the clever use of ping-pong operation can also achieve the effect of processing high-speed data streams with low-speed modules. As shown in Figure 2, the data buffer module adopts dual-port RAM and is implemented in DPRAM.

Then, the first-level data preprocessing module is introduced, which can perform various data operations as needed, such as despreading, descrambling and unscrewing the input data stream in WCDMA design. Imaginary port

The input data stream rate of A is 100Mbps, and the buffer period of ping-pong operation is 10 ms ... The data rate of each node port is analyzed as follows.

The rate of the input data stream at port A is 100Mbps, and during the buffering period of 1, it reaches DPRAM 1 from B 1 through the "input data selection unit". The data rate of B 1 is also 100Mbps, and DPRAM 1 needs to write data of 1Mb in 10 ms ... Similarly, in the second 10ms, the data stream is switched to the ports of DPRAM2 and B2. In the second 10ms, DPRAM2 writes 1Mb data. The third 10ms, the data stream is switched to DPRAM 1, DPRAM 1 write 1Mb data.

After careful analysis, it will be found that the time for DPRAM 1 to read data and send it to "data preprocessing module 1" is 20ms in the third buffer period. Some engineers are puzzled why the reading time of DPRAM 1 is 20 ms, and this time is obtained as follows: First, within 10ms of writing data to DPRAM2 in the second buffer period,

DPRAM 1 can be read; In addition, DPRAM 1 can read data from address 0 and write data to addresses after 500K from the 5th ms (absolute time is 5ms) of the1buffer cycle. By the time of 10ms, DPRAM 1 had just finished writing 1Mb data and read 500K K. Starting from the 5th ms of the third buffer period (the absolute time is 35ms), it can also be completed in one direction.

After 500K, the address is read from address 0, and then 5 Ms, so before the data stored in the first cycle of DPRAM 1 is completely overwritten, DPRAM 1 can be read for 20ms at most, and the data to be read is 1Mb, so the data rate of port C 1 is:/kloc. Therefore, the minimum data throughput of "data preprocessing module 1" only needs 50Mbps. Similarly, the minimum data throughput of "data preprocessing module 2" only needs 50Mbps. In other words, through the ping-pong operation, the "data preprocessing module"

The timing pressure is reduced, and the required data processing rate is only 1/2 of the input data rate.

The essence of low-speed module processing high-speed data through ping-pong operation is to realize serial-parallel conversion of data streams through DPRAM, and to process the split data in parallel through "data preprocessing module 1" and "data preprocessing module 2", which is the embodiment of the principle of area and speed exchange!

deserializer

Serial-parallel conversion is FPGA.

The important skill of design is a common means of data stream processing, and it is also a direct embodiment of the idea of exchange area and speed. There are many ways to realize serial-parallel conversion, which can be selected according to the requirements of data sorting and quantity.

Registers, RAM, etc. In the legend of ping-pong operation, the serial-parallel conversion of data stream is realized by DPRAM, and the buffer of data can be opened very large because of the use of DPRAM. For a small number of designs, registers can be used to complete the serial-parallel conversion. If there is no special requirement, synchronous timing design should be adopted to complete the conversion between series and parallel. For example, from serial to parallel data, the order of data arrangement is high, which can be realized by the following coding:

prl _ temp & lt={prl_temp,SRL _ in };

Where prl_temp is the parallel output buffer register and srl_in is the serial data input. For the serial-parallel conversion of the specified arrangement order, it can be judged by case statement. For complex serial-parallel conversion, it can also be realized by state machine. The method of serial-parallel conversion is relatively simple, so I don't need to go into details here.

Design concept of pipeline operation

First of all, it should be declared that the pipeline mentioned here refers to a design idea of processing flow and sequential operation, rather than the "pipelining" of optimizing timing used in FPGA and ASIC design.

Pipeline processing is a common design method in high-speed design. If the processing flow of a design is divided into several steps, and the whole data processing is "one-way", that is, there is no feedback or iterative operation, and the output of the previous step is the input of the next step, then the pipeline design method can be considered to improve the working frequency of the system.

The structural schematic diagram of pipeline design is shown in Figure 3. Its basic structure is as follows: n appropriately divided operation steps are connected in series in one direction. The biggest feature and requirement of pipeline operation is that every step of data flow is continuous. If each operation step is simplified to a D flip-flop (that is, a beat is made with a register), then the pipeline operation is similar to a shift register group, and the data stream flows through the D flip-flop in turn to complete each step. The pipeline design timing is shown in Figure 4.

A key of pipeline design lies in the reasonable arrangement of the whole design sequence, which requires a reasonable division of each operation step. If the running time of the previous stage is exactly equal to the running time of the next stage, the design is the simplest, and the output of the previous stage can be directly imported into the input of the next stage; If the running time of the previous stage is longer than that of the next stage, the output data of the previous stage needs to be properly buffered before being imported into the input end of the next stage; If the running time of the previous stage is just shorter than that of the next stage, the data stream must be diverted by replication logic, or the data of the previous stage must be stored and post-processed, otherwise the data of the next stage will overflow.

Pipeline processing methods are often used in WCDMA design, such as RAKE receiver, searcher, preamble acquisition and so on. The reason why the pipeline processing mode has high frequency is that the processing modules are duplicated, which is another concrete embodiment of the idea of area for speed.

Synchronization method of data interface

Synchronization of data interface is a common problem in FPGA/CPLD design, and it is also a key and difficult point. Many unstable designs are caused by synchronization problems of data interfaces.

In the circuit diagram design stage, some engineers manually add BUFT NOR gates to adjust the data delay, so as to ensure that the clock of this level module needs the establishment and holding time of the data of the superior module. In order to have stable sampling, some engineers have generated many clock signals with a difference of 90 degrees, sometimes hitting the data with positive edges and sometimes hitting the data with negative edges to adjust the sampling position of the data. Both methods are very unsatisfactory, because once the chip is updated or transplanted to other chips,

Chip assembly and sampling implementation must be redesigned. Moreover, these two methods lead to insufficient margin of circuit realization. Once the external conditions change (such as temperature rise), the sampling sequence may be completely disordered, leading to circuit paralysis.

The following briefly introduces the synchronization methods of data interfaces in several different situations:

1. How to complete data synchronization when the input and output delay (inter-chip, PCB wiring, delay of some driving interface components, etc.). ) Is it unpredictable or is it possible to change?

If the data delay is unpredictable or changes, it is necessary to establish a synchronization mechanism, which can use synchronization enable or synchronization indication signal. In addition, data synchronization can be achieved by accessing data through RAM or FIFO.

The method of storing data in RAM or FIFO is to write data into RAM or FIFO with the data channel clock provided by the upper chip as the writing signal, and then read data with the sampling clock of this stage (usually the main clock for data processing). The key of this method is that data should be reliably written into RAM or FIFO. If synchronous RAM or FIFO is used, there needs to be a related indication signal with a fixed relative delay relationship with the data. This signal can be a valid indication of data, or it can be a clock that the superior module prints data. For slow data, asynchronous RAM or FIFO can also be sampled, but this is not recommended.

Data are arranged in a fixed format, and a lot of important information is at the beginning of the data, which is very common in communication systems. In a communication system, a large amount of data is organized according to "frames". Because the whole system has high requirements for clocks, a clock board is often specially designed to generate and drive high-precision clocks. And the data has a starting position. How to synchronize data and find the "header" of data?

The data synchronization method can completely adopt the above method, adopt synchronization indication signal, or use RAM and FIFO cache. There are two ways to find the data header. The first one is very simple, just transmitting an indication signal of the starting position of data along the way. For some systems, especially asynchronous systems, a synchronization code (such as a training sequence) is often inserted into the data, and the receiver can find the "header" of the data after detecting the synchronization code through the state machine. This practice is called "blind inspection".

The superior data and the clock at this level are asynchronous, that is to say, the clocks of the superior chip or module and the chip or module at this level are asynchronous clock domains.

In the synchronization of input data, a principle has been briefly introduced: if the beat of input data is the same as the processing clock of this stage chip, the main clock of this stage chip can directly sample the input data register to complete the synchronization of input data; If the input data of this level is not synchronized with the processing clock of the chip, especially when the frequency is not matched, the synchronization of the input data can only be completed by sampling the input data twice by the processing clock. It should be noted that the function of sampling data in asynchronous clock domain twice with registers is to effectively prevent the spread of metastable state (unstable data state) and make the data processed by subsequent circuits at an effective level. However, this method can not guarantee that the data sampled by the two-stage register is at the correct level, and this method will generally produce a certain amount of wrong level data. Therefore, it is only applicable to functional units that are insensitive to a small number of errors.

In order to avoid the wrong sampling level in asynchronous clock domain, RAM and FIFO cache are generally used to complete the data conversion in asynchronous clock domain. The most commonly used cache unit is DPRAM, which uses the superior clock to write data at the input port and the current clock to read data at the output port, thus completing the data exchange between asynchronous clock domains very conveniently.

2. Do you need to add constraints when designing data interface synchronization?

It is suggested that appropriate constraints should be added, especially for high-speed design, and corresponding constraints must be added to the cycle, establishment and holding time.

The additional constraints here have two functions:

A. Improve the design working frequency to meet the requirements of interface data synchronization. The logic synthesis, mapping, layout and routing can be controlled by additional period, setup time, holding time and other constraints to reduce the delay of logic and routing, thus improving the working frequency and meeting the requirements of interface data synchronization.

b.

Get the correct time series analysis report. Almost all FPGA design platforms include static timing analysis tools, which can be used to obtain timing analysis reports after mapping or layout, so as to evaluate the performance of the design. Static time series analysis tools take constraints as the standard to judge whether the time series meets the design requirements, so designers are required to input constraints correctly so that the static time series analysis tools can output correct time series analysis reports.

Xilinx company (global leader in programmable logic solutions)

Common constraints related to data interface are Period, OFFSET_IN_BEFORE, OFFSET_IN_AFTER,

OFFSET_OUT_BEFORE and OFFSET_OUT_AFTER, etc. A common constraint associated with data interfaces is periodicity,

Tsu, tH, tco, etc.