[The Art of Hardware Architecture] Study Notes (2) Synchronization and Reset

Table of contents

write in front

2 Synchronization and reset

2.1 Synchronous Design

2.1.1 Avoid using traveling wave counters

2.1.2 Gating the clock

2.1.3 Dual-Edge or Mixed-Edge Clock

2.1.4 Using a flip-flop to drive the asynchronous reset terminal of another flip-flop

2.2 Recommended Design Techniques

2.2.1 Avoid combinatorial loops in your design

2.2.2 Avoiding Delay Chains in Digital Designs

2.2.3 Avoid using asynchronous pulse generators

2.2.4 Avoid using latches

2.2.5 Avoid using double edge clocks

2.3 Clock scheme

2.3.1 Internally generated clock

2.3.2 Multiple Clocks

2.4 Gated Clock Methodology

2.4.1 Gated clock circuit without latch

2.4.1 Latch-based gated clock circuit

2.5 Reset Signal Design

2.5.1 Synchronous reset

2.5.2 Asynchronous reset

2.5.3 Filter reset glitches

write in front

This blog series is a reading note of the recently read book “The Art of Hardware [Architecture] “. Most of the content is excerpts from the book, and a small part is my own notes, understanding and expansion of some knowledge points in the book (marked). ).

2 Synchronization and reset

2.1 Synchronous Design

In a synchronous design, a single master clock and a single master set/reset signal drive all sequential devices in the design.

Experience has shown that the safest approach to time-domain control of an ASIC is a synchronous design.

The synchronous design can maximize the requirements of setup time and hold time, and when using a single clock and a single reset, it is best to use the system clock and system reset, because the system clock/reset has the strongest driving ability, reaching every The delays are also almost equal, so the design system will be more stable.

2.1.1 Avoid using traveling wave counters

Using flip-flops to drive the clock inputs of other flip-flops is generally problematic. The clock input of the second flip-flop is skewed due to the clock-to-q delay of the first flip-flop and cannot be activated on every clock edge. Connecting more than two flip-flops in this way results in a traveling wave counter as shown in the figure below. Note that this method is not recommended because more triggers are used, which will increase the delay accumulation.

If the bit width of the traveling wave carry adder is large, the delay will be too large. It is recommended to use a carry-ahead adder or a pipeline.

2.1.2 Gating the clock

Gating cells on the clock line cause clock skew and introduce spikes to the flip-flop. This problem is especially acute when there are multiplexers on the clock lines, as shown in the figure below.

A design with gated clocks may work fine in simulation, but problems arise when you synthesize.

This means that using the enable signal to control the clock will cause clock skew, which will affect the stability of the system.

2.1.3 Dual-Edge or Mixed-Edge Clock

As shown in the figure below, the two flip-flops are controlled by two clock signals with opposite phases. This creates problems with test methods such as synchronous resets and the use of intervening scan chains, and also increases the difficulty of determining the path of critical signals.

2.1.4 Using a flip-flop to drive the asynchronous reset terminal of another flip-flop

In the figure below, the output of the second stage flip-flop is not only affected by the clock edge, which violates the principle of synchronous design. In addition, the circuit contains a potential race condition between the second stage flip-flop clock and reset.

2.2 Recommended Design Techniques

When designing with HDL code, it is important to understand how synthesis tools interpret different HDL coding styles and results. It is important to think from a hardware perspective as a particular design style (or equivalently coding style), which affects design gate count and timing performance.

2.2.1 Avoid combinatorial loops in your design

Combinatorial loops are the most common cause of instability and unreliability in digital designs. In a synchronous design, all feedback loops should contain registers. Combinatorial loops create direct feedback loops without registers, which violates synchronous design principles.

In HDL language, when a signal goes through several combinations it is always. A combinatorial loop is formed when a block produces itself, or when the left-hand side of an arithmetic expression also appears on the right-hand side. Combinatorial loops are a risk to design. Synthesis tools will always report an error when encountering a combinatorial loop. It is not synthesizable.

The generation of combined loops can be understood from the bubble chart in the figure below. Each bubble represents a combined always block, the arrows going into it represent the signals that the always block will use, and the arrows going out of the bubbles represent the output signals of that block. It is clear that the signal “a” is generated in dependence on itself by the signal “d”, thus forming a combined loop.

It is like an equation x = 3x + 1, which is a good solution for us, but it will report an error to the circuit synthesis tool, because combinational logic does not need a clock to drive, if the “cause” changes, then the “effect” will change immediately, which is unrealistic for circuit implementation, and the generation of a value is related to itself, which will lead to contradictions.

To remove the combinatorial loop, one has to change how one of the signals is generated to remove the dependencies of the signals on each other. A simple solution to this problem is to introduce a flip-flop or register in the combinatorial loop to break the direct path.

The figure below shows another example, where the output of a register directly controls the asynchronous input of the same register through combinational logic.

Combination loops are inherently high-risk design structures. The behavior of a combinatorial loop is related to the propagation delay of all logic in that loop. Because the propagation delay varies according to different conditions, the behavior of the combined loop may also vary. In many design tools, combinatorial loops result in endless looping operations. The various tools used in the design flow may break the combinatorial loop in different ways, handling it in a way that is inconsistent with the original design intent.

2.2.2 Avoiding Delay Chains in Digital Designs

A delay chain is formed when two or more consecutive nodes with a single fan-in and a single fan-out are used to generate delays. Inverters are usually chained together to increase latency. Delay chains are often found in asynchronous designs, and are sometimes used to resolve race conditions caused by other combinatorial logic. In both FPGAs and ASICs, there are delays due to placement and routing. Delay chains can cause a variety of design problems, including increasing the sensitivity of the design to the operating environment, reducing the reliability of the design, and increasing the difficulty of porting the design to different device structures. Avoiding delay chains requires replacing asynchronous techniques with synchronous techniques in the design.

2.2.3 Avoid using asynchronous pulse generators

Designs often require pulses to be generated based on certain events. Designers sometimes use delay chains to generate a single pulse (pulse generator) or – a series of pulses (multi-oscillator). There are two common methods for generating pulses. These techniques are purely asynchronous and should be avoided whenever possible.

  • Connect the same trigger signal to both inputs of a two-input AND or OR gate, but invert the signal at one of the inputs or add a delay chain. The width of the pulse depends on the relative delay of the signal directly connected to the gate input and the signal connected to the gate input after a delay. This is the same principle as in combinatorial logic where glitches are caused by input changes. This technique artificially increases the glitch width by using a delay chain.
  • The register output drives the asynchronous reset terminal of the same register after passing through the delay chain. This register essentially resets itself asynchronously after a determinate delay. Asynchronously generated pulse widths often present challenges for synthesis and place-and-route software. The actual pulse width can only be determined after placement and routing, when routing and propagation delays are known. So it is difficult to determine reliable delay values ​​when creating the HDL. Pulse widths may not be available in all PVT environments and may vary when moving to different technology nodes. In addition, verification becomes difficult because static timing analysis cannot be used to verify pulse widths.

The recommended sync pulse generator is shown in the figure below.

In the sync pulse generator above, the pulse width is always equal to the clock period width.

The pulse generator is predictable, can be verified with timing analysis, and is easily ported to other architectures while being process-independent.

This operation is often used in rising edge detection. Take a beat on the signal to be detected and perform an XOR operation. The two signals will have pulses at different times, and the rest are low.

2.2.4 Avoid using latches

In digital design, latches are used to hold the value of the original signal until a new value arrives. Avoid using latches in all possible locations, and use flip-flops instead. As shown in the figure below, if both the X and Y signals go high, since the latches are level-triggered, turning them on at the same time will cause the circuit to oscillate. Latches can add all sorts of problems to a design. Although latches are storage devices similar to registers, they have fundamental differences. Latches are connected mode, ie there is a direct path between data input and output. A glitch at the input can be passed to the output.

Static timing analyzers often make false assumptions about latches and either find false paths through data input ports or miss true critical paths. The timing of the latches themselves is also ambiguous. Latches often make circuits untestable.

Latches present different challenges for FPGA design because FPGAs are register-intensive; so designs that use latches consume more logic and have lower performance than designs that use registers.

Common causes of latches:

  • An incomplete if-else statement, if there is no else added to the if statement in combinatorial combinational logic, will generate a latch.
  • Incomplete case statements, which do not give all cases but do not add default, will also generate latches.

As shown in the figure below, if the value of a in the else case is not given, a latch will be generated for the output of a.

In general, when designing RTL combinational logic, pay attention to the integrity of the statement, so as to avoid unnecessary effects caused by latches.

2.2.5 Avoid using double edge clocks

Using a dual-edge clock means that data is transferred on both the rising and falling edges of the clock. This enables data transmission to double the throughput. But it might have some problems.

The figure below shows a circuit triggered by a double edge clock.

Some problems encountered when using dual edge clocks are as follows:.

  • Asymmetric clock duty cycles can cause setup and hold time violations.
  • It is difficult to determine the path of critical signals.
  • It is difficult to use a design methodology like insert scan chain because it requires all registers to use the same clock edge.

But it does not mean that the double-edge clock is completely unusable, it also has advantages: increasing the operating clock frequency, such as SDR and DDR, DDR uses double-edge sampling to increase the speed and reduce power consumption.

2.3 Clock scheme

2.3.1 Internally generated clock

Clock generators built from combinational logic can introduce glitches that cause functional problems, and the delays caused by combinational logic can also cause timing problems. In a synchronous design, glitches on the data input do not cause any problems because the data is captured at the clock edge, so glitches can be automatically filtered out.

The glitch will violate the minimum pulse width requirement of the register. The setup and hold times are violated if the data input to the register changes when the glitch arrives at the clock input. Even if the design does not violate timing requirements, the registers may output random values, putting the entire design functionality at risk.

The following figure shows the effect of using combinational logic to generate a clock on a synchronous counter. As can be seen in the timing diagram, the counter is incremented twice on the clock cycle shown due to the glitch at the clock edge. Due to the clock glitch, the counter adds an extra count value, which can cause functional problems.

A simple way to solve the above problem is to add a register to the output of the combinational logic, and use the output of the register as the following clock signal. This register ensures that glitches generated by combinatorial logic are prevented at the data input of the register.

When generating the internal clock, pay attention to adding a register to the output to make the clock output more stable. In actual projects, you can also directly call the official PLL IP to generate the required clock requirements (fixed frequency and phase, etc.).

2.3.2 Multiple Clocks

Clock multiplexers are used to have different clocks for the same logic function. Certain types of multiplexing logic select the clock source shown in the diagram.

For example, communications applications that need to handle multiple frequency standards often use multiple clocks.

Although the introduction of multiplexing logic on the clock signal can cause the problems mentioned earlier, the requirements for multiplexing clocks vary widely in different applications.

Clock multiplexing is acceptable if the following criteria are met:

  • After initial configuration, the clock multiplexing logic does not change.
  • At test time, the design bypasses the functional clock multiplexing logic and selects a normal clock.
  • The registers are always in reset upon clock switching.
  • There are no negative effects of brief error responses at clock switching.

2.4 Gated Clock Methodology

In the traditional synchronous design style, the system clock is connected to the clock terminal of each register. Power consumption mainly consists of three parts.

  1. Power consumed by combinatorial logic that changes on every clock edge (due to flip-flops driving these combinatorial logic).
  2. The power consumption generated by the flip-flop (even if the input and internal state of the flip-flop have not changed, the power consumption still exists).
  3. The power consumed by the clock tree in the design.

Gating the clock path can greatly reduce the power consumption of flip-flops. A gated clock can exist at the root of the clock tree, at the end, or anywhere in between.

Since the clock tree consumes almost 50% of the power consumption of the entire chip, it is best to always generate or turn off the clock at the root.

The figure below is a three-bit counter with a gated clock.

The circuit is identical to the conventional implementation except that the gated clock device is inserted into the clock network so that the flip-flop is driven by the clock only when the INC input is high. When the INC input is low, the flip-flop has no clock input, so it keeps the original value. This saves 3 multiplexers in front of the flip-flop.

This is done for two purposes:

  • Power consumption can be reduced, because the clock consumes a high proportion of the entire chip;
  • To save resources, the above example saves 3 multiplexers.

2.4.1 Gated clock circuit without latch

A gated clock without a latch is implemented using an AND gate or an OR gate (depending on which edge the flip-flop uses), as shown in Figure 1 below.

To avoid prematurely truncating the clock pulse or falsely generating multiple clock pulses (or glitches on the clock), proper operation enforces that the enable signal runs from the active edge (rising edge) of the clock to the inactive edge (falling edge) of the clock keep it constant.

Figure 2 below illustrates a situation where the above requirements are not met and the resulting clock is prematurely truncated.

This limitation makes the use of gated clocks without latches inappropriate in single-clock flip-flop-based designs.

2.4.1 Latch-based gated clock circuit

The latch-based clock-gating style adds a level-sensitive latch to the design to hold the enable signal between active and inactive edges of the clock, thus eliminating the need to rely on the gating circuit itself to This requirement is met, as shown in the figure below.

Since the latch can capture the enable signal and hold it until a complete clock pulse is generated, the enable signal only needs to be stable near the rising edge of the clock.

Using this technique, only one input of the gate needs to be changed at a time to turn the clock on or off, and the output of the circuit is guaranteed to be free of any glitches or spikes.

2.5 Reset Signal Design

There are many design issues that must be considered before choosing a reset strategy, such as whether to use a synchronous reset or an asynchronous reset, whether each flip-flop must receive a reset signal, and so on.

The purpose of reset is to bring the SoC into a deterministic state for stable operation. This prevents the SoC from going into a random state and freezing after power up. Designs can choose to use asynchronous resets, synchronous resets, or both. Each of the two reset methods has its own obvious advantages and disadvantages, and either method can be effectively used in the actual design.

In some cases, when pipelined registers (shift register flip-flops) are used in high-speed applications, the reset signal of some registers can be removed to allow the design to achieve higher performance.

2.5.1 Synchronous reset

The reset signal of the synchronous reset can only affect or reset the state of the flip-flop when the valid edge of the clock arrives. In some emulators, depending on the logic of the circuit, the reset signal may be prevented from reaching the flip-flop. This is only a phenomenon that occurs in simulation and does not exist in real hardware.

Due to the high fanout of the reset tree, the reset may be a “delayed signal” relative to the clock cycle. Even if the reset signal is buffered by the reset buffer tree, minimize the amount of logic it traverses before reaching the local logic.

The RTL code and corresponding hardware implementation of a loadable flip-flop with synchronous reset are shown below.

module load_syn (
    input   clk,
    input   in,
    input   load,
    input   rst_n,
    output  out
);
    always @(posedge clk) begin
        if (!rst_n)
            out <= 'd0;
        else 
            out  <= in; 
    end end 
modules

Advantages of synchronous reset

  1. A synchronous reset ensures that the circuit is 100% synchronous.
  2. Facilitates the analysis of static timing analysis tools.
  3. The glitches in the reset can be filtered out, making the system more stable.

Disadvantages of synchronous reset

  1. The reset width needs to be larger than the clock period in order to be detected by the clock.
  2. More resources are consumed than asynchronous resets.
  3. Synchronous resets are clock-dependent and cannot be reset if the clock signal fails.

2.5.2 Asynchronous reset

Asynchronous reset flip-flops are designed with a reset pin. With an active-low reset, the flip-flop enters the reset state when the signal causes the reset terminal of the flip-flop to change to a logic low level.

The RTL code of the loadable flip-flop with asynchronous reset and the corresponding hardware implementation diagram are shown below.

module load_syn (
    input   clk,
    input   in,
    input   load,
    input   rst_n,
    output  out
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            out <= 'd0;
        else 
            out  <= in; 
    end end 
modules

Advantages of Asynchronous Reset

  1. The library provided by the manufacturer has a flip-flop with asynchronous reset, and the reset port of the flip-flop does not require additional combinational logic, which can save resources;
  2. Reset with or without a clock, and synthesis tools can automatically infer asynchronous resets without adding any synthesis parameters.

Disadvantages of asynchronous reset

  1. Regardless of whether the reset signal is generated or released, they are an asynchronous process. There may be problems with release. If the asynchronous reset is released near the active edge of the flip-flop clock, the output of the flip-flop becomes metastable and the reset state is lost.
  2. Another problem with asynchronous resets is related to their source, spurious resets caused by noise or glitches generated by board-level or system resets. A glitch filter needs to be designed to eliminate the effect of glitches on the reset circuit.
  3. During DFT, if the asynchronous reset signal cannot be directly driven by the I/0 pin, the asynchronous reset line must be disconnected from the reset driver to ensure correct DFT scanning and testing.

In the design of RTL, the design idea of ​​asynchronous reset and synchronous release is generally recommended.

2.5.3 Filter reset glitches

Asynchronous resets are sensitive to glitches, and any input that meets the minimum reset pulse width of the flip-flop can cause the flip-flop to reset. Therefore, it is necessary to filter burrs. One way to do this is to use digital delays to filter glitches. The reset input pin must also be a Schmitt trigger pin to help with glitch filtering. The picture shows the circuit and timing diagram of the reset glitch filter.

The design method of this glitch filtering is relatively simple to implement. It is to add a short delay to the reset signal, and then perform an “AND” operation between the delayed reset signal and the original reset signal, so that a new reset signal can be obtained, and Filter out glitches. However, there is also a certain problem in the control of the delay size. If the delay is too small, the glitch cannot be filtered. If the delay is too large, it will not only affect the reset delay of the entire link, but may even filter out the reset signal. Therefore, Grasping the size of the delay is the key to the design of this method.

Leave a Comment

Your email address will not be published. Required fields are marked *