【YOLOv5-6.x】Data enhancement code analysis

Hits: 0

Article directory


[The YOLOv5] version used in this article is v6.1 . Students who are not familiar with the YOLOv5-6.x network structure can move to: [YOLOv5-6.x] Network Model & Source Code Analysis

Students who want to try to improve YOLOv5-6.1 can refer to the following blogs:

[Magic change YOLOv5-6.x (on)]: Combining lightweight networks Shufflenetv2, Mobilenetv3 and Ghostnet

[Magic change YOLOv5-6.x (middle)]: Add ACON activation function, CBAM and CA attention mechanism, weighted bidirectional feature pyramid BiFPN

[Magic change YOLOv5-6.x (below)]: YOLOv5s+Ghostconv+BiFPN+CA

In general, in deep learning, in order for the parameters of the neural network to work correctly, we need a large amount of data for training, but the actual data is not as much as we imagined, so we can: (1) Find More data; (2) Make full use of existing data for [data enhancement] Find More data; (2) Make full use of existing data for [data enhancement] .

Data augmentation can be understood as constructing the neighborhood values ​​of training samples through prior knowledge, so that not only the training error obtained by the model on the training set is small, but also the generalization error on the validation set is also small, which can improve the model’s performance. Generalization.

The role of data augmentation generally includes:

  • Enrich the training dataset and enhance the generalization ability of the model
  • Increase data variation and improve model robustness
  • Alleviate the uneven distribution of small targets and reduce the number of GPUs

The following is an explanation of the data enhancement part involved in the YOLOv5-6.1 [source code]hyp.scratch-high.yaml . Here is the parameter definition of the data enhancement part in the middle (the cutout parameter is added by myself, not in the original file):

# 1. hsv enhancement coefficient hue saturation brightness 
hsv_h:  0.015   # image HSV-Hue augmentation (fraction) 
hsv_s:  0.7   # image HSV-Saturation augmentation (fraction) 
hsv_v:  0.4   # image HSV-Value augmentation (fraction)

# 2. random_perspective enhancement coefficient rotation angle translation zoom shear transparency 
degrees:  0.0   # image rotation (+/- deg) 
translate:  0.1   # image translation (+/- fraction) 
scale:  0.9   # image scale (+/- gain) 
shear :  0.0   # image shear (+/- deg) 
perspective:  0.0   # image perspective (+/- fraction)

# 3. Image flip up, down, left and right 
flipud:  0.0   # image flip up-down (probability) 
fliplr:  0.5   # image flip left-right (probability)

# 4. Image-level data augmentation 
mosaic:  1.0   # image mosaic (probability) 
mixup:  0.1   # image mixup (probability) 
cutout:  0.0  # image cutout (probability) 
copy_paste:  0.1   # segment copy-paste (probability)

In general, the data enhancement methods involved in YOLOv5-6.1 mainly include the following:

1. Data enhancement of the original image

  • Pixel level : HSV enhancement, rotation, scaling, panning, shearing, perspective, flipping, and more
  • Image level : MixUp, Cutout, CutMix, Mosaic, Copy-Paste(Segment), etc.

2. Do the same enhancement to the labels

  • The transformed coordinate offset
  • Prevent label coordinates from out of bounds

The four images used in the test are as follows:

Pixel [-level] data augmentation

HSV color gamut transformation

# hsv color gamut transformation 
elif method == 'hsv' :
     """hsv color gamut enhancement processing image hsv, no label processing
    :param img: image to be processed BGR [736, 736]
    :param hgain: h channel gamut parameter used to generate new h channel
    :param sgain: h channel gamut parameter used to generate new s channel
    :param vgain: h channel gamut parameter used to generate new v channel
    :return: return the hsv enhanced image img
    hgain, sgain, vgain = 0.015 , 0.7 , 0.4 
    if hgain or sgain or vgain:
     # Randomly take three real numbers from -1 to 1, multiply by the coefficients of the hsv three channels in hyp to generate a new hsv channel 
    r = np.random.uniform( -1 , 1 , 3 ) * [hgain, sgain, vgain] + 1   # random gains 
    hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))   # image channel split hsv 
    dtype = img.dtype   # uint8

    # Build lookup table 
    x = np.arange( 0 , 256 , dtype=r.dtype)
    lut_hue = ((x * r[ 0 ]) % 180 ).astype(dtype)   # generate new h channel 
    lut_sat = np.clip(x * r[ 1 ], 0 , 255 ).astype(dtype)   # generate new s channel 
    lut_val = np.clip(x * r[ 2 ], 0 , 255 ).astype(dtype)   # generate new v channel

    # Image channel merging img_hsv=h+s+v Recombine hsv channels after randomly adjusting hsv 
    # cv2.LUT(hue, lut_hue) Channel color gamut transformation input channel before transformation hue and channel after transformation lut_hue
    img_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
    # no return needed dst: output image 
    cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)


elif method == 'rotation':
    a = random.uniform(-45, 45)
    R = cv2.getRotationMatrix2D(angle=a, center=(width / 2, height / 2), scale=1)
    img = cv2.warpAffine(img, R, dsize=(width, height), borderValue=(114, 114, 114))

Scale [_]

# scale 
elif method == 'scale' :
    img = cv2.resize(img, dsize=(640, 640))

Flip Flip

# Flip up and down vertically 
if method == 'flipud' :
    img = np.flipud(img)

# Flip left and right horizontally 
elif method == 'fliplr' :
    img = np.fliplr(img)

Pan Translate

# Translation 
elif method == 'translation' :
    T = np.eye(3)
    tr = 0.1
    T[0, 2] = random.uniform(0.5 - tr, 0.5 + tr) * width  # x translation (pixels)
    T[1, 2] = random.uniform(0.5 - tr, 0.5 + tr) * height  # y translation (pixels)
    img = cv2.warpAffine(img, T[:2], dsize=(width, height), borderValue=(114, 114, 114))

Cut Shear

Shear transformation, roughly transforming a rectangular image into a parallelogram, keeping a certain coordinate value of each point on the graph unchanged, while another coordinate value is linearly transformed with respect to the unchanged coordinate value, similar to the parallelism outside the image. When a quadrilateral is fixed on one side, a thrust is applied to a corner of the opposite side of the fixed side, and the action line of the thrust is parallel to the x or y-axis direction. Under the action of the thrust, the deformation sent by the circumscribed parallelogram of the image is shear.

# /LaoYuanPython/article/details/113856503 
elif method == 'shear' :
    S = np.eye(3)
    sh = 20.0
    S[0, 1] = math.tan(random.uniform(-sh, sh) * math.pi / 180)  # x shear (deg)
    S[1, 0] = math.tan(random.uniform(-sh, sh) * math.pi / 180)  # y shear (deg)
    img = cv2.warpAffine(img, S[:2], dsize=(width, height), borderValue=(114, 114, 114))


Perspective transformation is to use the condition that the perspective center, image point, and target point are collinear to project a plane onto the specified plane through a projection matrix. The image after Perspective transformation is usually not a parallelogram (unless the view plane and the original plane are mapped parallel case), but resembles a trapezoid.

# Perspective transformation 
# Example code of perspective transformation principle: /article/details/104281693 
elif method == 'perspective' :
    P = np.eye(3)
    pe = 0.001
    P[2, 0] = random.uniform(-pe, pe)  # x perspective (about y)
    P[2, 1] = random.uniform(-pe, pe)  # y perspective (about x)
    img = cv2.warpPerspective(img, P, dsize=(width, height), borderValue=(114, 114, 114))

Three commonly used image-level data augmentation


In the picture A, superimpose the picture B, so that after the weighting operation of the two pictures, you can see that the new picture has both the picture A and the picture B.

if  method == 'mixup': 
    # Fill to the same size 640 × 640 
    imgs[ : 2] = fix_shape(imgs[:2]) 
    img1 = imgs[0] 
    img2 = imgs[1] 
    # Display the original image 
    htitch = np. hstack((img1, img2)) 
    cv2.imshow("origin  images", htitch) 
    cv2.imwrite('outputs/mixup_origin.jpg',  htitch) 
    # mixup ratio, alpha=beta=32.0 
    r = np.random.beta(32.0, 32.0) 
    imgs = (img1 * r + img2 * (1 - r)).astype(np.uint8) 
    return  imgs


Fill a certain block or a few areas in the picture with a certain color block to simulate occlusion and other effects

elif method == 'cutout':
    img = imgs[0]
    cv2.imshow("origin images", img)
    height, width = img.shape[:2]
    # image size fraction
    scales = [0.5] * 1 + \
            [0.25] * 2 + \
            [0.125] * 4 + \
            [0.0625] * 8 + \
            [0.03125] * 16
    # create random masks
    for s in scales:
        # mask box shape
        mask_h = random.randint(1, int(height * s))
        mask_w = random.randint(1, int(width * s))

        # mask box coordinate 
        xmin = max( 0 , random.randint( 0 , width) - mask_w // 2) # upper left corner x coordinate 
        ymin = max( 0 , random.randint( 0 , height) - mask_h // 2) # Upper left corner y coordinate 
        xmax = min(width, xmin + mask_w)   # lower right corner x coordinate 
        ymax = min(height, ymin + mask_h)   # lower right corner y coordinate

        # apply random color mask
        color = [random.randint(64, 191) for _ in range(3)]
        # color = [0, 0, 0]
        img[ymin:ymax, xmin:xmax] = color
    return img


Crop an area of ​​the image and fill it with the corresponding area of ​​another image

elif  method == 'cutmix': 
    # The fix_shape processing is not done here. The two images have different sizes 
    img1,  img2 = imgs[0], imgs[1] 
    h1,  h2 = img1.shape[0], img2.shape[0] 
    w1,  w2 = img1.shape[1], img2.shape[1] 
    # Set the value of lamda, follow the beta distribution 
    alpha = 1.0 
    lam = np.random.beta(alpha, alpha) 
    cut_rat = np.sqrt(1. - lam) 
    # Crop the second picture 
    cut_w = int(w2 * cut_rat) # The width of the picture to be cropped 
    cut_h = int(h2 * cut_rat) # The height of the picture to be cropped 
    # uniform 
    cx = np.random.randint(w2) # Random crop position 
    cy = np.random.randint(h2)

    # Restrict the cropped coordinate area not to exceed the minimum size of 2 images 
    xmin = np.clip(cx - cut_w // 2, 0, min(w1, w2)) # upper left corner x 
    ymin = np.clip(cy - cut_h // 2, 0, min(h1, h2)) # upper left corner y 
    xmax = np.clip(cx + cut_w // 2, 0, min(w1, w2)) # lower right corner x 
    ymax = np.clip(cy + cut_h // 2, 0, min(h1, h2)) # lower right corner y 

    # crop area blend 
    img1[ymin : ymax, xmin:xmax] = img2[ymin:ymax, xmin:xmax] 
    return  img1

Mosaic data augmentation

Mosaic data augmentation has been used in YOLOv4 and has certain similarities with CutMix. Mosaic uses four pictures to randomly splicing the four pictures. Each picture has its corresponding GT frame. After splicing the four pictures, a new picture is obtained, and the GT corresponding to this picture is also obtained. box, and then we pass such a new picture into the neural network for training, which greatly enriches the background of the detected object, and directly calculates four pictures when BN is calculated.

The main flow of the code is as follows:

  • Step1 : Assuming that the model input size is s, first initialize a large gray image with a size of 2s*2s

img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)

  • Step2 : Randomly select a point within the rectangle bounded by point A (s/2, s/2) and point B (3s/2, 3s/2) as the splicing point in the large image

yc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border]  # mosaic center x, y

  • Step3 : Randomly select four pictures, take part of it and put it into the big picture, the excess part will be discarded

for i in range(len(imgs)):
    img = imgs[i]
    h, w = img.shape[:2]
    # place img in img4 
    if i == 0:   # top left 
        # Create mosaic image [1280, 1280, 3]=[h, w, c] base image with 4 tiles
        img4 = np.full((s * 2, s * 2, imgs[0].shape[2]), 114, dtype=np.uint8)
        # xmin, ymin, xmax, ymax (large image) 
        # Calculate the coordinate information in the mosaic image (fill the image into the mosaic image) 
        # Mosaic image [large image]: (x1a, y1a) upper left corner, (x2a, y2a) bottom right
        x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc
        # xmin, ymin, xmax, ymax (small image) 
        # Calculate the information of the intercepted image area (fill the mosaic image with xc, yc as the coordinates of the lower right corner of the first image, discard the out-of-bounds area) 
        # Image to be stitched [ Thumbnail]: (x1b, y1b) upper left corner (x2b, y2b) lower right corner
        x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h
    elif i == 1:  # top right
        x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
        x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
    elif i == 2:  # bottom left
        x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
        x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
    elif i == 3:  # bottom right
        x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
        x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
    # img4[ymin:ymax, xmin:xmax] 
    # Fill the intercepted image area to the corresponding position of the mosaic image img4[h, w, c] 
    # Put the [(x1b, y1b) upper left corner of the image img (x2b, y2b) The lower right corner] area is cut out and filled to the [(x1a, y1a) upper left corner (x2a, y2a) lower right corner] area of ​​the mosaic image 
    img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]   # img4[ymin:ymax, xmin:xmax]

  • Step4 : Recalculate the coordinates of the GT box according to the offset of the original image coordinates, and use np.clipthe updated label coordinates to prevent them from going out of bounds

# Calculate the offset generated when the small image is filled to the large image to calculate the position of the label box after mosaic data enhancement 
    padw = x1a - x1b 
    padh = y1a - y1b 

    # Process the labels information of the image 
    label = labels[i].copy( ) 
    if  label.size: 
        # normalized xywh to pixel xyxy format 
        label[ : , 1:] = xywhn2xyxy(label[:, 1:], w, h, padw, padh) 

# Concat/clip labels 
# Integrate the information of the 4 small images in 
label4 labels4 = np.concatenate(labels4, 0) 
for  x in (labels4[:, 1:]): 
    np.clip(x,  0, 2 * s, out=x) # clip when using random_perspective()

The test results are as follows:

Complete code and data

YOLOv5 data augmentation test


[trick 7] mosaic data enhancement

[YOLO v4] [trick 8] Data augmentation: MixUp, Random Erasing, CutOut, CutMix, Mosic
image affine transformation shear how to translate? Which translation is better to cut, wrongly cut, or shift?
Example code of perspective transformation principle Detailed
explanation of OpenCV perspective transformation principle and examples
[Image processing] Perspective Transformation Perspective Transformation

You may also like...

Leave a Reply

Your email address will not be published.