# [OpenCV Study Notes] Chapter 17 Image Segmentation and Extraction

Hits: 0

### Chapter 17: [Image Segmentation] and Extraction

In image processing, we often need to segment or extract foreground objects from images as target images, such as vehicles and pedestrians in surveillance videos.
The realization of image segmentation can be achieved by: morphological transformation, threshold algorithm, image pyramid, image contour, edge detection and other methods. But this chapter introduces the use of watershed algorithm and GrabCut algorithm to segment and extract images

1. Watershed algorithm

Excellent reference: The classic algorithm of image segmentation: the watershed algorithm Develop Paper

• Algorithm principle The inspiration of the
[watershed algorithm] is: a grayscale image is regarded as a geographical terrain surface, and the grayscale value of each pixel represents the height. Areas with large gray values ​​are regarded as hills, and areas with small gray values ​​are regarded as depressions. If it starts to rain, the concave land is first filled with rainwater. If the rainwater keeps falling until it reaches the ground level (assuming that the gray value of the ground plane is 100, those less than 100 are concave, and those greater than 100 are hills). The grayscale value less than 100 turns into black, and the pattern composed of pixels greater than 100 is the watershed line of a grayscale image. In fact, the threshold value is used to find the outline of the image. After finding the contour, suppose the rain continues to fall. At this time, we need to build a dam between the contour and the contour to prevent water from injecting into each other. Then the rain continues to fall, and each contour is continuously filled with water, and the flooded area turns black, and then every Each contour area forms its own contour. In fact, it is to find the contour of each contour to realize the segmentation of the image.

Or some textbooks say that the line connecting the pixels with larger gray value in the grayscale image can be regarded as a ridge, that is, a watershed. The water in it is the gray threshold level used for binarization. The binarization threshold can be understood as the horizontal plane. The area lower than the horizontal plane will be submerged, and each isolated valley (local minimum) will be filled with water at the beginning. When the water level rises to a certain height, the water will overflow the current valley. By building a dam on the watershed, the water pooling of the two valleys can be avoided, so that the image is divided into 2 pixel sets, one is the flooded valley Pixel set, one is the watershed line pixel set. Finally, the lines formed by these dams partition the entire image to achieve image segmentation.

However, when using the above watershed algorithm for image segmentation, due to the interference of noise points or other factors, dense small areas may be obtained, that is, the image is divided too finely (over-segmented, over-segmented), because there are There are a lot of local minimum points, and each point will form a small area by itself.
The solutions are as follows:
1. Perform Gaussian smoothing on the image, and erase many small minimum values, and these small partitions will be merged.
2. Do not start to grow from the minimum value, you can use a relatively high gray value pixel as the starting point (requires manual marking by the user), and start submerging from the mark, then many small areas will be merged into one area, which is called It is a watershed algorithm based on image marking. The following three images are the original image, the watershed over-segmented image, and the image obtained by the marker-based watershed algorithm:
Each of the marked points is equivalent to the water injection point in the watershed. The water injection starts from these points to make the level rise. However, as shown in the figure above, there are too many areas to be divided in the image, and manual marking is too troublesome. We use distance conversion. The method of marking is used in [OpenCV] .

• Distance transformation function: cv2.distanceTransform()
This function is to calculate the distance from any point in the binary image to the nearest background point, that is, to calculate the distance from the non-zero value point pixel in the image to its nearest zero value point pixel. It is to calculate the distance between all pixels in the binary image and the nearest zero-valued pixel. If a pixel is itself a zero-valued pixel, then the distance calculated by this pixel is 0.

Therefore, this calculation result reflects the distance between each pixel and the image background (the image background is the area where the pixel value is 0). If a pixel is the centroid or center of the foreground object, the pixel is the farthest from the zero-value point, and the calculated result is the largest; if a pixel is the edge point of the foreground object, then the pixel is farthest from the zero-value point. The closer it is, the smaller the calculation result.

Therefore, after thresholding the above distance results, information such as the center and skeleton of the foreground object in the image can be obtained, and the outline of the foreground object can be refined, so that the foreground image can be accurately obtained.

API: dst = cv2.distanceTransform(img, distanceType, maskSize)
img: yes 8 for single-channel binary image
distanceType: distance calculation method.
cv2.DIST_USER: User-defined distance
cv2.DIST_L1: Street distance, distance=|x1-x2|+|y1-y2|
cv2.DIST_L2: Euclidean distance
cv2.DIST_C: distance=max(|x1-x2|,|y1 -y2|)
7 medium distance calculation methods such as cv2.DIST_L12, cv2.DIST_FAIR, cv2.DIST_WELSCH, cv2.DIST_HUBER, etc.
dst: The return value of the function is a floating point number of image type CV_32F with the same size as img.
Description: This function is used to determine the foreground object. We previously identified foreground objects, we extracted foreground objects with edge detection and various contours based on edge detection, or with morphological changes. However, if the foreground objects are next to each other or overlapped and occluded, and we also want to extract the outlines of the foreground objects one by one, the effect of the previous method is greatly reduced, and we need the method of this function to extract. The method used in this function works very well! Because if some foreground objects are themselves glued together, it is impossible to extract them one by one with contour extraction. Although the opening and closing operations can be extracted, we also hope that the single contour and the single contour can be extracted when we draw the contour. The time is close to each other, and the opening and closing operation cannot be done at this time, so this function must be used to implement it!

• Labeling function: After cv2.connectedComponents()
calculates whether each area in the image is a foreground or background or unknown area through the distance transformation function, we start to label these areas.
API: retval, labels = cv2.connectedComponents(img)
img: 8-bit single-channel image to be labeled
retval: returns the number of
labels labels: is the labeling result Image
function cv2.connectedComponents() When labeling an image, it will label the background as 0, the other objects are marked from 1 to the back, indicating different foreground areas.
We can understand the labeled region as the “seed” region for watershed segmentation.

• Watershed algorithm function: markers = cv2.watershed(image, labels)
image: Input image, which must be an 8-bit three-channel image.
labels: is the 32-bit single-channel labeling result, the size is the same as image, which is the return value of the labeling function above.
But it should be noted that in the parameter labels in the watershed function, 0 represents the unknown area. Therefore, we also need to adjust the return value labels of the above labeling function:
the return value labels = labels+1 of cv2.connectedComponents(), and laels[unknown area]=0 after adjustment can be passed to cv2.watershed() in.
markers: In the return value of the function, each pixel is either set to the initial “seed value”, or is set to – 1, indicating the boundary.

```</p> <h1>Example 17.1 Image segmentation example of watershed algorithm</h1> <p>import cv2 import numpy as np import matplotlib.pyplot as plt</p> <h1>------------Otsu threshold processing, into a binary image-------------</h1> <p>t, otsu = cv2.threshold(img_gray, 0 , 255 , cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU) #t returns 162.0, otsu.shape returns (312, 252)</p> <h1>------------The morphological opening operation is to erode the erode first and then dilate it. The first purpose is to denoise, and the second is to separate the overlapping parts of the foreground objects, which is convenient for later counting or drawing each Outlines of objects -------------</h1> <p>img_opening = cv2.morphologyEx(otsu, cv2.MORPH_OPEN, kernel=np.ones(( 3 , 3 ), np.uint8), iterations= 2 ) #A This is still a binary image</p> <h1>-------Calculate the distance and determine the foreground object--------------------</h1> <p>dist = cv2.distanceTransform(img_opening, cv2.DIST_L2, 5 ) #float32 Array of floats, dist.shape returns (312, 252), dist is a grayscale image th, sure_fg = cv2.threshold(dist, 0.5 *dist.max(), 255 , cv2.THRESH_BINARY) #put the dist threshold After processing, it becomes a binary image of 0 and 255. At this time, we want to determine the foreground. sure_fg = np.uint8(sure_fg)</p> <h1>-----Calculate to determine the background, calculate the unknown area-------------------</h1> <p>sure_bg = cv2.dilate(img_opening, kernel=np.ones(( 3 , 3 ) , np.uint8), iterations= 3 ) #Determine the background by expanding the foreground unknown = cv2.subtract(sure_bg, sure_fg) #Determine the background image - determine the foreground image and generate an unknown area map</p> <h1>------ Label to determine the foreground image, adjust the labeling rules -----------------------</h1> <p>ret, labels = cv2.connectedComponents( sure_fg) #There are 24 coins, ret returns 25, labels is an array of int32 with shape (312, 252) labels = labels+ 1 # Label the background as 1, and the foreground objects as 2, 3, , 26 labels [unknown== 255 ]= 0 #0 represents unknown area</p> <h1>------------Use the watershed algorithm to segment the image---------------</h1> <p>img1 = img.copy() markers = cv2.watershed(img1,labels) img1[markers==-1]=[0,255,0]</p> <h1>Visualization :</h1> <p>plt.figure(figsize=( 12 , 6 )) plt.subplot( 251 ), plt.imshow(img[:,:,:: -1 ]) #Original image plt.subplot( 252 ), plt.imshow(img_gray, cmap= 'gray' ) #Grayscale image plt .subplot( 253 ), plt.imshow(otsu, cmap= 'gray' ) #otsu thresholded binary image plt.subplot( 254 ), plt.imshow(img_opening, cmap= 'gray' ) #Open operation to go Noised image plt.subplot( 255 ), plt.imshow(dist, cmap= 'gray' ) #distance image plt.subplot( 256 ), plt.imshow(sure_fg, cmap= 'gray' ) #determine foreground plt. subplot(257 ), plt.imshow(sure_bg, cmap= 'gray' ) #determine the background plt.subplot( 258 ), plt.imshow(unknown, cmap= 'gray' ) #determine the unknown area map plt.subplot( 259 ), plt .imshow(labels, cmap= 'gray' ) #label plot plt.subplot( 2 , 5 , 10 ), plt.imshow(img1[:,:,:: -1 ]) #Segmentation result plt.show() ```

A: We are here mainly for the purpose of denoising, because the foreground objects in this picture are glued together, and the outlines drawn after the objects are separated by the open operation are also separated. We hope that the outlines drawn should have to suffer. , because the contours of the coins are originally next to each other, so we have to find the contours of the coins in other ways! Use cv2.distanceTransform() to calculate the distance from the coin pixel to its nearest zero-valued pixel. According to this distance result, we can judge the true outline of the coin, that is, to find the definite foreground and definite background of the image.

2. Interactive foreground extraction: GrabCut algorithm

The interactive foreground extraction method is: first use a rectangular frame to frame the foreground object to be extracted, that is, first point out the approximate position range of the foreground area, and then the algorithm iteratively divides it until the best effect is achieved. But sometimes the effect of this approach is not ideal, and the user needs to intervene in the extraction. How to intervene? Making an extraction mask, that is, a template, is to make an arbitrary image of the same size as the original image. The white annotations in this image represent the foreground area to be extracted, and the black annotations represent the background area. The annotated image is then used as a mask to let the algorithm continue to iteratively extract the foreground.

• GrabCut algorithm principle:

1. Mark the approximate position of the foreground with a rectangular frame. Since there is a foreground and part of the background framed by the rectangular frame, the rectangular frame area is actually an undetermined area, and the outside of the rectangular frame is a definite background area.
2. Analyze the foreground and background in the area inside the frame according to the background-determining data outside the rectangular frame.
3. Model the foreground and background with a Gaussians Mixture Model (GMM). The GMM model builds a background-determining pixel distribution by modeling the background-determining pixel point data. Then it is judged to classify the relationship between the pixels in the box and the known classified pixels (foreground and background).
4. Generate a graph according to the pixel distribution, and the nodes in the graph are the pixels. In addition to the pixels, there are two nodes: the foreground node and the background node. All foreground pixels are connected to foreground nodes, and all background pixels are connected to background nodes. The weight of each pixel’s edge connecting to a foreground or background node is determined by the probability that the pixel is foreground or background.
5. In addition to being connected to foreground points or background nodes, each pixel in the graph is also connected to each other. The weight value of the edge connecting two pixels is determined by their similarity, the closer the color of the two pixels is, the higher the weight value of the edge.
6. After completing the node connection, the problem to be solved becomes a connected graph. The graph is cut according to the weight relationship of the respective edges, and different points are divided into foreground nodes and background nodes.
7. Repeat the above process until the classification converges.
• API: [mask] , bgdModel, fgdModel = cv2.grabCut(img, mask, rect, bgdModel, fgdModel, iterCount[, mode])
img: Input image, which is required to be an 8-bit 3-channel image.
mask: mask image, which is required to be an 8-bit single-channel image. This parameter is used to determine the foreground area, background area and uncertain area. It can be set to 4 forms:
cv2.GC_BGD: means to determine the background, or 0 to mean
cv2 .GC_FGD: Indicates the determined foreground, or 1 for
cv2.GC_PR_BGD: For possible background, or 2 for
cv2.GC_PR_FGD: For possible foreground, or 3 for
rect: Refers to the area containing foreground objects, format is (x,y,w,h), outside of this area is defaulted to determine the background
bgdModel: the array used internally by the algorithm, just create a numpy.float64 array of size (1,65).
fgdModel: The array used internally by the algorithm, just create a numpy.float64 array of size (1,65).
iterCount: indicates the number of iterations
mode: indicates the iteration mode, there are 4 modes:
cv2.GC_INIT_WITH_RECT: use a rectangular template
cv2.GC_INIT_WITH_MASK: use a custom template, all pixels outside the roi area will be automatically processed as background
cv2.GC_EVAL: repair mode
cv2.GC_EVAL_FREEZE_MODEL: use fixed mode

• The specific operation steps of interactive foreground extraction:

1. Load the image img to extract the foreground, which can be a 3-channel color image.
2. Make a mask image (also called a template in some places) mask. The size of the mask image mask should be the same size as img, but it must be two-channel. You can set all the values ​​of mask to 0 first, that is, assume that the background is all determined, and then modify some values ​​as needed later.
3. Specify the approximate position of the foreground object rect, the format of rect is (x, y, w, h), xy represents the coordinate value of the upper left corner pixel of the box, w represents the width of the box, and h represents the height of the box.
4. Create bgdModel, fgdModel objects.
5. Iterate with the cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT) algorithm. After the iteration, a new mask, bgdModel and fgdModel are generated. Among them, bgdModel and fgdModel cannot be used for the time being. What we need now is a newly generated mask. The size of this mask is still the same as img, and it is also of uint8 type, but the pixel values ​​in it are only 0, 1, 2, 3, four kinds. numerical value. Among them, 0 means a certain background, 1 means a certain foreground, 2 means a possible background, and 3 means a possible foreground. Therefore, we need to set the pixels of ==0 and ==2 in the mask to 0, and set the pixels of ==1 and ==3 to 1. This is the mask image for our foreground object extraction.
6. Perform a bitwise AND operation on the original image img and the mask image mask to extract the foreground.

```</p> <h1>Example 17.2 Using only a rectangular template, let the grabcut algorithm iterate to extract the foreground</h1> <p>import cv2 import numpy as np import matplotlib.pyplot as plt</p> <p>mask = np.zeros(img.shape[:2], np.uint8) #mask values ​​are all set to 0 bgdModel = np.zeros((1,65), np.float64) fgdModel = np.zeros((1,65), np.float64) rect = (50, 50, 420, 500) #rectangular box</p> <p>img0 = img.copy() #Draw the rectangle to see cv2.rectangle(img0, (50, 50), (420, 500), (0,255,0), 3) </p> <p>mask11 = np.where((mask1==0)|(mask1==2), 0,1).astype('uint8') img1 = img.copy() ogc1 = img1 * mask11 [:,:, e.g.newaxis]</p> <h1>Visualization:</h1> <p>plt.figure(figsize=(16,6)) plt.subplot(151), plt.imshow(img[:,:,::-1]) #Original image plt.subplot(152), plt.imshow(img0[:,:,::-1]) plt.subplot(153), plt.imshow(mask1, cmap='gray')<br /> plt.subplot(154), plt.imshow(mask11, cmap='gray')<br /> plt.subplot(155), plt.imshow(ogc1[:,:,::-1])<br /> plt.show() ```

```</p> <h1>Example 17.3 Use the labeled image as a template to iteratively extract the foreground</h1> <p>import cv2 import numpy as np import matplotlib.pyplot as plt</p> <p>mask = np.zeros(img.shape[: 2 ], np.uint8) #The mask values ​​are all set to 0, and then iterate bgdModel = np.zeros(( 1 , 65 ), np.float64) fgdModel = np.zeros((1,65), np.float64) rect = ( 50 , 50 , 420 , 500 ) #rectangular box</p> <p>img0 = img.copy() #Draw the rectangle to see cv2.rectangle(img0, ( 50 , 50 ), ( 420 , 500 ), ( 0 , 255 , 0 ), 3 ) </p> <p>mask1, bgd1, fgd1 = cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT) </p> <h1>-------------Make annotated image-----------------------------</h1> <p>mask2 = cv2.imread ( r'C:\Users\25584\Desktop\mask.png' ) #mask2_2.shape returns: (512, 512, 3) mask2_1 = cv2.cvtColor(mask2, cv2.COLOR_BGR2GRAY) mask3 = mask1.copy() mask3[mask2_1== 0 ]= 0 mask3[mask2_1== 255 ]= 1 </p> <h1>--------------Continue to iterate according to the label template------------ ---------</h1> <p>mask4, bgd4,fgd4 = cv2.grabCut(img, mask3, None , bgdModel, fgdModel, 5 , cv2.GC_INIT_WITH_MASK) mask5 = np.where((mask4== 0 )|(mask4== 2 ), 0 , 1 ).astype( 'uint8' ) # Merge possible regions img1 = img.copy() ogc = img1*mask5[:,:, np.newaxis] #Original image and template and operation to extract foreground objects</p> <h1>Visualization :</h1> <p>plt.figure(figsize=( 22 , 8 )) plt.subplot( 191 ), plt.imshow(img[:,:,:: -1 ]) #Original image plt.subplot( 192 ), plt.imshow(img0[:,:,:: -1 ]) # Box box plt.subplot( 193 ), plt.imshow(mask1, cmap= 'gray' ) #mask image mask1 after the first iteration plt.subplot( 194 ), plt.imshow(mask2[:,:,:: -1 ]) #mask image, original image, color image plt.subplot( 195 ), plt.imshow(mask2_1, cmap= 'gray' ) #mask image, grayscale image plt.subplot( 196 ), plt. imshow(mask3, cmap= 'gray' ) #mask1 and the manually annotated mask image use slices to extract image mask3 of annotation information plt.subplot( 197 ), plt.imshow(mask4, cmap= 'gray' ) #mask3 is the result of the algorithm iteration as the mask mask4 plt.subplot( 198 ), plt.imshow(mask5, cmap= 'gray' ) # Merge possible areas plt.subplot( 199 ), plt.imshow(ogc[:,:,:: -1 ]) #Extract the image plt.show() ```

```</p> <h1>Example 17.4 Extract image foreground directly using template</h1> <p>import cv2 import numpy as np import matplotlib.pyplot as plt</p> <p>img = cv2.imread(r'C:\Users\25584\Desktop\lenacolor.png') img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)</p> <p>mask = np.zeros(img.shape[:2], np.uint8) <br /> bgdModel = np.zeros((1,65), np.float64) fgdModel = np.zeros((1,65), np.float64) mask[ 30 : 512 , 50 : 400 ]= 3 #Possible area of ​​lena avatar mask[ 70 : 300 , 150 : 200 ]= 1 #Definite area of ​​lena avatar, if this area is not set, the extraction of avatar is incomplete mask1=mask.copy() mask2,bgd,fgd = cv2.grabCut(img, mask1, None, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_MASK) mask3 = np.where((mask2==2)|(mask2==0), 0,1).astype('uint8') ogc = img*mask3[:,:,np.newaxis]</p> <h1>Visualization :</h1> <p>plt.figure(figsize=( 12 , 4 )) plt.subplot( 151 ), plt.imshow(img[:,:,:: -1 ]) #Original image plt.subplot( 152 ), plt.imshow(mask,cmap= 'gray' ) #Possible area and determination Area plt.subplot( 153 ), plt.imshow(mask2,cmap= 'gray' ) #mask after algorithm iteration plt.subplot( 154 ), plt.imshow(mask3,cmap= 'gray' ) #after merging possible areas mask plt.subplot( 155 ), plt.imshow(ogc[:,:,:: -1 ])<br /> plt.show() ```