The difference between tensorflow 1.X version and 2.X version sparse categorical crossentropy loss function stepping on thunder

Convolutional Neural Network

The 2.X version of [tensorflow] has an Input layer

# Create the Student Model
student = keras.Sequential(
        [
            keras.Input(shape=(28,28,1)),
            layers.Conv2D(16,(3,3),strides = (2,2),padding = "same"),
            layers.LeakyReLU(alpha=0.2),
            layers.MaxPooling2D(pool_size=(2,2),strides=(1,1),padding="same"),
            layers.Conv2D(32,(3,3),strides=(2,2),padding="same"),
            layers.Flatten(),
            layers.Dense(10),

        ],
        name = "student"  # Add this line to print the model name when printing the model structure
)
student.summary() # print the structure of the current model

1.X version will report an error

TypeError: The added layer must be an instance of class Layer. Found: Tensor("input_3:0", shape=(?, 28, 28, 1), dtype=float32)

Solution: put the input in the first conv, the input will turn the variable into a tensor, affecting the subsequent layers

from keras import layers
student = keras.Sequential(
        [
            # keras.Input(shape=(28,28,1)), # The version problem reports an error, the input is changed to conv 
            layers.Conv2D( 16 ,( 3 , 3 ),input_shape=( 28 , 28 , 1 ),strides = ( 2 , 2 ), padding = "same" ),
            layers.LeakyReLU(alpha=0.2),
            layers.MaxPooling2D(pool_size=(2,2),strides=(1,1),padding="same"),
            layers.Conv2D(32,(3,3),strides=(2,2),padding="same"),
            layers.Flatten(),
            layers.Dense(10),

        ],
        name = "student" 
)
student.summary() # print the structure of the current model

Compile phase
2.X version of tensorflow

teacher.compile( 
        optimizer = keras.optimizers.Adam(), 
        loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
        metrics = [keras.metrics.SparseCategoricalAccuracy()], 
        ) 

# train and evaluation 
teacher.fit(x_train, y_train,epochs = 1) #The actual situation will train more epochs, such as 100 or more 
teacher.evaluate(x_test,y_test)

An error will be reported in the 1.X version

Solution:

student.compile(
        optimizer = keras.optimizers.Adam(),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
        )

# train and evaluation
student.fit(x_train,y_train,epochs = 3) 
student.evaluate(x_test,y_test)

But watch out! ! !
If it is just a simple change, the training will not improve the acc at all:
it can be seen that no matter how many rounds of training the acc does not increase, the loss does not decrease.

But it loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)is normal in the previous training process:
acc easily goes up to 0.85

The reason: the original problem lies in the parameter logit=True, logit=True is equivalent to adding a softmax output to the output, mapping the value of Dense output to the range of [0,1], if you directly loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)replace it with loss='sparse_categorical_crossentropy'Then you need to receive a softmax activation function after the Dense layer of the model:

from keras import layers
student = keras.Sequential(
        [
            # keras.Input(shape=(28,28,1)), # The version problem reports an error, the input is changed to conv 
            layers.Conv2D( 16 ,( 3 , 3 ),input_shape=( 28 , 28 , 1 ),strides = ( 2 , 2 ), padding = "same" ),
            layers.LeakyReLU(alpha=0.2),
            layers.MaxPooling2D(pool_size=(2,2),strides=(1,1),padding="same"),
            layers.Conv2D(32,(3,3),strides=(2,2),padding="same"),
            layers.Flatten(),
            layers.Dense(10,activation="softmax"),

        ],
        name = "student" 
)
student.summary() # print the structure of the current model

In this way, you can train normally:
you can print the output of the [model] to see what the output without activation=”softmax” looks like

y_pre = my_model.predict(np.reshape(x_train[:10],(-1,28,28,1))) # The last dense layer does not add softmax, and gets a very large range of values, not mapped to [ 0,1] between 
print(y_pre)

The output after adding softmax

y_pre = my_model.predict(np.reshape(x_train[:10],(-1,28,28,1)))
print(y_pre)

Leave a Comment

Your email address will not be published. Required fields are marked *