In part 1, ANN and linear regression models were built with Tensorflow and Keras to predict the efficiency of crystalline Si solar cells when the thickness of front silicon nitride anti-reflection layer changes.
As also discussed in part 1, the dev error is larger than the training error. This suggests the presence of variance. This may be resolved with larger data set, regularization and/or better models.
Here I experimented with L2 regularization to the ANN model in part 1 to improve variance. However, this will be at the expense of the accuracy to the training data – The infamous bias-variance tradeoff. Nevertheless, this could still be further resolved by having larger training data, which will be shown in the later posts.
Initialization
For this test, weight values were initialized with a fixed random seed as followed:
From the original model:
model.add(layers.Dense(30, input_dim=2, activation='relu'))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='linear'))
Change to:
model.add(layers.Dense(30, input_dim=2, kernel_initializer=test_initializer, activation='relu'))
model.add(layers.Dense(10, kernel_initializer=test_initializer, activation='relu'))
model.add(layers.Dense(1, kernel_initializer=test_initializer, activation='linear'))
where
test_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=101)
The training and dev errors become
Error | |
Train | 0.0399 |
Dev | 0.0900 |
L2 Regularization
L2 regularization are added to the hidden layers, but not the output layer. This is because the output layer has a linear activation function with only one node. Therefore, the effect from L2 regularization on the output layer will not be as significant as the ones applied to the densely connected hidden layers.
As shown in part 1, the neural network has 2 hidden layers. The first layer has 30 nodes, while the 2nd layer has 10 nodes.
L2 regularization strength is tuned to improve the variance performance. For this test, variance (%) is defined as:
To add L2 regularization to the original codes, we first defined the L2 strength that we would like to test:
#L2 regularization parameter for hidden layer 1
reg_para_s_1 = [0, 0.01, 0.05, 0.1, 0.5, 1, 5]
#L2 regularization parameter for hidden layer 2
reg_para_s_2 = [0, 0.01, 0.05, 0.1, 0.5, 1, 5]
Then, for 1st hidden layer, the code is changed from
model.add(layers.Dense(30, input_dim=2, kernel_initializer=test_initializer, activation='relu'))
to include L2 regularizers
model.add(layers.Dense(30, input_dim=2, kernel_regularizer=regularizers.l2(reg_para_1), kernel_initializer=test_initializer, activation='relu'))
Similarly, for hidden layer 2, from
model.add(layers.Dense(10, kernel_initializer=test_initializer, activation='relu'))
to
model.add(layers.Dense(10, kernel_regularizer=regularizers.l2(reg_para_2), kernel_initializer=test_initializer, activation='relu'))
Test result
As shown in Figure 1, variance expectedly decreases with L2 regularization strength. In addition, a more pronounced improvement is observed for the 1st hidden layer than the 2nd hidden layer.
This is because the 1st hidden layer has a larger number of nodes (20), and hence, the number of weights, if compared to the 2nd hidden layer (10). As a result, weight decay from L2 regularization will be more significant for layers with larger number of nodes.
As shown in Figures 2 and 3, both train and development errors increase with larger L2 regularization. The criteria for performance after L2 regularization is set as :
- Smaller variance than the original model
- Train error < 0.1
- Dev error < 0.1
The above condition is roughly satisfied with L2 regularization strength presented in the table as followed. This improves the variance by about 20% absolute, but with higher train and development errors.
No regularization | With L2 regularization | |
Layer 1 L2 strength | 0 | 0.05 |
Layer 2 L2 strength | 0 | 0.05 |
Train error | 0.0339 | 0.0593 |
Dev error | 0.0900 | 0.0996 |
Variance | ~62.3% | ~40.5% |
Source code and data file
Source codes and data file on Github.
https://github.com/KengSiewChan/PVML
- TF_ANN.py – ANN with no regularization
- TF_ANN_Reg.py – ANN with regularization