Predicting Solar Cell Efficiency with Machine Learning

Machine learning (ML) has been the hot buzzwords for many applications, from speech recognition, search engines optimization to self-driving vehicles. After completing several online ML courses, particularly Andrew Ng’s Machine Learning and Deep Learning specializations, I wonder if I could also apply ML concepts to my professional works, particularly on solar cell research and manufacturing.

As a start, I experimented several simple machine learning models to predict the performance of a crystalline Si solar cell. Here, input X is the thickness of the so-called anti-reflection layers on the front surface of a Si solar cell. Output Y is the solar cell efficiency.

The anti-reflection layer is assumed to be made from silicon nitride (SiNx), which is traditionally used in many Si solar cell architectures. For simplicity, only the thickness of SiNx is varied with its refractive index value fixed at 2.03.

Data source

The input X (SiNx thickness) and output Y (Solar cell efficiency) are generated using simulation software Quokka 2.0, OPAL 2 and Wafer Ray Tracer. These softwares are available for free from PVlighthouse.

A total of 50 samples (40 for training and 10 for validation/dev) were generated for this project. Simulation device parameters are loosely based on the previous scientific publication that I have co-authored [1]

The two models experimented here are:
1. Linear regression with Tensorflow
2. Artificial Neural Networks with Tensorflow.Keras

Link to data file and source codes https://github.com/KengSiewChan/PVML

Figure 1. Data that is used to train the machine learning models in this experiment.

Figure 2. The solar cell device structure that is used for this experiment. For simplicity, only the thickness of front SiNx is varied to predict the solar cell efficiency. For more information about this so-called PERC solar cell structure, please visit my medium post https://medium.com/@kengsiewchan/perc-solar-cells-3eb275804ded

Linear Regression with Tensorflow

The popular python based ML framework Tensorflow is used here to train a linear regression model. Due to the non-linear relationship between X and Y (Figure 1), the model uses a multivariate hypothesis as followed:

$h(x) = x + x^2 + x^3 + x^4 + bias$

Learning curve for the machine learning model with linear regression — Figure 3. Cost vs iteration for gradient descent with Adam optimizer for training data

Figure 3 shows the model loss learning curve at a learning rate of 0.0005. The model uses the more efficient Adam optimizer to speed up gradient descent. Not surprisingly, Adam optimizer is able to achieve minimal cost with <2000 iterations. In contrast, gradient descent optimizer takes >4000 iterations to achieve minimal cost, as shown in Figure 4.

Learning curve difference between gradient descent vs Adam optimizer — Figure 4. Learning curves for gradient descent vs Adam optimizer

Data vs prediction

Figure 5. Training data (scattered) vs prediction (line) for linear regression model

Figure 6. Cross validation data (scattered) vs prediction (line) for linear regression model

As shown in Figures 5 and 6, a relatively good fit between data and prediction is accomplished with just a simple linear regression model trained with Tensorflow.

The table below shows the mean absolute percent error (MAE) for the training and cross validation sets. Furthermore, training error > dev error. Therefore, variance is present in the model. This may be improved with larger data size, regularization and/or better ML Model/Architecture.

	Mean Absolute Percent Error
Training set	~0.14
Validation/dev set	~0.40

Artificial Neural Network with Tensorflow.Keras

Apart from linear regression, I also experimented with Artificial Neural Network (ANN) with Tensorflow.Keras to train the data. After several tests, an ANN model is built with 2 hidden layers of 30 and 10 nodes, respectively. Here, ‘RELU’ functions are used as activation in the hidden layers. In contrast, the output layer uses a linear activation to produce the Y variable. Once again, the model uses Adam optimizer to speed up gradient descent.

Figure 7. Cost vs iteration for gradient descent with Adam optimizer

Figure 8: Mean Absolute Percent Error (MAE) curves

Figure 7 shows the model loss learning curve for both the train and validation sets. Similar to the linear regression model as presented earlier, with Adam optimizer, minimal cost is achieved with <2000 iterations.

Data vs prediction

Figure 9. Training data (scattered) vs model prediction (line) for ANN model

Figure 10. Cross validation data (scattered) vs model prediction (line) for ANN model

ANN produces a model with a slightly better fit between data and prediction, as shown in Figures 9 and 10. In addition, MAE values are also smaller than the data trained with linear regression. Similar to previous linear regression model, validation error > training error. Hence, this suggests the presence of variance in the model. This may be addressed with larger data size, regularisation and/or better model optimisation.

	Mean Absolute Percent Error
Training set	~0.014
Validation/dev set	~0.080

Linear Regression with sklearn

Another way to build and train learning regression model is with sklearn. This is quick and easy way, especially with the simple 2-dimensional data that are demonstrated here. Similar errors as both the ANN and tensorflow linear regression shown earlier were obtained. The source codes are also presented in Github page as above. https://github.com/KengSiewChan/PVML

Conclusion

The above quick tests show that it is possible to apply machine learning for quick prediction and screening of new designs for Si solar cells.

Particularly, in a manufacturing environment where large sets of real solar cell data are available, we can apply machine learning to identify the most suitable process conditions to optimise yield, performance and costs for solar cell manufacturing.

References

[1] P. Zheng, J. Xu, H. Sun, F. Zhang, Y. Guo, H. Pan, K. S. Chan, J. Jin, H. Wang, W. Chen, X. Zhang and H. Jin., “21.63% industrial screen-printed multi-crystalline Si solar cell”, Physica Status Solidi – Rapid Research Letters, 11, 1600453 (2017)