Best practices in combining multi-hazard damage imagery training datasets for damage detection for a deep learning neural network

January 6, 2021
by Earth Numerics Team

ABSTRACT

Accurate and timely damage assessment is important after any natural disaster event. Accurate damage assessments enhance the efficient distribution of resources. Building damage levels are an important outcome of damage assessment, especially in urban areas. Although at present, most building damage assessments are collected manually from post-disaster satellite images or aerial photographs, efforts are now underway to automate the process. Some of these efforts use deep learning algorithms to first identify buildings and then to classify them into damage levels. One of these efforts initiated in 2019, through the Defense Innovation Unit (DIU) and with Humanitarian Assistance and Disaster Recovery (HADR) organizations, created a multi-hazard training dataset using high-resolution satellite imagery from pre- and post-event imagery (xBD). Across 19 natural disaster events including tornados, wildfire, earthquake, hurricanes, volcanos, flood, and tsunami, buildings were identified and classified into four classes: no-damage, minor damage, major damage, and destroyed. Participants in the challenge were expected to use deep learning algorithms to perform the classification. They were also provided with a base classification algorithm; which participants were encouraged to improve. The base algorithm contained RESNET50 trained on ImageNet database and three additional convolution and max pooling layers. This project analyzes the quality of the training dataset, discusses the pros and cons of combining training dataset across multiple natural disaster events and provides recommendations on using the provided training dataset to optimize classification accuracy. Specifically, we will provide recommendations on creating class balance in the training dataset, and which damage labels are the most identifiable. We will also provide an assessment on which natural disasters lead to damage that is most identifiable using satellite imagery and which natural disasters lead to less accurate damage assessments. We will also examine pooling training data across natural disasters to achieve more accurate classifications. Keywords – Damage assessment, Deep learning, Imbalance, Overfitting, xBD

CLASS IMBALANCE IN A DEEP LEARNING DATASET

The exercise of collecting labels for a deep learning classification of satellite images is an expensive effort. It is expensive partly because it may require site-visits to places which may not be easily accessible. These site-visits are sometimes needed to confirm labels that are not identifiable on satellite images. When site-visits are not needed, the effort may require crowdsourcing, which uses experts to identify and label features of interest. Site-visits and crowdsourcing have high costs. For these reasons, it is challenging to collect balanced dataset for training a deep neural network. Even when the cost is not a problem, these images may not be available due to lack of coverage or poor cloud cover. The prohibitive nature of this exercise makes organizations or researchers settle for or look for approaches that use imbalanced datasets to train a deep learning model. Sometimes, disparate data are combined to increase the number of training dataset. For example, xBD dataset, combines data across 6 different disaster types and 19 different disaster events (Ritwik Gupta, 2019).

TECHNIQUES FOR SOLVING PROBLEMS WITH IMBALANCE DATASETS

In an imbalanced dataset, the distribution of classes is uneven. Johnson et al indicated that there are three groups of categories for handling class imbalance in machine learning: “data-level techniques, algorithm-level methods, and hybrid approaches” (Johnson, 2019). The data-level techniques use different sampling approaches to balance input data. The algorithm approaches work on the algorithms themselves by adjusting weights, costs and/or the algorithm itself. The hybrid approaches combine both data and algorithm level approaches. Boyle suggested another approach, which involves creating synthetic samples to augment the input dataset (Boyle, 2019). Synthetic Minority Over-sampling Technique (SMOTE) is an example of synthetic over-sampling technique that strengthen class boundary, reduces overfitting and increases class discrimination. Another approach for dealing with imbalance dataset is by sampling data from an auxiliary domain. Transfer learning is an example of such techniques. It uses information gained from training a deep learning algorithm to solve similar problems. IMBALANCE IN XBD DATASET xBD labels were classified into no-damage (0), minor damage (1), major damage (2), and destroyed (3). Figure 1 shows that xBD dataset is imbalance since (0), the “no-damage” labels, were overrepresented in the dataset. Knowing how damage levels were distributed within each event is a good starting point for determining whether disaster events will perform better separately or together. It could also help to determine the influences of each event on the model’s performance. Figure 2 shows the distribution of damage levels across the different disasters.

damage scale image — Figure 1. Imbalance in xBD dataset

Figure 2. Distribution of damage levels across event types

DATA PROCESSING AND MODEL FINE-TUNING

The neural network was trained on images of buildings and their labels. So, to extract the buildings from the images, the coordinates of buildings were extracted from label files, which came with xBD dataset and then converted into rectangular bounding boxes. These boxes were used to clip the images containing the buildings. Additional 80% offset was added to the bounding boxes to capture the area surrounding the buildings. Along with buildings’ images, a CSV file was also created to store buildings’ uids and their damage levels. Because the xBD base classification algorithm overfits, a new model was created by modifying only the top layers of a ResNet50. With the weights from ImageNet retained, five new convolution, batch normalization, and max pooling layers were added on top of the ResNet50. One dropout layer, with a rate of 0.5 was also added in-between the fully connected layers to reduce overfitting and improve performance. Dropout was used to randomly change the architecture of the network such that the architecture will not easily overfit to the training dataset. More could be done to improve the performance of this model, but it is good enough for this paper as a baseline for classifying all the disaster types separately, and pooled.

ANALYSIS OF RESULTS AND RECOMMENDATIONS

The neural network was trained on balanced datasets because the model overfitted and performed poorly when applied to unbalanced datasets. The data performed better when balanced by combination of oversampling and under sampling approaches. These were achieved by adjusting the weights of the inputs into the model to be inversely proportional to the class frequencies in the input data. Figure 3 shows the number of buildings in each of the classes after the weights were applied. These approaches helped to reduce influences of overrepresented classes and improved the contributions of the underrepresented classes. Overfitting was also controlled by combining longer epochs with early stopping; and by adding batch normalization and dropout layers.

Figure 3. Number of buildings in each of the classes after weights were applied

The wind-related dataset had the best F1 scores. It performed well by itself and when pooled with other datasets (Figures 4 and 5). This is probably because it had the highest count of buildings and the best distribution of damage levels (Figures 1 and 2). The earthquake dataset had the worst performance. The dataset is highly imbalance, with only the ‘no- damage’ level represented. Although its performance improved when pooled with other datasets, but only for ‘no-damage’ class. Flooding and Tsunami datasets also gained from the merger; their performances improved for some of their classes. Wind-related, fires, and volcanos performed better when used by separately. Fires and volcanoes had high F1 scores for ‘destroyed’ buildings when separated and when pooled. This is probably because buildings destroyed by volcanoes and fires were highly identifiable on satellite images. However, disasters such as flooding and earthquake, whose damage levels were harder to identify on satellite images benefitted from the pooling.

Figure 5. F1 scores for when datasets were pooled

Figure 4. F1 scores for when datasets were used separately

CONCLUSIONS

The task of training a neural network to classify buildings’ damage levels due to a disaster is a challenging task. This challenge is even compounded by an imbalanced dataset, which causes overfitting. Therefore, it is helpful to balance the dataset to reduce overfitting and achieve a good performance. Transfer learning can also help to reduce overfitting, especially when fine-tuned with batch normalization and dropout layers. In addition, a model performance also improves when long epochs are combined with early stopping. Based on the results, xBD dataset should only be combined when there are performances to be gained. For instance, disasters like flooding and earthquake benefited from pooling, but wind-related disaster performed better separately.

ACKNOWLEDGEMENT

This work was supported by Award HM04762010006 through the National Geospatial Intelligence Agency NURI Program.

REFERENCES

Boyle, T. (2019, February 3). Methods For Dealing With Imbalanced Data. pp. https://towardsdatascience.com/methods-for- dealing-with-imbalanced-data-5b761be45a18. Johnson, J. K. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 27. Ritwik Gupta, R. H. (2019). xBD: A Dataset for Assessing Building Damage from Satellite Imagery. Computer Vision and Pattern Recognition.