The International Information Center for Geotechnical Engineers

# Coseismic landslide hazard modeling methodologies - 2.a Methods and Analysis: Logistic Regression

#### 2. Landslide Hazard Modeling Methods

2.a Logistic Regression

The spatial and temporal modeling of landslide susceptibility by means of logistic regression is the most commonly used statistical prediction technique (Nowicki et al., 2014).  Logistic regression modeling of landslide susceptibility is defined by formula coefficients and the linear combination of predictor variables, which predict the presence or absence of a landslide in a particular area. Brenning (2005) found when modeling landslide hazard in both spatial and temporal domains, logistic regression with step-wise variable selection resulted in a lower rate of conditional error compared to other modeling techniques, but faced limitations due to the linearity of the model, causing issues with variable selection and term interaction, limiting the number of variables that should be included.

2.a.I Application: Nowicki et al. 2014

The model developed by Nowicki et al. (2014) for coseismic landslide susceptibility focuses on an individual earthquake and the resulting spatial distribution of slope failure in near-real time. Using a logistic regression, Nowicki et al. (2014) select predictor variables that produce the best landslide prediction model with the lowest rate of error, with the intent to incorporate the near real-time hazard assessment on a global scale.

2.a.II Methodology: The following procedures follow protocol of Nowicki et al., 2014.

Nowicki et al. (2014) uses the following predictor variables: Peak Ground Acceleration (PGA), topographic slope, geologic strength, and water saturation. These variables are inputted into the logistic regression to test the model using the landslide inventory from the Wenchuan, China 2008 earthquake.

The topographic slope for each location was derived from Shuttle Radar Topography Mission elevation data and classified as median, minimum, and maximum. Slope classification aids in mechanical landslide modeling and can be used as a statistical restraint. Material strength is quantified through cohesion and friction angle values, representing the spatial variation in rock strength. Soil wetness is determined through topography using the Compound Topographic Index (CTI), in which lower water saturation is the result of higher slopes and larger drainage basin, and climate interaction does not play a role in saturation. Landslide inventory data was compiled using field-based methods, as well as remote sensing techniques. Ground motion was then incorporated into the model using USGS ShakeMap Atlas 2.0 to estimate the PGA and Peak Ground Velocity (PGV) for the seismic event.  PGA and PGV can be used to determine the magnitude and location of fault rupture, which are important for accurate estimation of ground motion at a particular location.

Following the assignment and calculation of predictor variables, Nowicki et al. (2014) processed all data and converted it into a map projection with cell divisions of 1 km2. In order to incorporate this into a logistic regression, cells were classified on a binary basis of landslide or no landslide. Performing iterations for various combinations of predictor variables, Nowicki et al. (2014) determined a best-fit regression model, shown below:

1)       Z=a+b(PGA)+c(maximum slope)+d(friction)+e(CTI)+f(PGA*slope)

Where, a, b, c, d, e, and f are regression coefficients

P(z) represents the predicted probability of a landslide. From P(t)-1/(1+exp(-z)), where z=a+bx1+cx2+dx3+...

CTI: Compound Topography Index = ln(A/tan(slope)) which serves as a proxy for soil wetness.

Nowicki et al. (2014) then calibrated the model using multiple inventories for landslide hazard assessment on a global scale. Incorporated landslide inventories were complete (all landslides mapped for a specific study area) and comprehensive (all landslides mapped exceed specified size) consisting of data from the Chi-Chi, Taiwan (1999), Northridge, California (1994), Niigata-Chuestsu, Japan (2004), and Guatemala (1976) events.

Figure 1. Taken directly from Nowicki et al., 2014. Results obtained from logistic regression model of landslides resulting from the Wenchuan earthquake in 2008. Figures depicting Peak Ground Acceleration (PGA), slope, Compound Topography Index (CTI), and friction are shown, as well as the resulting areas of observed landslides compared to areas of predicted landslide probability.

2.a.III Analysis

Evaluation of model accuracy is difficult to determine. Therefore, sensitivity (correct prediction of a landslide), specificity (correct prediction of a non-landslide), and overall accuracy were computed for various landslide probabilities. Nowicki et al. (2014) found that probability values between 10% and 20% yielded the most correct landslide/no-landslide prediction cells. Furthermore, it was found that the regional scale landslide hazard analysis predicted landslide occurrences more correctly than the global model. Inaccuracies are most likely due to differences in earthquake events and error of landslide inventories. For example, amalgamation is a common error and occurs when two individual landslides are mapped as one larger failure, skewing statistics (Li et al., 2014). Failure may also be due to other factors that were neglected in this regression, such as previous slope failures, or environmental contributions such as pre-event precipitation. Nowicki et al. (2014) found that the model was more strongly correlated with PGA and slope rather than friction and CTI, however separate regression models that take certain geomorphic or tectonic activity into account for more accurate landslide prediction may be needed.