Research article |
|
|
|
|
Estimation of soil organic matter in the Ogan-Kuqa River Oasis, Northwest China, based on visible and near-infrared spectroscopy and machine learning |
ZHOU Qian1,2,3, DING Jianli1,2,3,*(), GE Xiangyu1,2,3, LI Ke1,2,3, ZHANG Zipeng1,2,3, GU Yongsheng1,2,3 |
1College of Geography and Remote Sensing Science, Xinjiang University, Urumqi 830046, China 2Xinjiang Key Laboratory of Oasis Ecology, Xinjiang University, Urumqi 830046, China 3Key Laboratory of Smart City and Environment Modelling of Higher Education Institute, Xinjiang University, Urumqi 830046, China |
|
|
Abstract Visible and near-infrared (vis-NIR) spectroscopy technique allows for fast and efficient determination of soil organic matter (SOM). However, a prior requirement for the vis-NIR spectroscopy technique to predict SOM is the effective removal of redundant information. Therefore, this study aims to select three wavelength selection strategies for obtaining the spectral response characteristics of SOM. The SOM content and spectral information of 110 soil samples from the Ogan-Kuqa River Oasis were measured under laboratory conditions in July 2017. Pearson correlation analysis was introduced to preselect spectral wavelengths from the preprocessed spectra that passed the 0.01 level significance test. The successive projection algorithm (SPA), competitive adaptive reweighted sampling (CARS), and Boruta algorithm were used to detect the optimal variables from the preselected wavelengths. Finally, partial least squares regression (PLSR) and random forest (RF) models combined with the optimal wavelengths were applied to develop a quantitative estimation model of the SOM content. The results demonstrate that the optimal variables selected were mainly located near the range of spectral absorption features (i.e., 1400.0, 1900.0, and 2200.0 nm), and the CARS and Boruta algorithm also selected a few visible wavelengths located in the range of 480.0-510.0 nm. Both models can achieve a more satisfactory prediction of the SOM content, and the RF model had better accuracy than the PLSR model. The SOM content prediction model established by Boruta algorithm combined with the RF model performed best with 23 variables and the model achieved the coefficient of determination (R2) of 0.78 and the residual prediction deviation (RPD) of 2.38. The Boruta algorithm effectively removed redundant information and optimized the optimal wavelengths to improve the prediction accuracy of the estimated SOM content. Therefore, combining vis-NIR spectroscopy with machine learning to estimate SOM content is an important method to improve the accuracy of SOM prediction in arid land.
|
Received: 12 November 2022
Published: 28 February 2023
|
Corresponding Authors:
*DING Jianli (E-mail: watarid@xju.edu.cn)
|
|
|
[1] |
Araújo M C U, Saldanha T C B, Galvão R K H, et al. 2001. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 57(2): 65-73.
doi: 10.1016/S0169-7439(01)00119-8
|
|
|
[2] |
Araújo S R, Wetterlind J, Demattê J A M, et al. 2014. Improving the prediction performance of a large tropical vis-NIR spectroscopic soil library from Brazil by clustering into smaller subsets or use of data mining calibration techniques. European Journal of Soil Science, 65(5): 718-729.
doi: 10.1111/ejss.12165
|
|
|
[3] |
Bao N S, Wu L X, Ye B Y, et al. 2017. Assessing soil organic matter of reclaimed soil from a large surface coal mine using a field spectroradiometer in laboratory. Geoderma, 288: 47-55.
doi: 10.1016/j.geoderma.2016.10.033
|
|
|
[4] |
Chang W C, Laird D A, Mausbach M J, et al. 2001. Near-infrared reflectance spectroscopy-principal components regression analyses of soil properties. Soil Science Society of America Journal, 65(2): 480-490.
doi: 10.2136/sssaj2001.652480x
|
|
|
[5] |
Chen Y, Ma L X, Yu D S, et al. 2022. Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests. Ecological Indicators, 135: 108545, doi: 10.1016/j.ecolind.2022.108545.
doi: 10.1016/j.ecolind.2022.108545
|
|
|
[6] |
Chen S C, Xu H Y, Xu D Y, et al. 2021. Evaluating validation strategies on the performance of soil property prediction from regional to continental spectral data. Geoderma, 400: 115159, doi: 10.1016/j.geoderma.2021.115159.
doi: 10.1016/j.geoderma.2021.115159
|
|
|
[7] |
Ding J L, Yu D L. 2014. Monitoring and evaluating spatial variability of soil salinity in dry and wet seasons in the Werigan-Kuqa Oasis, China, using remote sensing and electromagnetic induction instruments. Geoderma, 235-236: 316-322.
|
|
|
[8] |
Dharumarajan S, Lalitha M, Gomez C, et al. 2022. Prediction of soil hydraulic properties using VIS-NIR spectral data in semi- arid region of Northern Karnataka Plateau. Geoderma Regional, 28: e00475, doi: 10.1016/j.geodrs.2021.e00475.
doi: 10.1016/j.geodrs.2021.e00475
|
|
|
[9] |
Ge X Y, Ding J L, Jin X L, et al. 2021. Estimating agricultural soil moisture content through UAV-based hyperspectral images in the arid region. Remote Sensing, 13(8): 1562, doi: 10.3390/rs13081562.
doi: 10.3390/rs13081562
|
|
|
[10] |
Ge X Y, Ding J L, Teng D X, et al. 2022a. Exploring the capability of Gaofen-5 hyperspectral data for assessing soil salinity risks. International Journal of Applied Earth Observation and Geoinformation, 112: 102969, doi: 10.1016/j.jag.2022.102969.
doi: 10.1016/j.jag.2022.102969
|
|
|
[11] |
Ge X Y, Ding J L, Teng D X, et al. 2022b. Updated soil salinity with fine spatial resolution and high accuracy: The synergy of Sentinel-2 MSI, environmental covariates and hybrid machine learning approaches. CATENA, 212: 106054, doi: 10.1016/j.catena.2022.106054.
doi: 10.1016/j.catena.2022.106054
|
|
|
[12] |
Han L J, Ding J L, Wang J J, et al. 2022. Monitoring oasis cotton fields expansion in arid zones using the Google Earth Engine: A case study in the Ogan-Kucha River oasis, Xinjiang, China. Remote Sensing, 14(1): 225, doi: 10.3390/rs14010225.
doi: 10.3390/rs14010225
|
|
|
[13] |
Hong Y S, Chen Y Y, Shen R L, et al. 2021. Diagnosis of cadmium contamination in urban and suburban soils using visible-to-near-infrared spectroscopy. Environmental Pollution, 291: 118128, doi: 10.1016/j.envpol.2021.118128.
doi: 10.1016/j.envpol.2021.118128
|
|
|
[14] |
Jin X L, Du J, Liu H J, et al. 2016. Remote estimation of soil organic matter content in the Sanjiang Plain, Northest China: The optimal band algorithm versus the GRA-ANN model. Agricultural and Forest Meteorology, 218-219: 250-260.
|
|
|
[15] |
Keskin H, Grunwald S, Harris W G. 2019. Digital mapping of soil carbon fractions with machine learning. Geoderma, 339: 40-58.
doi: 10.1016/j.geoderma.2018.12.037
|
|
|
[16] |
Kursa M B, Jankowski A, Rudnicki W. 2010. Boruta-a system for feature selection. Fundamenta Informaticae, 101(4): 271-285.
doi: 10.3233/FI-2010-288
|
|
|
[17] |
Li X H, Ding J L, Liu J, et al. 2021. Digital mapping of soil organic carbon using sentinel series data: A case study of the Ebinur Lake Watershed in Xinjiang. Remote Sensing, 13(4): 769, doi: 10.3390/rs13040769.
doi: 10.3390/rs13040769
|
|
|
[18] |
Li Q Q, Huang Y, Song X Z, et al. 2019. Moving window smoothing on the ensemble of competitive adaptive reweighted sampling algorithm. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 214: 129-138.
doi: 10.1016/j.saa.2019.02.023
|
|
|
[19] |
Liu J B, Dong Z Y, Xia J S, et al. 2021. Estimation of soil organic matter content based on CARS algorithm coupled with random forest. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 258: 119823, doi: 10.1016/j.saa.2021.119823.
doi: 10.1016/j.saa.2021.119823
|
|
|
[20] |
Luo C, Wang Y A, Zhang X L, et al. 2022. Spatial prediction of soil organic matter content using multiyear synthetic images and partitioning algorithms. CATENA, 211: 106023, doi: 10.1016/j.catena.2022.106023.
doi: 10.1016/j.catena.2022.106023
|
|
|
[21] |
Ma G L, Ding J L, Han L J, et al. 2021. Digital mapping of soil salinization based on Sentinel-1 and Sentinel-2 data combined with machine learning algorithms. Regional Sustainability, 2(2): 177-188.
doi: 10.1016/j.regsus.2021.06.001
|
|
|
[22] |
Mcbratney A, Field D J, Koch A. 2014. The dimensions of soil security. Geoderma, 213: 203-213.
doi: 10.1016/j.geoderma.2013.08.013
|
|
|
[23] |
Mesquita D P P, Gomes J P P, Rodrigues L R, et al. 2018. Building selective ensembles of Randomization Based Neural Networks with the successive projections algorithm. Applied Soft Computing, 70: 1135-1145.
doi: 10.1016/j.asoc.2017.08.007
|
|
|
[24] |
Nocita M, Stevens A, Toth G, et al. 2014. Prediction of soil organic carbon content by diffuse reflectance spectroscopy using a local partial least square regression approach. Soil Biology and Biochemistry, 68: 337-347.
doi: 10.1016/j.soilbio.2013.10.022
|
|
|
[25] |
Savitzky A, Golay M J E. 1964. Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8): 1627-1639.
doi: 10.1021/ac60214a047
|
|
|
[26] |
Schomberg J, Ziogas A, Anton-Culver H, et al. 2018. Identification of a gene expression signature predicting survival in oral cavity squamous cell carcinoma using Monte Carlo cross validation. Oral Oncology, 78: 72-79.
doi: S1368-8375(18)30021-6
pmid: 29496061
|
|
|
[27] |
Shi T Z, Chen Y Y, Liu H Z, et al. 2014. Soil organic carbon content estimation with laboratory-based visible-near-infrared reflectance spectroscopy: Feature selection. Applied Spectroscopy, 68(8): 831-837.
doi: 10.1366/13-07294
pmid: 25061784
|
|
|
[28] |
Shi T Z, Wang J J, Chen Y Y, et al. 2016. Improving the prediction of arsenic contents in agricultural soils by combining the reflectance spectroscopy of soils and rice plants. International Journal of Applied Earth Observation and Geoinformation, 52: 95-103.
doi: 10.1016/j.jag.2016.06.002
|
|
|
[29] |
Song X Z, Huang Y, Tian K D, et al. 2020. Near infrared spectral variable optimization by final complexity adapted models combined with uninformative variables elimination-a validation study. Optik, 203: 164019, doi: 10.1016/j.ijleo.2019.164019.
doi: 10.1016/j.ijleo.2019.164019
|
|
|
[30] |
Swierenga H, Wülfert F, De Noord O E, et al. 2000. Development of robust calibration models in near infra-red spectrometric applications. Analytica Chimica Acta, 411(1-2): 121-135.
doi: 10.1016/S0003-2670(00)00718-2
|
|
|
[31] |
Tian Y C, Zhang J J, Yao X, et al. 2013. Laboratory assessment of three quantitative methods for estimating the organic matter content of soils in China based on visible/near-infrared reflectance spectra. Geoderma, 202-203: 161-170.
|
|
|
[32] |
Viscarra Rossel R A, Walvoort D J J, Mcbratney A B, et al. 2006. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma, 131(1-2): 59-75.
doi: 10.1016/j.geoderma.2005.03.007
|
|
|
[33] |
Vohland M, Ludwig M, Thiele-Bruhn S, et al. 2014. Determination of soil properties with visible to near- and mid-infrared spectroscopy: Effects of spectral variable selection. Geoderma, 223-225(1): 88-96.
doi: 10.1016/j.geoderma.2014.01.013
|
|
|
[34] |
Wang J Z, Ding J L, Ma X, et al. 2019. Capability of Sentinel-2 MSI data for monitoring and mapping of soil salinity in dry and wet seasons in the Ebinur Lake region, Xinjiang, China. Geoderma, 353: 172-187.
doi: 10.1016/j.geoderma.2019.06.040
|
|
|
[35] |
Wang X P, Zhang F, Ding J L, et al. 2018. Estimation of soil salt content (SSC) in the Ebinur Lake Wetland National Nature Reserve (ELWNNR), Northwest China, based on a Bootstrap-BP neural network model and optimal spectral indices. Science of the Total Environment, 615: 918-930.
doi: 10.1016/j.scitotenv.2017.10.025
|
|
|
[36] |
Wang Z, Ding J L, Zhang Z P. 2022. Estimation of soil organic matter in arid zones with coupled environmental variables and spectral features. Sensors, 22(3): 1194, doi: 10.3390/s22031194.
doi: 10.3390/s22031194
|
|
|
[37] |
Xie S G, Ding F J, Chen S G, et al. 2022. Prediction of soil organic matter content based on characteristic band selection method. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 273: 120949, doi: 10.1016/j.saa.2022.120949.
doi: 10.1016/j.saa.2022.120949
|
|
|
[38] |
Xing Z, Du C W, Shen Y Z, et al. 2021. A method combining FTIR-ATR and Raman spectroscopy to determine soil organic matter: Improvement of prediction accuracy using competitive adaptive reweighted sampling (CARS). Computers and Electronics in Agriculture, 191: 106549, doi: 10.1016/j.compag.2021.106549.
doi: 10.1016/j.compag.2021.106549
|
|
|
[39] |
Yin G C, Chen X L, Zhu H H, et al. 2022. A novel interpolation method to predict soil heavy metals based on a genetic algorithm and neural network model. Science of the Total Environment, 825: 153948, doi: 10.1016/j.scitotenv.2022.153948.
doi: 10.1016/j.scitotenv.2022.153948
|
|
|
[40] |
Zhang Y, Sui B, Shen H O, et al. 2019. Mapping stocks of soil total nitrogen using remote sensing data: A comparison of random forest models with different predictors. Computers and Electronics in Agriculture, 160: 23-30.
doi: 10.1016/j.compag.2019.03.015
|
|
|
[41] |
Zhang Z P, Ding J L, Zhu C M, et al. 2021. Bivariate empirical mode decomposition of the spatial variation in the soil organic matter content: A case study from NW China. CATENA, 206: 105572, doi: 10.1016/j.catena.2021.105572.
doi: 10.1016/j.catena.2021.105572
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|