5 Soil chemical and physical properties

You are reading the work-in-progress An Open Compendium of Soil Sample and Soil Profile Datasets. This chapter is currently draft version, a peer-review publication is pending. You can find the polished first edition at https://opengeohub.github.io/SoilSamples/.

Last update: 2023-05-10

5.1 Overview

This section describes import steps used to produce a global compilation of soil laboratory data with chemical (and physical) soil properties that can be then used for predictive soil mapping / modeling at global and regional scales.

Read more about soil chemical properties, global soil profile and sample data sets and functionality:

5.2 alt text Specifications

5.2.0.2 Target variables:

site.names = c("site_key", "usiteid", "site_obsdate", "longitude_decimal_degrees", 
               "latitude_decimal_degrees")
hor.names = c("labsampnum","site_key","layer_sequence","hzn_top","hzn_bot", 
              "hzn_desgn", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", 
              "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", 
              "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", 
              "na_ext", "k_ext", "ec_satp", "ec_12pre")
## target structure:
col.names = c("site_key", "usiteid", "site_obsdate", "longitude_decimal_degrees", 
              "latitude_decimal_degrees", "labsampnum", "layer_sequence", "hzn_top", 
              "hzn_bot","hzn_desgn", "tex_psda", "clay_tot_psa", "silt_tot_psa", 
              "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", 
              "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", 
              "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre", "source_db", 
              "confidence_degree", "project_url", "citation_url")

Targe variables listed:

  • clay_tot_psa: Clay, Total in % wt for <2 mm soil fraction,
  • silt_tot_psa: Silt, Total in % wt for <2 mm soil fraction,
  • sand_tot_psa: Sand, Total in % wt for <2 mm soil fraction,
  • oc: Carbon, Organic in g/kg for <2 mm soil fraction,
  • oc_d: Soil organic carbon density in kg/m3,
  • c_tot: Carbon, Total in g/kg for <2 mm soil fraction,
  • n_tot: Nitrogen, Total NCS in g/kg for <2 mm soil fraction,
  • ph_kcl: pH, KCl Suspension for <2 mm soil fraction,
  • ph_h2o: pH, 1:1 Soil-Water Suspension for <2 mm soil fraction,
  • ph_cacl2: pH, CaCl2 Suspension for <2 mm soil fraction,
  • cec_sum: Cation Exchange Capacity, Summary, in cmol(+)/kg for <2 mm soil fraction,
  • cec_nh4: Cation Exchange Capacity, NH4 prep, in cmol(+)/kg for <2 mm soil fraction,
  • ecec: Cation Exchange Capacity, Effective, CMS derived value default, standa prep in cmol(+)/kg for <2 mm soil fraction,
  • wpg2: Coarse fragments in % wt for >2 mm soil fraction,
  • db_od: Bulk density (Oven Dry) in g/cm3 (4A1h),
  • ca_ext: Calcium, Extractable in mg/kg for <2 mm soil fraction (usually Mehlich3),
  • mg_ext: Magnesium, Extractable in mg/kg for <2 mm soil fraction (usually Mehlich3),
  • na_ext: Sodium, Extractable in mg/kg for <2 mm soil fraction (usually Mehlich3),
  • k_ext: Potassium, Extractable in mg/kg for <2 mm soil fraction (usually Mehlich3),
  • ec_satp: Electrical Conductivity, Saturation Extract in dS/m for <2 mm soil fraction,
  • ec_12pre: Electrical Conductivity, Predict, 1:2 (w/w) in dS/m for <2 mm soil fraction,

5.3 alt text Data import

5.3.0.1 National Cooperative Soil Survey Characterization Database

This data set is continuously updated.

if(!exists("chemsprops.NCSS")){
  ncss.site <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_Site_Location.csv", stringsAsFactors = FALSE)
  ncss.layer <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_Layer.csv", stringsAsFactors = FALSE)
  ncss.bdm <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_Bulk_Density_and_Moisture.csv", stringsAsFactors = FALSE)
  ## multiple measurements
  summary(as.factor(ncss.bdm$prep_code))
  ncss.bdm.0 <- ncss.bdm[ncss.bdm$prep_code=="S",]
  summary(ncss.bdm.0$db_od)
  ## 0 BD values --- error!
  ncss.carb <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_Carbon_and_Extractions.csv", stringsAsFactors = FALSE)
  ncss.organic <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_Organic.csv", stringsAsFactors = FALSE)
  ncss.pH <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_pH_and_Carbonates.csv", stringsAsFactors = FALSE)
  #str(ncss.pH)
  #summary(ncss.pH$ph_h2o)
  #summary(!is.na(ncss.pH$ph_h2o))
  ncss.PSDA <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_PSDA_and_Rock_Fragments.csv", stringsAsFactors = FALSE)
  ncss.CEC <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_CEC_and_Bases.csv")
  ncss.salt <- read.csv("/mnt/diskstation/data/Soil_points/INT/USDA_NCSS/NCSS_Salt.csv")
  ncss.horizons <- plyr::join_all(list(ncss.bdm.0, ncss.layer, ncss.carb, ncss.organic[,c("labsampnum", "result_source_key", "c_tot", "n_tot", "db_od", "oc")], ncss.pH, ncss.PSDA, ncss.CEC, ncss.salt), type = "full", by="labsampnum")
  #head(ncss.horizons)
  nrow(ncss.horizons)
  ncss.horizons$oc_d = signif(ncss.horizons$oc / 100 * ncss.horizons$db_od * 1000 * (100 - ifelse(is.na(ncss.horizons$wpg2), 0, ncss.horizons$wpg2))/100, 3)
  ncss.horizons$ca_ext = signif(ncss.horizons$ca_nh4 * 200, 4)
  ncss.horizons$mg_ext = signif(ncss.horizons$mg_nh4 * 121, 3)
  ncss.horizons$na_ext = signif(ncss.horizons$na_nh4 * 230, 3)
  ncss.horizons$k_ext = signif(ncss.horizons$k_nh4 * 391, 3)
  #summary(ncss.horizons$oc_d)
  ## Values <0!!
  chemsprops.NCSS = plyr::join(ncss.site[,site.names], ncss.horizons[,hor.names], by="site_key") 
  chemsprops.NCSS$site_obsdate = format(as.Date(chemsprops.NCSS$site_obsdate, format="%m/%d/%Y"), "%Y-%m-%d")
  chemsprops.NCSS$source_db = "USDA_NCSS"
  #dim(chemsprops.NCSS)
  chemsprops.NCSS$oc = chemsprops.NCSS$oc * 10
  chemsprops.NCSS$n_tot = chemsprops.NCSS$n_tot * 10
  #hist(log1p(chemsprops.NCSS$oc), breaks=45, col="gray")
  chemsprops.NCSS$confidence_degree = 1
  chemsprops.NCSS$project_url = "http://ncsslabdatamart.sc.egov.usda.gov/"
  chemsprops.NCSS$citation_url = "https://doi.org/10.2136/sssaj2016.11.0386n"
  chemsprops.NCSS = complete.vars(chemsprops.NCSS, sel=c("tex_psda","oc","clay_tot_psa","ecec","ph_h2o","ec_12pre","k_ext"))
  #rm(ncss.horizons)
}
dim(chemsprops.NCSS)
#> [1] 136011     36
#summary(!is.na(chemsprops.NCSS$oc))
## texture classes need to be cleaned-up
summary(as.factor(chemsprops.NCSS$tex_psda))
#>                               c               C              cl              CL 
#>            2391           10908              19           10158               5 
#>             cos             CoS             COS            cosl            CoSL 
#>            2424               6               4            4543               2 
#> Fine Sandy Loam              fs             fsl             FSL               l 
#>               1            2357           10701              16           17038 
#>               L            lcos            LCoS             lfs              ls 
#>               2            2933               3            1805            3166 
#>              LS            lvfs               s               S              sc 
#>               1             152            2958               1             601 
#>             scl             SCL              si             sic             SiC 
#>            5456               5             854            7123              11 
#>            sicl            SiCL             sil             SiL             SIL 
#>           13718              14           22230               4              28 
#>              sl              SL             vfs            vfsl            VFSL 
#>            5547              13              64            2678               3 
#>            NA's 
#>            6068

5.3.0.2 Rapid Carbon Assessment (RaCA)

if(!exists("chemsprops.RaCa")){
  raca.df <- read.csv("/mnt/diskstation/data/Soil_points/USA/RaCA/RaCa_general_location.csv", stringsAsFactors = FALSE)
  names(raca.df)[1] = "rcasiteid"
  raca.layer <- read.csv("/mnt/diskstation/data/Soil_points/USA/RaCA/RaCA_samples_JULY2016.csv", stringsAsFactors = FALSE)
  raca.layer$longitude_decimal_degrees = plyr::join(raca.layer["rcasiteid"], raca.df, match ="first")$Gen_long
  raca.layer$latitude_decimal_degrees = plyr::join(raca.layer["rcasiteid"], raca.df, match ="first")$Gen_lat
  raca.layer$site_obsdate = "2013"
  summary(raca.layer$Calc_SOC)
  #plot(raca.layer[!duplicated(raca.layer$rcasiteid),c("longitude_decimal_degrees", "latitude_decimal_degrees")])
  #summary(raca.layer$SOC_pred1)
  ## some strange groupings around small values
  raca.layer$oc_d = signif(raca.layer$Calc_SOC / 100 * raca.layer$Bulkdensity * 1000 * (100 - ifelse(is.na(raca.layer$fragvolc), 0, raca.layer$fragvolc))/100, 3)
  raca.layer$oc = raca.layer$Calc_SOC * 10
  #summary(raca.layer$oc_d)
  raca.h.lst <- c("rcasiteid", "lay_field_label1", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "Lab.Sample.No", "layer_Number", "TOP", "BOT", "hzname", "texture", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot_ncs", "n_tot_ncs", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "fragvolc", "Bulkdensity", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = raca.h.lst[which(!raca.h.lst %in% names(raca.layer))]
  if(length(x.na)>0){ for(i in x.na){ raca.layer[,i] = NA } }
  chemsprops.RaCA = raca.layer[,raca.h.lst]
  chemsprops.RaCA$source_db = "RaCA2016"
  chemsprops.RaCA$confidence_degree = 4
  chemsprops.RaCA$project_url = "https://www.nrcs.usda.gov/survey/raca/"
  chemsprops.RaCA$citation_url = "https://www.nrcs.usda.gov/Internet/FSE_DOCUMENTS/nrcs142p2_052841.pdf"
  chemsprops.RaCA = complete.vars(chemsprops.RaCA, sel = c("oc", "fragvolc"))
}
#> Joining by: rcasiteid
#> Joining by: rcasiteid
dim(chemsprops.RaCA)
#> [1] 53664    36

5.3.0.3 National Geochemical Database Soil

if(!exists("chemsprops.USGS.NGS")){
  ngs.points <- read.csv("/mnt/diskstation/data/Soil_points/USA/geochemical/ds-801-csv/site.txt", sep=",")
  ## 4857 pnts
  ngs.layers <- lapply(c("top5cm.txt", "ahorizon.txt", "chorizon.txt"), function(i){read.csv(paste0("/mnt/diskstation/data/Soil_points/USA/geochemical/ds-801-csv/", i), sep=",")})
  ngs.layers = plyr::rbind.fill(ngs.layers)
  #dim(ngs.layers)
  # 14571  126
  #summary(ngs.layers$tot_carb_pct)
  #lattice::xyplot(c_org_pct ~ c_tot_pct, ngs.layers, scales=list(x = list(log = 2), y = list(log = 2)))
  #lattice::xyplot(c_org_pct ~ tot_clay_pct, ngs.layers, scales=list(y = list(log = 2)))
  ngs.layers$c_tot = ngs.layers$c_tot_pct * 10
  ngs.layers$oc = ngs.layers$c_org_pct * 10
  ngs.layers$hzn_top = sapply(ngs.layers$depth_cm, function(i){strsplit(i, "-")[[1]][1]})
  ngs.layers$hzn_bot = sapply(ngs.layers$depth_cm, function(i){strsplit(i, "-")[[1]][2]})
  #summary(ngs.layers$tot_clay_pct)
  #summary(ngs.layers$k_pct) ## very high numbers?
  ## question is if the geochemical element results are compatible with e.g. k_ext?
  t.ngs = c("lab_id", "site_id", "horizon", "hzn_top", "hzn_bot", "tot_clay_pct", "c_tot", "oc")
  ngs.m = plyr::join(ngs.points, ngs.layers[!is.na(ngs.layers$c_org_pct),t.ngs])
  ngs.m$site_obsdate = as.Date(ngs.m$colldate, format="%Y-%m-%d") 
  ngs.h.lst <- c("site_id", "quad", "site_obsdate", "longitude", "latitude", "lab_id", "layer_sequence", "hzn_top", "hzn_bot", "horizon", "tex_psda", "tot_clay_pct", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = ngs.h.lst[which(!ngs.h.lst %in% names(ngs.m))]
  if(length(x.na)>0){ for(i in x.na){ ngs.m[,i] = NA } }
  chemsprops.USGS.NGS = ngs.m[,ngs.h.lst]
  chemsprops.USGS.NGS$source_db = "USGS.NGS"
  chemsprops.USGS.NGS$confidence_degree = 1
  chemsprops.USGS.NGS$project_url = "https://mrdata.usgs.gov/ds-801/"
  chemsprops.USGS.NGS$citation_url = "https://pubs.usgs.gov/ds/801/"
  chemsprops.USGS.NGS = complete.vars(chemsprops.USGS.NGS, sel = c("tot_clay_pct", "oc"), coords = c("longitude", "latitude"))
}
dim(chemsprops.USGS.NGS)
#> [1] 9446   36

5.3.0.4 Forest Inventory and Analysis Database (FIADB)

if(!exists("chemsprops.FIADB")){
  fia.loc <- vroom::vroom("/mnt/diskstation/data/Soil_points/USA/FIADB/ENTIRE/PLOT.csv")
  fia.loc$site_id = paste(fia.loc$STATECD, fia.loc$COUNTYCD, fia.loc$PLOT, sep="_")
  fia.lab <- read.csv("/mnt/diskstation/data/Soil_points/USA/FIADB/ENTIRE/SOILS_LAB.csv")
  fia.lab$site_id = paste(fia.lab$STATECD, fia.lab$COUNTYCD, fia.lab$PLOT, sep="_")
  ## 23,765 rows
  fia.des <- read.csv("/mnt/diskstation/data/Soil_points/USA/FIADB/ENTIRE/SOILS_SAMPLE_LOC.csv")
  fia.des$site_id = paste(fia.des$STATECD, fia.des$COUNTYCD, fia.des$PLOT, sep="_")
  #fia.lab$TXTRLYR1 = plyr::join(fia.lab[c("site_id","INVYR")], fia.des[c("site_id","TXTRLYR1","INVYR")], match ="first")$TXTRLYR1
  fia.lab$TXTRLYR2 = plyr::join(fia.lab[c("site_id","INVYR")], fia.des[c("site_id","TXTRLYR2","INVYR")], match ="first")$TXTRLYR2
  #summary(as.factor(fia.lab$TXTRLYR1))
  fia.lab$tex_psda = factor(fia.lab$TXTRLYR2, labels = c("Organic", "Loamy", "Clayey", "Sandy", "Coarse sand", "Not measured")) 
  #Code Description
  # 0 Organic.
  # 1 Loamy.
  # 2 Clayey.
  # 3 Sandy.
  # 4 Coarse sand.
  # 9 Not measured - make plot notes
  fia.lab$FORFLTHK = plyr::join(fia.lab[c("site_id","INVYR")], fia.des[c("site_id","FORFLTHK","INVYR")], match ="first")$FORFLTHK
  #summary(fia.lab$FORFLTHK)
  fia.lab$LTRLRTHK = plyr::join(fia.lab[c("site_id","INVYR")], fia.des[c("site_id","LTRLRTHK","INVYR")], match ="first")$LTRLRTHK
  fia.lab$tot_thk = rowSums(fia.lab[,c("FORFLTHK", "LTRLRTHK")], na.rm=TRUE)
  fia.lab$DPTHSBSL = plyr::join(fia.lab[c("site_id","INVYR")], fia.des[c("site_id","DPTHSBSL","INVYR")], match ="first")$DPTHSBSL
  #summary(fia.lab$DPTHSBSL)
  sel.fia = fia.loc$site_id %in% fia.lab$site_id
  #summary(sel.fia)
  # 15,109
  fia.loc = fia.loc[sel.fia, c("site_id", "LON", "LAT")]
  #summary(fia.lab$BULK_DENSITY)  ## some strange values for BD!
  #quantile(fia.lab$BULK_DENSITY, c(0.02, 0.98), na.rm=TRUE)
  #summary(fia.lab$C_ORG_PCT)
  #summary(as.factor(fia.lab$LAYER_TYPE))
  #lattice::xyplot(BULK_DENSITY ~ C_ORG_PCT, fia.lab, scales=list(x = list(log = 2)))
  #dim(fia.lab)
  # 14571  126
  fia.lab$c_tot = fia.lab$C_ORG_PCT * 10
  fia.lab$oc = fia.lab$C_TOTAL_PCT * 10
  fia.lab$n_tot = fia.lab$N_TOTAL_PCT * 10
  fia.lab$db_od = ifelse(fia.lab$BULK_DENSITY < 0.001 | fia.lab$BULK_DENSITY > 1.8, NA, fia.lab$BULK_DENSITY)
  #lattice::xyplot(db_od ~ C_ORG_PCT, fia.lab, par.settings = list(plot.symbol = list(col=scales::alpha("black", 0.6), fill=scales::alpha("red", 0.6), pch=21, cex=0.6)), scales = list(x=list(log=TRUE, equispaced.log=FALSE)), ylab="Bulk density", xlab="SOC wpct")
  #hist(fia.lab$db_od, breaks=45)
  ## A lot of very small BD measurements
  fia.lab$oc_d = signif(fia.lab$oc / 100 * fia.lab$db_od * 1000 * (100 - ifelse(is.na(fia.lab$COARSE_FRACTION_PCT), 0, fia.lab$COARSE_FRACTION_PCT))/100, 3)
  #hist(fia.lab$oc_d, breaks=45, col="grey")
  fia.lab$hzn_top = ifelse(fia.lab$LAYER_TYPE=="FF_TOTAL" | fia.lab$LAYER_TYPE=="L_ORG", 0, NA)
  fia.lab$hzn_bot = ifelse(fia.lab$LAYER_TYPE=="FF_TOTAL" | fia.lab$LAYER_TYPE=="L_ORG", fia.lab$tot_thk, NA)
  fia.lab$hzn_top = ifelse(is.na(fia.lab$hzn_top) & (fia.lab$LAYER_TYPE=="MIN_2" | fia.lab$LAYER_TYPE=="ORG_2"), 10.2 + fia.lab$tot_thk, fia.lab$hzn_top)
  fia.lab$hzn_bot = ifelse(is.na(fia.lab$hzn_bot) & (fia.lab$LAYER_TYPE=="MIN_2" | fia.lab$LAYER_TYPE=="ORG_2"), 20.3 + fia.lab$tot_thk, fia.lab$hzn_bot)
  fia.lab$hzn_top = ifelse(is.na(fia.lab$hzn_top) & (fia.lab$LAYER_TYPE=="MIN_1" | fia.lab$LAYER_TYPE=="ORG_1"), 0 + fia.lab$tot_thk, fia.lab$hzn_top)
  fia.lab$hzn_bot = ifelse(is.na(fia.lab$hzn_bot) & (fia.lab$LAYER_TYPE=="MIN_1" | fia.lab$LAYER_TYPE=="ORG_1"), 10.2 + fia.lab$tot_thk, fia.lab$hzn_bot) 
  #summary(fia.lab$EXCHNG_K) ## Negative values!
  fia.m = plyr::join(fia.lab, fia.loc)
  #fia.m = fia.m[!duplicated(as.factor(paste(fia.m$site_id, fia.m$INVYR, fia.m$LAYER_TYPE, sep="_"))),]
  fia.m$site_obsdate = as.Date(fia.m$SAMPLE_DATE, format="%Y-%m-%d")
  sel.d.fia = fia.m$site_obsdate < as.Date("1980-01-01", format="%Y-%m-%d")
  fia.m$site_obsdate[which(sel.d.fia)] = NA
  #hist(fia.m$site_obsdate, breaks=25)
  fia.h.lst <- c("site_id", "usiteid", "site_obsdate", "LON", "LAT", "SAMPLE_ID", "layer_sequence", "hzn_top", "hzn_bot", "LAYER_TYPE", "tex_psda", "tot_clay_pct", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "PH_H2O", "PH_CACL2", "cec_sum", "cec_nh4", "ECEC", "COARSE_FRACTION_PCT", "db_od", "EXCHNG_CA", "EXCHNG_MG", "EXCHNG_NA", "EXCHNG_K", "ec_satp", "ec_12pre")
  x.na = fia.h.lst[which(!fia.h.lst %in% names(fia.m))]
  if(length(x.na)>0){ for(i in x.na){ fia.m[,i] = NA } }
  chemsprops.FIADB = fia.m[,fia.h.lst]
  chemsprops.FIADB$source_db = "FIADB"
  chemsprops.FIADB$confidence_degree = 2
  chemsprops.FIADB$project_url = "http://www.fia.fs.fed.us/"
  chemsprops.FIADB$citation_url = "https://www.fia.fs.fed.us/library/database-documentation/"
  chemsprops.FIADB = complete.vars(chemsprops.FIADB, sel = c("PH_H2O", "oc", "EXCHNG_K"), coords = c("LON", "LAT"))
  #str(unique(paste(chemsprops.FIADB$LON, chemsprops.FIADB$LAT, sep="_")))
}
dim(chemsprops.FIADB)
#> [1] 23208    36
#write.csv(chemsprops.FIADB, "/mnt/diskstation/data/Soil_points/USA/FIADB/fiadb_soil.pnts.csv")

5.3.0.5 Africa soil profiles database

if(!exists("chemsprops.AfSPDB")){
  library(foreign)
  afspdb.profiles <- read.dbf("/mnt/diskstation/data/Soil_points/AF/AfSIS_SPDB/AfSP012Qry_Profiles.dbf", as.is=TRUE)
  afspdb.layers <- read.dbf("/mnt/diskstation/data/Soil_points/AF/AfSIS_SPDB/AfSP012Qry_Layers.dbf", as.is=TRUE)
  afspdb.s.lst <- c("ProfileID", "FldMnl_ID", "T_Year", "X_LonDD", "Y_LatDD")
  #summary(afspdb.layers$BlkDens)
  ## add missing columns
  for(j in 1:ncol(afspdb.layers)){ 
    if(is.numeric(afspdb.layers[,j])) { 
      afspdb.layers[,j] <- ifelse(afspdb.layers[,j] < 0, NA, afspdb.layers[,j]) 
    }
  }
  afspdb.layers$ca_ext = afspdb.layers$ExCa * 200
  afspdb.layers$mg_ext = afspdb.layers$ExMg * 121
  afspdb.layers$na_ext = afspdb.layers$ExNa * 230
  afspdb.layers$k_ext = afspdb.layers$ExK * 391
  #summary(afspdb.layers$k_ext)
  afspdb.m = plyr::join(afspdb.profiles[,afspdb.s.lst], afspdb.layers)
  afspdb.m$oc_d = signif(afspdb.m$OrgC * afspdb.m$BlkDens * (100 - ifelse(is.na(afspdb.m$CfPc), 0, afspdb.m$CfPc))/100, 3)
  #summary(afspdb.m$T_Year)
  afspdb.m$T_Year = ifelse(afspdb.m$T_Year < 0, NA, afspdb.m$T_Year)
  afspdb.h.lst <- c("ProfileID", "FldMnl_ID", "T_Year", "X_LonDD", "Y_LatDD", "LayerID", "LayerNr", "UpDpth", "LowDpth", "HorDes", "LabTxtr", "Clay", "Silt", "Sand", "OrgC", "oc_d", "TotC", "TotalN", "PHKCl", "PHH2O", "PHCaCl2", "CecSoil", "cec_nh4", "Ecec", "CfPc" , "BlkDens", "ca_ext", "mg_ext", "na_ext", "k_ext", "EC", "ec_12pre")
  x.na = afspdb.h.lst[which(!afspdb.h.lst %in% names(afspdb.m))]
  if(length(x.na)>0){ for(i in x.na){ afspdb.m[,i] = NA } }
  chemsprops.AfSPDB = afspdb.m[,afspdb.h.lst]
  chemsprops.AfSPDB$source_db = "AfSPDB"
  chemsprops.AfSPDB$confidence_degree = 5
  chemsprops.AfSPDB$project_url = "https://www.isric.org/projects/africa-soil-profiles-database-afsp"
  chemsprops.AfSPDB$citation_url = "https://www.isric.org/sites/default/files/isric_report_2014_01.pdf"
  chemsprops.AfSPDB = complete.vars(chemsprops.AfSPDB, sel = c("LabTxtr","OrgC","Clay","Ecec","PHH2O","EC","k_ext"), coords = c("X_LonDD", "Y_LatDD"))
}
dim(chemsprops.AfSPDB)
#> [1] 60306    36

5.3.0.6 Africa Soil Information Service (AfSIS) Soil Chemistry

  • Towett, E. K., Shepherd, K. D., Tondoh, J. E., Winowiecki, L. A., Lulseged, T., Nyambura, M., … & Cadisch, G. (2015). Total elemental composition of soils in Sub-Saharan Africa and relationship with soil forming factors. Geoderma Regional, 5, 157-168. https://doi.org/10.1016/j.geodrs.2015.06.002
  • AfSIS Soil Chemistry produced by World Agroforestry Centre (ICRAF), Quantitative Engineering Design (QED), Center for International Earth Science Information Network (CIESIN), The International Center for Tropical Agriculture (CIAT), Crop Nutrition Laboratory Services (CROPNUTS) and Rothamsted Research (RRES). Data download URL: https://registry.opendata.aws/afsis/
if(!exists("chemsprops.AfSIS1")){
  afsis1.xy = read.csv("/mnt/diskstation/data/Soil_points/AF/AfSIS_SSL/2009-2013/Georeferences/georeferences.csv")
  afsis1.xy$Sampling.date = 2011
  afsis1.lst = list.files("/mnt/diskstation/data/Soil_points/AF/AfSIS_SSL/2009-2013/Wet_Chemistry", pattern=glob2rx("*.csv$"), full.names = TRUE, recursive = TRUE)
  afsis1.hor = plyr::rbind.fill(lapply(afsis1.lst, read.csv))
  tansis.xy = read.csv("/mnt/diskstation/data/Soil_points/AF/AfSIS_SSL/tansis/Georeferences/georeferences.csv")
  #summary(tansis.xy$Sampling.date)
  tansis.xy$Sampling.date = 2018
  tansis.lst = list.files("/mnt/diskstation/data/Soil_points/AF/AfSIS_SSL/tansis/Wet_Chemistry", pattern=glob2rx("*.csv$"), full.names = TRUE, recursive = TRUE)
  tansis.hor = plyr::rbind.fill(lapply(tansis.lst, read.csv))
  afsis1t.df = plyr::rbind.fill(list(plyr::join(afsis1.hor, afsis1.xy, by="SSN"), plyr::join(tansis.hor, tansis.xy, by="SSN")))
  afsis1t.df$UpDpth = ifelse(afsis1t.df$Depth=="sub", 20, 0)
  afsis1t.df$LowDpth = ifelse(afsis1t.df$Depth=="sub", 50, 20)
  afsis1t.df$LayerNr = ifelse(afsis1t.df$Depth=="sub", 2, 1)
  #summary(afsis1t.df$C...Org)
  afsis1t.df$oc = rowMeans(afsis1t.df[,c("C...Org", "X.C")], na.rm=TRUE) * 10
  afsis1t.df$c_tot = afsis1t.df$Total.carbon
  afsis1t.df$n_tot = rowMeans(afsis1t.df[,c("Total.nitrogen", "X.N")], na.rm=TRUE) * 10
  afsis1t.df$ph_h2o = rowMeans(afsis1t.df[,c("PH", "pH")], na.rm=TRUE)
  ## multiple texture fractons - which one is the total clay, sand, silt?
  ## Clay content for water dispersed particles-recorded after 4 minutes of ultrasonication
  #summary(afsis1t.df$Psa.w4clay)
  #plot(afsis1t.df[,c("Longitude", "Latitude")])
  afsis1.h.lst <- c("SSN", "Site", "Sampling.date", "Longitude", "Latitude", "Soil.material", "LayerNr", "UpDpth", "LowDpth", "HorDes", "LabTxtr", "Psa.w4clay", "Psa.w4silt", "Psa.w4sand", "oc", "oc_d", "c_tot", "n_tot", "PHKCl", "ph_h2o", "PHCaCl2", "CecSoil", "cec_nh4", "Ecec", "CfPc" , "BlkDens", "ca_ext", "M3.Mg", "M3.Na", "M3.K", "EC", "ec_12pre")
  x.na = afspdb.h.lst[which(!afsis1.h.lst %in% names(afsis1t.df))]
  if(length(x.na)>0){ for(i in x.na){ afsis1t.df[,i] = NA } }
  chemsprops.AfSIS1 = afsis1t.df[,afsis1.h.lst]
  chemsprops.AfSIS1$source_db = "AfSIS1"
  chemsprops.AfSIS1$confidence_degree = 2
  chemsprops.AfSIS1$project_url = "https://registry.opendata.aws/afsis/"
  chemsprops.AfSIS1$citation_url = "https://doi.org/10.1016/j.geodrs.2015.06.002"
  chemsprops.AfSIS1 = complete.vars(chemsprops.AfSIS1, sel = c("Psa.w4clay","oc","ph_h2o","M3.K"), coords = c("Longitude", "Latitude"))
}
dim(chemsprops.AfSIS1)
#> [1] 4162   36

5.3.0.7 Fine Root Ecology Database (FRED)

if(!exists("chemsprops.FRED")){
  Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 2)
  fred = vroom::vroom("/mnt/diskstation/data/Soil_points/INT/FRED/FRED3_Entire_Database_2021.csv", skip = 10, col_names=FALSE)
  ## 57,190 x 1,164
  #nm.fred = read.csv("/mnt/diskstation/data/Soil_points/INT/FRED/FRED3_Column_Definitions_20210423-091040.csv", header=TRUE)
  nm.fred0 = read.csv("/mnt/diskstation/data/Soil_points/INT/FRED/FRED3_Entire_Database_2021.csv", nrows=2)
  names(fred) = make.names(t(nm.fred0)[,1])
  ## 1164 columns!
  fred.h.lst = c("Notes_Row.ID", "Data.source_DOI", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "labsampnum", "layer_sequence", "hzn_top", "hzn_bot", "Soil.horizon", "Soil.texture", "Soil.texture_Fraction.clay", "Soil.texture_Fraction.silt", "Soil.texture_Fraction.sand", "Soil.organic.C.content", "oc_d", "c_tot", "Soil.N.content", "ph_kcl", "Soil.pH_Water", "Soil.pH_Salt", "Soil.cation.exchange.capacity..CEC.", "cec_nh4", "Soil.effective.cation.exchange.capacity..ECEC.", "wpg2", "Soil.bulk.density", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre", "source_db", "confidence_degree")
  fred$site_obsdate = as.integer(rowMeans(fred[,c("Sample.collection_Year.ending.collection", "Sample.collection_Year.beginning.collection")], na.rm=TRUE))
  #summary(fred$site_obsdate)
  fred$longitude_decimal_degrees = ifelse(is.na(fred$Longitude), fred$Longitude_Estimated, fred$Longitude)
  fred$latitude_decimal_degrees = ifelse(is.na(fred$Latitude), fred$Latitude_Estimated, fred$Latitude)
  #names(fred)[grep("Notes_Row", names(fred))]
  #summary(fred[,grep("clay", names(fred))])
  #summary(fred[,grep("cation.exchange", names(fred))])
  #summary(fred[,grep("organic.C", names(fred))])
  #summary(fred$Soil.organic.C.content)
  #summary(fred$Soil.bulk.density)
  #summary(as.factor(fred$Soil.horizon))
  fred$hzn_bot = ifelse(is.na(fred$Soil.depth_Lower.sampling.depth), fred$Soil.depth - 5, fred$Soil.depth_Lower.sampling.depth)
  fred$hzn_top = ifelse(is.na(fred$Soil.depth_Upper.sampling.depth), fred$Soil.depth + 5, fred$Soil.depth_Upper.sampling.depth)
  fred$oc_d = signif(fred$Soil.organic.C.content / 1000 * fred$Soil.bulk.density * 1000, 3)
  #summary(fred$oc_d)  
  x.na = fred.h.lst[which(!fred.h.lst %in% names(fred))]
  if(length(x.na)>0){ for(i in x.na){ fred[,i] = NA } }
  chemsprops.FRED = fred[,fred.h.lst]
  #plot(chemsprops.FRED[,4:5])
  chemsprops.FRED$source_db = "FRED"
  chemsprops.FRED$confidence_degree = 5
  chemsprops.FRED$project_url = "https://roots.ornl.gov/"
  chemsprops.FRED$citation_url = "https://doi.org/10.25581/ornlsfa.014/1459186"
  chemsprops.FRED = complete.vars(chemsprops.FRED, sel = c("Soil.organic.C.content", "Soil.texture_Fraction.clay", "Soil.pH_Water"))
  ## many duplicates
}
dim(chemsprops.FRED)
#> [1] 858  36

5.3.0.8 Global root traits (GRooT) database (compilation)

if(!exists("chemsprops.GROOT")){
  #Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 2)
  GROOT = vroom::vroom("/mnt/diskstation/data/Soil_points/INT/GRooT/GRooTFullVersion.csv")
  ## 114,222 x 73
  c("locationID", "GRooTID", "originalID", "source", "year", "decimalLatitude", "decimalLongitud", "soilpH", "soilTexture", "soilCarbon", "soilNitrogen", "soilPhosphorus", "soilCarbonToNitrogen", "soilBaseCationSaturation", "soilCationExchangeCapacity", "soilOrganicMatter", "soilWaterGravimetric", "soilWaterVolumetric")
  #summary(GROOT$soilCarbon)
  #summary(!is.na(GROOT$soilCarbon))
  #summary(GROOT$soilOrganicMatter)
  #summary(GROOT$soilNitrogen)
  #summary(GROOT$soilpH)
  #summary(as.factor(GROOT$soilTexture))
  #lattice::xyplot(soilCarbon ~ soilpH, GROOT, par.settings = list(plot.symbol = list(col=scales::alpha("black", 0.6), fill=scales::alpha("red", 0.6), pch=21, cex=0.6)), scales = list(y=list(log=TRUE, equispaced.log=FALSE)), ylab="SOC", xlab="pH")
  GROOT$site_obsdate = as.Date(paste0(GROOT$year, "-01-01"), format="%Y-%m-%d")
  GROOT$hzn_top = 0
  GROOT$hzn_bot = 30
  GROOT.h.lst = c("locationID", "originalID", "site_obsdate", "decimalLongitud", "decimalLatitude", "GRooTID", "layer_sequence", "hzn_top", "hzn_bot", "hzn_desgn", "soilTexture", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "soilCarbon", "oc_d", "c_tot", "soilNitrogen", "ph_kcl", "soilpH", "ph_cacl2", "soilCationExchangeCapacity", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = GROOT.h.lst[which(!GROOT.h.lst %in% names(GROOT))]
  if(length(x.na)>0){ for(i in x.na){ GROOT[,i] = NA } }
  chemsprops.GROOT = GROOT[,GROOT.h.lst]
  chemsprops.GROOT$source_db = "GROOT"
  chemsprops.GROOT$confidence_degree = 8
  chemsprops.GROOT$project_url = "https://groot-database.github.io/GRooT/"
  chemsprops.GROOT$citation_url = "https://dx.doi.org/10.1111/geb.13179"
  chemsprops.GROOT = complete.vars(chemsprops.GROOT, sel = c("soilCarbon", "soilpH"), coords = c("decimalLongitud", "decimalLatitude"))
}
dim(chemsprops.GROOT)
#> [1] 718  36

5.3.0.9 Global Soil Respiration DB

if(!exists("chemsprops.SRDB")){
  srdb = read.csv("/mnt/diskstation/data/Soil_points/INT/SRDB/srdb-data.csv")
  ## 10366 x 85
  srdb.h.lst = c("Site_ID", "Notes", "Study_midyear", "Longitude", "Latitude", "labsampnum", "layer_sequence", "hzn_top", "hzn_bot", "hzn_desgn", "tex_psd", "Soil_clay", "Soil_silt", "Soil_sand", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "Soil_BD", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre", "source_db", "confidence_degree")
  #summary(srdb$Study_midyear)
  srdb$hzn_bot = ifelse(is.na(srdb$C_soildepth), 100, srdb$C_soildepth)
  srdb$hzn_top = 0
  #summary(srdb$Soil_clay)
  #summary(srdb$C_soilmineral)
  srdb$oc_d = signif(srdb$C_soilmineral / 1000 / (srdb$hzn_bot/100), 3)
  #summary(srdb$oc_d) 
  #summary(srdb$Soil_BD)
  srdb$oc = srdb$oc_d / srdb$Soil_BD
  #summary(srdb$oc)
  x.na = srdb.h.lst[which(!srdb.h.lst %in% names(srdb))]
  if(length(x.na)>0){ for(i in x.na){ srdb[,i] = NA } }
  chemsprops.SRDB = srdb[,srdb.h.lst]
  #plot(chemsprops.SRDB[,4:5])
  chemsprops.SRDB$source_db = "SRDB"
  chemsprops.SRDB$confidence_degree = 5
  chemsprops.SRDB$project_url = "https://github.com/bpbond/srdb/"
  chemsprops.SRDB$citation_url = "https://doi.org/10.5194/bg-7-1915-2010"
  chemsprops.SRDB = complete.vars(chemsprops.SRDB, sel = c("oc", "Soil_clay", "Soil_BD"), coords = c("Longitude", "Latitude"))
}
dim(chemsprops.SRDB)
#> [1] 1596   36

5.3.0.10 SOils DAta Harmonization database (SoDaH)

if(!exists("chemsprops.SoDaH")){
  sodah.hor = read.csv("/mnt/diskstation/data/Soil_points/INT/SoDaH/521_soils_data_harmonization_6e8416fa0c9a2c2872f21ba208e6a919.csv")
  #head(sodah.hor)
  #summary(sodah.hor$coarse_frac)
  #summary(sodah.hor$lyr_soc)
  #summary(sodah.hor$lyr_som_WalkleyBlack/1.724)
  #summary(as.factor(sodah.hor$observation_date))
  sodah.hor$site_obsdate = as.integer(substr(sodah.hor$observation_date, 1, 4))
  sodah.hor$oc = ifelse(is.na(sodah.hor$lyr_soc), sodah.hor$lyr_som_WalkleyBlack/1.724, sodah.hor$lyr_soc) * 10
  sodah.hor$n_tot = sodah.hor$lyr_n_tot * 10
  sodah.hor$oc_d = signif(sodah.hor$oc / 1000 * sodah.hor$bd_samp * 1000 * (100 - ifelse(is.na(sodah.hor$coarse_frac), 0, sodah.hor$coarse_frac))/100, 3)
  sodah.hor$site_key = paste(sodah.hor$network, sodah.hor$location_name, sep="_")
  sodah.hor$labsampnum = make.unique(paste(sodah.hor$network, sodah.hor$location_name, sodah.hor$L1, sep="_"))
  #summary(sodah.hor$oc_d)
  sodah.h.lst = c("site_key", "data_file", "observation_date", "long", "lat", "labsampnum", "layer_sequence", "layer_top", "layer_bot", "hzn", "profile_texture_class", "clay", "silt", "sand", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl", "cec_sum", "cec_nh4", "ecec", "coarse_frac", "bd_samp", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre", "source_db", "confidence_degree")
  x.na = sodah.h.lst[which(!sodah.h.lst %in% names(sodah.hor))]
  if(length(x.na)>0){ for(i in x.na){ sodah.hor[,i] = NA } }
  chemsprops.SoDaH = sodah.hor[,sodah.h.lst]
  #plot(chemsprops.SoDaH[,4:5])
  chemsprops.SoDaH$source_db = "SoDaH"
  chemsprops.SoDaH$confidence_degree = 3
  chemsprops.SoDaH$project_url = "https://lter.github.io/som-website"
  chemsprops.SoDaH$citation_url = "https://doi.org/10.5194/essd-2020-195"
  chemsprops.SoDaH = complete.vars(chemsprops.SoDaH, sel = c("oc", "clay", "ph_h2o"), coords = c("long", "lat"))
}
dim(chemsprops.SoDaH)
#> [1] 20383    36

5.3.0.11 ISRIC WISE harmonized soil profile data

if(!exists("chemsprops.WISE")){
  wise.site <- read.table("/mnt/diskstation/data/Soil_points/INT/ISRIC_WISE/WISE3_SITE.csv", sep=",", header=TRUE, stringsAsFactors = FALSE, fill=TRUE)
  wise.s.lst <- c("WISE3_id", "PITREF", "DATEYR", "LONDD", "LATDD")
  wise.site$LONDD = as.numeric(wise.site$LONDD)
  wise.site$LATDD = as.numeric(wise.site$LATDD)
  wise.layer <- read.table("/mnt/diskstation/data/Soil_points/INT/ISRIC_WISE/WISE3_HORIZON.csv", sep=",", header=TRUE, stringsAsFactors = FALSE, fill=TRUE)
  wise.layer$ca_ext = signif(wise.layer$EXCA * 200, 4)
  wise.layer$mg_ext = signif(wise.layer$EXMG * 121, 3)
  wise.layer$na_ext = signif(wise.layer$EXNA * 230, 3)
  wise.layer$k_ext = signif(wise.layer$EXK * 391, 3)
  wise.layer$oc_d = signif(wise.layer$ORGC / 1000 * wise.layer$BULKDENS * 1000 * (100 - ifelse(is.na(wise.layer$GRAVEL), 0, wise.layer$GRAVEL))/100, 3)
  wise.h.lst <- c("WISE3_ID", "labsampnum", "HONU", "TOPDEP", "BOTDEP", "DESIG", "tex_psda", "CLAY", "SILT", "SAND", "ORGC", "oc_d", "c_tot", "TOTN", "PHKCL", "PHH2O", "PHCACL2", "CECSOIL", "cec_nh4", "ecec", "GRAVEL" , "BULKDENS", "ca_ext", "mg_ext", "na_ext", "k_ext", "ECE", "ec_12pre")
  x.na = wise.h.lst[which(!wise.h.lst %in% names(wise.layer))]
  if(length(x.na)>0){ for(i in x.na){ wise.layer[,i] = NA } }
  chemsprops.WISE = merge(wise.site[,wise.s.lst], wise.layer[,wise.h.lst], by.x="WISE3_id", by.y="WISE3_ID")
  chemsprops.WISE$source_db = "ISRIC_WISE"
  chemsprops.WISE$confidence_degree = 4
  chemsprops.WISE$project_url = "https://isric.org"
  chemsprops.WISE$citation_url = "http://dx.doi.org/10.1111/j.1475-2743.2009.00202.x"
  chemsprops.WISE = complete.vars(chemsprops.WISE, sel = c("ORGC","CLAY","PHH2O","CECSOIL","k_ext"), coords = c("LONDD", "LATDD"))
}
dim(chemsprops.WISE)
#> [1] 23278    36

5.3.0.12 GEMAS

if(!exists("chemsprops.GEMAS")){
  gemas.samples <- read.csv("/mnt/diskstation/data/Soil_points/EU/GEMAS/GEMAS.csv", stringsAsFactors = FALSE)
  ## GEMAS, agricultural soil, 0-20 cm, air dried, <2 mm, aqua regia Data from ACME, total C, TOC, CEC, ph_CaCl2
  gemas.samples$hzn_top = 0
  gemas.samples$hzn_bot = 20
  gemas.samples$oc = gemas.samples$TOC * 10
  #summary(gemas.samples$oc)
  gemas.samples$c_tot = gemas.samples$C_tot * 10
  gemas.samples$site_obsdate = 2009
  gemas.h.lst <- c("ID", "COUNRTY", "site_obsdate", "XCOO", "YCOO", "labsampnum", "layer_sequence", "hzn_top", "hzn_bot", "TYPE", "tex_psda", "clay", "silt", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "pH_CaCl2", "CEC", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = gemas.h.lst[which(!gemas.h.lst %in% names(gemas.samples))]
  if(length(x.na)>0){ for(i in x.na){ gemas.samples[,i] = NA } }
  chemsprops.GEMAS <- gemas.samples[,gemas.h.lst]
  chemsprops.GEMAS$source_db = "GEMAS_2009"
  chemsprops.GEMAS$confidence_degree = 2
  chemsprops.GEMAS$project_url = "http://gemas.geolba.ac.at/"
  chemsprops.GEMAS$citation_url = "https://doi.org/10.1016/j.apgeochem.2017.01.021"
  chemsprops.GEMAS = complete.vars(chemsprops.GEMAS, sel = c("oc","clay","pH_CaCl2"), coords = c("XCOO", "YCOO"))
}
dim(chemsprops.GEMAS)
#> [1] 4131   36

5.3.0.13 LUCAS soil

if(!exists("chemsprops.LUCAS")){
  lucas.samples <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/EU/LUCAS/LUCAS_TOPSOIL_v1.xlsx", sheet = 1)
  lucas.samples$site_obsdate <- "2009"
  #summary(lucas.samples$N)
  lucas.ro <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/EU/LUCAS/Romania.xlsx", sheet = 1)
  lucas.ro$site_obsdate <- "2012"
  names(lucas.samples)[which(!names(lucas.samples) %in% names(lucas.ro))]
  lucas.ro = plyr::rename(lucas.ro, replace=c("Soil.ID"="sample_ID", "GPS_X_LONG"="GPS_LONG", "GPS_Y_LAT"="GPS_LAT", "pHinH2O"="pH_in_H2O", "pHinCaCl2"="pH_in_CaCl"))
  lucas.bu <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/EU/LUCAS/Bulgaria.xlsx", sheet = 1)
  lucas.bu$site_obsdate <- "2012"
  names(lucas.samples)[which(!names(lucas.samples) %in% names(lucas.bu))]
  #lucas.ch <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/EU/LUCAS/LUCAS_2015_Topsoil_data_of_Switzerland-with-coordinates.xlsx_.xlsx", sheet = 1, startRow = 2)
  #lucas.ch = plyr::rename(lucas.ch, replace=c("Soil_ID"="sample_ID", "GPS_.LAT"="GPS_LAT", "pH.in.H2O"="pH_in_H2O", "pH.in.CaCl2"="pH_in_CaCl", "Calcium.carbonate/.g.kg–1"="CaCO3", "Silt/.g.kg–1"="silt", "Sand/.g.kg–1"="sand", "Clay/.g.kg–1"="clay", "Organic.carbon/.g.kg–1"="OC"))
  ## Double readings?
  lucas.t = plyr::rbind.fill(list(lucas.samples, lucas.ro, lucas.bu))
  lucas.h.lst <- c("POINT_ID", "usiteid", "site_obsdate", "GPS_LONG", "GPS_LAT", "sample_ID", "layer_sequence", "hzn_top", "hzn_bot", "hzn_desgn", "tex_psda", "clay", "silt", "sand", "OC", "oc_d", "c_tot", "N", "ph_kcl", "pH_in_H2O", "pH_in_CaCl", "CEC", "cec_nh4", "ecec", "coarse", "db_od", "ca_ext", "mg_ext", "na_ext", "K", "ec_satp", "ec_12pre")
  x.na = lucas.h.lst[which(!lucas.h.lst %in% names(lucas.t))]
  if(length(x.na)>0){ for(i in x.na){ lucas.t[,i] = NA } }
  chemsprops.LUCAS <- lucas.t[,lucas.h.lst]
  chemsprops.LUCAS$source_db = "LUCAS_2009"
  chemsprops.LUCAS$hzn_top <- 0
  chemsprops.LUCAS$hzn_bot <- 20
  chemsprops.LUCAS$confidence_degree = 2
  chemsprops.LUCAS$project_url = "https://esdac.jrc.ec.europa.eu/"
  chemsprops.LUCAS$citation_url = "https://doi.org/10.1111/ejss.12499"
  chemsprops.LUCAS = complete.vars(chemsprops.LUCAS, sel = c("OC","clay","pH_in_H2O"), coords = c("GPS_LONG", "GPS_LAT"))
}
dim(chemsprops.LUCAS)
#> [1] 21272    36
if(!exists("chemsprops.LUCAS2")){
  #lucas2015.samples <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/EU/LUCAS/LUCAS_Topsoil_2015_20200323.xlsx", sheet = 1)
  lucas2015.xy = readOGR("/mnt/diskstation/data/Soil_points/EU/LUCAS/LUCAS_Topsoil_2015_20200323.shp")
  #head(as.data.frame(lucas2015.xy))
  lucas2015.xy = as.data.frame(lucas2015.xy)
  ## https://www.aqion.de/site/130
  ## 1 mS/m = 100 mS/cm | 1 dS/m = 1 mS/cm = 1 mS/m / 100
  lucas2015.xy$ec_satp = lucas2015.xy$EC / 100
  lucas2015.h.lst <- c("Point_ID", "LC0_Desc", "site_obsdate", "coords.x1", "coords.x2", "sample_ID", "layer_sequence", "hzn_top", "hzn_bot", "hzn_desgn", "tex_psda", "Clay", "Silt", "Sand", "OC", "oc_d", "c_tot", "N", "ph_kcl", "pH_H20", "pH_CaCl2", "CEC", "cec_nh4", "ecec", "coarse", "db_od", "ca_ext", "mg_ext", "na_ext", "K", "ec_satp", "ec_12pre")
  x.na = lucas2015.h.lst[which(!lucas2015.h.lst %in% names(lucas2015.xy))]
  if(length(x.na)>0){ for(i in x.na){ lucas2015.xy[,i] = NA } }
  chemsprops.LUCAS2 <- lucas2015.xy[,lucas2015.h.lst]
  chemsprops.LUCAS2$source_db = "LUCAS_2015"
  chemsprops.LUCAS2$hzn_top <- 0
  chemsprops.LUCAS2$hzn_bot <- 20
  chemsprops.LUCAS2$site_obsdate <- "2015"
  chemsprops.LUCAS2$confidence_degree = 2
  chemsprops.LUCAS2$project_url = "https://esdac.jrc.ec.europa.eu/"
  chemsprops.LUCAS2$citation_url = "https://doi.org/10.1111/ejss.12499"
  chemsprops.LUCAS2 = complete.vars(chemsprops.LUCAS2, sel = c("OC","Clay","pH_H20"), coords = c("coords.x1", "coords.x2"))
}
dim(chemsprops.LUCAS2)
#> [1] 21859    36

5.3.0.14 Mangrove forest soil DB

if(!exists("chemsprops.Mangroves")){
  mng.profs <- read.csv("/mnt/diskstation/data/Soil_points/INT/TNC_mangroves/mangrove_soc_database_v10_sites.csv", skip=1)
  mng.hors <- read.csv("/mnt/diskstation/data/Soil_points/INT/TNC_mangroves/mangrove_soc_database_v10_horizons.csv", skip=1)
  mngALL = plyr::join(mng.hors, mng.profs, by=c("Site.name"))
  mngALL$oc = mngALL$OC_final * 10
  mngALL$oc_d = mngALL$CD_calc * 1000
  mngALL$hzn_top = mngALL$U_depth * 100
  mngALL$hzn_bot = mngALL$L_depth * 100
  mngALL$wpg2 = 0
  #summary(mngALL$BD_reported) ## some very high values 3.26 t/m3
  mngALL$Year = ifelse(is.na(mngALL$Year_sampled), mngALL$Years_collected, mngALL$Year_sampled)
  mng.col = c("Site.name", "Site..", "Year", "Longitude_Adjusted", "Latitude_Adjusted", "labsampnum", "layer_sequence","hzn_top","hzn_bot","hzn_desgn", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "BD_reported", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")  
  x.na = mng.col[which(!mng.col %in% names(mngALL))]
  if(length(x.na)>0){ for(i in x.na){ mngALL[,i] = NA } }
  chemsprops.Mangroves = mngALL[,mng.col]
  chemsprops.Mangroves$source_db = "MangrovesDB"
  chemsprops.Mangroves$confidence_degree = 4
  chemsprops.Mangroves$project_url = "http://maps.oceanwealth.org/mangrove-restoration/"
  chemsprops.Mangroves$citation_url = "https://doi.org/10.1088/1748-9326/aabe1c"
  chemsprops.Mangroves = complete.vars(chemsprops.Mangroves, sel = c("oc","BD_reported"), coords = c("Longitude_Adjusted", "Latitude_Adjusted"))
  #head(chemsprops.Mangroves)
  #levels(as.factor(mngALL$OK.to.release.))
  mng.rm = chemsprops.Mangroves$Site.name[chemsprops.Mangroves$Site.name %in% mngALL$Site.name[grep("N", mngALL$OK.to.release., ignore.case = FALSE)]]
}
dim(chemsprops.Mangroves)
#> [1] 7734   36

5.3.0.15 CIFOR peatland points

Peatland soil measurements (points) from the literature described in:

  • Murdiyarso, D., Roman-Cuesta, R. M., Verchot, L. V., Herold, M., Gumbricht, T., Herold, N., & Martius, C. (2017). New map reveals more peat in the tropics (Vol. 189). CIFOR. https://doi.org/10.17528/cifor/006452
if(!exists("chemsprops.Peatlands")){
  cif.hors <- read.csv("/mnt/diskstation/data/Soil_points/INT/CIFOR_peatlands/SOC_literature_CIFOR.csv")
  #summary(cif.hors$BD..g.cm..)
  #summary(cif.hors$SOC)
  cif.hors$oc = cif.hors$SOC * 10
  cif.hors$wpg2 = 0
  cif.hors$c_tot = cif.hors$TOC.content.... * 10
  cif.hors$oc_d = cif.hors$C.density..kg.C.m..
  cif.hors$site_obsdate = as.integer(substr(cif.hors$year, 1, 4))-1
  cif.col = c("SOURCEID", "usiteid", "site_obsdate", "modelling.x", "modelling.y", "labsampnum", "layer_sequence", "Upper", "Lower", "hzn_desgn", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "BD..g.cm..", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = cif.col[which(!cif.col %in% names(cif.hors))]
  if(length(x.na)>0){ for(i in x.na){ cif.hors[,i] = NA } }
  chemsprops.Peatlands = cif.hors[,cif.col]
  chemsprops.Peatlands$source_db = "CIFOR"
  chemsprops.Peatlands$confidence_degree = 4
  chemsprops.Peatlands$project_url = "https://www.cifor.org/"
  chemsprops.Peatlands$citation_url = "https://doi.org/10.17528/cifor/006452"
  chemsprops.Peatlands = complete.vars(chemsprops.Peatlands, sel = c("oc","BD..g.cm.."), coords = c("modelling.x", "modelling.y"))  
}
dim(chemsprops.Peatlands)
#> [1] 756  36

5.3.0.16 LandPKS observations

if(!exists("chemsprops.LandPKS")){
  pks = read.csv("/mnt/diskstation/data/Soil_points/INT/LandPKS/Export_LandInfo_Data.csv", stringsAsFactors = FALSE)
  #str(pks)
  pks.hor = data.frame(rock_fragments = c(pks$rock_fragments_layer_0_1cm,
                                          pks$rock_fragments_layer_1_10cm,
                                          pks$rock_fragments_layer_10_20cm,
                                          pks$rock_fragments_layer_20_50cm,
                                          pks$rock_fragments_layer_50_70cm,
                                          pks$rock_fragments_layer_70_100cm,
                                          pks$rock_fragments_layer_100_120cm), 
                       tex_field = c(pks$texture_layer_0_1cm, 
                                     pks$texture_layer_1_10cm, 
                                     pks$texture_layer_10_20cm, 
                                     pks$texture_layer_20_50cm, 
                                     pks$texture_layer_50_70cm, 
                                     pks$texture_layer_70_100cm, 
                                     pks$texture_layer_100_120cm))
  pks.hor$hzn_top = c(rep(0, nrow(pks)), 
                      rep(1, nrow(pks)), 
                      rep(10, nrow(pks)), 
                      rep(20, nrow(pks)), 
                      rep(50, nrow(pks)), 
                      rep(70, nrow(pks)), 
                      rep(100, nrow(pks)))
  pks.hor$hzn_bot = c(rep(1, nrow(pks)), 
                      rep(10, nrow(pks)), 
                      rep(20, nrow(pks)), 
                      rep(50, nrow(pks)), 
                      rep(70, nrow(pks)), 
                      rep(100, nrow(pks)), 
                      rep(120, nrow(pks)))
  pks.hor$longitude_decimal_degrees = rep(pks$longitude, 7)
  pks.hor$latitude_decimal_degrees = rep(pks$latitude, 7)
  pks.hor$site_obsdate = rep(pks$modified_date, 7)
  pks.hor$site_key = rep(pks$id, 7)
  #summary(as.factor(pks.hor$tex_field))
  tex.tr = data.frame(tex_field=c("CLAY", "CLAY LOAM", "LOAM", "LOAMY SAND", "SAND", "SANDY CLAY", "SANDY CLAY LOAM", "SANDY LOAM", "SILT LOAM", "SILTY CLAY", "SILTY CLAY LOAM"), 
                      clay_tot_psa=c(62.4, 34.0, 19.0, 5.8, 3.3, 41.7, 27.0, 10.0, 13.1, 46.7, 34.0), 
                      silt_tot_psa=c(17.8, 34.0, 40.0, 12.0, 5.0, 6.7, 13.0, 25.0, 65.7, 46.7, 56.0), 
                      sand_tot_psa=c(19.8, 32.0, 41.0, 82.2, 91.7, 51.6, 60.0, 65.0, 21.2, 6.7, 10.0))
  pks.hor$clay_tot_psa = plyr::join(pks.hor["tex_field"], tex.tr)$clay_tot_psa
  pks.hor$silt_tot_psa = plyr::join(pks.hor["tex_field"], tex.tr)$silt_tot_psa
  pks.hor$sand_tot_psa = plyr::join(pks.hor["tex_field"], tex.tr)$sand_tot_psa
  #summary(as.factor(pks.hor$rock_fragments))
  pks.hor$wpg2 = ifelse(pks.hor$rock_fragments==">60%", 65, ifelse(pks.hor$rock_fragments=="35-60%", 47.5, ifelse(pks.hor$rock_fragments=="15-35%", 25, ifelse(pks.hor$rock_fragments=="1-15%" | pks.hor$rock_fragments=="0-15%", 7.5, ifelse(pks.hor$rock_fragments=="0-1%", 0.5, NA)))))
  #head(pks.hor)
  #plot(pks.hor[,c("longitude_decimal_degrees","latitude_decimal_degrees")])
  pks.col = c("site_key", "usiteid", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "labsampnum", "layer_sequence","hzn_top","hzn_bot","hzn_desgn", "tex_field", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = pks.col[which(!pks.col %in% names(pks.hor))]
  if(length(x.na)>0){ for(i in x.na){ pks.hor[,i] = NA } }
  chemsprops.LandPKS = pks.hor[,pks.col]
  chemsprops.LandPKS$source_db = "LandPKS"
  chemsprops.LandPKS$confidence_degree = 8
  chemsprops.LandPKS$project_url = "http://portal.landpotential.org"
  chemsprops.LandPKS$citation_url = "https://doi.org/10.2489/jswc.68.1.5A"
  chemsprops.LandPKS = complete.vars(chemsprops.LandPKS, sel = c("clay_tot_psa","wpg2"), coords = c("longitude_decimal_degrees", "latitude_decimal_degrees"))
}
dim(chemsprops.LandPKS)
#> [1] 41644    36

5.3.0.17 EGRPR

if(!exists("chemsprops.EGRPR")){
  russ.HOR = read.csv("/mnt/diskstation/data/Soil_points/Russia/EGRPR/Russia_EGRPR_soil_pedons.csv")
  russ.HOR$SOURCEID = paste(russ.HOR$CardID, russ.HOR$SOIL_ID, sep="_")
  russ.HOR$wpg2 = russ.HOR$TEXTSTNS
  russ.HOR$SNDPPT <- russ.HOR$TEXTSAF + russ.HOR$TEXSCM
  russ.HOR$SLTPPT <- russ.HOR$TEXTSIC + russ.HOR$TEXTSIM + 0.8 * ifelse(is.na(russ.HOR$TEXTSIF), 0, russ.HOR$TEXTSIF)
  russ.HOR$CLYPPT <- russ.HOR$TEXTCL + 0.2 * ifelse(is.na(russ.HOR$TEXTSIF), 0, russ.HOR$TEXTSIF)
  ## Correct texture fractions:
  sumTex <- rowSums(russ.HOR[,c("SLTPPT","CLYPPT","SNDPPT")])
  russ.HOR$SNDPPT <- russ.HOR$SNDPPT / ((sumTex - russ.HOR$CLYPPT) /(100 - russ.HOR$CLYPPT))
  russ.HOR$SLTPPT <- russ.HOR$SLTPPT / ((sumTex - russ.HOR$CLYPPT) /(100 - russ.HOR$CLYPPT))
  russ.HOR$oc <- rowMeans(data.frame(x1=russ.HOR$CORG * 10, x2=russ.HOR$ORGMAT/1.724 * 10), na.rm=TRUE)
  russ.HOR$oc_d = signif(russ.HOR$oc / 1000 * russ.HOR$DVOL * 1000 * (100 - ifelse(is.na(russ.HOR$wpg2), 0, russ.HOR$wpg2))/100, 3)
  russ.HOR$n_tot <- russ.HOR$NTOT * 10
  russ.HOR$ca_ext = russ.HOR$EXCA * 200
  russ.HOR$mg_ext = russ.HOR$EXMG * 121
  russ.HOR$na_ext = russ.HOR$EXNA * 230
  russ.HOR$k_ext = russ.HOR$EXK * 391
  ## Sampling year not available but with high confidence <2000
  russ.HOR$site_obsdate = "1982"
  russ.sel.h <- c("SOURCEID", "SOIL_ID", "site_obsdate", "LONG", "LAT", "labsampnum", "HORNMB", "HORTOP", "HORBOT", "HISMMN", "tex_psda", "CLYPPT", "SLTPPT", "SNDPPT", "oc", "oc_d", "c_tot", "NTOT", "PHSLT", "PHH2O", "ph_cacl2", "CECST", "cec_nh4", "ecec", "wpg2", "DVOL", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = russ.sel.h[which(!russ.sel.h %in% names(russ.HOR))]
  if(length(x.na)>0){ for(i in x.na){ russ.HOR[,i] = NA } }
  chemsprops.EGRPR = russ.HOR[,russ.sel.h]
  chemsprops.EGRPR$source_db = "Russia_EGRPR"
  chemsprops.EGRPR$confidence_degree = 2
  chemsprops.EGRPR$project_url = "http://egrpr.esoil.ru/"
  chemsprops.EGRPR$citation_url = "https://doi.org/10.19047/0136-1694-2016-86-115-123"
  chemsprops.EGRPR <- complete.vars(chemsprops.EGRPR, sel=c("oc", "CLYPPT"), coords = c("LONG", "LAT"))
}
dim(chemsprops.EGRPR)
#> [1] 4437   36

5.3.0.18 Canada National Pedon DB

if(!exists("chemsprops.NPDB")){
  NPDB.nm = c("NPDB_V2_sum_source_info.csv","NPDB_V2_sum_chemical.csv", "NPDB_V2_sum_horizons_raw.csv", "NPDB_V2_sum_physical.csv")
  NPDB.HOR = plyr::join_all(lapply(paste0("/mnt/diskstation/data/Soil_points/Canada/NPDB/", NPDB.nm), read.csv), type = "full")
  NPDB.HOR$HISMMN = paste0(NPDB.HOR$HZN_MAS, NPDB.HOR$HZN_SUF, NPDB.HOR$HZN_MOD)
  NPDB.HOR$CARB_ORG[NPDB.HOR$CARB_ORG==9] <- NA
  NPDB.HOR$N_TOTAL[NPDB.HOR$N_TOTAL==9] <- NA
  NPDB.HOR$oc = NPDB.HOR$CARB_ORG * 10
  NPDB.HOR$oc_d = signif(NPDB.HOR$oc / 1000 * NPDB.HOR$BULK_DEN * 1000 * (100 - ifelse(is.na(NPDB.HOR$VC_SAND), 0, NPDB.HOR$VC_SAND))/100, 3)
  NPDB.HOR$ca_ext = NPDB.HOR$EXCH_CA * 200
  NPDB.HOR$mg_ext = NPDB.HOR$EXCH_MG * 121
  NPDB.HOR$na_ext = NPDB.HOR$EXCH_NA * 230
  NPDB.HOR$k_ext = NPDB.HOR$EXCH_K * 391
  npdb.sel.h = c("PEDON_ID", "usiteid", "CAL_YEAR", "DD_LONG", "DD_LAT", "labsampnum", "layer_sequence", "U_DEPTH", "L_DEPTH", "HISMMN", "tex_psda", "T_CLAY", "T_SILT", "T_SAND", "oc", "oc_d", "c_tot", "N_TOTAL", "ph_kcl", "PH_H2O", "PH_CACL2", "CEC", "cec_nh4", "ecec", "VC_SAND", "BULK_DEN", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = npdb.sel.h[which(!npdb.sel.h %in% names(NPDB.HOR))]
  if(length(x.na)>0){ for(i in x.na){ NPDB.HOR[,i] = NA } }
  chemsprops.NPDB = NPDB.HOR[,npdb.sel.h]
  chemsprops.NPDB$source_db = "Canada_NPDB"
  chemsprops.NPDB$confidence_degree = 2
  chemsprops.NPDB$project_url = "https://open.canada.ca/data/en/"
  chemsprops.NPDB$citation_url = "https://open.canada.ca/data/en/dataset/6457fad6-b6f5-47a3-9bd1-ad14aea4b9e0"
  chemsprops.NPDB <- complete.vars(chemsprops.NPDB, sel=c("oc", "PH_H2O", "T_CLAY"), coords = c("DD_LONG", "DD_LAT"))
}
dim(chemsprops.NPDB)
#> [1] 15946    36

5.3.0.19 Canadian upland forest soil profile and carbon stocks database

*Organic horizons have negative values, the first mineral soil horizon has a value of 0 cm, and other mineral soil horizons have positive values. This needs to be corrected before the values can be bind with other international sets.

if(!exists("chemsprops.CUFS")){
  ## Reading of the .dat file was tricky 
  cufs.HOR = read.csv("/mnt/diskstation/data/Soil_points/Canada/CUFSDB/PROFILES.csv", stringsAsFactors = FALSE)
  cufs.HOR$LOWER_HZN_LIMIT =cufs.HOR$UPPER_HZN_LIMIT + cufs.HOR$HZN_THICKNESS
  ## Correct depth (Canadian data can have negative depths for soil horizons):
  z.min.cufs <- ddply(cufs.HOR, .(LOCATION_ID), summarize, aggregated = min(UPPER_HZN_LIMIT, na.rm=TRUE))
  z.shift.cufs <- join(cufs.HOR["LOCATION_ID"], z.min.cufs, type="left")$aggregated
  ## fixed shift
  z.shift.cufs <- ifelse(z.shift.cufs>0, 0, z.shift.cufs)
  cufs.HOR$hzn_top <- cufs.HOR$UPPER_HZN_LIMIT - z.shift.cufs
  cufs.HOR$hzn_bot <- cufs.HOR$LOWER_HZN_LIMIT - z.shift.cufs
  cufs.SITE = read.csv("/mnt/diskstation/data/Soil_points/Canada/CUFSDB/SITES.csv", stringsAsFactors = FALSE)
  cufs.HOR$longitude_decimal_degrees = plyr::join(cufs.HOR["LOCATION_ID"], cufs.SITE)$LONGITUDE
  cufs.HOR$latitude_decimal_degrees = plyr::join(cufs.HOR["LOCATION_ID"], cufs.SITE)$LATITUDE
  cufs.HOR$site_obsdate = plyr::join(cufs.HOR["LOCATION_ID"], cufs.SITE)$YEAR_SAMPLED
  cufs.HOR$usiteid = plyr::join(cufs.HOR["LOCATION_ID"], cufs.SITE)$RELEASE_SOURCE_SITEID
  #summary(cufs.HOR$ORG_CARB_PCT)
  #hist(cufs.HOR$ORG_CARB_PCT, breaks=45)
  cufs.HOR$oc = cufs.HOR$ORG_CARB_PCT*10
  #cufs.HOR$c_tot = cufs.HOR$oc + ifelse(is.na(cufs.HOR$CARBONATE_CARB_PCT), 0, cufs.HOR$CARBONATE_CARB_PCT*10)
  cufs.HOR$n_tot = cufs.HOR$TOT_NITRO_PCT*10
  cufs.HOR$ca_ext = cufs.HOR$EXCH_Ca * 200
  cufs.HOR$mg_ext = cufs.HOR$EXCH_Mg * 121
  cufs.HOR$na_ext = cufs.HOR$EXCH_Na * 230
  cufs.HOR$k_ext = cufs.HOR$EXCH_K * 391
  cufs.HOR$ph_cacl2 = cufs.HOR$pH
  cufs.HOR$ph_cacl2[!cufs.HOR$pH_H2O_CACL2=="CACL2"] = NA
  cufs.HOR$ph_h2o = cufs.HOR$pH
  cufs.HOR$ph_h2o[!cufs.HOR$pH_H2O_CACL2=="H2O"] = NA
  #summary(cufs.HOR$CF_VOL_PCT) ## is NA == 0??
  cufs.HOR$wpg2 = ifelse(cufs.HOR$CF_CORR_FACTOR==1, 0, cufs.HOR$CF_VOL_PCT)
  cufs.HOR$oc_d = signif(cufs.HOR$oc / 1000 * cufs.HOR$BULK_DENSITY * 1000 * (100 - ifelse(is.na(cufs.HOR$wpg2), 0, cufs.HOR$wpg2))/100, 3)
  cufs.sel.h = c("LOCATION_ID", "usiteid", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "labsampnum", "HZN_SEQ_NO", "hzn_top", "hzn_bot", "HORIZON", "TEXT_CLASS", "CLAY_PCT", "SILT_PCT", "SAND_PCT", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "CEC_CALCULATED", "cec_nh4", "ecec", "wpg2", "BULK_DENSITY", "ca_ext", "mg_ext", "na_ext", "k_ext", "ELEC_COND", "ec_12pre")
  x.na = cufs.sel.h[which(!cufs.sel.h %in% names(cufs.HOR))]
  if(length(x.na)>0){ for(i in x.na){ cufs.HOR[,i] = NA } }
  chemsprops.CUFS = cufs.HOR[,cufs.sel.h]
  chemsprops.CUFS$source_db = "Canada_CUFS"
  chemsprops.CUFS$confidence_degree = 1
  chemsprops.CUFS$project_url = "https://cfs.nrcan.gc.ca/publications/centre/nofc"
  chemsprops.CUFS$citation_url = "https://doi.org/10.1002/ecy.2159"
  chemsprops.CUFS <- complete.vars(chemsprops.CUFS, sel=c("oc", "ph_h2o", "CLAY_PCT"))
}
dim(chemsprops.CUFS)
#> [1] 15162    36

5.3.0.20 Permafrost in subarctic Canada

  • Estop-Aragones, C.; Fisher, J.P.; Cooper, M.A.; Thierry, A.; Treharne, R.; Murton, J.B.; Phoenix, G.K.; Charman, D.J.; Williams, M.; Hartley, I.P. (2016). Bulk density, carbon and nitrogen content in soil profiles from permafrost in subarctic Canada. NERC Environmental Information Data Centre. https://doi.org/10.5285/efa2a84b-3505-4221-a7da-12af3cdc1952. Data download URL:
if(!exists("chemsprops.CAPERM")){
  caperm.HOR = vroom::vroom("/mnt/diskstation/data/Soil_points/Canada/NorthCanada/Bulk_density_CandNcontent_profiles_all_sites.csv")
  #measurements::conv_unit("-99 36 15.7", from = "deg_min_sec", to = "dec_deg")
  #caperm.HOR$longitude_decimal_degrees = as.numeric(measurements::conv_unit(paste0("-", gsub('\"W', '', gsub("'", ' ', iconv(caperm.HOR$Coordinates_West, "UTF-8", "UTF-8", sub=' ')), fixed = TRUE)), from = "deg_min_sec", to = "dec_deg"))
  caperm.HOR$longitude_decimal_degrees = as.numeric(measurements::conv_unit(paste0("-", caperm.HOR$Cordinates_West), from = "deg_min_sec", to = "dec_deg"))
  #caperm.HOR$latitude_decimal_degrees = as.numeric(measurements::conv_unit(gsub('\"N', '', gsub('o', '', gsub("'", ' ', iconv(caperm.HOR$Coordinates_North, "UTF-8", "UTF-8", sub=' '))), fixed = TRUE), from = "deg_min_sec", to = "dec_deg"))
  caperm.HOR$latitude_decimal_degrees = as.numeric(measurements::conv_unit(caperm.HOR$Cordinates_North, from = "deg_min_sec", to = "dec_deg"))
  #plot(caperm.HOR[,c("longitude_decimal_degrees","latitude_decimal_degrees")])
  caperm.HOR$site_obsdate = "2013"
  caperm.HOR$site_key = make.unique(caperm.HOR$Soil.core)
  #summary(as.factor(caperm.HOR$Soil_depth_cm))
  caperm.HOR$hzn_top = caperm.HOR$Soil_depth_cm-1
  caperm.HOR$hzn_bot = caperm.HOR$Soil_depth_cm+1
  caperm.HOR$db_od = caperm.HOR$Bulk_density_gdrysoil_cm3wetsoil
  caperm.HOR$oc = caperm.HOR$Ccontent_percentage_on_drymass * 10
  caperm.HOR$n_tot = caperm.HOR$Ncontent_percentage_on_drymass * 10
  caperm.HOR$oc_d = signif(caperm.HOR$oc / 1000 * caperm.HOR$db_od * 1000, 3)
  x.na = col.names[which(!col.names %in% names(caperm.HOR))]
  if(length(x.na)>0){ for(i in x.na){ caperm.HOR[,i] = NA } }
  chemsprops.CAPERM = caperm.HOR[,col.names]
  chemsprops.CAPERM$source_db = "Canada_subarctic"
  chemsprops.CAPERM$confidence_degree = 2
  chemsprops.CAPERM$project_url = "http://arp.arctic.ac.uk/projects/carbon-cycling-linkages-permafrost-systems-cyclops/"
  chemsprops.CAPERM$citation_url = "https://doi.org/10.5285/efa2a84b-3505-4221-a7da-12af3cdc1952"
  chemsprops.CAPERM <- complete.vars(chemsprops.CAPERM, sel=c("oc", "n_tot"))
}
dim(chemsprops.CAPERM)
#> [1] 1180   36

5.3.0.21 SOTER China soil profiles

if(!exists("chemsprops.CNSOTER")){
  sot.sites = read.csv("/mnt/diskstation/data/Soil_points/China/China_SOTERv1/CHINA_SOTERv1_Profile.csv")
  sot.horizons = read.csv("/mnt/diskstation/data/Soil_points/China/China_SOTERv1/CHINA_SOTERv1_Horizon.csv")
  sot.HOR = plyr::join_all(list(sot.sites, sot.horizons), type = "full")
  sot.HOR$oc = sot.HOR$SOCA * 10
  sot.HOR$ca_ext = sot.HOR$EXCA * 200
  sot.HOR$mg_ext = sot.HOR$EXMG * 121
  sot.HOR$na_ext = sot.HOR$EXNA * 230
  sot.HOR$k_ext = sot.HOR$EXCK * 391
  ## upper depth missing needs to be derived manually
  sot.HOR$hzn_top = NA
  sot.HOR$hzn_top[2:nrow(sot.HOR)] <- sot.HOR$HBDE[1:(nrow(sot.HOR)-1)]
  sot.HOR$hzn_top <- ifelse(sot.HOR$hzn_top > sot.HOR$HBDE, 0, sot.HOR$hzn_top)
  sot.HOR$hzn_top <- ifelse(sot.HOR$HONU==1 & is.na(sot.HOR$hzn_top), 0, sot.HOR$hzn_top)
  sot.HOR$oc_d = signif(sot.HOR$oc / 1000 * sot.HOR$BULK * 1000 * (100 - ifelse(is.na(sot.HOR$SDVC), 0, sot.HOR$SDVC))/100, 3)
  sot.sel.h = c("PRID", "PDID", "SAYR", "LNGI", "LATI", "labsampnum", "HONU", "hzn_top","HBDE","HODE", "PSCL", "CLPC", "STPC", "SDTO", "oc", "oc_d", "TOTC", "TOTN", "PHKC", "PHAQ", "ph_cacl2", "CECS", "cec_nh4", "ecec", "SDVC", "BULK", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = sot.sel.h[which(!sot.sel.h %in% names(sot.HOR))]
  if(length(x.na)>0){ for(i in x.na){ sot.HOR[,i] = NA } }
  chemsprops.CNSOT = sot.HOR[,sot.sel.h]
  chemsprops.CNSOT$source_db = "China_SOTER"
  chemsprops.CNSOT$confidence_degree = 8
  chemsprops.CNSOT$project_url = "https://www.isric.org/explore/soter"
  chemsprops.CNSOT$citation_url = "https://isric.org/sites/default/files/isric_report_2008_06.pdf"
  chemsprops.CNSOT <- complete.vars(chemsprops.CNSOT, sel=c("TOTC", "PHAQ", "CLPC"), coords = c("LNGI", "LATI"))
}
#> Joining by: PRID, INFR
dim(chemsprops.CNSOT)
#> [1] 5105   36

5.3.0.22 SISLAC

if(!exists("chemsprops.SISLAC")){
  sis.hor = read.csv("/mnt/diskstation/data/Soil_points/SA/SISLAC/sislac_profiles_es.csv", stringsAsFactors = FALSE)
  #str(sis.hor)
  ## SOC for Urugvay do not match the original soil profile data (see e.g. http://www.mgap.gub.uy/sites/default/files/multimedia/skmbt_c45111090914030.pdf)
  ## compare with:
  #sis.hor[sis.hor$perfil_id=="23861",]
  ## Subset to SISINTA/WOSIS points:
  cor.sel = c(grep("WoSIS", paste(sis.hor$perfil_numero)), grep("SISINTA", paste(sis.hor$perfil_numero)))
  #length(cor.sel)
  sis.hor = sis.hor[cor.sel,]
  #summary(sis.hor$analitico_carbono_organico_c)
  sis.hor$oc = sis.hor$analitico_carbono_organico_c * 10
  sis.hor$oc_d = signif(sis.hor$oc / 1000 * sis.hor$analitico_densidad_aparente * 1000 * (100 - ifelse(is.na(sis.hor$analitico_gravas), 0, sis.hor$analitico_gravas))/100, 3)
  #summary(sis.hor$analitico_base_k)
  #summary(as.factor(sis.hor$perfil_fecha))
  sis.sel.h = c("perfil_id", "perfil_numero", "perfil_fecha", "perfil_ubicacion_longitud", "perfil_ubicacion_latitud", "id", "layer_sequence", "profundidad_superior", "profundidad_inferior", "hzn_desgn", "tex_psda", "analitico_arcilla", "analitico_limo_2_50", "analitico_arena_total", "oc", "oc_d", "c_tot", "n_tot", "analitico_ph_kcl", "analitico_ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "analitico_gravas", "analitico_densidad_aparente", "ca_ext", "mg_ext", "na_ext", "k_ext", "analitico_conductividad", "ec_12pre")
  x.na = sis.sel.h[which(!sis.sel.h %in% names(sis.hor))]
  if(length(x.na)>0){ for(i in x.na){ sis.hor[,i] = NA } }
  chemsprops.SISLAC = sis.hor[,sis.sel.h]
  chemsprops.SISLAC$source_db = "SISLAC"
  chemsprops.SISLAC$confidence_degree = 4
  chemsprops.SISLAC$project_url = "http://54.229.242.119/sislac/es"
  chemsprops.SISLAC$citation_url = "https://hdl.handle.net/10568/49611"
  chemsprops.SISLAC <- complete.vars(chemsprops.SISLAC, sel=c("oc","analitico_ph_kcl","analitico_arcilla"), coords = c("perfil_ubicacion_longitud", "perfil_ubicacion_latitud"))
}
dim(chemsprops.SISLAC)
#> [1] 49994    36

5.3.0.23 FEBR

  • Samuel-Rosa, A., Dalmolin, R. S. D., Moura-Bueno, J. M., Teixeira, W. G., & Alba, J. M. F. (2020). Open legacy soil survey data in Brazil: geospatial data quality and how to improve it. Scientia Agricola, 77(1). https://doi.org/10.1590/1678-992x-2017-0430
  • Free Brazilian Repository for Open Soil Data – febr. Data download URL: http://www.ufsm.br/febr/
if(!exists("chemsprops.FEBR")){
  #library(febr)
  ## download up-to-date copy of data
  #febr.lab = febr::layer(dataset = "all", variable="all")
  #febr.lab = febr::observation(dataset = "all")
  febr.hor = read.csv("/mnt/diskstation/data/Soil_points/Brasil/FEBR/febr-superconjunto.csv", stringsAsFactors = FALSE, dec = ",", sep = ";")
  #head(febr.hor)
  #summary(febr.hor$carbono)
  #summary(febr.hor$ph)
  #summary(febr.hor$dsi) ## bulk density of total soil
  febr.hor$clay_tot_psa = febr.hor$argila /10
  febr.hor$sand_tot_psa = febr.hor$areia /10
  febr.hor$silt_tot_psa = febr.hor$silte /10
  febr.hor$wpg2 = (1000-febr.hor$terrafina)/10
  febr.hor$oc_d = signif(febr.hor$carbono / 1000 * febr.hor$dsi * 1000 * (100 - ifelse(is.na(febr.hor$wpg2), 0, febr.hor$wpg2))/100, 3)
  febr.sel.h <- c("observacao_id", "usiteid", "observacao_data", "coord_x", "coord_y", "sisb_id", "camada_id", "profund_sup", "profund_inf", "camada_nome", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "carbono", "oc_d", "c_tot", "nitrogenio", "ph_kcl", "ph", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "dsi", "ca_ext", "mg_ext", "na_ext", "k_ext", "ce", "ec_12pre")
  x.na = febr.sel.h[which(!febr.sel.h %in% names(febr.hor))]
  if(length(x.na)>0){ for(i in x.na){ febr.hor[,i] = NA } }
  chemsprops.FEBR = febr.hor[,febr.sel.h]
  chemsprops.FEBR$source_db = "FEBR"
  chemsprops.FEBR$confidence_degree = 4
  chemsprops.FEBR$project_url = "http://www.ufsm.br/febr/"
  chemsprops.FEBR$citation_url = "https://doi.org/10.1590/1678-992x-2017-0430"
  chemsprops.FEBR <- complete.vars(chemsprops.FEBR, sel=c("carbono","ph","clay_tot_psa","dsi"), coords = c("coord_x", "coord_y"))
}
dim(chemsprops.FEBR)
#> [1] 7842   36

5.3.0.24 PRONASOLOS

if(!exists("chemsprops.PRONASOLOS")){
  pronas.hor = as.data.frame(sf::read_sf("/mnt/diskstation/data/Soil_points/Brasil/Pronasolos/Perfis_PronaSolos_20201202v2.shp"))
  ## 34,464 rows
  #head(pronas.hor)
  #summary(as.numeric(pronas.hor$carbono_or))
  #summary(as.numeric(pronas.hor$densidade_))
  #summary(as.numeric(pronas.hor$argila))
  #summary(as.numeric(pronas.hor$cascalho))
  #summary(as.numeric(pronas.hor$ph_h2o))
  #summary(as.numeric(pronas.hor$complexo_2))
  ## A lot of errors / typos e.g. very high values and 0 values!!
  #pronas.hor$data_colet[1:50]
  pronas.in.name = c("sigla", "codigo_pon", "data_colet", "gcs_latitu", "gcs_longit", "simbolo_ho", "profundida", 
                     "profundi_1", "cascalho", "areia_tota", "silte", "argila", "densidade_", "ph_h2o", "ph_kcl", 
                     "complexo_s", "complexo_1", "complexo_2", "complexo_3", "valor_s", "carbono_or", "nitrogenio",
                     "condutivid", "classe_tex")
  #pronas.in.name[which(!pronas.in.name %in% names(pronas.hor))]
  pronas.x = as.data.frame(pronas.hor[,pronas.in.name])
  pronas.out.name = c("site_key", "usiteid", "site_obsdate", "latitude_decimal_degrees", "longitude_decimal_degrees",
                      "hzn_desgn", "hzn_bot", "hzn_top", "wpg2", "sand_tot_psa", "silt_tot_psa", 
                      "clay_tot_psa", "db_od", "ph_h2o", "ph_kcl", "ca_ext",
                      "mg_ext", "k_ext", "na_ext", "cec_sum", "oc", "n_tot", "ec_satp", "tex_psda")
  ## translate values
  pronas.fun.lst = as.list(rep("as.numeric(x)*1", length(pronas.in.name)))
  pronas.fun.lst[[which(pronas.in.name=="sigla")]] = "paste(x)"
  pronas.fun.lst[[which(pronas.in.name=="codigo_pon")]] = "paste(x)"
  pronas.fun.lst[[which(pronas.in.name=="data_colet")]] = "paste(x)"
  pronas.fun.lst[[which(pronas.in.name=="simbolo_ho")]] = "paste(x)"
  pronas.fun.lst[[which(pronas.in.name=="classe_tex")]] = "paste(x)"
  pronas.fun.lst[[which(pronas.in.name=="complexo_s")]] = "as.numeric(x)*200"
  pronas.fun.lst[[which(pronas.in.name=="complexo_1")]] = "as.numeric(x)*121"
  pronas.fun.lst[[which(pronas.in.name=="complexo_2")]] = "as.numeric(x)*391"
  pronas.fun.lst[[which(pronas.in.name=="complexo_3")]] = "as.numeric(x)*230"
  pronas.fun.lst[[which(pronas.in.name=="areia_tota")]] = "round(as.numeric(x)/10, 1)"
  pronas.fun.lst[[which(pronas.in.name=="silte")]] = "round(as.numeric(x)/10, 1)"
  pronas.fun.lst[[which(pronas.in.name=="argila")]] = "round(as.numeric(x)/10, 1)"
  ## save translation rules:
  write.csv(data.frame(pronas.in.name, pronas.out.name, unlist(pronas.fun.lst)), "pronas_soilab_transvalues.csv")
  pronas.soil = transvalues(pronas.x, pronas.out.name, pronas.in.name, pronas.fun.lst)
  pronas.soil$oc_d = signif(pronas.soil$oc / 1000 * pronas.soil$db_od * 1000 * (100 - ifelse(is.na(pronas.soil$wpg2), 0, pronas.soil$wpg2))/100, 3)
  x.na = col.names[which(!col.names %in% names(pronas.soil))]
  if(length(x.na)>0){ for(i in x.na){ pronas.soil[,i] = NA } }
  chemsprops.PRONASOLOS = pronas.soil[,col.names]
  chemsprops.PRONASOLOS$source_db = "PRONASOLOS"
  chemsprops.PRONASOLOS$confidence_degree = 2
  chemsprops.PRONASOLOS$project_url = "https://geoportal.cprm.gov.br/pronasolos/"
  chemsprops.PRONASOLOS$citation_url = "https://www.infoteca.cnptia.embrapa.br/infoteca/handle/doc/1135056"
  chemsprops.PRONASOLOS <- complete.vars(chemsprops.PRONASOLOS, sel=c("oc","ph_h2o","clay_tot_psa"))
}
dim(chemsprops.PRONASOLOS)
#> [1] 31747    36

5.3.0.25 Soil Profile DB for Costa Rica

if(!exists("chemsprops.CostaRica")){
  cr.hor = read.csv("/mnt/diskstation/data/Soil_points/Costa_Rica/Base_de_datos_version_1.2.3.csv", stringsAsFactors = FALSE)
  #plot(cr.hor[,c("X","Y")], pch="+", asp=1)
  cr.hor$usiteid = paste(cr.hor$Provincia, cr.hor$Cantón, cr.hor$Id, sep="_")
  #summary(cr.hor$Corg.)
  cr.hor$oc = cr.hor$Corg. * 10
  cr.hor$Densidad.Aparente = as.numeric(paste0(cr.hor$Densidad.Aparente))
  #summary(cr.hor$K)
  cr.hor$ca_ext = cr.hor$Ca * 200
  cr.hor$mg_ext = cr.hor$Mg * 121
  #cr.hor$na_ext = cr.hor$Na * 230
  cr.hor$k_ext = cr.hor$K * 391
  cr.hor$wpg2 = NA
  cr.hor$oc_d = signif(cr.hor$oc / 1000 * cr.hor$Densidad.Aparente * 1000 * (100 - ifelse(is.na(cr.hor$wpg2), 0, cr.hor$wpg2))/100, 3)
  cr.sel.h = c("Id", "usiteid", "Fecha", "X", "Y", "labsampnum", "horizonte", "prof_inicio", "prof_final", "id_hz", "Clase.Textural", "ARCILLA", "LIMO", "ARENA", "oc", "oc_d", "c_tot", "n_tot", "pHKCl", "pH_H2O", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "Densidad.Aparente", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = cr.sel.h[which(!cr.sel.h %in% names(cr.hor))]
  if(length(x.na)>0){ for(i in x.na){ cr.hor[,i] = NA } }
  chemsprops.CostaRica = cr.hor[,cr.sel.h]
  chemsprops.CostaRica$source_db = "CostaRica"
  chemsprops.CostaRica$confidence_degree = 4
  chemsprops.CostaRica$project_url = "http://www.cia.ucr.ac.cr"
  chemsprops.CostaRica$citation_url = "https://doi.org/10.18845/tm.v32i7.42"
  chemsprops.CostaRica <- complete.vars(chemsprops.CostaRica, sel=c("oc","pH_H2O","ARCILLA","Densidad.Aparente"), coords = c("X", "Y"))
}
dim(chemsprops.CostaRica)
#> [1] 2042   36

5.3.0.26 Iran soil profile DB

if(!exists("chemsprops.IRANSPDB")){
  na.s = c("?","","?.","??", -2147483647, -1.00e+308, "<NA>")
  iran.hor = read.csv("/mnt/diskstation/data/Soil_points/Iran/iran_sdbana.txt", stringsAsFactors = FALSE, na.strings = na.s, header = FALSE)[,1:12]
  names(iran.hor) = c("site_key", "hzn_desgn", "hzn_top", "hzn_bot", "ph_h2o", "ec_satp", "oc", "CACO", "PBS", "sand_tot_psa", "silt_tot_psa", "clay_tot_psa")
  iran.hor$hzn_top = ifelse(is.na(iran.hor$hzn_top) & iran.hor$hzn_desgn=="A", 0, iran.hor$hzn_top)
  iran.hor2 = read.csv("/mnt/diskstation/data/Soil_points/Iran/iran_sdbhor.txt", stringsAsFactors = FALSE, na.strings = na.s, header = FALSE)[,1:8]
  names(iran.hor2) = c("site_key", "layer_sequence", "DESI", "hzn_top", "hzn_bot", "M_colour", "tex_psda", "hzn_desgn")
  iran.site = read.csv("/mnt/diskstation/data/Soil_points/Iran/iran_sgdb.txt", stringsAsFactors = FALSE, na.strings = na.s, header = FALSE)
  names(iran.site) = c("usiteid", "latitude_decimal_degrees", "longitude_decimal_degrees", "FAO", "Tax", "site_key")
  iran.db = plyr::join_all(list(iran.site, iran.hor, iran.hor2))
  iran.db$oc = iran.db$oc * 10
  #summary(iran.db$oc)
  x.na = col.names[which(!col.names %in% names(iran.db))]
  if(length(x.na)>0){ for(i in x.na){ iran.db[,i] = NA } }
  chemsprops.IRANSPDB = iran.db[,col.names]
  chemsprops.IRANSPDB$source_db = "Iran_SPDB"
  chemsprops.IRANSPDB$confidence_degree = 4
  chemsprops.IRANSPDB$project_url = ""
  chemsprops.IRANSPDB$citation_url = "https://doi.org/10.1016/j.geoderma.2007.04.022"
  chemsprops.IRANSPDB <- complete.vars(chemsprops.IRANSPDB, sel=c("oc","ph_h2o","clay_tot_psa"))
}
dim(chemsprops.IRANSPDB)
#> [1] 4759   36

5.3.0.27 Northern circumpolar permafrost soil profiles

if(!exists("chemsprops.NCSCD")){
  ncscd.hors <- read.csv("/mnt/diskstation/data/Soil_points/INT/NCSCD/Harden_etal_2012_Hugelius_etal_2013_cleaned_data.csv", stringsAsFactors = FALSE)
  ncscd.hors$oc = as.numeric(ncscd.hors$X.C)*10
  #summary(ncscd.hors$oc)
  #hist(ncscd.hors$Layer.thickness.cm, breaks = 45)
  ncscd.hors$Layer.thickness.cm = ifelse(ncscd.hors$Layer.thickness.cm<0, NA, ncscd.hors$Layer.thickness.cm)
  ncscd.hors$hzn_bot = ncscd.hors$Basal.Depth.cm + ncscd.hors$Layer.thickness.cm
  ncscd.hors$db_od = as.numeric(ncscd.hors$bulk.density.g.cm.3)
  ## Can we assume no coarse fragments?
  ncscd.hors$wpg2 = 0
  ncscd.hors$oc_d = signif(ncscd.hors$oc / 1000 * ncscd.hors$db_od * 1000 * (100 - ifelse(is.na(ncscd.hors$wpg2), 0, ncscd.hors$wpg2))/100, 3)
  ## very high values >40 kg/m3
  ncscd.hors$site_obsdate = format(as.Date(ncscd.hors$Sample.date, format="%d-%m-%Y"), "%Y-%m-%d")
  #summary(ncscd.hors$db_od)
  ncscd.col = c("Profile.ID", "citation", "site_obsdate", "Long", "Lat", "labsampnum", "layer_sequence", "Basal.Depth.cm", "hzn_bot", "Horizon.type", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = ncscd.col[which(!ncscd.col %in% names(ncscd.hors))]
  if(length(x.na)>0){ for(i in x.na){ ncscd.hors[,i] = NA } }
  chemsprops.NCSCD = ncscd.hors[,ncscd.col]
  chemsprops.NCSCD$source_db = "NCSCD"
  chemsprops.NCSCD$confidence_degree = 10
  chemsprops.NCSCD$project_url = "https://bolin.su.se/data/ncscd/"
  chemsprops.NCSCD$citation_url = "https://doi.org/10.5194/essd-5-393-2013"
  chemsprops.NCSCD = complete.vars(chemsprops.NCSCD, sel = c("oc","db_od"), coords = c("Long", "Lat"))  
}
dim(chemsprops.NCSCD)
#> [1] 7104   36

5.3.0.28 CSIRO National Soil Site Database

if(!exists("chemsprops.NatSoil")){
  library(Hmisc) 
  cmdb <- mdb.get("/mnt/diskstation/data/Soil_points/Australia/CSIRO/NatSoil_v2_20200612.mdb")
  #str(cmdb$SITES)
  au.obs = cmdb$OBSERVATIONS[,c("s.id", "o.location.notes", "o.date.desc", "o.latitude.GDA94", "o.longitude.GDA94")]
  au.obs = au.obs[!is.na(au.obs$o.longitude.GDA94),]
  coordinates(au.obs) <- ~o.longitude.GDA94+o.latitude.GDA94
  proj4string(au.obs) <- CRS("+proj=longlat +ellps=GRS80 +no_defs")
  au.xy <- data.frame(spTransform(au.obs, CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")))
  #plot(au.xy[,c("o.longitude.GDA94", "o.latitude.GDA94")])
  ## all variables in one column and need to be sorted based on the lab method
  #summary(cmdb$LAB_METHODS$LABM.SHORT.NAME)
  #write.csv(cmdb$LAB_METHODS, "/mnt/diskstation/data/Soil_points/Australia/CSIRO/NatSoil_v2_20200612_lab_methods.csv")
  lab.tbl = list(
          c("6_DC", "6A1", "6A1_UC", "6B1", "6B2", "6B2a", "6B2b", "6B3", "6B4", "6B4a", "6B4b", "6Z"), # %
          c("6B3a"), # g/kg
          c("6H4", "6H4_SCaRP"), # %
          c("7_C_B", "7_NR", "7A1", "7A2", "7A2a", "7A2b", "7A3", "7A4", "7A5", "7A6", "7A6a", "7A6b", "7A6b_MCLW"),  # g/kg
          c("4A1", "4_NR", "4A_C_2.5", "4A_C_1", "4G1"),
          c("4C_C_1", "4C1", "4C2", "23A"),
          c("4B_C_2.5", "4B1", "4B2"),
          c("P10_NR_C", "P10_HYD_C", "P10_PB_C", "P10_PB1_C", "P10_CF_C", "P10_I_C"),
          c("P10_NR_Z", "P10_HYD_Z", "P10_PB_Z", "P10_PB1_Z", "P10_CF_Z", "P10_I_Z"),
          c("P10_NR_S", "P10_HYD_S", "P10_PB_S", "P10_PB1_S", "P10_CF_S", "P10_I_S"),
          c("15C1modCEC", "15_HSK_CEC", "15J_CEC"),
          c("15I1", "15I2", "15I3", "15I4", "15D3_CEC"),
          c("15_BASES", "15_NR", "15J_H", "15J1"),
          c("2Z2_Grav", "P10_GRAV"),
          c("503.08a", "P3A_NR", "P3A1", "P3A1_C4", "P3A1_CLOD", "P3A1_e"),
          c("18F1_CA"),
          c("18F1_MG"),
          c("18F1_NA"),
          c("18F1_K", "18F2", "18A1mod", "18_NR", "18A1", "18A1_NR", "18B1", "18B2"),
          c("3_C_B", "3_NR", "3A_TSS"),
          c("3A_C_2.5", "3A1")
          )
  names(lab.tbl) = c("oc", "ocP", "c_tot", "n_tot", "ph_h2o", "ph_kcl", "ph_cacl2", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  val.lst = lapply(1:length(lab.tbl), function(i){x <- cmdb$LAB_RESULTS[cmdb$LAB_RESULTS$labm.code %in% lab.tbl[[i]], c("agency.code", "proj.code", "s.id", "o.id", "h.no", "labr.value")]; names(x)[6] <- names(lab.tbl)[i]; return(x) })
  names(val.lst) = names(lab.tbl)
  val.lst$oc$oc = val.lst$oc$oc * 10
  names(val.lst$ocP)[6] = "oc"
  val.lst$oc <- rbind(val.lst$oc, val.lst$ocP)
  val.lst$ocP = NULL
  #summary(val.lst$oc$oc)
  #str(val.lst, max.level = 1)
  for(i in 1:length(val.lst)){ val.lst[[i]]$h.id <- paste(val.lst[[i]]$agency.code, val.lst[[i]]$proj.code, val.lst[[i]]$s.id, val.lst[[i]]$o.id, val.lst[[i]]$h.no, sep="_") }
  au.hor <- plyr::join_all(lapply(val.lst, function(x){x[,6:7]}), match="first")
  #str(as.factor(au.hor$h.id))
  cmdb$HORIZONS$h.id = paste(cmdb$HORIZONS$agency.code, cmdb$HORIZONS$proj.code, cmdb$HORIZONS$s.id, cmdb$HORIZONS$o.id, cmdb$HORIZONS$h.no, sep="_")
  cmdb$HORIZONS$hzn_desgn = paste(cmdb$HORIZONS$h.desig.master, cmdb$HORIZONS$h.desig.subdiv, cmdb$HORIZONS$h.desig.suffix, sep="")
  au.horT <- plyr::join_all(list(cmdb$HORIZONS[,c("h.id","s.id","h.no","h.texture","hzn_desgn","h.upper.depth","h.lower.depth")], au.hor, au.xy))
  au.horT$site_obsdate = format(as.Date(au.horT$o.date.desc, format="%d%m%Y"), "%Y-%m-%d")
  au.horT$sand_tot_psa = ifelse(is.na(au.horT$sand_tot_psa), 100-(au.horT$clay_tot_psa + au.horT$silt_tot_psa), au.horT$sand_tot_psa)
  au.horT$hzn_top = au.horT$h.upper.depth*100
  au.horT$hzn_bot = au.horT$h.lower.depth*100
  au.horT$oc_d = signif(au.horT$oc / 1000 * au.horT$db_od * 1000 * (100 - ifelse(is.na(au.horT$wpg2), 0, au.horT$wpg2))/100, 3)
  au.cols.n = c("s.id", "o.location.notes", "site_obsdate", "o.longitude.GDA94", "o.latitude.GDA94", "h.id", "h.no", "hzn_top", "hzn_bot", "hzn_desgn", "h.texture", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = au.cols.n[which(!au.cols.n %in% names(au.horT))]
  if(length(x.na)>0){ for(i in x.na){ au.horT[,i] = NA } }
  chemsprops.NatSoil = au.horT[,au.cols.n]
  chemsprops.NatSoil$source_db = "CSIRO_NatSoil"
  chemsprops.NatSoil$confidence_degree = 4
  chemsprops.NatSoil$project_url = "https://www.csiro.au/en/Do-business/Services/Enviro/Soil-archive"
  chemsprops.NatSoil$citation_url = "https://doi.org/10.25919/5eeb2a56eac12"
  chemsprops.NatSoil = complete.vars(chemsprops.NatSoil, sel = c("oc","db_od","clay_tot_psa","ph_h2o"), coords = c("o.longitude.GDA94", "o.latitude.GDA94"))  
}
dim(chemsprops.NatSoil)
#> [1] 70791    36

5.3.0.29 NAMSOTER

  • Coetzee, M. E. (2001). NAMSOTER, a SOTER database for Namibia. Agroecological Zoning, 458.
  • Coetzee, M. E. (2009). Chemical characterisation of the soils of East Central Namibia (Doctoral dissertation, Stellenbosch: University of Stellenbosch).
if(!exists("chemsprops.NAMSOTER")){
  nam.profs <- read.csv("/mnt/diskstation/data/Soil_points/Namibia/NAMSOTER/Namibia_all_profiles.csv", na.strings = c("-9999", "999", "9999", "NA"))
  nam.hors <- read.csv("/mnt/diskstation/data/Soil_points/Namibia/NAMSOTER/Namibia_all_horizons.csv", na.strings = c("-9999", "999", "9999", "NA"))
  #summary(nam.hors$TOTN)
  #summary(nam.hors$TOTC)
  nam.hors$hzn_top <- NA
  nam.hors$hzn_top <- ifelse(nam.hors$HONU==1, 0, nam.hors$hzn_top)
  h.lst <- lapply(1:7, function(x){which(nam.hors$HONU==x)})
  for(i in 2:7){
    sel <- match(nam.hors$PRID[h.lst[[i]]], nam.hors$PRID[h.lst[[i-1]]])
    nam.hors$hzn_top[h.lst[[i]]] <- nam.hors$HBDE[h.lst[[i-1]]][sel]
  }
  nam.hors$HBDE <- ifelse(is.na(nam.hors$HBDE), nam.hors$hzn_top+50, nam.hors$HBDE)
  #summary(nam.hors$HBDE)
  namALL = plyr::join(nam.hors, nam.profs, by=c("PRID"))
  namALL$k_ext = namALL$EXCK * 391
  namALL$ca_ext = namALL$EXCA * 200
  namALL$mg_ext = namALL$EXMG * 121
  namALL$na_ext = namALL$EXNA * 230
  #summary(namALL$MINA)
  namALL$BULK <- ifelse(namALL$BULK>2.4, NA, namALL$BULK)
  namALL$wpg2 = ifelse(namALL$MINA=="D", 80, ifelse(namALL$MINA=="A", 60, ifelse(namALL$MINA=="M", 25, ifelse(namALL$MINA=="C", 10, ifelse(namALL$MINA=="V", 1, ifelse(namALL$MINA=="F", 2.5, ifelse(namALL$MINA=="M/A", 40, ifelse(namALL$MINA=="C/M", 15, 0))))))))
  #hist(namALL$wpg2)
  namALL$oc_d = signif(namALL$TOTC / 1000 * namALL$BULK * 1000 * (100 - ifelse(is.na(namALL$wpg2), 0, namALL$wpg2))/100, 3)
  #summary(namALL$oc_d)
  #summary(namALL$PHAQ) ## very high ph
  namALL$site_obsdate = 2000
  nam.col = c("PRID", "SLID", "site_obsdate", "LONG", "LATI", "labsampnum", "HONU", "hzn_top", "HBDE", "HODE", "PSCL", "CLPC", "STPC", "SDTO", "TOTC", "oc_d", "c_tot", "TOTN", "PHKC", "PHAQ", "ph_cacl2", "CECS", "cec_nh4", "ecec", "wpg2", "BULK", "ca_ext", "mg_ext", "na_ext", "k_ext", "ELCO", "ec_12pre")
  x.na = nam.col[which(!nam.col %in% names(namALL))]
  if(length(x.na)>0){ for(i in x.na){ namALL[,i] = NA } }
  chemsprops.NAMSOTER = namALL[,nam.col]
  chemsprops.NAMSOTER$source_db = "NAMSOTER"
  chemsprops.NAMSOTER$confidence_degree = 2
  chemsprops.NAMSOTER$project_url = ""
  chemsprops.NAMSOTER$citation_url = "https://edepot.wur.nl/485173"
  chemsprops.NAMSOTER = complete.vars(chemsprops.NAMSOTER, sel = c("TOTC","CLPC","PHAQ"), coords = c("LONG", "LATI"))  
}
dim(chemsprops.NAMSOTER)
#> [1] 2953   36

5.3.0.30 Worldwide organic soil carbon and nitrogen data

  • Zinke, P. J., Millemann, R. E., & Boden, T. A. (1986). Worldwide organic soil carbon and nitrogen data. Carbon Dioxide Information Center, Environmental Sciences Division, Oak Ridge National Laboratory. Data download URL: https://dx.doi.org/10.3334/CDIAC/lue.ndp018

  • Note: poor spatial location accuracy i.e. <10 km. Bulk density for many points has been estimated not measured. Sampling year has not been but literature indicates: 1965, 1974, 1976, 1978, 1979, 1984. Most of samples come from natural vegetation (undisturbed) areas.

if(!exists("chemsprops.ISCND")){
  ndp.profs <- read.csv("/mnt/diskstation/data/Soil_points/INT/ISCND/ndp018.csv", na.strings = c("-9999", "?", "NA"), stringsAsFactors = FALSE)
  names(ndp.profs) = c("PROFILE", "CODE", "CARBON", "NITROGEN", "LAT", "LONG", "ELEV", "SOURCE", "HOLDRIGE", "OLSON", "PARENT")
  for(j in c("CARBON","NITROGEN","ELEV")){  ndp.profs[,j] <- as.numeric(ndp.profs[,j])   }
  #summary(ndp.profs$CARBON)
  lat.s  <- grep("S", ndp.profs$LAT) # lat.n <- grep("N", ndp.profs$LAT)
  ndp.profs$latitude_decimal_degrees = as.numeric(gsub("[^0-9.-]", "", ndp.profs$LAT))
  ndp.profs$latitude_decimal_degrees[lat.s] = ndp.profs$latitude_decimal_degrees[lat.s] * -1
  lon.w <- grep("W", ndp.profs$LONG) # lon.e  <- grep("E", ndp.profs$LONG, fixed = TRUE)
  ndp.profs$longitude_decimal_degrees = as.numeric(gsub("[^0-9.-]", "", ndp.profs$LONG))
  ndp.profs$longitude_decimal_degrees[lon.w] = ndp.profs$longitude_decimal_degrees[lon.w] * -1
  #plot(ndp.profs[,c("longitude_decimal_degrees", "latitude_decimal_degrees")])
  ndp.profs$hzn_top = 0; ndp.profs$hzn_bot = 100
  ## Sampling years from the doc: 1965, 1974, 1976, 1978, 1979, 1984 
  ndp.profs$site_obsdate = "1982"
  ndp.col = c("PROFILE", "CODE", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "labsampnum", "layer_sequence","hzn_top","hzn_bot","hzn_desgn", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "CARBON", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = ndp.col[which(!ndp.col %in% names(ndp.profs))]
  if(length(x.na)>0){ for(i in x.na){ ndp.profs[,i] = NA } }
  chemsprops.ISCND = ndp.profs[,ndp.col]
  chemsprops.ISCND$source_db = "ISCND"
  chemsprops.ISCND$confidence_degree = 8
  chemsprops.ISCND$project_url = "https://iscn.fluxdata.org/data/"
  chemsprops.ISCND$citation_url = "https://dx.doi.org/10.3334/CDIAC/lue.ndp018"
  chemsprops.ISCND = complete.vars(chemsprops.ISCND, sel = c("CARBON"), coords = c("longitude_decimal_degrees", "latitude_decimal_degrees"))  
}
dim(chemsprops.ISCND)
#> [1] 3977   36

5.3.0.31 Interior Alaska Carbon and Nitrogen stocks

if(!exists("chemsprops.Alaska")){
  al.gps <- read.csv("/mnt/diskstation/data/Soil_points/USA/Alaska_Interior/Site_GPS_coordinates_v1-1.csv", stringsAsFactors = FALSE)
  ## Different datums!
  #summary(as.factor(al.gps$Datum))
  al.gps1 = al.gps[al.gps$Datum=="NAD83",]
  coordinates(al.gps1) = ~ Longitude + Latitude
  proj4string(al.gps1) = "+proj=longlat +datum=NAD83"
  al.gps0 = spTransform(al.gps1, CRS("+proj=longlat +datum=WGS84"))
  al.gps[which(al.gps$Datum=="NAD83"),"Longitude"] = al.gps0@coords[,1]
  al.gps[which(al.gps$Datum=="NAD83"),"Latitude"] = al.gps0@coords[,2]
  al.gps$site = al.gps$Site
  al.hor <- read.csv("/mnt/diskstation/data/Soil_points/USA/Alaska_Interior/Generalized_models_for_CandN_Alaska_v1-1.csv", stringsAsFactors = FALSE)
  al.hor$hzn_top = al.hor$depth - as.numeric(al.hor$thickness)
  al.hor$site_obsdate = format(as.Date(al.hor$date, format = "%m/%d/%Y"), "%Y-%m-%d")
  al.hor$oc = as.numeric(al.hor$carbon) * 10
  al.hor$n_tot = as.numeric(al.hor$nitrogen) * 10
  al.hor$oc_d = as.numeric(al.hor$Cdensity) * 1000
  #summary(al.hor$oc_d)
  al.horA = plyr::join(al.hor, al.gps, by=c("site"))
  al.col = c("profile", "description", "site_obsdate", "Longitude", "Latitude", "sampleID", "layer_sequence", "hzn_top", "depth", "Hcode", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "BDfine", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = al.col[which(!al.col %in% names(al.horA))]
  if(length(x.na)>0){ for(i in x.na){ al.horA[,i] = NA } }
  chemsprops.Alaska = al.horA[,al.col]
  chemsprops.Alaska$source_db = "Alaska_interior"
  chemsprops.Alaska$confidence_degree = 1
  chemsprops.Alaska$project_url = "https://www.usgs.gov/centers/gmeg"
  chemsprops.Alaska$citation_url = "https://doi.org/10.5194/essd-12-1745-2020"
  chemsprops.Alaska = complete.vars(chemsprops.Alaska, sel = c("oc","oc_d"), coords = c("Longitude", "Latitude"))  
}
dim(chemsprops.Alaska)
#> [1] 3882   36

5.3.0.32 Croatian Soil Pedon data

  • Martinović J., (2000) “Tla u Hrvatskoj”, Monografija, Državna uprava za zaštitu prirode i okoliša, str. 269, Zagreb. ISBN: 9536793059
  • Bašić F., (2014) “The Soils of Croatia”. World Soils Book Series, Springer Science & Business Media, 179 pp. ISBN: 9400758154
if(!exists("chemsprops.bpht")){
  bpht.site <- read.csv("/mnt/diskstation/data/Soil_points/Croatia/WBSoilHR_sites_1997.csv", stringsAsFactors = FALSE)
  bpht.hors <- read.csv("/mnt/diskstation/data/Soil_points/Croatia/WBSoilHR_1997.csv", stringsAsFactors = FALSE)
  ## filter typos
  for(j in c("GOR", "DON", "MKP", "PH1", "PH2", "MSP", "MP", "MG", "HUM", "EXTN", "EXTP", "EXTK", "CAR")){
    bpht.hors[,j] = as.numeric(bpht.hors[,j])
  }
  ## Convert to the USDA standard
  bpht.hors$sand_tot_psa <- bpht.hors$MSP * 0.8 + bpht.hors$MKP
  bpht.hors$silt_tot_psa <- bpht.hors$MP + bpht.hors$MSP * 0.2
  bpht.hors$oc <- signif(bpht.hors$HUM/1.724 * 10, 3)
  ## summary(bpht.hors$sand_tot_psa)
  bpht.s.lst <- c("site_key", "UZORAK", "Cro16.30_X", "Cro16.30_Y", "FITOC", "STIJENA", "HID_DREN", "DUBINA")
  bpht.hor = plyr::join(bpht.site[,bpht.s.lst], bpht.hors)
  bpht.hor$wpg2 = bpht.hor$STIJENA
  bpht.hor$DON <- ifelse(is.na(bpht.hor$DON), bpht.hor$GOR+50, bpht.hor$DON)
  bpht.hor$depth <- bpht.hor$GOR + (bpht.hor$DON - bpht.hor$GOR)/2
  bpht.hor = bpht.hor[!is.na(bpht.hor$depth),]
  bpht.hor$wpg2[which(bpht.hor$GOR<30)] <- bpht.hor$wpg2[which(bpht.hor$GOR<30)]*.3
  bpht.hor$sample_key = make.unique(paste(bpht.hor$PEDOL_ID, bpht.hor$OZN, sep="_"))
  bpht.hor$sand_tot_psa[bpht.hor$sample_key=="805_Amo"] <- bpht.hor$sand_tot_psa[bpht.hor$sample_key=="805_Amo"]/10
  ## convert N, P, K
  #summary(bpht.hor$EXTK) -- measurements units?
  bpht.hor$p_ext = bpht.hor$EXTP * 4.364
  bpht.hor$k_ext = bpht.hor$EXTK * 8.3013
  bpht.hor = bpht.hor[!is.na(bpht.hor$`Cro16.30_X`),]
  ## coordinates:
  bpht.pnts = SpatialPointsDataFrame(bpht.hor[,c("Cro16.30_X","Cro16.30_Y")], bpht.hor["site_key"], proj4string = CRS("+proj=tmerc +lat_0=0 +lon_0=16.5 +k=0.9999 +x_0=2500000 +y_0=0 +ellps=bessel +towgs84=550.499,164.116,475.142,5.80967,2.07902,-11.62386,0.99999445824 +units=m"))
  bpht.pnts.ll <- spTransform(bpht.pnts, CRS("+proj=longlat +datum=WGS84"))
  bpht.hor$longitude_decimal_degrees = bpht.pnts.ll@coords[,1]
  bpht.hor$latitude_decimal_degrees = bpht.pnts.ll@coords[,2] 
  bpht.h.lst <- c('site_key', 'OZ_LIST_PROF', 'UZORAK', 'longitude_decimal_degrees', 'latitude_decimal_degrees', 'labsampnum', 'layer_sequence', 'GOR', 'DON', 'OZN', 'TT', 'MG', 'silt_tot_psa', 'sand_tot_psa', 'oc', 'oc_d', 'c_tot', 'EXTN', 'PH2', 'PH1', 'ph_cacl2', 'cec_sum', 'cec_nh4', 'ecec', 'wpg2', 'db_od', 'ca_ext', 'mg_ext', 'na_ext', 'k_ext', 'ec_satp', 'ec_12pre')
  x.na = bpht.h.lst[which(!bpht.h.lst %in% names(bpht.hor))]
  if(length(x.na)>0){ for(i in x.na){ bpht.hor[,i] = NA } }
  chemsprops.bpht = bpht.hor[,bpht.h.lst]
  chemsprops.bpht$source_db = "Croatian_Soil_Pedon"
  chemsprops.bpht$confidence_degree = 1
  chemsprops.bpht$project_url = "http://www.haop.hr/"
  chemsprops.bpht$citation_url = "https://books.google.nl/books?id=k_a2MgAACAAJ"
  chemsprops.bpht = complete.vars(chemsprops.bpht, sel = c("oc","MG","PH1","k_ext"))
}
dim(chemsprops.bpht)
#> [1] 5746   36

5.3.0.33 Remnant native SOC database

if(!exists("chemsprops.RemnantSOC")){
  rem.hor <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/INT/WHRC_remnant_SOC/remnant+native+SOC+database+for+release.xlsx", sheet = 3)
  rem.site <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/INT/WHRC_remnant_SOC/remnant+native+SOC+database+for+release.xlsx", sheet = 2)
  rem.ref <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/INT/WHRC_remnant_SOC/remnant+native+SOC+database+for+release.xlsx", sheet = 4)
  rem.site = plyr::join(rem.site, rem.ref[,c("Source.No.","DOI","Sample_year")], by=c("Source.No."))
  rem.site$Site = rem.site$Site.ID
  rem.horA = plyr::join(rem.hor, rem.site, by=c("Site"))
  rem.horA$hzn_top = rem.horA$'U_depth.(m)'*100
  rem.horA$hzn_bot = rem.horA$'L_depth.(m)'*100
  rem.horA$db_od = ifelse(is.na(as.numeric(rem.horA$'measured.BD.(Mg/m3)')), as.numeric(rem.horA$'estimated.BD.(Mg/m3)'), as.numeric(rem.horA$'measured.BD.(Mg/m3)'))
  rem.horA$oc_d = signif(rem.horA$'OC.(g/kg)' * rem.horA$db_od, 3)
  #summary(rem.horA$oc_d)
  rem.col = c("Source.No.", "Site", "Sample_year", "Longitude", "Latitude", "labsampnum", "layer_sequence", "hzn_top", "hzn_bot", "hzn_desgn", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "OC.(g/kg)", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = rem.col[which(!rem.col %in% names(rem.horA))]
  if(length(x.na)>0){ for(i in x.na){ rem.horA[,i] = NA } }
  chemsprops.RemnantSOC = rem.horA[,rem.col]
  chemsprops.RemnantSOC$source_db = "WHRC_remnant_SOC"
  chemsprops.RemnantSOC$confidence_degree = 8
  chemsprops.RemnantSOC$project_url = "https://www.woodwellclimate.org/research-area/carbon/"
  chemsprops.RemnantSOC$citation_url = "http://dx.doi.org/10.1073/pnas.1706103114"
  chemsprops.RemnantSOC = complete.vars(chemsprops.RemnantSOC, sel = c("OC.(g/kg)","oc_d"), coords = c("Longitude", "Latitude"))  
}
dim(chemsprops.RemnantSOC)
#> [1] 1604   36

5.3.0.34 Soil Health DB

Note: some information is available about column names (https://www.nature.com/articles/s41597-020-0356-3/tables/3) but detailed explanation is missing.

if(!exists("chemsprops.SoilHealthDB")){
  shdb.hor <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/INT/SoilHealthDB/SoilHealthDB_V2.xlsx", sheet = 1, na.strings = c("NA", "NotAvailable", "Not-available"))
  #summary(as.factor(shdb.hor$SamplingDepth))
  shdb.hor$hzn_top = as.numeric(sapply(shdb.hor$SamplingDepth, function(i){ strsplit(i, "-to-")[[1]][1] }))
  shdb.hor$hzn_bot = as.numeric(sapply(shdb.hor$SamplingDepth, function(i){ strsplit(i, "-to-")[[1]][2] }))
  shdb.hor$hzn_top = ifelse(is.na(shdb.hor$hzn_top), 0, shdb.hor$hzn_top)
  shdb.hor$hzn_bot = ifelse(is.na(shdb.hor$hzn_bot), 15, shdb.hor$hzn_bot)
  shdb.hor$oc = as.numeric(shdb.hor$BackgroundSOC) * 10
  shdb.hor$oc_d = signif(shdb.hor$oc * shdb.hor$SoilBD, 3)
  for(j in c("ClayPerc", "SiltPerc", "SandPerc", "SoilpH")){   shdb.hor[,j]  <- as.numeric(shdb.hor[,j])   }
  #summary(shdb.hor$oc_d)
  shdb.col = c("StudyID", "ExperimentID", "SamplingYear", "Longitude", "Latitude", "labsampnum", "layer_sequence", "hzn_top", "hzn_bot", "hzn_desgn", "Texture", "ClayPerc", "SiltPerc", "SandPerc", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "SoilpH", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "wpg2", "SoilBD", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = shdb.col[which(!shdb.col %in% names(shdb.hor))]
  if(length(x.na)>0){ for(i in x.na){ shdb.hor[,i] = NA } }
  chemsprops.SoilHealthDB = shdb.hor[,shdb.col]
  chemsprops.SoilHealthDB$source_db = "SoilHealthDB"
  chemsprops.SoilHealthDB$confidence_degree = 8
  chemsprops.SoilHealthDB$project_url = "https://github.com/jinshijian/SoilHealthDB"
  chemsprops.SoilHealthDB$citation_url = "https://doi.org/10.1038/s41597-020-0356-3"
  chemsprops.SoilHealthDB = complete.vars(chemsprops.SoilHealthDB, sel = c("ClayPerc", "SoilpH", "oc"), coords = c("Longitude", "Latitude"))  
}
dim(chemsprops.SoilHealthDB)
#> [1] 120  36

5.3.0.35 Global Harmonized Dataset of SOC change under perennial crops

Note: Many missing years for PREVIOUS SOC AND SOIL CHARACTERISTICS.

if(!exists("chemsprops.SOCPDB")){
  library("readxl")
  socpdb <- readxl::read_excel("/mnt/diskstation/data/Soil_points/INT/SOCPDB/SOC_perennials_DATABASE.xls", skip=1, sheet = 1)
  #names(socpdb)
  #summary(as.numeric(socpdb$year_measure))
  socpdb$year_measure = ifelse(is.na(as.numeric(socpdb$year_measure)), as.numeric(socpdb$yearPpub)-5, as.numeric(socpdb$year_measure))
  socpdb$year_measure = ifelse(socpdb$year_measure<1960, NA, socpdb$year_measure)
  socpdb$depth_current = socpdb$soil_to_cm_current - socpdb$soil_from_cm_current
  socpdb = socpdb[socpdb$depth_current>5,]
  socpdb$SOC_g_kg_current = ifelse(is.na(as.numeric(socpdb$SOC_g_kg_current)), as.numeric(socpdb$SOC_Mg_ha_current) / (socpdb$depth_current/100 * as.numeric(socpdb$bulk_density_Mg_m3_current) * 1000) * 10, as.numeric(socpdb$SOC_g_kg_current))
  socpdb$depth_previous = socpdb$soil_to_cm_previous - socpdb$soil_from_cm_previous
  socpdb$SOC_g_kg_previous = ifelse(is.na(as.numeric(socpdb$SOC_g_kg_previous)), as.numeric(socpdb$SOC_Mg_ha_previous) / (socpdb$depth_previous/100 * as.numeric(socpdb$Bulkdensity_previous) * 1000) * 10, as.numeric(socpdb$SOC_g_kg_previous))
  hor.b = which(names(socpdb) %in% c("ID", "plotID", "Longitud", "Latitud", "year_measure", "years_since_luc", "USDA", "original_source"))
  socpdb1 = socpdb[,c(hor.b, grep("_current", names(socpdb)))]
  #summary(as.numeric(socpdb1$years_since_luc))
  ## 10 yrs median
  socpdb1$site_obsdate = socpdb1$year_measure
  socpdb2 = socpdb[,c(hor.b, grep("_previous", names(socpdb)))]
  socpdb2$site_obsdate = socpdb2$year_measure - ifelse(is.na(as.numeric(socpdb2$years_since_luc)), 10, as.numeric(socpdb2$years_since_luc))
  colnames(socpdb2) <- sub("_previous", "_current", colnames(socpdb2))
  nm.socpdb = c("site_key", "usiteid", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "hzn_top", "hzn_bot", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "ph_h2o", "db_od")
  sel.socdpb1 = c("ID", "original_source", "site_obsdate", "Longitud", "Latitud", "soil_from_cm_current", "soil_to_cm_current", "%clay_current", "%silt_current", "%sand_current", "SOC_g_kg_current", "ph_current", "bulk_density_Mg_m3_current")
  sel.socdpb2 = c("ID", "original_source", "site_obsdate", "Longitud", "Latitud", "soil_from_cm_current", "soil_to_cm_current", "%clay_current", "%silt_current", "%sand_current", "SOC_g_kg_current", "ph_current", "Bulkdensity_current")   
  socpdbALL = as.data.frame(dplyr::bind_rows(lapply(list(socpdb1[,sel.socdpb1], socpdb2[,sel.socdpb2]), function(i){ dplyr::mutate_all(setNames(i, nm.socpdb), as.character) })))
  for(j in 1:ncol(socpdbALL)){ socpdbALL[,j] <- as.numeric(socpdbALL[,j]) }
  #summary(socpdbALL$oc) ## mean = 15
  #summary(socpdbALL$db_od)
  #summary(socpdbALL$ph_h2o)
  socpdbALL$oc_d = signif(socpdbALL$oc * socpdbALL$db_od, 3)
  #summary(socpdbALL$oc_d)
  x.na = col.names[which(!col.names %in% names(socpdbALL))]
  if(length(x.na)>0){ for(i in x.na){ socpdbALL[,i] = NA } }
  chemsprops.SOCPDB <- socpdbALL[,col.names]
  chemsprops.SOCPDB$source_db = "SOCPDB"
  chemsprops.SOCPDB$confidence_degree = 5
  chemsprops.SOCPDB$project_url = "https://africap.info/"
  chemsprops.SOCPDB$citation_url = "https://doi.org/10.1038/s41597-019-0062-1"
  chemsprops.SOCPDB = complete.vars(chemsprops.SOCPDB, sel = c("oc","ph_h2o","clay_tot_psa"))
}
dim(chemsprops.SOCPDB)
#> [1] 1526   36

5.3.0.36 Stocks of organic carbon in German agricultural soils (BZE_LW)

Note: For protection of data privacy, the coordinate was randomly generated within a radius of 4-km around the planned sampling point. This data is hence probably not suitable for spatial analysis, predictive soil mapping.

if(!exists("chemsprops.BZE_LW")){
  site.de <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/Germany/SITE.xlsx", sheet = 1)
  site.de$site_obsdate = format(as.Date(paste0("01-", site.de$Sampling_month, "-", site.de$Sampling_year), format="%d-%m-%Y"), "%Y-%m-%d")
  site.de.xy = site.de[,c("PointID","xcoord","ycoord")]
  ## 3104
  coordinates(site.de.xy) <- ~xcoord+ycoord
  proj4string(site.de.xy) <- CRS("+proj=utm +zone=32 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
  site.de.ll <- data.frame(spTransform(site.de.xy, CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")))
  site.de$longitude_decimal_degrees = site.de.ll[,2]
  site.de$latitude_decimal_degrees = site.de.ll[,3]
  hor.de <- openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/Germany/LABORATORY_DATA.xlsx", sheet = 1)
  #hor.de = plyr::join(openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/Germany/LABORATORY_DATA.xlsx", sheet = 1), openxlsx::read.xlsx("/mnt/diskstation/data/Soil_points/Germany/HORIZON_DATA.xlsx", sheet = 1), by="PointID")
  ## 17,189 rows
  horALL.de = plyr::join(hor.de, site.de, by="PointID")
  ## Sand content [Mass-%]; grain size 63-2000µm (DIN ISO 11277)
  horALL.de$sand_tot_psa <- horALL.de$gS + horALL.de$mS + horALL.de$fS + 0.2 * horALL.de$gU
  horALL.de$silt_tot_psa <- horALL.de$fU + horALL.de$mU + 0.8 * horALL.de$gU
  ## Convert millisiemens/meter [mS/m] to microsiemens/centimeter [μS/cm, uS/cm]
  horALL.de$ec_satp = horALL.de$EC_H2O / 10
  hor.sel.de <- c("PointID", "Main.soil.type", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "labsampnum", "layer_sequence", "Layer.upper.limit", "Layer.lower.limit", "hzn_desgn", "Soil.texture.class", "Clay", "silt_tot_psa", "sand_tot_psa", "TOC", "oc_d", "TC", "TN", "ph_kcl", "pH_H2O", "pH_CaCl2", "cec_sum", "cec_nh4", "ecec", "Rock.fragment.fraction", "BD_FS", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  #summary(horALL.de$TOC) ## mean = 12.3
  #summary(horALL.de$BD_FS) ## mean = 1.41
  #summary(horALL.de$pH_H2O)
  horALL.de$oc_d = signif(horALL.de$TOC * horALL.de$BD_FS * (1-horALL.de$Rock.fragment.fraction/100), 3)
  #summary(horALL.de$oc_d)
  x.na = hor.sel.de[which(!hor.sel.de %in% names(horALL.de))]
  if(length(x.na)>0){ for(i in x.na){ horALL.de[,i] = NA } }
  chemsprops.BZE_LW <- horALL.de[,hor.sel.de]
  chemsprops.BZE_LW$source_db = "BZE_LW"
  chemsprops.BZE_LW$confidence_degree = 3
  chemsprops.BZE_LW$project_url = "https://www.thuenen.de/de/ak/"
  chemsprops.BZE_LW$citation_url = "https://doi.org/10.1002/jpln.202000113"
  chemsprops.BZE_LW = complete.vars(chemsprops.BZE_LW, sel = c("TOC", "pH_H2O", "Clay"))
}
dim(chemsprops.BZE_LW)
#> [1] 17187    36

5.3.0.37 AARDEWERK-Vlaanderen-2010

if(!exists("chemsprops.Vlaanderen2010")){
  site.vl <- read.csv("/mnt/diskstation/data/Soil_points/Belgium/Vlaanderen/Aardewerk-Vlaanderen-2010_Profiel.csv")
  site.vl$site_obsdate = format(as.Date(sapply(site.vl$Profilering_Datum, function(i){strsplit(i, " ")[[1]][1]}), format="%d-%m-%Y"), "%Y-%m-%d")
  site.vl.xy = site.vl[,c("ID","Coordinaat_Lambert72_X","Coordinaat_Lambert72_Y")]
  ## 7020
  site.vl.xy = site.vl.xy[complete.cases(site.vl.xy),]
  coordinates(site.vl.xy) <- ~Coordinaat_Lambert72_X+Coordinaat_Lambert72_Y
  proj4string(site.vl.xy) <- CRS("+init=epsg:31300")
  site.vl.ll <- data.frame(spTransform(site.vl.xy, CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")))
  site.vl$longitude_decimal_degrees = join(site.vl["ID"], site.vl.ll, by="ID")$Coordinaat_Lambert72_X
  site.vl$latitude_decimal_degrees = join(site.vl["ID"], site.vl.ll, by="ID")$Coordinaat_Lambert72_Y
  site.vl$Profiel_ID = site.vl$ID
  hor.vl <- read.csv("/mnt/diskstation/data/Soil_points/Belgium/Vlaanderen/Aardewerk-Vlaanderen-2010_Horizont.csv")
  ## 42,529 rows
  horALL.vl = plyr::join(hor.vl, site.vl, by="Profiel_ID")
  horALL.vl$oc = horALL.vl$Humus*10 /1.724
  summary(horALL.vl$oc) ## mean = 7.8
  #summary(horALL.vl$pH_H2O)
  horALL.vl$hzn_top <- rowSums(horALL.vl[,c("Diepte_grens_boven1", "Diepte_grens_boven2")], na.rm=TRUE)/2
  horALL.vl$hzn_bot <- rowSums(horALL.vl[,c("Diepte_grens_onder1","Diepte_grens_onder2")], na.rm=TRUE)/2
  horALL.vl$sand_tot_psa <- horALL.vl$T50_100 + horALL.vl$T100_200 + horALL.vl$T200_500 + horALL.vl$T500_1000 + horALL.vl$T1000_2000
  horALL.vl$silt_tot_psa <- horALL.vl$T2_10 + horALL.vl$T10_20 + horALL.vl$T20_50
  horALL.vl$tex_psda = paste0(horALL.vl$HorizontTextuur_code1, horALL.vl$HorizontTextuur_code2)
  ## some corrupt coordinates
  horALL.vl <- horALL.vl[horALL.vl$latitude_decimal_degrees > 50.6,]
  hor.sel.vl <- c("Profiel_ID", "Bodemgroep", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees", "labsampnum", "Hor_nr", "hzn_top", "hzn_bot", "Naam", "tex_psda", "T0_2", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "pH_KCl", "pH_H2O", "ph_cacl2", "Sorptiecapaciteit_Totaal", "cec_nh4", "ecec", "Tgroter_dan_2000", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = hor.sel.vl[which(!hor.sel.vl %in% names(horALL.vl))]
  if(length(x.na)>0){ for(i in x.na){ horALL.vl[,i] = NA } }
  chemsprops.Vlaanderen <- horALL.vl[,hor.sel.vl]
  chemsprops.Vlaanderen$source_db = "Vlaanderen"
  chemsprops.Vlaanderen$confidence_degree = 2
  chemsprops.Vlaanderen$project_url = "https://www.dov.vlaanderen.be"
  chemsprops.Vlaanderen$citation_url = "https://doi.org/10.1016/j.geoderma.2015.04.001"
  chemsprops.Vlaanderen = complete.vars(chemsprops.Vlaanderen, sel = c("oc", "pH_H2O", "T0_2"))
}
dim(chemsprops.Vlaanderen)
#> [1] 41310    36

5.3.0.38 Chilean Soil Organic Carbon database

if(!exists("chemsprops.CHLSOC")){
  chl.hor <- read.csv("/mnt/diskstation/data/Soil_points/Chile/CHLSOC/CHLSOC_v1.0.csv", stringsAsFactors = FALSE)
  #summary(chl.hor$oc)
  chl.hor$oc = chl.hor$oc*10
  #summary(chl.hor$bd)
  chl.hor$oc_d = signif(chl.hor$oc  * chl.hor$bd * (100 - ifelse(is.na(chl.hor$crf), 0, chl.hor$crf))/100, 3)
  #summary(chl.hor$oc_d)
  chl.col = c("ProfileID", "usiteid", "year", "long", "lat", "labsampnum", "layer_sequence", "top", "bottom", "hzn_desgn", "tex_psda", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", "cec_nh4", "ecec", "crf", "bd", "ca_ext", "mg_ext", "na_ext", "k_ext", "ec_satp", "ec_12pre")
  x.na = chl.col[which(!chl.col %in% names(chl.hor))]
  if(length(x.na)>0){ for(i in x.na){ chl.hor[,i] = NA } }
  chemsprops.CHLSOC = chl.hor[,chl.col]
  chemsprops.CHLSOC$source_db = "Chilean_SOCDB"
  chemsprops.CHLSOC$confidence_degree = 4
  chemsprops.CHLSOC$project_url = "https://doi.org/10.17605/OSF.IO/NMYS3"
  chemsprops.CHLSOC$citation_url = "https://doi.org/10.5194/essd-12-457-2020"
  chemsprops.CHLSOC = complete.vars(chemsprops.CHLSOC, sel = c("oc", "bd"), coords = c("long", "lat"))
}
dim(chemsprops.CHLSOC)
#> [1] 16371    36

5.3.0.39 Scotland (NSIS_1)

if(!exists("chemsprops.ScotlandNSIS1")){
  sco.xy = read.csv("/mnt/diskstation/data/Soil_points/Scotland/NSIS_10km.csv")
  coordinates(sco.xy) = ~ easting + northing
  proj4string(sco.xy) = "EPSG:27700"
  sco.ll = as.data.frame(spTransform(sco.xy, CRS("EPSG:4326")))
  sco.ll$site_obsdate = as.numeric(sapply(sco.ll$profile_da, function(x){substr(x, nchar(x)-3, nchar(x))}))
  #hist(sco.ll$site_obsdate[sco.ll$site_obsdate>1000])
  ## no points after 1990!!
  #summary(sco.ll$exch_k)
  sco.in.name = c("profile_id", "site_obsdate", "easting", "northing", "horz_top", "horz_botto", 
                  "horz_symb", "sample_id", "texture_ps",
                  "sand_int", "silt_int", "clay", "carbon", "nitrogen", "ph_h2o", "exch_ca", 
                  "exch_mg", "exch_na", "exch_k", "sum_cation")
  #sco.in.name[which(!sco.in.name %in% names(sco.ll))]
  sco.x = as.data.frame(sco.ll[,sco.in.name])
  #sco.x = sco.x[!sco.x$sample_id==0,]
  #summary(sco.x$carbon)
  sco.out.name = c("usiteid", "site_obsdate", "longitude_decimal_degrees", "latitude_decimal_degrees",
                     "hzn_bot", "hzn_top", "hzn_desgn", "labsampnum", "tex_psda", "sand_tot_psa", "silt_tot_psa", 
                      "clay_tot_psa", "oc", "n_tot", "ph_h2o", "ca_ext",
                      "mg_ext", "na_ext", "k_ext", "cec_sum")
  ## translate values
  sco.fun.lst = as.list(rep("as.numeric(x)*1", length(sco.in.name)))
  sco.fun.lst[[which(sco.in.name=="profile_id")]] = "paste(x)"
  sco.fun.lst[[which(sco.in.name=="exch_ca")]] = "as.numeric(x)*200"
  sco.fun.lst[[which(sco.in.name=="exch_mg")]] = "as.numeric(x)*121"
  sco.fun.lst[[which(sco.in.name=="exch_k")]] = "as.numeric(x)*391"
  sco.fun.lst[[which(sco.in.name=="exch_na")]] = "as.numeric(x)*230"
  sco.fun.lst[[which(sco.in.name=="carbon")]] = "as.numeric(x)*10"
  sco.fun.lst[[which(sco.in.name=="nitrogen")]] = "as.numeric(x)*10"
  ## save translation rules:
  write.csv(data.frame(sco.in.name, sco.out.name, unlist(sco.fun.lst)), "scotland_soilab_transvalues.csv")
  sco.soil = transvalues(sco.x, sco.out.name, sco.in.name, sco.fun.lst)
  x.na = col.names[which(!col.names %in% names(sco.soil))]
  if(length(x.na)>0){ for(i in x.na){ sco.soil[,i] = NA } }
  chemsprops.ScotlandNSIS1 = sco.soil[,col.names]
  chemsprops.ScotlandNSIS1$source_db = "ScotlandNSIS1"
  chemsprops.ScotlandNSIS1$confidence_degree = 2
  chemsprops.ScotlandNSIS1$project_url = "http://soils.environment.gov.scot/"
  chemsprops.ScotlandNSIS1$citation_url = "https://doi.org/10.5281/zenodo.4650230"
  chemsprops.ScotlandNSIS1 = complete.vars(chemsprops.ScotlandNSIS1, sel = c("oc", "ph_h2o", "clay_tot_psa"))
}
dim(chemsprops.ScotlandNSIS1)
#> [1] 2977   36

5.3.0.40 Ecoforest map of Quebec, Canada

if(!exists("chemsprops.QuebecTEX")){
  que.xy = read.csv("/mnt/diskstation/data/Soil_points/Canada/Quebec/RawData.csv")
  #summary(as.factor(que.xy$Horizon))
  ## horizon depths were not measured - we assume 15-30 and 30-80
  que.xy$hzn_top = ifelse(que.xy$Horizon=="B", 15, 30)
  que.xy$hzn_bot = ifelse(que.xy$Horizon=="B", 30, 80)
  que.xy$site_key = que.xy$usiteid
  que.xy$latitude_decimal_degrees = que.xy$Latitude
  que.xy$longitude_decimal_degrees = que.xy$Longitude
  que.xy$hzn_desgn = que.xy$Horizon
  que.xy$sand_tot_psa = que.xy$PC_Sand
  que.xy$silt_tot_psa = que.xy$PC_Silt
  que.xy$clay_tot_psa = que.xy$PC_Clay
  x.na = col.names[which(!col.names %in% names(que.xy))]
  if(length(x.na)>0){ for(i in x.na){ que.xy[,i] = NA } }
  chemsprops.QuebecTEX = que.xy[,col.names]
  chemsprops.QuebecTEX$source_db = "QuebecTEX"
  chemsprops.QuebecTEX$confidence_degree = 4
  chemsprops.QuebecTEX$project_url = ""
  chemsprops.QuebecTEX$citation_url = "https://doi.org/10.7717/peerj.11685"
  chemsprops.QuebecTEX = complete.vars(chemsprops.QuebecTEX, sel = c("clay_tot_psa"))
}
dim(chemsprops.QuebecTEX)
#> [1] 26648    36

5.3.0.41 Pseudo-observations

  • Pseudo-observations using simulated points (world deserts)
if(!exists("chemsprops.SIM")){
  ## 0 soil organic carbon + 98% sand content (deserts)
  load("deserts.pnt.rda")
  nut.sim <- as.data.frame(spTransform(deserts.pnt, CRS("+proj=longlat +datum=WGS84")))
  nut.sim[,1] <- NULL
  nut.sim <- plyr::rename(nut.sim, c("x"="longitude_decimal_degrees", "y"="latitude_decimal_degrees"))
  nr = nrow(nut.sim)
  nut.sim$site_key <- paste("Simulated", 1:nr, sep="_")
  ## insert zeros for all nutrients except for the once we are not sure:
  ## http://www.decodedscience.org/chemistry-sahara-sand-elements-dunes/45828
  sim.vars = c("oc", "oc_d", "c_tot", "n_tot", "ecec", "clay_tot_psa", "mg_ext", "k_ext")
  nut.sim[,sim.vars] <- 0
  nut.sim$silt_tot_psa = 2
  nut.sim$sand_tot_psa = 98
  nut.sim$hzn_top = 0
  nut.sim$hzn_bot = 30
  nut.sim$db_od = 1.55
  nut.sim2 = nut.sim
  nut.sim2$silt_tot_psa = 1
  nut.sim2$sand_tot_psa = 99
  nut.sim2$hzn_top = 30
  nut.sim2$hzn_bot = 60
  nut.sim2$db_od = 1.6
  nut.simA = rbind(nut.sim, nut.sim2)
  #str(nut.simA)
  nut.simA$source_db = "Simulated"
  nut.simA$confidence_degree = 10
  x.na = col.names[which(!col.names %in% names(nut.simA))]
  if(length(x.na)>0){ for(i in x.na){ nut.simA[,i] = NA } }
  chemsprops.SIM = nut.simA[,col.names]
  chemsprops.SIM$project_url = "https://gitlab.com/openlandmap/"
  chemsprops.SIM$citation_url = "https://gitlab.com/openlandmap/compiled-ess-point-data-sets/"
}
dim(chemsprops.SIM)
#> [1] 718  36

Other potential large soil profile DBs of interest:

5.4 alt text Bind all datasets

5.4.0.1 Bind and clean-up

ls(pattern=glob2rx("chemsprops.*"))
#>  [1] "chemsprops.AfSIS1"        "chemsprops.AfSPDB"       
#>  [3] "chemsprops.Alaska"        "chemsprops.bpht"         
#>  [5] "chemsprops.BZE_LW"        "chemsprops.CAPERM"       
#>  [7] "chemsprops.CHLSOC"        "chemsprops.CNSOT"        
#>  [9] "chemsprops.CostaRica"     "chemsprops.CUFS"         
#> [11] "chemsprops.EGRPR"         "chemsprops.FEBR"         
#> [13] "chemsprops.FIADB"         "chemsprops.FRED"         
#> [15] "chemsprops.GEMAS"         "chemsprops.GROOT"        
#> [17] "chemsprops.IRANSPDB"      "chemsprops.ISCND"        
#> [19] "chemsprops.LandPKS"       "chemsprops.LUCAS"        
#> [21] "chemsprops.LUCAS2"        "chemsprops.Mangroves"    
#> [23] "chemsprops.NAMSOTER"      "chemsprops.NatSoil"      
#> [25] "chemsprops.NCSCD"         "chemsprops.NCSS"         
#> [27] "chemsprops.NPDB"          "chemsprops.Peatlands"    
#> [29] "chemsprops.PRONASOLOS"    "chemsprops.QuebecTEX"    
#> [31] "chemsprops.RaCA"          "chemsprops.RemnantSOC"   
#> [33] "chemsprops.ScotlandNSIS1" "chemsprops.SIM"          
#> [35] "chemsprops.SISLAC"        "chemsprops.SOCPDB"       
#> [37] "chemsprops.SoDaH"         "chemsprops.SoilHealthDB" 
#> [39] "chemsprops.SRDB"          "chemsprops.USGS.NGS"     
#> [41] "chemsprops.Vlaanderen"    "chemsprops.WISE"
tot_sprops = dplyr::bind_rows(lapply(ls(pattern=glob2rx("chemsprops.*")), function(i){ mutate_all(setNames(get(i), col.names), as.character) }))
## convert to numeric:
for(j in c("longitude_decimal_degrees", "latitude_decimal_degrees", "layer_sequence", 
           "hzn_top", "hzn_bot", "clay_tot_psa", "silt_tot_psa", "sand_tot_psa", 
           "oc", "oc_d", "c_tot", "n_tot", "ph_kcl", "ph_h2o", "ph_cacl2", "cec_sum", 
           "cec_nh4", "ecec", "wpg2", "db_od", "ca_ext", "mg_ext", "na_ext", "k_ext", 
           "ec_satp", "ec_12pre")){
  tot_sprops[,j] = as.numeric(tot_sprops[,j])
}
#> Warning: NAs introduced by coercion

#> Warning: NAs introduced by coercion

#> Warning: NAs introduced by coercion

#> Warning: NAs introduced by coercion

#> Warning: NAs introduced by coercion
#head(tot_sprops)

Clean up typos and physically impossible values:

tex.rm = rowSums(tot_sprops[,c("clay_tot_psa", "sand_tot_psa", "silt_tot_psa")])
summary(tex.rm)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
#> -2997.00   100.00   100.00    99.39   100.00   500.00   274336
for(j in c("clay_tot_psa", "sand_tot_psa", "silt_tot_psa", "wpg2")){
  tot_sprops[,j] = ifelse(tot_sprops[,j]>100|tot_sprops[,j]<0, NA, tot_sprops[,j])
  tot_sprops[,j] = ifelse(tex.rm<99|is.na(tex.rm)|tex.rm>101, NA, tot_sprops[,j])
}
for(j in c("ph_h2o","ph_kcl","ph_cacl2")){
  tot_sprops[,j] = ifelse(tot_sprops[,j]>12|tot_sprops[,j]<2, NA, tot_sprops[,j])
}
#hist(tot_sprops$db_od)
for(j in c("db_od")){
  tot_sprops[,j] = ifelse(tot_sprops[,j]>2.4|tot_sprops[,j]<0.05, NA, tot_sprops[,j])
}
#hist(tot_sprops$oc)
for(j in c("oc")){
  tot_sprops[,j] = ifelse(tot_sprops[,j]>800|tot_sprops[,j]<0, NA, tot_sprops[,j])
}

Fill-in the missing depths:

## soil layer depth (middle)
tot_sprops$hzn_depth = tot_sprops$hzn_top + (tot_sprops$hzn_bot-tot_sprops$hzn_top)/2
summary(is.na(tot_sprops$hzn_depth))
#>    Mode   FALSE    TRUE 
#> logical  766689    5465
## Note: large number of horizons without a depth
tot_sprops = tot_sprops[!is.na(tot_sprops$hzn_depth),]
#quantile(tot_sprops$hzn_depth, c(0.01,0.99), na.rm=TRUE)
tot_sprops$hzn_depth = ifelse(tot_sprops$hzn_depth<0, 10, ifelse(tot_sprops$hzn_depth>800, 800, tot_sprops$hzn_depth))
#hist(tot_sprops$hzn_depth, breaks=45)

Summary number of points per data source:

summary(as.factor(tot_sprops$source_db))
#>              AfSIS1              AfSPDB     Alaska_interior              BZE_LW 
#>                4162               60277                3880               17187 
#>         Canada_CUFS         Canada_NPDB    Canada_subarctic       Chilean_SOCDB 
#>               15162               14900                1180               16358 
#>         China_SOTER               CIFOR           CostaRica Croatian_Soil_Pedon 
#>                5105                 561                2029                5746 
#>       CSIRO_NatSoil                FEBR               FIADB                FRED 
#>               70688                7804               23208                 625 
#>          GEMAS_2009               GROOT           Iran_SPDB               ISCND 
#>                4131                 718                4677                3977 
#>          ISRIC_WISE             LandPKS          LUCAS_2009          LUCAS_2015 
#>               23278               41644               21272               21859 
#>         MangrovesDB            NAMSOTER               NCSCD          PRONASOLOS 
#>                7733                2941                7082               31655 
#>           QuebecTEX            RaCA2016        Russia_EGRPR       ScotlandNSIS1 
#>               26648               53663                4437                2977 
#>           Simulated              SISLAC              SOCPDB               SoDaH 
#>                 718               49416                1526               17766 
#>        SoilHealthDB                SRDB           USDA_NCSS            USGS.NGS 
#>                 120                1596              135671                9398 
#>          Vlaanderen    WHRC_remnant_SOC 
#>               41310                1604

Add unique row identifier

tot_sprops$uuid = openssl::md5(make.unique(paste("OpenLandMap", tot_sprops$site_key, tot_sprops$layer_sequence, sep="_")))

and unique location based on the Open Location Code:

tot_sprops$olc_id = olctools::encode_olc(tot_sprops$latitude_decimal_degrees, tot_sprops$longitude_decimal_degrees, 11)
length(levels(as.factor(tot_sprops$olc_id)))
#> [1] 205687
## 205,620
tot_sprops.pnts = tot_sprops[!duplicated(tot_sprops$olc_id),c("site_key", "source_db", "longitude_decimal_degrees", "latitude_decimal_degrees", "site_obsdate", "olc_id", "project_url", "citation_url")]
coordinates(tot_sprops.pnts) <- ~ longitude_decimal_degrees + latitude_decimal_degrees
proj4string(tot_sprops.pnts) <- "EPSG:4326"

Remove points falling in the sea or similar:

if(!exists("ov.sprops")){
  #mask = terra::rast("./layers1km/lcv_landmask_esacci.lc.l4_c_1km_s0..0cm_2000..2015_v1.0.tif")
  mask = terra::rast("/mnt/diskstation/data/LandGIS/layers250m/lcv_landmask_esacci.lc.l4_c_250m_s0..0cm_2000..2015_v1.0.tif")
  ov.sprops <- terra::extract(mask, terra::vect(tot_sprops.pnts)) ## TAKES 2 mins
  summary(as.factor(ov.sprops[,2]))
  if(sum(is.na(ov.sprops[,2]))>0 | sum(ov.sprops[,2]==2)>0){
    rem.lst = which(is.na(ov.sprops[,2]) | ov.sprops[,2]==2 | ov.sprops[,2]==4)
    rem.sp = tot_sprops.pnts$site_key[rem.lst]
    tot_sprops.pnts = tot_sprops.pnts[-rem.lst,]
  }
}
## final number of unique spatial locations:
nrow(tot_sprops.pnts)
#> [1] 205687
## 203,107

5.4.0.2 Histogram plots

Have in mind that some datasets only represent top-soil (e.g. LUCAS) while other cover the whole soil depth, hence higher mean values for some regions (Europe) should be considered within the context of diverse soil depths.

library(ggplot2)
ggplot(tot_sprops, aes(x=source_db, y=log1p(oc))) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning: Removed 195328 rows containing non-finite values (stat_boxplot).
sel.db = c("ISRIC_WISE", "Canada_CUFS", "USDA_NCSS", "AfSPDB", "Canada_NPDB", "FIADB", "PRONASOLOS", "CSIRO_NatSoil")
openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% sel.db,], x = "hzn_depth", y = "oc", method = "hexbin", 
                     col = "increment", type = "source_db", log.x = TRUE, log.y=TRUE, ylab="SOC wprm", xlab="depth in cm")

Note: FIADB includes litter and the soil samples are taken at fixed depths. Canada dataset also shows SOC content for peatlands.

ggplot(tot_sprops, aes(x=source_db, y=db_od)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning: Removed 537198 rows containing non-finite values (stat_boxplot).
openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% sel.db,], x = "oc", y = "db_od", method = "hexbin", 
                     col = "increment", type = "source_db", log.x=TRUE, ylab="Bulk density", xlab="SOC wprm")
ggplot(tot_sprops, aes(x=source_db, y=log1p(oc_d))) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning in log1p(oc_d): NaNs produced

#> Warning in log1p(oc_d): NaNs produced
#> Warning: Removed 561232 rows containing non-finite values (stat_boxplot).
sel.db0 = c("ISRIC_WISE", "Canada_CUFS", "USDA_NCSS", "AfSPDB", "PRONASOLOS", "CSIRO_NatSoil")
openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% sel.db0,], x = "oc", y = "oc_d", method = "hexbin", 
                     col = "increment", type = "source_db", log.x=TRUE, log.y=TRUE, xlab="SOC wprm", ylab="SOC kg/m3")

Note: SOC (%) and SOC density (kg/m3) are almost linear relationship (curving toward fixed value), except for organic soils (especially litter) where relationship is slightly shifted.

openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% sel.db0,], x = "oc", y = "n_tot", method = "hexbin", 
                     col = "increment", type = "source_db", log.x=TRUE, log.y=TRUE, xlab="SOC wprm", ylab="N total wprm")
ggplot(tot_sprops, aes(x=source_db, y=ph_h2o)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning: Removed 270562 rows containing non-finite values (stat_boxplot).
openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% c("ISRIC_WISE", "USDA_NCSS", "AfSPDB"),], x = "ph_kcl", y = "ph_h2o", method = "hexbin", 
                     col = "increment", type = "source_db", xlab="soil pH KCl", ylab="soil pH H2O")
openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% sel.db0,], x = "hzn_depth", y = "ph_h2o", method = "hexbin", 
                     col = "increment", type = "source_db", log.x = TRUE, log.y=TRUE, ylab="soil pH H2O", xlab="depth in cm")

Note: there seems to be no apparent correlation between soil pH and soil depth.

openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% c("ISRIC_WISE", "USDA_NCSS", "AfSPDB", "PRONASOLOS"),], x = "ph_h2o", y = "cec_sum", method = "hexbin", 
                     col = "increment", type = "source_db", log.y=TRUE, ylab="CEC", xlab="soil pH H2O")
openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% sel.db0,], x = "ph_h2o", y = "oc", method = "hexbin",  
                     col = "increment", type = "source_db", log.y=TRUE, ylab="SOC wprm", xlab="soil pH H2O")
ggplot(tot_sprops, aes(x=source_db, y=clay_tot_psa)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning: Removed 279635 rows containing non-finite values (stat_boxplot).
openair::scatterPlot(tot_sprops[tot_sprops$source_db %in% sel.db0,], y = "clay_tot_psa", x = "sand_tot_psa", method = "hexbin",  
                     col = "increment", type = "source_db", ylab="clay %", xlab="sand %")
ggplot(tot_sprops, aes(x=source_db, y=log1p(cec_sum))) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning in log1p(cec_sum): NaNs produced

#> Warning in log1p(cec_sum): NaNs produced
#> Warning: Removed 514240 rows containing non-finite values (stat_boxplot).
ggplot(tot_sprops, aes(x=source_db, y=log1p(n_tot))) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning in log1p(n_tot): NaNs produced

#> Warning in log1p(n_tot): NaNs produced
#> Warning: Removed 446028 rows containing non-finite values (stat_boxplot).
ggplot(tot_sprops, aes(x=source_db, y=log1p(k_ext))) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning in log1p(k_ext): NaNs produced

#> Warning in log1p(k_ext): NaNs produced
#> Warning: Removed 523010 rows containing non-finite values (stat_boxplot).
ggplot(tot_sprops, aes(x=source_db, y=log1p(ec_satp))) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning: Removed 608509 rows containing non-finite values (stat_boxplot).
sprops_yrs = tot_sprops[!is.na(tot_sprops$site_obsdate),c("site_obsdate", "source_db", "oc")]
sprops_yrs$year = as.numeric(substr(x=sprops_yrs$site_obsdate, 1, 4))
#> Warning: NAs introduced by coercion
sprops_yrs$year = ifelse(sprops_yrs$year <1960, NA, ifelse(sprops_yrs$year>2024, NA, sprops_yrs$year))
ggplot(sprops_yrs, aes(x=source_db, y=year)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Warning: Removed 70644 rows containing non-finite values (stat_boxplot).
openair::scatterPlot(sprops_yrs[sprops_yrs$source_db %in% sel.db0,], y = "oc", x = "year", method = "hexbin",  
                     col = "increment", type = "source_db", log.y=TRUE, ylab="SOC wprm", xlab="Sampling year")

5.4.0.3 Convert to wide format

Add layer_sequence where missing since this is needed to be able to convert to wide format:

#summary(tot_sprops$layer_sequence)
tot_sprops$dsiteid = paste(tot_sprops$source_db, tot_sprops$site_key, tot_sprops$site_obsdate, sep="_")
if(!exists("l.s1")){
  library(dplyr)
  ## Note: takes >2 mins
  l.s1 <- tot_sprops[,c("dsiteid","hzn_depth")] %>% group_by(dsiteid) %>% mutate(layer_sequence.f = data.table::frank(hzn_depth, ties.method = "first"))
  tot_sprops$layer_sequence.f = ifelse(is.na(tot_sprops$layer_sequence), l.s1$layer_sequence.f, tot_sprops$layer_sequence)
  tot_sprops$layer_sequence.f = ifelse(tot_sprops$layer_sequence.f>6, 6, tot_sprops$layer_sequence.f)
}

Convert long table to wide table format so that each depth gets unique column (note: the most computational / time-consuming step usually):

if(!exists("tot_sprops.w")){
  library(data.table)
  tot_sprops.w = data.table::dcast( as.data.table(tot_sprops), 
                formula = olc_id ~ layer_sequence.f, 
                value.var = c("uuid", hor.names[-which(hor.names %in% c("site_key", "layer_sequence"))]), 
                ## "labsampnum", "hzn_desgn", "tex_psda" 
                #fun=function(x){ mean(x, na.rm=TRUE) }, 
                ## Note: does not work for characters
                fun=function(x){ x[1] },
                verbose = FALSE)
  ## remove "0" layers added automatically but containing no values
  tot_sprops.w = tot_sprops.w[,grep("*_0$", colnames(tot_sprops.w)):=NULL]
}
tot_sprops_w.pnts = tot_sprops.pnts
tot_sprops_w.pnts@data = plyr::join(tot_sprops.pnts@data, tot_sprops.w)
#> Joining by: olc_id

Write all soil profiles using a wide format:

sel.rm.pnts  <- tot_sprops_w.pnts$source_db=="LUCAS_2009" | tot_sprops_w.pnts$source_db=="LUCAS_2015" | tot_sprops_w.pnts$site_key %in% mng.rm | tot_sprops_w.pnts$site_key %in% rem.sp
out.gpkg = "./out/gpkg/sol_chem.pnts_horizons.gpkg"
#unlink(out.gpkg)
if(!file.exists(out.gpkg)){
  writeOGR(tot_sprops_w.pnts[!sel.rm.pnts,], "./out/gpkg/sol_chem.pnts_horizons.gpkg", "sol_chem.pnts_horizons", drive="GPKG")
}

5.4.0.4 Save RDS files

Remove points that are not allowed to be distributed publicly:

sel.rm  <- tot_sprops$source_db=="LUCAS_2009" | tot_sprops$source_db=="LUCAS_2015" | tot_sprops$site_key %in% mng.rm | tot_sprops$site_key %in% rem.sp
tot_sprops.s = tot_sprops[!sel.rm,] 

Plot in Goode Homolozine projection and save final objects:

if(!file.exists("./img/sol_chem.pnts_sites.png")){
  tot_sprops.pnts_sf <- st_as_sf(tot_sprops.pnts[1], crs=4326)
  plot_gh(tot_sprops.pnts_sf, out.pdf="./img/sol_chem.pnts_sites.pdf")
  ## extremely slow --- takes 15mins
  system("pdftoppm ./img/sol_chem.pnts_sites.pdf ./img/sol_chem.pnts_sites -png -f 1 -singlefile")
  system("convert -crop 1280x575+36+114 ./img/sol_chem.pnts_sites.png ./img/sol_chem.pnts_sites.png")
}
Soil profiles and soil samples with chemical and physical properties global compilation.

(#fig:sol_chem.pnts_sites)Soil profiles and soil samples with chemical and physical properties global compilation.

Fig. 1: Soil profiles and soil samples with chemical and physical properties global compilation.

5.5 Save final analysis-ready objects:

saveRDS.gz(tot_sprops.s, "./out/rds/sol_chem.pnts_horizons.rds")
saveRDS.gz(tot_sprops, "/mnt/diskstation/data/Soil_points/sol_chem.pnts_horizons.rds")
saveRDS.gz(tot_sprops.pnts, "/mnt/diskstation/data/Soil_points/sol_chem.pnts_sites.rds")
#library(farff)
#writeARFF(tot_sprops.s, "./out/arff/sol_chem.pnts_horizons.arff", overwrite = TRUE)
## compressed CSV
write.csv(tot_sprops.s, file=gzfile("./out/csv/sol_chem.pnts_horizons.csv.gz"))
## regression matrix:
#saveRDS.gz(rm.sol, "./out/rds/sol_chem.pnts_horizons_rm.rds")

Save temp object:

save.image.pigz(file="soilchem.RData")
## rmarkdown::render("Index.rmd")