Introdução
R
- Carrega biblioteas
Aquisicao de dados
Extracao de caracteristicas
- Selecionando bandas
- Indices de vegetacao
Classificação

Introdução

R

Carrega biblioteas

library(sf)
library(raster)
# library(spData)

library(ggplot2)
library(rasterVis)
library(mapview)

library(dplyr)
library(readr)

# library(mlr)
library(caret)
library(getSpatialData)
library(RStoolbox)

library(parallel)
library(doParallel)

Aquisicao de dados

Área de estudo

https://proj.org/usage/projections.html

A área do município de Bauru - SP foi escolhida como objeto de estudo, que está representada pelo modelo vetorial (pontos, linhas e polígonos) obtido pelo site do Instituto Brasileiro de Geografia e Estatística (IBGE) no formato shapefile.

Primeiro, o arquivo com a malha municipal do estado de São Paulo foi baixado.

if(!file.exists("./dados/ibge/35MUE250GC_SIR.shp")){
  url <- "ftp://geoftp.ibge.gov.br/organizacao_do_territorio/malhas_territoriais/malhas_municipais/municipio_2018/UFs/SP/sp_municipios.zip"
  destino_zip <- "./dados/ibge/sp_municipios.zip"
  
  # download e descompactacao dos dados
  download.file(url = url, destfile = destino_zip)
  unzip(destino_zip, exdir="./dados/ibge/")
  
  file.remove(destino_zip)
}

Em seguida, foi selecionado apenas o polígono que representa a cidade.

sp_shp <- st_read("./dados/ibge/35MUE250GC_SIR.shp")

## Reading layer `35MUE250GC_SIR' from data source `/home/rstudio/tcc/dados/ibge/35MUE250GC_SIR.shp' using driver `ESRI Shapefile'
## Simple feature collection with 645 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -53.11011 ymin: -25.358 xmax: -44.16137 ymax: -19.77966
## epsg (SRID):    NA
## proj4string:    +proj=longlat +ellps=GRS80 +no_defs

bauru_shp <- sp_shp %>% 
  filter(NM_MUNICIP == "BAURU")

Todo dado espacial é composto por um sistema de referência de coordenadas, que pode ser geográfico ou projetado. As bibliotecas do R e muitos outros SIGs adotam o padrão da biblioteca PROJ.

st_crs(bauru_shp)

## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=longlat +ellps=GRS80 +no_defs"

Dados de treinamento

Mapbiomas

COLECAO 3.1

O projeto MapBiomas > é uma iniciativa multi-institucional para gerar mapas anuais de cobertura e uso do solo a partir de processos de classificação automática aplicada a imagens de satélite.

Os dados gerados pelo projeto - e disponibilizados através de diversas ferramentas - foram coletados e usados neste trabalho para extração dos exemplos de treinamento do modelo preditivo.

Classes

As classes de uso e cobertura do solo, de acordo com a metodologia aplicada, são as seguintes:

classes <- read.csv("./dados/mapbiomas/classes.csv") 

classes %>% 
  select(name.full, value)

##                                        name.full value
## 1                                    1. Floresta     1
## 2                          1.1. Floresta Natural     2
## 3                      1.1.1. Formação Florestal     3
## 4                       1.1.2. Formação Savanica     4
## 5                                  1.1.3. Mangue     5
## 6                         1.2. Floresta Plantada     9
## 7              2. Formação Natural não Florestal    10
## 8          2.1. Área Úmida Natural não Florestal    11
## 9                        2.2. Formação Campestre    12
## 10                                   2.3. Apicum    32
## 11     2.4. Outra Formação Natural não Florestal    13
## 12                               3. Agropecuária    14
## 13                                 3.1. Pastagem    15
## 14                              3.2. Agricultura    18
## 15                 3.2.1. Cultura Anual e Perene    19
## 16                    3.2.2. Cultura Semi-Perene    20
## 17        3.3. Mosaico de Agricultura e Pastagem    21
## 18                          4. Área não vegetada    22
## 19                             4.1. Praia e Duna    23
## 20                    4.2. Infraestrutura Urbana    24
## 21                      4.3. Afloramento Rochoso    29
## 22                                4.4. Mineração    30
## 23                  4.4. Outra Área não Vegetada    25
## 24                              5. Corpos D'água    26
## 25                        5.1 Rio, Lago e Oceano    33
## 26                               5.2 Aquicultura    31
## 27                              6. Não observado    27

Seleção de amostras

Cerrado

if(!file.exists(cerrado_file <- "./dados/mapbiomas/CERRADO.tif")){
  url <- "http://storage.googleapis.com/mapbiomas-public/COLECAO/3_1/CONSOLIDACAO/CERRADO.tif"
  destino_zip <- "./dados/ibge/sp_municipios.zip"
  download.file(url = url, destfile = cerrado_file)
} else{
  cerrado_raster <- raster("./dados/mapbiomas/CERRADO.tif")  
}

Como a extensão do raster é muito grande, foi feito um recorte de toda a região do cerrado que intersecta com o estado de são paulo. Depois, para cada classe presente no raster, foi extraído 200 pontos aleatórios, como mostrado na tabela a seguir

if(!file.exists("./saida/shapefile/pontos_amostra_sp.shp")){
  # reprojeta shapefile para o SRC do raster
  sp_shp_reproj <- 
    st_transform(sp_shp,projection(cerrado_raster))
  
  # recorta depois aplica mascara no formato do poligono
  sp_raster_cerrado <- cerrado_raster %>% 
    crop(sp_shp_reproj) %>% 
    mask(sp_shp_reproj)
  
  # Raster Atribute Table
  rat <- sp_raster_cerrado %>% 
    ratify() %>%                          
    levels() %>%
    `[[`(1) 
  
  # Nomeia 
  levels(sp_raster_cerrado) <- rat
  
  # Extrai 200 pontos aleatorios para cada classe
  set.seed(123)
  samples <- sp_raster_cerrado %>%
    sampleStratified(size = 200, na.rm = TRUE, sp = TRUE) %>%
    st_as_sf() %>% 
    select(-cell)
  
  st_write(samples,"./saida/shapefile/pontos_amostra_sp.shp")
  
  saveRDS("./cache/sp_raster_cerrado.rds")
} else{
  samples <- st_read("./saida/shapefile/pontos_amostra_sp.shp", quiet = TRUE)
}
samples$CERRADO %>%
  table()

## .
##   3   4   9  12  15  19  20  21  24  33 
## 200  91 200 140 200 146 200 200 108 161

Posteriormente, foi utilizada a plataforma Google Earth Engine para obtenção dos dados de treinamento. A metodologia aplicada foi: upload do arquivo gerado com os pontos aleatórios das classes para a plataforma; obtenção dos dados da coleção (‘COPERNICUS/S2_SR’) do sensor MSI do satélite Sentinel 2, mais específicamente, imagens de Level 2A que possuem correções atmosféricas e geométricas aplicadas; em seguida, foram selecionadas apenas as imagens entre as datas de 01/01/2019 e 01/10/2019, com percentagem de núvens menor que 0.1 e que estivessem dentro do contorno da região dos pontos gerados anteriormente; a média de todas as imagens trazidas foi computada e então, para cada ponto, foram extraídos os valores de todas as bandas; por útimo, a tabela gerada foi exportada para o Google Drive.

Segue o script aplicado na plataforma:

/**
 * @referencia
 *    https://www.youtube.com/playlist?list=PLNFvG6bTA4NReWtgC93Mh9Tw1RNG4EBMP
 *    https://developers.google.com/earth-engine/api_docs#ee.image.reduceregion
 *    https://gis.stackexchange.com/questions/265392/extracting-pixel-values-by-points-and-converting-to-table-in-google-earth-engine
 */
 
var pts_amostra = ee.FeatureCollection('users/alexandretomy/tcc/pontos_amostra_sp')
var sent2a_col = ee.ImageCollection('COPERNICUS/S2_SR')

var aoi = ee.FeatureCollection(pts_amostra)

var collection = sent2a_col
  .filterDate('2019-01-01', '2019-10-01')
  .filterBounds(aoi)
  .filterMetadata('CLOUDY_PIXEL_PERCENTAGE', 'less_than', 0.1)
//   .filterMetadata('CLOUD_COVER', 'less_than', 5)
  
print(collection)
var median = collection.median();

var samplePointsValues = 
  median.reduceRegions({
      reducer: ee.Reducer.first(),
      collection: aoi,
      scale: 30,
      tileScale: 2
    })

print(samplePointsValues)

Map.centerObject(aoi)
// Map.addLayer(collection)
Map.addLayer(collection.select('B4','B3','B2'))
Map.addLayer(aoi)

Export.table.toDrive(
  samplePointsValues,
  "valores das amostras aleatorias",
  "EarthEngineExport",
  "samplePointsValues"
)

Satélite Sentinel

Como abordado acima, o sensor multiespectral do Satélite Sentinel 2, possui as seguintes características:

sentinel_bandas <- read.csv("./dados/satelite/sentinel/bandas.csv") 
sentinel_bandas

##    banda         nome lambda resolucao
## 1     B1      Aerosol  0.443        60
## 2     B2         Azul  0.490        10
## 3     B3        Verde  0.560        10
## 4     B4     Vermelho  0.665        10
## 5     B8          NIR  0.842        10
## 6     B5   Red edge 1  0.705        20
## 7     B6   Red edge 2  0.740        20
## 8     B7   Red edge 3  0.783        20
## 9     B9 Vapor d’água  0.940        60
## 10   B10       Cirrus  1.375        60
## 11   B11       SWIR 1  1.610        20
## 12   B12       SWIR 2  2.190        20
## 13   B8A   Red edge 4  0.865        20

Os dados dos satélites Sentinel podem ser obtidos gratuitamente, é necessário apenas cadastro. No R, há diversas bibliotecas que fazer a interface com o hub.

# credenciais e diretorio de dados (no meu caso, estao no arquivo .Rprofile)
login_CopHub(username = cophub.user, password = cophub.pass)
# diretorio dos dados
set_archive("./dados/satelite/sentinel/")

A partir dos filtros por data e cobertura de núvem, foram selecionadas 2 imagens para que recobrissem a área de interesse completamente.

# registros da pesquina no copernicus hub
registros <- getSentinel_query(
    aoi = bauru_shp$geometry,
    time_range = c("2018-01-01", "2019-10-01"),
    platform = "Sentinel-2"
  ) %>% 
  filter(cloudcoverpercentage <= 5)
head(registros)

if(!.exists("./dados/satelite/sentinel/get_data/")){
  # baixa as imagens
  getSentinel_data(records = registros[c(5,6),])
}

As duas imagens foram recortadas e depois juntadas em uma única composição. Posteriormente, foi feito um redimensionamento das bandas de 20m para 10m, para que fossem compiladas em uma única pilha de raster.

if(!file.exists("./dados/satelite/sentinel/sentinel.tif")){
  sentinel_ras <- func_le("./dados/satelite/sentinel/get_data/Sentinel-2/")
  
  bauru_shp_reproj <- 
    st_transform(bauru_shp,projection(sentinel_ras[[1]][[1]]))
  
  sentinel_merge <- 
    purrr::map(sentinel_ras,{function(ras) lapply(ras,FUN = raster::crop, y = extent(bauru_shp_reproj))}) %>% 
    func_by2col(raster::merge)
  
  sentinel_names <- stringr::str_extract(sapply(sentinel_ras[[1]],names),"B.{2}")
  
  names(sentinel_merge) <- sentinel_names
  
  sentinel_resamp <- vector(mode = "list", length = length(sentinel_merge))
  names(sentinel_resamp) <- names(sentinel_merge)
  
  for (b in c("B05", "B06", "B07", "B8A", "B11", "B12")){
    beginCluster(n = round(3/4 * parallel::detectCores()))
    try(
      sentinel_resamp[[b]] <- raster::resample(x = sentinel_merge[[b]],
                                                  y = sentinel_merge$B02)
    )
    endCluster()
  }
  
  b_10m <- c("B02", "B03", "B04", "B08")
  sentinel_resamp[b_10m] <- sentinel_merge[b_10m]
  sentinel <- stack(sentinel_resamp)
  
  names(sentinel) <- names(sentinel_merge)
  
  writeRaster(sentinel, './dados/satelite/sentinel/sentinel.tif', overwrite=FALSE)
} else {
  sentinel <- stack("./dados/satelite/sentinel/sentinel.tif")
  sentinel_names <- c("Azul","Verde","Vermelho","NIR","RE1","RE2","RE3","SWIR1","SWIR2","RE4")
  names(sentinel) <- sentinel_names
}

Segue uma composição com as bandas do Vermelho, Verde e Azul da região do município.

sentinel_rgb_plot <- ggRGB(sentinel,r=3,g=2,b=1)
sentinel_rgb_plot

Extracao de caracteristicas

Selecionando bandas

A partir do arquivo gerado pelo script na plataforma do Google Earth Engine, foram selecionadas as bandas, ou variáveis, para criação do modelo preditivo

# Carrega amostra com valores extraidos
amostras_com_valores <- st_read("./dados/samp_val.geojson", quiet = TRUE)

# Nomeia as bandas
bandas <- data.frame(
  name = c("SWIR1","SWIR2","Azul","Verde","Vermelho","NIR","RE1","RE2","RE3","RE4"),
  band = c("B11","B12","B2","B3","B4","B5","B6","B7","B8","B8A")
) 
bandas

##        name band
## 1     SWIR1  B11
## 2     SWIR2  B12
## 3      Azul   B2
## 4     Verde   B3
## 5  Vermelho   B4
## 6       NIR   B5
## 7       RE1   B6
## 8       RE2   B7
## 9       RE3   B8
## 10      RE4  B8A

Indices de vegetacao

Os índices de vegetação são obtidos através de operações aritméticas entre as bandas. Possuem a característica de realçar as variações de densidade da cobertura vegetal. O NDVI é provavelmente o mais utilizado, ele possui a característica de evidenciar áreas da vegetação fotossinteticamente mais ativas e é calculado a partir das bandas do Vermelho e Infravermelho Próximo.

calculaIndice <- function(x,y){
  (x-y)/(x+y)
}

NDVI <- overlay(sentinel$NIR,sentinel$Vermelho,fun=calculaIndice)
names(NDVI) <- "NDVI"
sentinel <- stack(sentinel,NDVI)

plot(NDVI)

Aqui, foi montada a tabela contendo as variáveis de treinamento (bandas e índice de vegetação) bem como as classes alvo.

if(!file.exists("./cache/amostras.rds")){
  # Acrescenta coluna de NDVI e seleciona as bandas definidas
  amostras <- amostras_com_valores %>%
    mutate(NDVI = calculaIndice(B5,B4)) %>% 
    select(bandas$band,NDVI,CERRADO) %>% 
    st_drop_geometry() %>% 
    na.omit()
  
  # Renomeia as bandas e a coluna das classes
  names(amostras) <- c(bandas$name,"NDVI","Classe")
  
  # Tabela com as informacoes das classes (cor, nome e valor)
  classes_subset <- classes %>% 
    filter(value %in% unique(amostras$Classe)) %>% 
    select(value,name,color)
  
  # Renomeia a coluna das classes pelo nome
  amostras$Classe <- amostras$Classe %>% 
    plyr::mapvalues(from=unique(.), to=classes_subset$name)
  
  saveRDS(amostras, "./cache/amostras.rds")
} else {
  amostras <- readRDS("./cache/amostras.rds")
}
  
head(amostras)

##   SWIR1  SWIR2  Azul Verde Vermelho  NIR    RE1    RE2    RE3  RE4
## 1  2244 1296.0 400.5 733.0    670.0 1219 2230.5 2579.5 2681.5 2856
## 2  2703 1787.0 375.0 624.0    807.0 1215 1612.0 1770.0 1818.0 2040
## 3  1703  738.5 210.5 435.0    227.5  778 2616.5 3235.0 3347.5 3605
## 4  1488  645.0 209.0 341.0    231.0  677 2043.0 2477.0 2628.0 2795
## 5  1671  767.0 216.0 367.0    229.0  659 1735.0 2314.0 2485.0 2587
## 6  1539  671.5 207.0 395.5    221.0  685 2156.0 2642.5 2690.0 2943
##        NDVI             Classe
## 1 0.2906300 formacao_florestal
## 2 0.2017804 formacao_florestal
## 3 0.5474888 formacao_florestal
## 4 0.4911894 formacao_florestal
## 5 0.4842342 formacao_florestal
## 6 0.5121413 formacao_florestal

Podemos vizualisar a correlação entre as bandas em gráficos para cada par, representado a seguir.

amostras_plot_bandas <- amostras %>% 
  select(-Classe,-NDVI) %>% 
  plot()

Classificação

Validação Cruzada

Validação cruzada é um método de reamostragem onde dividi-se o conjunto de dados repetidamente em conjuntos de treinamento, usados para ajustar o modelo, e conjuntos de teste, usados para verificar o desempenho das predições. No caso, foi feito uma divisão de 75% para o conjunto de treinamento, e os 25% restantes para testes. Além disso, foi definido uma repetição de 10 vezes em que a repartição é realizada.

set.seed(123)

idx_treino <- createDataPartition(amostras$Classe,
                                  p = 0.75,
                                  list = FALSE)

amostras_treino <- amostras[idx_treino,]
amostras_teste <- amostras[-idx_treino,]

table(amostras_treino$Classe)

## 
## agricultura_pastagem cultura_anual_perene  cultura_semi_perene 
##                  148                  110                  150 
##    floresta_plantada   formacao_campestre   formacao_florestal 
##                  150                  105                  150 
##    formacao_savanica         infra_urbana             pastagem 
##                   69                   79                  149 
##      rio_lago_oceano 
##                  120

table(amostras_teste$Classe)

## 
## agricultura_pastagem cultura_anual_perene  cultura_semi_perene 
##                   49                   36                   50 
##    floresta_plantada   formacao_campestre   formacao_florestal 
##                   50                   35                   49 
##    formacao_savanica         infra_urbana             pastagem 
##                   22                   26                   49 
##      rio_lago_oceano 
##                   39

controle_treino <- trainControl(
  summaryFunction = multiClassSummary,
  method = "cv",
  number = 10,
  savePredictions = TRUE
)

Treinamento

SVM

if(!file.exists("./cache/modelo_svm_radial.rds")){
  cluster <- makeCluster(3/4 * detectCores())
  registerDoParallel(cl)
  
  modelo_svm_radial <- caret::train(
    Classe ~ ., method = "svmRadial", data = amostras_treino,
    tuneGrid = data.frame(
      .C = c(0.01,0.3,1),
      .sigma = c(0.05,0,1)
    ),
    trControl = controle_treino,
    allowParallel = TRUE,
    preProc = c("center", "scale")
  )
  
  stopCluster(cl); remove(cl)
  registerDoSEQ()
  saveRDS(modelo_svm_radial, "./cache/modelo_svm_radial.rds")
} else {
  modelo_svm_radial <- readRDS("./cache/modelo_svm_radial.rds")
}
modelo_svm_radial

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 1230 samples
##   11 predictor
##   10 classes: 'agricultura_pastagem', 'cultura_anual_perene', 'cultura_semi_perene', 'floresta_plantada', 'formacao_campestre', 'formacao_florestal', 'formacao_savanica', 'infra_urbana', 'pastagem', 'rio_lago_oceano' 
## 
## Pre-processing: centered (11), scaled (11) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1108, 1106, 1107, 1108, 1108, 1107, ... 
## Resampling results across tuning parameters:
## 
##   C     sigma  Accuracy   Kappa      Mean_F1    Mean_Sensitivity
##   0.01  0.05   0.3211950  0.2289964        NaN  0.2811905       
##   0.30  0.00   0.1219577  0.0000000        NaN  0.1000000       
##   1.00  1.00   0.4519443  0.3835632  0.4720939  0.4405249       
##   Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
##   0.9227229              NaN             0.9279680          
##   0.9000000              NaN             0.9109987          
##   0.9381365         0.468257             0.9388539          
##   Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
##        NaN        0.2811905    0.03211950           0.6019567             
##        NaN        0.1000000    0.01219577           0.5000000             
##   0.468257        0.4405249    0.04519443           0.6893307             
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 1 and C = 1.

matriz_confusao_svm_radial <- confusionMatrix(
  data = predict(modelo_svm_radial, newdata = amostras_teste),
  reference = as.factor(amostras_teste$Classe)
)
matriz_confusao_svm_radial

## Confusion Matrix and Statistics
## 
##                       Reference
## Prediction             agricultura_pastagem cultura_anual_perene
##   agricultura_pastagem                    6                    2
##   cultura_anual_perene                    2                    6
##   cultura_semi_perene                    16                   10
##   floresta_plantada                       1                    3
##   formacao_campestre                      6                    2
##   formacao_florestal                      2                    3
##   formacao_savanica                       0                    0
##   infra_urbana                            2                    2
##   pastagem                               14                    8
##   rio_lago_oceano                         0                    0
##                       Reference
## Prediction             cultura_semi_perene floresta_plantada
##   agricultura_pastagem                   4                 2
##   cultura_anual_perene                   1                 2
##   cultura_semi_perene                   32                 1
##   floresta_plantada                      0                29
##   formacao_campestre                     1                 3
##   formacao_florestal                     0                 6
##   formacao_savanica                      0                 0
##   infra_urbana                           1                 1
##   pastagem                              11                 6
##   rio_lago_oceano                        0                 0
##                       Reference
## Prediction             formacao_campestre formacao_florestal
##   agricultura_pastagem                  5                  2
##   cultura_anual_perene                  0                  0
##   cultura_semi_perene                   5                  0
##   floresta_plantada                     1                  6
##   formacao_campestre                   16                  7
##   formacao_florestal                    5                 27
##   formacao_savanica                     0                  2
##   infra_urbana                          0                  1
##   pastagem                              3                  4
##   rio_lago_oceano                       0                  0
##                       Reference
## Prediction             formacao_savanica infra_urbana pastagem
##   agricultura_pastagem                 2            5        8
##   cultura_anual_perene                 1            0        3
##   cultura_semi_perene                  1            0        6
##   floresta_plantada                    1            0        3
##   formacao_campestre                   5            3        1
##   formacao_florestal                   7            0        5
##   formacao_savanica                    4            0        0
##   infra_urbana                         0           15        0
##   pastagem                             1            3       23
##   rio_lago_oceano                      0            0        0
##                       Reference
## Prediction             rio_lago_oceano
##   agricultura_pastagem               1
##   cultura_anual_perene               0
##   cultura_semi_perene                0
##   floresta_plantada                  1
##   formacao_campestre                 5
##   formacao_florestal                 1
##   formacao_savanica                  0
##   infra_urbana                       0
##   pastagem                           0
##   rio_lago_oceano                   31
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4667          
##                  95% CI : (0.4172, 0.5166)
##     No Information Rate : 0.1235          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4005          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: agricultura_pastagem
## Sensitivity                              0.12245
## Specificity                              0.91292
## Pos Pred Value                           0.16216
## Neg Pred Value                           0.88315
## Prevalence                               0.12099
## Detection Rate                           0.01481
## Detection Prevalence                     0.09136
## Balanced Accuracy                        0.51769
##                      Class: cultura_anual_perene
## Sensitivity                              0.16667
## Specificity                              0.97561
## Pos Pred Value                           0.40000
## Neg Pred Value                           0.92308
## Prevalence                               0.08889
## Detection Rate                           0.01481
## Detection Prevalence                     0.03704
## Balanced Accuracy                        0.57114
##                      Class: cultura_semi_perene Class: floresta_plantada
## Sensitivity                             0.64000                   0.5800
## Specificity                             0.89014                   0.9549
## Pos Pred Value                          0.45070                   0.6444
## Neg Pred Value                          0.94611                   0.9417
## Prevalence                              0.12346                   0.1235
## Detection Rate                          0.07901                   0.0716
## Detection Prevalence                    0.17531                   0.1111
## Balanced Accuracy                       0.76507                   0.7675
##                      Class: formacao_campestre Class: formacao_florestal
## Sensitivity                            0.45714                   0.55102
## Specificity                            0.91081                   0.91854
## Pos Pred Value                         0.32653                   0.48214
## Neg Pred Value                         0.94663                   0.93696
## Prevalence                             0.08642                   0.12099
## Detection Rate                         0.03951                   0.06667
## Detection Prevalence                   0.12099                   0.13827
## Balanced Accuracy                      0.68398                   0.73478
##                      Class: formacao_savanica Class: infra_urbana
## Sensitivity                          0.181818             0.57692
## Specificity                          0.994778             0.98153
## Pos Pred Value                       0.666667             0.68182
## Neg Pred Value                       0.954887             0.97128
## Prevalence                           0.054321             0.06420
## Detection Rate                       0.009877             0.03704
## Detection Prevalence                 0.014815             0.05432
## Balanced Accuracy                    0.588298             0.77923
##                      Class: pastagem Class: rio_lago_oceano
## Sensitivity                  0.46939                0.79487
## Specificity                  0.85955                1.00000
## Pos Pred Value               0.31507                1.00000
## Neg Pred Value               0.92169                0.97861
## Prevalence                   0.12099                0.09630
## Detection Rate               0.05679                0.07654
## Detection Prevalence         0.18025                0.07654
## Balanced Accuracy            0.66447                0.89744

if(!file.exists("./cache/modelo_svm_linear.rds")){
  cl <- makeCluster(3/4 * detectCores())
  registerDoParallel(cl)
  
  modelo_svm_linear <- caret::train(
    Classe ~ ., method = "svmLinear3", data = amostras_treino,
    tuneGrid = expand.grid(
      cost = c(0.03,0.1,3),
      Loss = c("L1", "L2")
    ),
    trControl = controle_treino,
    allowParallel = TRUE,
    preProc = c("center", "scale")
  )
  
  stopCluster(cl); remove(cl)
  registerDoSEQ()
  saveRDS(modelo_svm_linear, "./cache/modelo_svm_linear.rds")
} else {
  modelo_svm_linear <- readRDS("./cache/modelo_svm_linear.rds")
}
modelo_svm_linear

## L2 Regularized Support Vector Machine (dual) with Linear Kernel 
## 
## 1230 samples
##   11 predictor
##   10 classes: 'agricultura_pastagem', 'cultura_anual_perene', 'cultura_semi_perene', 'floresta_plantada', 'formacao_campestre', 'formacao_florestal', 'formacao_savanica', 'infra_urbana', 'pastagem', 'rio_lago_oceano' 
## 
## Pre-processing: centered (11), scaled (11) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1108, 1106, 1107, 1107, 1108, 1107, ... 
## Resampling results across tuning parameters:
## 
##   cost  Loss  Accuracy   Kappa      Mean_F1    Mean_Sensitivity
##   0.03  L1    0.3472029  0.2731904        NaN  0.3632100       
##   0.03  L2    0.4365703  0.3651394        NaN  0.4216504       
##   0.10  L1    0.3861229  0.3181410  0.4063704  0.4107035       
##   0.10  L2    0.4357440  0.3645845        NaN  0.4205595       
##   3.00  L1    0.3551875  0.2844897        NaN  0.3805855       
##   3.00  L2    0.4357508  0.3648648        NaN  0.4220931       
##   Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
##   0.9272399         0.3271170            0.9290613          
##   0.9362729               NaN            0.9387667          
##   0.9318056         0.3674792            0.9332700          
##   0.9362455               NaN            0.9385520          
##   0.9284281         0.3604449            0.9297899          
##   0.9362747         0.3950811            0.9382511          
##   Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
##   0.3271170       0.3632100    0.03472029           0.6452249             
##         NaN       0.4216504    0.04365703           0.6789617             
##   0.3674792       0.4107035    0.03861229           0.6712545             
##         NaN       0.4205595    0.04357440           0.6784025             
##   0.3604449       0.3805855    0.03551875           0.6545068             
##   0.3950811       0.4220931    0.04357508           0.6791839             
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were cost = 0.03 and Loss = L2.

matriz_confusao_svm_linear <- confusionMatrix(
  data = predict(modelo_svm_linear, newdata = amostras_teste),
  reference = as.factor(amostras_teste$Classe)
)
matriz_confusao_svm_linear

## Confusion Matrix and Statistics
## 
##                       Reference
## Prediction             agricultura_pastagem cultura_anual_perene
##   agricultura_pastagem                    0                    0
##   cultura_anual_perene                    1                    4
##   cultura_semi_perene                    20                   14
##   floresta_plantada                       1                    6
##   formacao_campestre                      2                    0
##   formacao_florestal                      6                    4
##   formacao_savanica                       0                    0
##   infra_urbana                            6                    2
##   pastagem                               13                    6
##   rio_lago_oceano                         0                    0
##                       Reference
## Prediction             cultura_semi_perene floresta_plantada
##   agricultura_pastagem                   0                 0
##   cultura_anual_perene                   2                 1
##   cultura_semi_perene                   29                 2
##   floresta_plantada                      0                30
##   formacao_campestre                     1                 1
##   formacao_florestal                     1                 7
##   formacao_savanica                      0                 0
##   infra_urbana                           4                 3
##   pastagem                              13                 6
##   rio_lago_oceano                        0                 0
##                       Reference
## Prediction             formacao_campestre formacao_florestal
##   agricultura_pastagem                  0                  0
##   cultura_anual_perene                  0                  1
##   cultura_semi_perene                   6                  2
##   floresta_plantada                     2                  6
##   formacao_campestre                    5                  1
##   formacao_florestal                   14                 35
##   formacao_savanica                     0                  0
##   infra_urbana                          1                  1
##   pastagem                              6                  3
##   rio_lago_oceano                       1                  0
##                       Reference
## Prediction             formacao_savanica infra_urbana pastagem
##   agricultura_pastagem                 1            0        0
##   cultura_anual_perene                 1            0        0
##   cultura_semi_perene                  2            1        8
##   floresta_plantada                    1            0        3
##   formacao_campestre                   0            2        2
##   formacao_florestal                  16            1        8
##   formacao_savanica                    0            0        0
##   infra_urbana                         0           21        4
##   pastagem                             1            1       24
##   rio_lago_oceano                      0            0        0
##                       Reference
## Prediction             rio_lago_oceano
##   agricultura_pastagem               0
##   cultura_anual_perene               0
##   cultura_semi_perene                0
##   floresta_plantada                  1
##   formacao_campestre                 0
##   formacao_florestal                 4
##   formacao_savanica                  0
##   infra_urbana                       0
##   pastagem                           0
##   rio_lago_oceano                   34
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4494          
##                  95% CI : (0.4002, 0.4993)
##     No Information Rate : 0.1235          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.3801          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: agricultura_pastagem
## Sensitivity                             0.000000
## Specificity                             0.997191
## Pos Pred Value                          0.000000
## Neg Pred Value                          0.878713
## Prevalence                              0.120988
## Detection Rate                          0.000000
## Detection Prevalence                    0.002469
## Balanced Accuracy                       0.498596
##                      Class: cultura_anual_perene
## Sensitivity                             0.111111
## Specificity                             0.983740
## Pos Pred Value                          0.400000
## Neg Pred Value                          0.918987
## Prevalence                              0.088889
## Detection Rate                          0.009877
## Detection Prevalence                    0.024691
## Balanced Accuracy                       0.547425
##                      Class: cultura_semi_perene Class: floresta_plantada
## Sensitivity                              0.5800                  0.60000
## Specificity                              0.8451                  0.94366
## Pos Pred Value                           0.3452                  0.60000
## Neg Pred Value                           0.9346                  0.94366
## Prevalence                               0.1235                  0.12346
## Detection Rate                           0.0716                  0.07407
## Detection Prevalence                     0.2074                  0.12346
## Balanced Accuracy                        0.7125                  0.77183
##                      Class: formacao_campestre Class: formacao_florestal
## Sensitivity                            0.14286                   0.71429
## Specificity                            0.97568                   0.82865
## Pos Pred Value                         0.35714                   0.36458
## Neg Pred Value                         0.92327                   0.95469
## Prevalence                             0.08642                   0.12099
## Detection Rate                         0.01235                   0.08642
## Detection Prevalence                   0.03457                   0.23704
## Balanced Accuracy                      0.55927                   0.77147
##                      Class: formacao_savanica Class: infra_urbana
## Sensitivity                           0.00000             0.80769
## Specificity                           1.00000             0.94459
## Pos Pred Value                            NaN             0.50000
## Neg Pred Value                        0.94568             0.98623
## Prevalence                            0.05432             0.06420
## Detection Rate                        0.00000             0.05185
## Detection Prevalence                  0.00000             0.10370
## Balanced Accuracy                     0.50000             0.87614
##                      Class: pastagem Class: rio_lago_oceano
## Sensitivity                  0.48980                0.87179
## Specificity                  0.86236                0.99727
## Pos Pred Value               0.32877                0.97143
## Neg Pred Value               0.92470                0.98649
## Prevalence                   0.12099                0.09630
## Detection Rate               0.05926                0.08395
## Detection Prevalence         0.18025                0.08642
## Balanced Accuracy            0.67608                0.93453

ggplot(modelo_svm_linear)

Random Forest

if(!file.exists("./cache/modelo_rf.rds")){
  cluster <- makeCluster(3/4 * detectCores())
  registerDoParallel(cluster)
  
  modelo_rf <- caret::train(
    Classe ~ ., method = "rf", data = amostras_treino,
    tuneGrid = data.frame(mtry = c(2, 3, 4, 5, 8)),
    trControl = controle_treino,
    allowParallel = TRUE
  )
  
  stopCluster(cluster); remove(cluster)
  registerDoSEQ()
  saveRDS(modelo_rf, "./cache/modelo_rf.rds")
  modelo_rf$times$everything    
} else {
  modelo_rf <- readRDS("./cache/modelo_rf.rds")
}
modelo_rf

## Random Forest 
## 
## 1230 samples
##   11 predictor
##   10 classes: 'agricultura_pastagem', 'cultura_anual_perene', 'cultura_semi_perene', 'floresta_plantada', 'formacao_campestre', 'formacao_florestal', 'formacao_savanica', 'infra_urbana', 'pastagem', 'rio_lago_oceano' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1107, 1107, 1107, 1106, 1107, 1108, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa      Mean_F1    Mean_Sensitivity
##   2     0.4252402  0.3555644  0.4550612  0.4299134       
##   3     0.4292791  0.3599819  0.4508215  0.4344935       
##   4     0.4219884  0.3517853  0.4318631  0.4261353       
##   5     0.4219752  0.3517556  0.4559636  0.4261407       
##   8     0.4252273  0.3553768  0.4630283  0.4284859       
##   Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
##   0.9353485         0.4347588            0.9358354          
##   0.9357761         0.4474454            0.9362117          
##   0.9349695         0.4307836            0.9353925          
##   0.9349624         0.4395405            0.9353650          
##   0.9353270         0.4468300            0.9357386          
##   Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
##   0.4347588       0.4299134    0.04252402           0.6826309             
##   0.4474454       0.4344935    0.04292791           0.6851348             
##   0.4307836       0.4261353    0.04219884           0.6805524             
##   0.4395405       0.4261407    0.04219752           0.6805516             
##   0.4468300       0.4284859    0.04252273           0.6819065             
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.

matriz_confusao_rf <- confusionMatrix(
  data = predict(modelo_rf, newdata = amostras_teste),
  reference = as.factor(amostras_teste$Classe)
)
matriz_confusao_rf

## Confusion Matrix and Statistics
## 
##                       Reference
## Prediction             agricultura_pastagem cultura_anual_perene
##   agricultura_pastagem                    6                    4
##   cultura_anual_perene                    4                    5
##   cultura_semi_perene                    14                   10
##   floresta_plantada                       3                    5
##   formacao_campestre                      4                    5
##   formacao_florestal                      2                    2
##   formacao_savanica                       2                    0
##   infra_urbana                            3                    0
##   pastagem                               11                    5
##   rio_lago_oceano                         0                    0
##                       Reference
## Prediction             cultura_semi_perene floresta_plantada
##   agricultura_pastagem                   4                 3
##   cultura_anual_perene                   3                 5
##   cultura_semi_perene                   29                 1
##   floresta_plantada                      0                24
##   formacao_campestre                     2                 3
##   formacao_florestal                     0                 8
##   formacao_savanica                      0                 0
##   infra_urbana                           1                 1
##   pastagem                              11                 5
##   rio_lago_oceano                        0                 0
##                       Reference
## Prediction             formacao_campestre formacao_florestal
##   agricultura_pastagem                  4                  2
##   cultura_anual_perene                  0                  2
##   cultura_semi_perene                   7                  0
##   floresta_plantada                     1                  7
##   formacao_campestre                   16                  5
##   formacao_florestal                    4                 23
##   formacao_savanica                     1                  5
##   infra_urbana                          0                  1
##   pastagem                              2                  4
##   rio_lago_oceano                       0                  0
##                       Reference
## Prediction             formacao_savanica infra_urbana pastagem
##   agricultura_pastagem                 0            1        5
##   cultura_anual_perene                 1            0        5
##   cultura_semi_perene                  2            0        8
##   floresta_plantada                    1            0        2
##   formacao_campestre                   5            3        1
##   formacao_florestal                   5            0        5
##   formacao_savanica                    7            0        0
##   infra_urbana                         0           22        2
##   pastagem                             1            0       21
##   rio_lago_oceano                      0            0        0
##                       Reference
## Prediction             rio_lago_oceano
##   agricultura_pastagem               0
##   cultura_anual_perene               1
##   cultura_semi_perene                0
##   floresta_plantada                  1
##   formacao_campestre                 4
##   formacao_florestal                 1
##   formacao_savanica                  1
##   infra_urbana                       0
##   pastagem                           0
##   rio_lago_oceano                   31
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4543          
##                  95% CI : (0.4051, 0.5042)
##     No Information Rate : 0.1235          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.3891          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: agricultura_pastagem
## Sensitivity                              0.12245
## Specificity                              0.93539
## Pos Pred Value                           0.20690
## Neg Pred Value                           0.88564
## Prevalence                               0.12099
## Detection Rate                           0.01481
## Detection Prevalence                     0.07160
## Balanced Accuracy                        0.52892
##                      Class: cultura_anual_perene
## Sensitivity                              0.13889
## Specificity                              0.94309
## Pos Pred Value                           0.19231
## Neg Pred Value                           0.91821
## Prevalence                               0.08889
## Detection Rate                           0.01235
## Detection Prevalence                     0.06420
## Balanced Accuracy                        0.54099
##                      Class: cultura_semi_perene Class: floresta_plantada
## Sensitivity                              0.5800                  0.48000
## Specificity                              0.8817                  0.94366
## Pos Pred Value                           0.4085                  0.54545
## Neg Pred Value                           0.9371                  0.92798
## Prevalence                               0.1235                  0.12346
## Detection Rate                           0.0716                  0.05926
## Detection Prevalence                     0.1753                  0.10864
## Balanced Accuracy                        0.7308                  0.71183
##                      Class: formacao_campestre Class: formacao_florestal
## Sensitivity                            0.45714                   0.46939
## Specificity                            0.91351                   0.92416
## Pos Pred Value                         0.33333                   0.46000
## Neg Pred Value                         0.94678                   0.92676
## Prevalence                             0.08642                   0.12099
## Detection Rate                         0.03951                   0.05679
## Detection Prevalence                   0.11852                   0.12346
## Balanced Accuracy                      0.68533                   0.69677
##                      Class: formacao_savanica Class: infra_urbana
## Sensitivity                           0.31818             0.84615
## Specificity                           0.97650             0.97889
## Pos Pred Value                        0.43750             0.73333
## Neg Pred Value                        0.96144             0.98933
## Prevalence                            0.05432             0.06420
## Detection Rate                        0.01728             0.05432
## Detection Prevalence                  0.03951             0.07407
## Balanced Accuracy                     0.64734             0.91252
##                      Class: pastagem Class: rio_lago_oceano
## Sensitivity                  0.42857                0.79487
## Specificity                  0.89045                1.00000
## Pos Pred Value               0.35000                1.00000
## Neg Pred Value               0.91884                0.97861
## Prevalence                   0.12099                0.09630
## Detection Rate               0.05185                0.07654
## Detection Prevalence         0.14815                0.07654
## Balanced Accuracy            0.65951                0.89744

ggplot(modelo_rf)

Resultados

if(!file.exists("./saida/raster/predicao_rf.tif")){
  time_predict <- system.time({
    predicao_svm_radial <- raster::predict(
      object = sentinel,
      model = modelo_svm_radial, na.rm = TRUE,
      progress = "text", type = "raw"
    )
    predicao_rf <- raster::predict(
      object = sentinel,
      model = modelo_rf, na.rm = TRUE,
      progress = "text", type = "raw"
    )
    predicao_svm_linear <- raster::predict(
      object = sentinel, 
      model = modelo_svm_linear, na.rm = TRUE,
      progress = "text", type = "raw"
    )
  })
  time_predict
  writeRaster(predicao_rf, "./saida/raster/predicao_rf.tif")
  writeRaster(predicao_svm_linear, "./saida/raster/predicao_svm_linear.tif")
  writeRaster(predicao_svm_radial, "./saida/raster/predicao_svm_radial.tif")
} else{
  predicao_svm_linear <- raster("./saida/raster/predicao_svm_linear.tif")
  predicao_svm_radial <- raster("./saida/raster/predicao_svm_radial.tif")
  predicao_rf <- raster("./saida/raster/predicao_rf.tif")
}

Mapa bauru

bauru_raster <- raster("./dados/mapbiomas/mapbiomas-bauru-2017.tif")

levels(bauru_raster) <- bauru_raster %>% 
  ratify() %>% 
  levels() %>% 
  "[["(1)

levels(bauru_raster)[[1]]$legend <- classes[match(
    bauru_raster@data@attributes[[1]]$ID,
    classes$value
  ),"name.full"]

colors_mapbiomas <- classes[match(
  bauru_raster@data@attributes[[1]]$ID,
  classes$value
),"color"]

map_bauru <- rasterVis::levelplot(
  bauru_raster, maxpixels = 1e6,
  col.regions = colors_mapbiomas
)

colors_predicao <- classes[match(
    predicao_rf@data@attributes[[1]][[2]],
    classes$name
),"color"]

map_svm_radial <- rasterVis::levelplot(
  predicao_svm_radial, maxpixels = 1e6,
  col.regions = colors_predicao
)
map_svm_linear <- rasterVis::levelplot(
  predicao_svm_linear, maxpixels = 1e6,
  col.regions = colors_predicao
)
map_rf <- rasterVis::levelplot(
  predicao_rf, maxpixels = 1e6,
  col.regions = colors_predicao
)

maps <- leafsync::sync(
  mapView(bauru_raster, col.regions = colors_mapbiomas, maxpixels = 5e5),
  mapView(predicao_svm_radial, col.regions = colors_predicao, maxpixels = 5e5),
  mapview(predicao_svm_linear, col.regions = colors_predicao, maxpixels = 5e5),
  mapview(predicao_rf, col.regions = colors_predicao, maxpixels = 5e5)
)

## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 1754300 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  1754300 '

## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 14519088 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  14519088 '

## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 14519088 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  14519088 '

## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 14519088 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  14519088 '

maps

Classificação de Imagens de Sensoriamento Remoto

Alexandre de Tomy Silva

27/10/2019