8_Advanced_AnalysingModelOutputs.Rmd

# |  (C) 2006-2019 Potsdam Institute for Climate Impact Research (PIK)
# |  authors, and contributors see CITATION.cff file. This file is part
# |  of REMIND and licensed under AGPL-3.0-or-later. Under Section 7 of
# |  AGPL-3.0, you are granted additional permissions described in the
# |  REMIND License Exception, version 1.0 (see LICENSE file).
# |  Contact: remind@pik-potsdam.de
---
title: 'How to import, process, and plot data using magclass '
author: "David Klein"
date: "`r Sys.Date()`"
output:
  html_document:
    df_print: paged
vignette: |
  %\VignetteIndexEntry{Vignette Title} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
---

This is a [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.

# Download demonstration data

Before starting the tutorial we need to download the input data we are going to use from PIK's RSE server.

```{r}
download.file("http://rse.pik-potsdam.de/data/example/fulldata.gdx", "fulldata.gdx", mode="wb")
download.file("http://rse.pik-potsdam.de/data/example/REMIND_generic_BAU.mif", "REMIND_generic_BAU.mif")
download.file("http://rse.pik-potsdam.de/data/example/REMIND_generic_RCP20.mif", "REMIND_generic_RCP20.mif")
download.file("http://rse.pik-potsdam.de/data/example/REMIND_generic_RCP26.mif", "REMIND_generic_RCP26.mif")
download.file("http://rse.pik-potsdam.de/data/example/historical.mif", "historical.mif")
```


# Import data from mif files

Import REMIND results from mif files using `read.report`. If you provide multiple files in a vector `read.report` reads all of them and returns a single magclass object:

```{r}
library(magclass)
rem <- read.report(c("REMIND_generic_BAU.mif",
                     "REMIND_generic_RCP20.mif",
                     "REMIND_generic_RCP26.mif"), 
                   as.list = FALSE)
```

Magclass objects have a standardized structure: regions in the first dimension, years in the second dimension, variable names in the third dimension. If your data has additional dimensions (such as *model* or *scenario*) they are added as a sub-dimension to the third dimension of the magpie object. The sub-dimension are separated by dots:

```{r}
str(rem)
```

Select years before and including 2100

```{r}
rem <- rem[,getYears(rem)<="y2100",]
```

Select only primary energy carriers

```{r}
var_PE <- c("PE|+|Coal",
            "PE|+|Oil",
            "PE|+|Gas",
            "PE|+|Biomass",
            "PE|+|Nuclear",
            "PE|+|Hydro",
            "PE|+|Wind",
            "PE|+|Solar",
            "PE|+|Geothermal")

# add same unit for all variables (works onyl if units are identical for all variables, otherwiese you would have to provide the units idividually)
var_PE <- paste(var_PE,"(EJ/yr)")

rem_PE <- rem[,,var_PE]
```

`fulldim` lists all elements of the object

```{r}
fulldim(rem_PE)
```

# Plot data

## Area plot

`mipArea` from the mip library inspects your data and tries to produce a meaningful plot. Please see the help `?mipArea` for more information. Remove global values before plotting (for nicer y-scales ):

```{r}
library(mip)
p <- mipArea(rem_PE["GLO",,invert=TRUE])
print(p)
```

Save the plot to a png file using `ggsave` from the ggplot2 package. We have to load ggplot2 that was previously loaded internally by `mipArea` but is not yet available outside.

```{r}
library(ggplot2)
ggsave(filename = "PE.png",plot = p, width=15, height=10, scale = 2, units = "cm")
```

`mipArea` retuns a regular ggplot object you can customize. To demonstrate this we override the default theme with theme_grey and a tiny font size, move the legend to the top, and reverse the facetting. This gives a nice ugly plot:

```{r}
p + theme_grey(base_size=8) + theme(legend.position="top") + facet_grid(region~scenario)
```

## Bar plot

So far there is no mipBar function available. However, ggplot makes it quite easy to generate a stacked bar plot from your data. We convert our magpie object into a data frame required by ggplot. Then we remove global values either using dplyr

```{r}
library(luplot)
dat <- as.ggplot(rem_PE)

library(dplyr)
dat <- dat %>% filter(Region != "GLO")
```

or using magclass
```{r}
dat <- as.ggplot(rem_PE[,2050,]["GLO",,invert=TRUE])
head(dat)
```

Generate a raw bar plot without specific adjustments
```{r}
ggplot() + geom_col(data=dat,aes(x=Data1,y=Value,fill=Data3))
```

We use the `plotstyle` function to define default colors for our entities. This ensures that in all our plots a particular entitiy will be displayed with the same color. Now, retrieve these default colors for the primary energy carriers using `plotstyle`. Therefore, shorten variable names, so that they match the names defined in plotstyle. The full names of the elements stored in the third dimension of the magpie object would be
```{r}
head(getNames(rem_PE))
```

We only want to shorten the *variable names* (stored in the third sub-dimension), not the model and scenario names. Therefore, specify the number of the sub-dimension:

```{r}
head(getNames(rem_PE,dim=3))
```

Now use gsub and regular expressions to remove the "PE|+|" and the units:
```{r}
getNames(rem_PE,dim=3) <- gsub("PE\\|\\+\\|(.*) \\(EJ\\/yr\\)$","\\1",getNames(rem_PE,dim=3))
getNames(rem_PE,dim=3)
```

For these names there are colors defined in `plotstyle`. Retrieve these colors
```{R}
mycolors <- plotstyle(getNames(rem_PE,dim=3))
print(mycolors)
```

Check if they are right
```{r}
showcolors(mycolors)
```

Generate plot
```{r}
dat <- as.ggplot(rem_PE[,2050,]["GLO",,invert=TRUE],useDimNames=FALSE)

p <- ggplot() +
  geom_col(data=dat,aes(x=Data1,y=Value,fill=Data3)) +
  scale_fill_manual(values = mycolors, name ="All Colors Are Beautiful") +
  labs(title = "Primary Energy", subtitle = "in 2050", x = "Scenario", y ="EJ/yr") +
  facet_wrap(~Region)
print(p)
```


## Line plot

Another plot example, this time plotting emissions.

```{r}
var_EMI <- c("Emi|CO2|Fossil Fuels and Industry|Cement process",
             "Emi|CO2|Fossil Fuels and Industry|Demand",
             "Emi|CO2|Fossil Fuels and Industry|Energy Supply",
             "Emi|CO2|Land-Use Change")

var_EMI <- paste(var_EMI,"(Mt CO2/yr)")

rem_EMI <- rem[,,var_EMI]
```

First, just because it's so beautiful and easy using `mipArea`, the area plot again (this time with negative values):

```{r}
mipArea(rem_EMI["GLO",,]) 
```

Creating a basic line plot with ggplot is almost as easy

```{r}
ggplot (as.ggplot(rem["GLO",,"Emi|CO2 (Mt CO2/yr)"])) + geom_line(aes(x=Year,y=Value,color=Data1)) +
 labs(title = "CO2 emissions", x = "", y ="Mt CO2/yr")
```

If you have historical values at hand you can use `mipLineHistorical` to validate your model data. Read historical data. Select years from 1950 on. Remove global data from both data sets (for nicer y-scaling). Select total CO2 emissions for model data. The function selects the same variables from the historical data if available.

```{r}
history <- read.report("historical.mif",as.list=FALSE)
history <- history["GLO",,,invert=TRUE][,getYears(history)>="y1950",][,,"IEA"]
mipLineHistorical(x = rem[,,"Emi|CO2 (Mt CO2/yr)"][,,"BAU"]["GLO",,,invert=TRUE], x_hist = history,size=8)
```


# Calculating and adding a new variable

Let's see whether we can calcualte per capita emissions. First, we have to find out, if our dataset contains something like population data. We apply `pmatch` and `getNames` to find out if there is a variable that has 'pop' in its name.
```{r}
getNames(rem[,,"Pop",pmatch=TRUE])
```

Oh, yes. So, let's simply calculate the per capita emissions.

```{r}
emi_cap <- rem_EMI / rem[,,"Population (million)"]
```

magclass automatically matches regions, years, scenarios, and models. It also automatically assigns names to the resulting variable by combining the variable names of the two source objects. The resulting names look like this now:

```{r}
getNames(emi_cap[,,1])
```

Let's provide better names. First, we remove the 4th sub-dimension, since all names are identical:

```{r}
emi_cap <- collapseNames(emi_cap,collapsedim = 4)
```

We could have got the same if we had removed the "Population (million)" above before dividing:

```{r}
emi_cap <- rem_EMI / collapseNames(rem[,,"Population (million)"])
getNames(emi_cap[,,1])
```

Now, we rename the emission variables by replacing "Emi" with "Per capita emission" and correcting the unit

```{r}
getNames(emi_cap,dim=3) <- gsub("Emi|","Per capita emissions|",getNames(emi_cap,dim=3),fixed = TRUE)
getNames(emi_cap,dim=3) <- gsub("(Mt CO2/yr)","(t CO2/yr/cap)",getNames(emi_cap,dim=3),fixed = TRUE)
```

Now we can add the new variable to the full report
```{r}
rem <- mbind(rem,emi_cap)
```

Finally, plot the freshly calculated per-capita emissions. When selecting the variables use `pmatch=TRUE` to retrieve all variables that have this string in their name. Useful in this case, because we do not need to provide the detailled sector names. Distinguish scenarios (column "Data1" in the data frame that `as.ggplot` produces) by line type and variables (Data3) by color.

```{r}
ggplot (as.ggplot(rem[,,"Per capita emissions",pmatch=TRUE])) + 
  geom_line(aes(x=Year,y=Value,color=Data3,linetype=Data1)) + 
  facet_grid(~Region) + 
  theme(legend.position="bottom") + 
  guides(color = guide_legend(nrow = 4),linetype=guide_legend(nrow = 3))
```


# Adding a new dimension

In the following we are only talking about the third dimension of a magpie object, which can contain further sub-dimensions. We want to add a new sub-dimension distinguishing gases. Add a new sub-dimension with the name "gas" and the variable "CO2"

```{r}
rem_EMI <- add_dimension(rem_EMI, dim = 3.3, add = "gas", nm="CO2")
str(rem_EMI)
```

Add a new element to the dimension just created

```{r}
rem_EMI <- add_columns(rem_EMI,addnm = "CH4",dim=3.3)
```

Now, remove the obsolete "Emi|CO2|" part from the old names

```{r}
getNames(rem_EMI,dim=3) <- gsub("Emi\\|CO2\\|","",getNames(rem_EMI,dim=3))
str(rem_EMI)
```

# Reading data from a gdx file

If the variable or parameter you are interested in is not yet part of the REMIND reporting you can directly access the gdx and import the variable using `readGDX`. Here, we read the level values (*l*) of the variable *vm_emiAll* (which is of course in the reporting already).

```{r}
library(gdx)
rem_gdx <- readGDX(gdx = "fulldata.gdx","vm_emiAll",field = "l",format = "first_found")
str(rem_gdx)
```

*vm_emiAll* does not include global values. We calcualte them with `dimSums` summing over the regions (first dimension) and apennd them using `mbind`.

```{r}
rem_gdx <- mbind(rem_gdx,dimSums(rem_gdx,dim = 1))
```

*vm_emiAll* is defined for all entities, but for most of them it is zero. Let's find the entities for which it contains data. We use the year 2050 as an indicator:

```{r}
non_zero_vars <- getNames(rem_gdx)[rem_gdx["GLO",2050,]!=0]
```

Before plotting we have to rename the dimensions of the magclass object to names that mipArea (and the underlying quitte class) requires

```{r}
getSets(rem_gdx) <- c("region","year","variable")

mipArea(rem_gdx["GLO", getYears(rem_gdx)>="y2005" & getYears(rem_gdx) <="y2100", non_zero_vars])

```

Now, let's pick Kyoto gases only and get the units right. Define vector with names of Kyoto gases

```{r}
kyo <- c("co2","n2o","ch4")
```

Create a new magclass object with the same regions and years as *rem_gdx* and kyoto gases as variables 

```{r}
conversion <- new.magpie(cells_and_regions = getRegions(rem_gdx),years = getYears(rem_gdx),names = kyo)

conversion[,,"co2"] <- 1 * 1000 * 44/12 # GtC      -> Mt CO2
conversion[,,"ch4"] <- 34               # Mt CH4   -> Mt CO2_eq
conversion[,,"n2o"] <- 298 * 44/18      # Mt N2O-N -> Mt CO2_eq

str(conversion)
```

Now, that the magpie objects have the same structure, multiplication is easy

```{r}
rem_gdx[,,kyo] <- rem_gdx[,,kyo] * conversion

mipArea(rem_gdx["GLO",getYears(rem_gdx)>="y2005" & getYears(rem_gdx) <="y2100",kyo]) + labs(title = "Emissions", subtitle = "Kyoto gases", y ="Mt CO2eq/yr")
```