Staying up-to date: automating tasks from downloading data to reporting

Alex Hurley
University of Birmingham

# Goals

- Discuss why and what to automate
- Introduce task automation concepts
- Highlight tools for each
- Use-case: Study Site Explorer with `R` and `TravisCI`

### Study Site Explorer?

concept of automated report: choose site, download data, generate report -- .center[<img src="./img/report_concept.png" alt="concept of automated report: choose site, download data, generate report" width="450"/>] --- # What and why? -- - <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 640 512"><path d="M629.657 343.598L528.971 444.284c-9.373 9.372-24.568 9.372-33.941 0L394.343 343.598c-9.373-9.373-9.373-24.569 0-33.941l10.823-10.823c9.562-9.562 25.133-9.34 34.419.492L480 342.118V160H292.451a24.005 24.005 0 0 1-16.971-7.029l-16-16C244.361 121.851 255.069 96 276.451 96H520c13.255 0 24 10.745 24 24v222.118l40.416-42.792c9.285-9.831 24.856-10.054 34.419-.492l10.823 10.823c9.372 9.372 9.372 24.569-.001 33.941zm-265.138 15.431A23.999 23.999 0 0 0 347.548 352H160V169.881l40.416 42.792c9.286 9.831 24.856 10.054 34.419.491l10.822-10.822c9.373-9.373 9.373-24.569 0-33.941L144.971 67.716c-9.373-9.373-24.569-9.373-33.941 0L10.343 168.402c-9.373 9.373-9.373 24.569 0 33.941l10.822 10.822c9.562 9.562 25.133 9.34 34.419-.491L96 169.881V392c0 13.255 10.745 24 24 24h243.549c21.382 0 32.09-25.851 16.971-40.971l-16.001-16z"/></svg> recurring tasks: database updates, time series, ancillary data - <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 384 512"><path d="M360 64c13.255 0 24-10.745 24-24V24c0-13.255-10.745-24-24-24H24C10.745 0 0 10.745 0 24v16c0 13.255 10.745 24 24 24 0 90.965 51.016 167.734 120.842 192C75.016 280.266 24 357.035 24 448c-13.255 0-24 10.745-24 24v16c0 13.255 10.745 24 24 24h336c13.255 0 24-10.745 24-24v-16c0-13.255-10.745-24-24-24 0-90.965-51.016-167.734-120.842-192C308.984 231.734 360 154.965 360 64zM192 208c-57.787 0-104-66.518-104-144h208c0 77.945-46.51 144-104 144z"/></svg> time-consuming tasks: QA + QC, updating/creating reports - <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 512 512"><path d="M448 0L320 96v62.06l-83.03 83.03c6.79 4.25 13.27 9.06 19.07 14.87 5.8 5.8 10.62 12.28 14.87 19.07L353.94 192H416l96-128-64-64zM128 278.59L10.92 395.67c-14.55 14.55-14.55 38.15 0 52.71l52.7 52.7c14.56 14.56 38.15 14.56 52.71 0L233.41 384c29.11-29.11 29.11-76.3 0-105.41s-76.3-29.11-105.41 0z"/></svg> automated testing (package development) <br> <br> .center[<img src="./img/schedule_concept.png" alt="concept of automated report: choose site, download data, generate report" width="450"/>] .center[ Download on schedule Field station broken? Bring tools on next trip] --- # Concepts -- ### Storage .left-column[ <br> <br> **local** <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 576 512"><path d="M528 0H48C21.5 0 0 21.5 0 48v320c0 26.5 21.5 48 48 48h192l-16 48h-72c-13.3 0-24 10.7-24 24s10.7 24 24 24h272c13.3 0 24-10.7 24-24s-10.7-24-24-24h-72l-16-48h192c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm-16 352H64V64h448v288z"/></svg>: <br> <br> <br> <br> *vs.* <br> <br> <br> <br> **hosted** <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 640 512"><path d="M537.6 226.6c4.1-10.7 6.4-22.4 6.4-34.6 0-53-43-96-96-96-19.7 0-38.1 6-53.3 16.2C367 64.2 315.3 32 256 32c-88.4 0-160 71.6-160 160 0 2.7.1 5.4.2 8.1C40.2 219.8 0 273.2 0 336c0 79.5 64.5 144 144 144h368c70.7 0 128-57.3 128-128 0-61.9-44-113.6-102.4-125.4z"/></svg>: ] -- .right-column[ - software / triggers, - routines (e.g. `R` scripts), - outputs are **on your computer** <br> <br> <br> <br> <br> - routines and outputs in a repository - software on a virtual machine - webservice schedules / triggers (webhooks) **download** or view online] --- # Concepts -- ### Execution / Trigger .center[ <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 640 512"><path d="M629.657 343.598L528.971 444.284c-9.373 9.372-24.568 9.372-33.941 0L394.343 343.598c-9.373-9.373-9.373-24.569 0-33.941l10.823-10.823c9.562-9.562 25.133-9.34 34.419.492L480 342.118V160H292.451a24.005 24.005 0 0 1-16.971-7.029l-16-16C244.361 121.851 255.069 96 276.451 96H520c13.255 0 24 10.745 24 24v222.118l40.416-42.792c9.285-9.831 24.856-10.054 34.419-.492l10.823 10.823c9.372 9.372 9.372 24.569-.001 33.941zm-265.138 15.431A23.999 23.999 0 0 0 347.548 352H160V169.881l40.416 42.792c9.286 9.831 24.856 10.054 34.419.491l10.822-10.822c9.373-9.373 9.373-24.569 0-33.941L144.971 67.716c-9.373-9.373-24.569-9.373-33.941 0L10.343 168.402c-9.373 9.373-9.373 24.569 0 33.941l10.822 10.822c9.562 9.562 25.133 9.34 34.419-.491L96 169.881V392c0 13.255 10.745 24 24 24h243.549c21.382 0 32.09-25.851 16.971-40.971l-16.001-16z"/></svg> recurring + <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 512 512"><path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8zm57.1 350.1L224.9 294c-3.1-2.3-4.9-5.9-4.9-9.7V116c0-6.6 5.4-12 12-12h48c6.6 0 12 5.4 12 12v137.7l63.5 46.2c5.4 3.9 6.5 11.4 2.6 16.8l-28.2 38.8c-3.9 5.3-11.4 6.5-16.8 2.6z"/></svg> scheduled <br> <br> <br> *vs.* <br> <br> <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 320 512"><path d="M295.973 160H180.572L215.19 30.184C219.25 14.956 207.756 0 192 0H56C43.971 0 33.8 8.905 32.211 20.828l-31.996 240C-1.704 275.217 9.504 288 24.004 288h118.701L96.646 482.466C93.05 497.649 104.659 512 119.992 512c8.35 0 16.376-4.374 20.778-11.978l175.973-303.997c9.244-15.967-2.288-36.025-20.77-36.025z"/></svg> event-based (e.g. on file change)] --- class: inverse, center, middle # Tools

### Scheduling (local)

[`taskscheduleR`](

taskscheduler app Linux via shell scripts:
- **cron** for recurring tasks
- **at** for one-of tasks 21.8zm-46.2 18.8c7.8-5.7 6.9-4.7 5.9-5.5-8-6.9-6.6-27.4 1.8-28.1 6.3-.5 10.8 10.7 9.6 19.6 3.1-2.1 6.7-3.6 10.2-4.6 1.7-19.3-9-33.5-19.1-33.5-18.9 0-24 37.5-8.4 52.1zm-9.4 20.9c1.5 4.9 6.1 10.5 14.7 15.3 7.8 4.6 12 11.5 20 15 2.6 1.1 5.7 1.9 9.6 2.1 18.4 1.1 27.1-11.3 38.2-14.9 11.7-3.7 20.1-11 22.7-18.1 3.2-8.5-2.1-14.7-10.5-18.2-11.3-4.9-16.3-5.2-22.6-9.3-10.3-6.6-18.8-8.9-25.9-8.9-14.4 0-23.2 9.8-27.9 14.2-.5.5-7.9 5.9-14.1 10.5-4.2 3.3-5.6 7.4-4.2 12.3zm-33.5 252.8L112.1 366c-6.8-9.2-13.8-14.8-21.9-16-7.7-1.2-12.6 1.4-17.7 6.9-4.8 5.1-8.8 12.3-14.3 18-7.8 6.5-9.3 6.2-19.6 9.9-6.3 2.2-11.3 4.6-14.8 11.3-2.7 5-2.1 12.2-.9 20 1.2 7.9 3 16.3.6 23.9v.2c-5 13.7-5 21.7-2.6 26.4 7.9 15.4 46.6 6.1 76.5 21.9 31.4 16.4 72.6 17.1 75.3-18 2.1-20.5-31.5-49-41-68.9zm153.9 35.8c3.2-11 6.3-21.3 6.8-29 .8-15.2 1.6-28.7 4.4-39.9 3.1-12.6 9.3-23.1 21.4-27.3 2.3-21.1 18.7-21.1 38.3-12.5 18.9 8.5 26 16 22.8 26.1 1 0 2-.1 4.2 0 5.2-16.9-14.3-28-30.7-34.8 2.9-12 2.4-24.1-.4-35.7-6-25.3-22.6-47.8-35.2-59-2.3-.1-2.1 1.9 2.6 6.5 11.6 10.7 37.1 49.2 23.3 84.9-3.9-1-7.6-1.5-10.9-1.4-5.3-29.1-17.5-53.2-23.6-64.6-11.5-21.4-29.5-65.3-37.2-95.7-4.5 6.4-12.4 11.9-22.3 15-4.7 1.5-9.7 5.5-15.9 9-13.9 8-30 8.8-42.4-1.2-4.5-3.6-8-7.6-12.6-10.3-1.6-.9-5.1-3.3-6.2-4.1-2 37.8-27.3 85.3-39.3 112.7-8.3 19.7-13.2 40.8-13.8 61.5-21.8-29.1-5.9-66.3 2.6-82.4 9.5-17.6 11-22.5 8.7-20.8-8.6 14-22 36.3-27.2 59.2-2.7 11.9-3.2 24 .3 35.2 3.5 11.2 11.1 21.5 24.6 29.9 0 0 24.8 14.3 38.3 32.5 7.4 10 9.7 18.7 7.4 24.9-2.5 6.7-9.6 8.9-16.7 8.9 4.8 6 10.3 13 14.4 19.6 37.6 25.7 82.2 15.7 114.3-7.2zM415 408.5c-10-11.3-7.2-33.1-17.1-41.6-6.9-6-13.6-5.4-22.6-5.1-7.7 8.8-25.8 19.6-38.4 16.3-11.5-2.9-18-16.3-18.8-29.5-.3.2-.7.3-1 .5-7.1 3.9-11.1 10.8-13.7 21.1-2.5 10.2-3.4 23.5-4.2 38.7-.7 11.8-6.2 26.4-9.9 40.6-3.5 13.2-5.8 25.2-1.1 36.3 7.2 14.5 19.5 20.4 33.7 19.3 14.2-1.1 30.4-9.8 43.6-25.5 22-26.6 62.3-29.7 63.2-46.5.3-5.1-3.1-13-13.7-24.6zM173.3 148.7c2 1.9 4.7 4.5 8 7.1 6.6 ### Event-based

`rOpenSci` [`drake`](

- semi-automated workflow manager
- monitors individual units/sections of analyses pipeline
- updates on change,
- but only parts of pipeline that require re-running

drake r open sci package 12v52H160V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H48C21.5 64 0 85.5 0 112v36c0 6.6 5.4 12 12 12h424c6.6 0 12-5.4 12-12z"/></svg> **at** for one-of tasks --- ### Event-based `rOpenSci` <svg style="height:0.8em;top:.04em;position:relative;fill:#FFDD00;" viewBox="0 0 512 512"><path d="M509.5 184.6L458.9 32.8C452.4 13.2 434.1 0 413.4 0H272v192h238.7c-.4-2.5-.4-5-1.2-7.4zM240 0H98.6c-20.7 0-39 13.2-45.5 32.8L2.5 184.6c-.8 2.4-.8 4.9-1.2 7.4H240V0zM0 224v240c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V224H0z"/></svg> [`drake`]( - semi-automated workflow manager - monitors individual units/sections of analyses pipeline - updates on change, - but only parts of pipeline that require re-running .center[<img src="./img/drake.png" alt="drake r open sci package" width="650"/>] --- layout: false class: inverse, center, middle # Study Site Explorer --- # Study Site Explorer .center[<img src="./img/report_concept.png" alt="concept of automated report: choose site, download data, generate report" width="350"/>] - R Markdown report with: + map + annual Precipitation + Temperature + 3D view of region - Hosted online (collaborators can generate report) - Triggered on file change (add new site) <br> --- layout: true # Study Site Explorer --- **Use parameterized report to define meta data!** .left-column[<img src="./img/report.png" alt="concept of automated report: choose site, download data, generate report" width="150"/>] -- .right-column[ ```r --- author: "rHydro Demonstrator" date: '`r paste("generated at:", Sys.Date())`' output: html_document *params: location: "Mt St Helens" year: value: 2000 *title: "`r paste('Overview for:', params$location)`" --- ``` ] --- **Define area of interest and make map:** ```r # specify area of interest *aoi <- AOI::getAOI(clip = list(params$location,15 ,15), km = TRUE) # pull and plot map map <- OpenStreetMap::openmap(upperLeft = c(aoi@bbox[[4]], aoi@bbox[[1]]), lowerRight = c(aoi@bbox[[2]], aoi@bbox[[3]]), type = "osm", minNumTiles = 12) ``` --- **Download climate data and plot:** ```r daymet_data <- daymetr::download_daymet(lat = aoi@polygons[[1]]@labpt[2], lon = aoi@polygons[[1]]@labpt[1], * start = params$year, * end = params$year) daymet_data$data <- dplyr::mutate(daymet_data$data, tmean = (tmax..deg.c. + tmin..deg.c.)/2, date = as.Date(paste(year, yday, sep = "-"), "%Y-%j")) library(ggplot2) ggplot(daymet_data$data, aes(x = date, y = + geom_col(position = "dodge", color = "darkblue") + labs(x = "Date", y = "P (mm/day)", * title = paste0(params$location,": ", * params$year, " - Precipitation")) + theme_bw() ``` --- **3D-Viz of Site** ([full code available here]( ```r *ned_aoi <- aoi %>% HydroData::findNED() # National DEM # convert to matrix for rayshader ned <- matrix(raster::extract(ned_aoi$NED, raster::extent(ned_aoi$NED), buffer=1000), nrow=ncol(ned_aoi$NED),ncol=nrow(ned_aoi$NED)) overlay <- create_overlay(prcp_raster, ned_aoi$NED) library(rayshader) ned %>% sphere_shade(texture = "imhof1") %>% add_water(detect_water(ned), color="desert") %>% add_overlay(overlay, alphacolor = NULL, alphalayer = 0.8) %>% add_shadow(ray_shade(ned)) %>% add_shadow(ambient_shade(ned)) %>% plot_3d(heightmap = ned , zscale = 1, # fov = 90, lineantialias = TRUE, theta = 15, phi = 85, zoom = 0.3) render_snapshot() rgl::rgl.close() ``` --- layout: true # Study Site Explorer --- **Generate report** ```r ' Render Location and Climate report #' #' @param location Character, location passed to AOI::getAOI() #' @param year Character, Year in YYYY #' #' @return Returns nothing, but writes a file to the reports directory #' @export #' #' @examples *render_report = function(location, year) { # house keeping on names location_dir <- stringr::str_replace_all(location, pattern = "[.]", "-") location_dir <- stringr::str_replace_all(location_dir, pattern = "[ ]", "_") * rmarkdown::render( * "./templates/report_template.Rmd", < * params = list( location = location, year = year ), output_dir = "./reports", output_file = paste0("Report-", location_dir, "-", year, ".html") ) } ``` --- <img src="./img/r_map.png" alt="map" height="400"/> --- <img src="./img/r_clim.png" alt="climate" width="750"/> --- <img src="./img/r_3d.png" alt="three d visual" width="450"/> --- ### Generate report for several sites: ```r ## Script executes report generation for all locations listed in sites.txt source("R/render_report.R") sites <- read.csv("sites.csv", header = TRUE, stringsAsFactors = FALSE) for(site in sites$sites){ render_report(site, year = 2010) } ``` --- ### Generate report for several sites: .pull-left[ ``` +-- reports | +-- Report-Mt_Baldy-2010.html | +-- Report-Mt_St_Helens-2010.html | +-- Report-Grand_Teton_Mountain-2010.html | \-- Report-El_Capitan_Yosemite-2010.html ``` ] -- .pull-right[ <img src="./img/r_3d_tetons.png" alt="" height="300"/> ] --- ### Continuous integration with `Travis` -- <img src="./img/travis/travis_01.png" alt="" height="300"/> --- ### Continuous integration with `Travis` <img src="./img/travis/travis_02.png" alt="" height="300"/> --- ### Continuous integration with `Travis` <img src="./img/travis/travis_03.png" alt="" height="300"/> --- ### Continuous integration with `Travis` <img src="./img/travis/travis_04.png" alt="" height="300"/> --- ### Continuous integration with `Travis` <img src="./img/travis/travis_05.png" alt="" height="300"/> --- ### Continuous integration with `Travis` <img src="./img/travis/travis_06.png" alt="" height="300"/> <!-- --- --> <!-- ### Continuous integration --> <!-- <img src="./img/travis/travis_07.png" alt="" height="300"/> --> --- ### Continuous integration with `Travis` <img src="./img/travis/travis_08.png" alt="" height="300"/> --- ### Continuous integration with `Travis` <img src="./img/travis/travis_09.png" alt="" height="300"/> --- ### Continuous integration with `Travis` .pull-left[<img src="./img/travis/travis_10.png" alt="" height="300"/>] -- .pull-right[ ```r sudo: required *language: r cran: cache: packages before_install: after_success: install: * - Rscript install_packages.R script: * - Rscript generate_parameter_reports.R * - Rscript push_back.R env: global: * secure: <GitHub_Access_Token> ``` ] --- layout: false # Summary: -- ### Notes: - Requires GitHub account - Linked to `TravisCI` (`CircleCI` as alternative) - Build a config `.travis.yml` - See in action: []( -- ### Task Automation - Useful locally or hosted - Frees up time - Builds up and checks data sets - When hosted, allows collaborators to produce standardized outputs