Tuesday, November 18, 2014

Data manipulation with dplyr

Dplyr is a package for data manipulation developed by Hadley Wickham and Romain Francois for the R statistical software.

  • Introduction to dplyr
  • A Tutorial from João Neto (dplyr.Rmd) gives examples of tools for grouped operations: 
    • n(): number of observations in the current group
    • n_distinct(x): count the number of unique values in x.
    • first(x), last(x) and nth(x, n) - these work similarly to x[1], x[length(x)], and x[n] but give you more control of the result if the value isn’t present.
    • min(), max(), mean(), sum(), sd(), median(), and IQR()

Non standard evaluation

dplyr uses non standard evaluation. To use standard evaluation a work around has to be found. See Stackoverflow question.

No comments: