Module 2: Functions

Learning Objectives

After module 2, you should be able to…

Describe and execute functions in R
Modify default behavior of functions using arguments in R
Use R-specific sources of help to get more information about functions and packages
Differentiate between Base R functions and functions that come from other packages

Function - Basic term

Function - Functions are “self contained” modules of code that accomplish specific tasks. Functions usually take in some sort of object (e.g., vector, list), process it, and return a result. You can write your own, use functions that come directly from installing R (i.e., Base R functions), or use functions from external packages.

A function might help you add numbers together, create a plot, or organize your data. In fact, we have already used three functions in the Module 1, including c(), matrix(), list(). Here is another one, sum()

sum(1, 20234)

[1] 20235

Function

The general usage for a function is the name of the function followed by parentheses (i.e., the function signature). Within the parentheses are arguments.

function_name(argument1, argument2, ...)

Arguments - Basic term

Arguments are what you pass to the function and can include:

the physical object on which the function carries out a task (e.g., can be data such as a number 1 or 20234)

sum(1, 20234)

[1] 20235

options that alter the way the function operates (e.g., such as the base argument in the function log())

log(10, base = 10)

[1] 1

log(10, base = 2)

[1] 3.321928

log(10, base=exp(1))

[1] 2.302585

Arguments

Most functions are created with default argument options. The defaults represent standard values that the author of the function specified as being “good enough in standard cases”. This means if you don’t specify an argument when calling the function, it will use a default.

If you want something specific, simply change the argument yourself with a value of your choice.
If an argument is required but you did not specify it and there is no default argument specified when the function was created, you will receive an error.

Example

What is the default in the base argument of the log() function?

log(10)

[1] 2.302585

Sure that is easy enough, but how do you know

the purpose of a function?
what arguments a function includes?
how to specify the arguments?

Seeking help for using functions (*)

The best way of finding out this information is to use the ? followed by the name of the function. Doing this will open up the help manual in the bottom RStudio Help panel. It provides a description of the function, usage, arguments, details, and examples. Lets look at the help file for the function round()

How to specify arguments

Arguments are separated with a comma
You can specify arguments by either including them in the correct order OR by assigning the argument within the function parentheses.

log(10, 2)

[1] 3.321928

log(base=2, x=10)

[1] 3.321928

log(x=10, 2)

[1] 3.321928

log(10, base=2)

[1] 3.321928

Package - Basic term

When you download R, it has a “base” set of functions, that are associated with a “base” set of packages including: ‘base’, ‘datasets’, ‘graphics’, ‘grDevices’, ‘methods’, ‘stats’ (typically just referred to as Base R).

e.g., the log() function comes from the ‘base’ package

Package - a package in R is a bundle or “package” of code (and or possibly data) that can be loaded together for easy repeated use or for sharing with others.

Packages are analogous to software applications like Microsoft Word. After installation, your operating system allows you to use it, just like having Word installed allows you to use it.

Packages

The Packages pane in RStudio can help you identify what have been installed (listed), and which one have been attached (check mark).

Lets go look at the Packages pane, find the base package and find the log() function. It automatically loads the help file that we looked at earlier using ?log.

Additional Packages

You can install additional packages for your use from CRAN or GitHub. These additional packages are written by RStudio or R users/developers (like us)

Not all packages available on CRAN or GitHub are trustworthy
RStudio (the company) makes a lot of great packages
Who wrote it? Hadley Wickham is a major authority on R (Employee and Developer at RStudio)
How to trust an R package

Installing and attaching packages

To use the bundle or “package” of code (and or possibly data) from a package, you need to install and also attach the package.

To install a package you can

go to R Studio Menu Bar Tools Menu —> Install Packages in the RStudio header

use the following code:

install.packages("package_name")

Installing and attaching packages

To attach (i.e., be able to use the package) you can use the following code:

require(package_name) #library(package_name) also works

More on installing and attaching packages later…

Mini exercise

Find and execute a Base R function that will round the number 0.86424 to two digits.

Functions from Module 1

The combine function c() concatenate/collects/combines single R objects into a vector of R objects. It is mostly used for creating vectors of numbers, character strings, and other data types.

?c

Registered S3 method overwritten by 'printr':
  method                from     
  knit_print.data.frame rmarkdown

Combine Values into a Vector or List

Description:

     This is a generic function which combines its arguments.

     The default method combines its arguments to form a vector.  All
     arguments are coerced to a common type which is the type of the
     returned value, and all attributes except names are removed.

Usage:

     ## S3 Generic function
     c(...)
     
     ## Default S3 method:
     c(..., recursive = FALSE, use.names = TRUE)
     
Arguments:

     ...: objects to be concatenated.  All 'NULL' entries are dropped
          before method dispatch unless at the very beginning of the
          argument list.

recursive: logical.  If 'recursive = TRUE', the function recursively
          descends through lists (and pairlists) combining all their
          elements into a vector.

use.names: logical indicating if 'names' should be preserved.

Details:

     The output type is determined from the highest type of the
     components in the hierarchy NULL < raw < logical < integer <
     double < complex < character < list < expression.  Pairlists are
     treated as lists, whereas non-vector components (such as 'name's /
     'symbol's and 'call's) are treated as one-element 'list's which
     cannot be unlisted even if 'recursive = TRUE'.

     There is a 'c.factor' method which combines factors into a factor.

     'c' is sometimes used for its side effect of removing attributes
     except names, for example to turn an 'array' into a vector.
     'as.vector' is a more intuitive way to do this, but also drops
     names.  Note that methods other than the default are not required
     to do this (and they will almost certainly preserve a class
     attribute).

     This is a primitive function.

Value:

     'NULL' or an expression or a vector of an appropriate mode.  (With
     no arguments the value is 'NULL'.)

S4 methods:

     This function is S4 generic, but with argument list '(x, ...)'.

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

See Also:

     'unlist' and 'as.vector' to produce attribute-free vectors.

Examples:

     c(1,7:9)
     c(1:5, 10.5, "next")
     
     ## uses with a single argument to drop attributes
     x <- 1:4
     names(x) <- letters[1:4]
     x
     c(x)          # has names
     as.vector(x)  # no names
     dim(x) <- c(2,2)
     x
     c(x)
     as.vector(x)
     
     ## append to a list:
     ll <- list(A = 1, c = "C")
     ## do *not* use
     c(ll, d = 1:3) # which is == c(ll, as.list(c(d = 1:3)))
     ## but rather
     c(ll, d = list(1:3))  # c() combining two lists
     
     c(list(A = c(B = 1)), recursive = TRUE)
     
     c(options(), recursive = TRUE)
     c(list(A = c(B = 1, C = 2), B = c(E = 7)), recursive = TRUE)

Functions from Module 1

The paste0() function concatenate/combines vectors after converting to character.

vector.object2 <- paste0(c("b", "t", "u"), c(8,4,2))
vector.object2
?paste0

Concatenate Strings

Description:

     Concatenate vectors after converting to character.

Usage:

     paste (..., sep = " ", collapse = NULL, recycle0 = FALSE)
     paste0(...,            collapse = NULL, recycle0 = FALSE)
     
Arguments:

     ...: one or more R objects, to be converted to character vectors.

     sep: a character string to separate the terms.  Not
          'NA_character_'.

collapse: an optional character string to separate the results.  Not
          'NA_character_'.

recycle0: 'logical' indicating if zero-length character arguments
          should lead to the zero-length 'character(0)' after the
          'sep'-phase (which turns into '""' in the 'collapse'-phase,
          i.e., when 'collapse' is not 'NULL').

Details:

     'paste' converts its arguments (_via_ 'as.character') to character
     strings, and concatenates them (separating them by the string
     given by 'sep').  If the arguments are vectors, they are
     concatenated term-by-term to give a character vector result.
     Vector arguments are recycled as needed, with zero-length
     arguments being recycled to '""' only if 'recycle0' is not true
     _or_ 'collapse' is not 'NULL'.

     Note that 'paste()' coerces 'NA_character_', the character missing
     value, to '"NA"' which may seem undesirable, e.g., when pasting
     two character vectors, or very desirable, e.g. in 'paste("the
     value of p is ", p)'.

     'paste0(..., collapse)' is equivalent to 'paste(..., sep = "",
     collapse)', slightly more efficiently.

     If a value is specified for 'collapse', the values in the result
     are then concatenated into a single string, with the elements
     being separated by the value of 'collapse'.

Value:

     A character vector of the concatenated values.  This will be of
     length zero if all the objects are, unless 'collapse' is non-NULL,
     in which case it is '""' (a single empty string).

     If any input into an element of the result is in UTF-8 (and none
     are declared with encoding '"bytes"', see 'Encoding'), that
     element will be in UTF-8, otherwise in the current encoding in
     which case the encoding of the element is declared if the current
     locale is either Latin-1 or UTF-8, at least one of the
     corresponding inputs (including separators) had a declared
     encoding and all inputs were either ASCII or declared.

     If an input into an element is declared with encoding '"bytes"',
     no translation will be done of any of the elements and the
     resulting element will have encoding '"bytes"'.  If 'collapse' is
     non-NULL, this applies also to the second, collapsing, phase, but
     some translation may have been done in pasting object together in
     the first phase.

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

See Also:

     'toString' typically calls 'paste(*, collapse=", ")'.  String
     manipulation with 'as.character', 'substr', 'nchar', 'strsplit';
     further, 'cat' which concatenates and writes to a file, and
     'sprintf' for C like string construction.

     'plotmath' for the use of 'paste' in plot annotation.

Examples:

     ## When passing a single vector, paste0 and paste work like as.character.
     paste0(1:12)
     paste(1:12)        # same
     as.character(1:12) # same
     
     ## If you pass several vectors to paste0, they are concatenated in a
     ## vectorized way.
     (nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9))))
     
     ## paste works the same, but separates each input with a space.
     ## Notice that the recycling rules make every input as long as the longest input.
     paste(month.abb, "is the", nth, "month of the year.")
     paste(month.abb, letters)
     
     ## You can change the separator by passing a sep argument
     ## which can be multiple characters.
     paste(month.abb, "is the", nth, "month of the year.", sep = "_*_")
     
     ## To collapse the output into a single string, pass a collapse argument.
     paste0(nth, collapse = ", ")
     
     ## For inputs of length 1, use the sep argument rather than collapse
     paste("1st", "2nd", "3rd", collapse = ", ") # probably not what you wanted
     paste("1st", "2nd", "3rd", sep = ", ")
     
     ## You can combine the sep and collapse arguments together.
     paste(month.abb, nth, sep = ": ", collapse = "; ")
     
     ## Using paste() in combination with strwrap() can be useful
     ## for dealing with long strings.
     (title <- paste(strwrap(
         "Stopping distance of cars (ft) vs. speed (mph) from Ezekiel (1930)",
         width = 30), collapse = "\n"))
     plot(dist ~ speed, cars, main = title)
     
     ## 'recycle0 = TRUE' allows more vectorized behaviour, i.e. zero-length recycling :
     valid <- FALSE
     val <- pi
     paste("The value is", val[valid], "-- not so good!")
     paste("The value is", val[valid], "-- good: empty!", recycle0=TRUE) # -> character(0)
     ## When  'collapse = <string>',  the result is a length-1 string :
     paste("foo", {}, "bar", collapse="|")                  # |-->  "foo  bar"
     paste("foo", {}, "bar", collapse="|", recycle0 = TRUE) # |-->  ""
     ## all empty args
     paste(    collapse="|")                  # |-->  ""  as do all these:
     paste(    collapse="|", recycle0 = TRUE)
     paste({}, collapse="|")
     paste({}, collapse="|", recycle0 = TRUE)

Functions from Module 1

The matrix() function creates a matrix from the given set of values.

matrix.object <- matrix(data=vector.object1, nrow=2, ncol=2, byrow=TRUE)
matrix.object
?matrix

Matrices

Description:

     'matrix' creates a matrix from the given set of values.

     'as.matrix' attempts to turn its argument into a matrix.

     'is.matrix' tests if its argument is a (strict) matrix.

Usage:

     matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
            dimnames = NULL)
     
     as.matrix(x, ...)
     ## S3 method for class 'data.frame'
     as.matrix(x, rownames.force = NA, ...)
     
     is.matrix(x)
     
Arguments:

    data: an optional data vector (including a list or 'expression'
          vector).  Non-atomic classed R objects are coerced by
          'as.vector' and all attributes discarded.

    nrow: the desired number of rows.

    ncol: the desired number of columns.

   byrow: logical. If 'FALSE' (the default) the matrix is filled by
          columns, otherwise the matrix is filled by rows.

dimnames: A 'dimnames' attribute for the matrix: 'NULL' or a 'list' of
          length 2 giving the row and column names respectively.  An
          empty list is treated as 'NULL', and a list of length one as
          row names.  The list can be named, and the list names will be
          used as names for the dimensions.

       x: an R object.

     ...: additional arguments to be passed to or from methods.

rownames.force: logical indicating if the resulting matrix should have
          character (rather than 'NULL') 'rownames'.  The default,
          'NA', uses 'NULL' rownames if the data frame has 'automatic'
          row.names or for a zero-row data frame.

Details:

     If one of 'nrow' or 'ncol' is not given, an attempt is made to
     infer it from the length of 'data' and the other parameter.  If
     neither is given, a one-column matrix is returned.

     If there are too few elements in 'data' to fill the matrix, then
     the elements in 'data' are recycled.  If 'data' has length zero,
     'NA' of an appropriate type is used for atomic vectors ('0' for
     raw vectors) and 'NULL' for lists.

     'is.matrix' returns 'TRUE' if 'x' is a vector and has a '"dim"'
     attribute of length 2 and 'FALSE' otherwise.  Note that a
     'data.frame' is *not* a matrix by this test.  The function is
     generic: you can write methods to handle specific classes of
     objects, see InternalMethods.

     'as.matrix' is a generic function.  The method for data frames
     will return a character matrix if there is only atomic columns and
     any non-(numeric/logical/complex) column, applying 'as.vector' to
     factors and 'format' to other non-character columns.  Otherwise,
     the usual coercion hierarchy (logical < integer < double <
     complex) will be used, e.g., all-logical data frames will be
     coerced to a logical matrix, mixed logical-integer will give a
     integer matrix, etc.

     The default method for 'as.matrix' calls 'as.vector(x)', and hence
     e.g. coerces factors to character vectors.

     When coercing a vector, it produces a one-column matrix, and
     promotes the names (if any) of the vector to the rownames of the
     matrix.

     'is.matrix' is a primitive function.

     The 'print' method for a matrix gives a rectangular layout with
     dimnames or indices.  For a list matrix, the entries of length not
     one are printed in the form 'integer,7' indicating the type and
     length.

Note:

     If you just want to convert a vector to a matrix, something like

       dim(x) <- c(nx, ny)
       dimnames(x) <- list(row_names, col_names)
     
     will avoid duplicating 'x' _and_ preserve 'class(x)' which may be
     useful, e.g., for 'Date' objects.

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

See Also:

     'data.matrix', which attempts to convert to a numeric matrix.

     A matrix is the special case of a two-dimensional 'array'.
     'inherits(m, "array")' is true for a 'matrix' 'm'.

Examples:

     is.matrix(as.matrix(1:10))
     !is.matrix(warpbreaks)  # data.frame, NOT matrix!
     warpbreaks[1:10,]
     as.matrix(warpbreaks[1:10,])  # using as.matrix.data.frame(.) method
     
     ## Example of setting row and column names
     mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE,
                    dimnames = list(c("row1", "row2"),
                                    c("C.1", "C.2", "C.3")))
     mdat

Summary

Functions are “self contained” modules of code that accomplish specific tasks.
Arguments are what you pass to functions (e.g., objects on which you carry out the task or options for how to carry out the task)
Arguments may include defaults that the author of the function specified as being “good enough in standard cases”, but that can be changed.
An R Package is a bundle or “package” of code (and or possibly data) that can be used by installing it once and attaching it (using require()`) each time R/Rstudio is opened
The Help pane in RStudio is useful for to get more information about functions and packages

Acknowledgements

These are the materials we looked through, modified, or extracted to complete this module’s lecture.

“Introduction to R - ARCHIVED” from Harvard Chan Bioinformatics Core (HBC)