Estimating population abundance for replicated counts data is a computationally intensive problem. N-mixture models are used extensively in ecology to estimate population sizes, and to ascertain under-detection rates. Here I will discuss my new R package: quickNmix, which implements asymptotic solutions to the N-mixture likelihood function. The asymptotic solutions admit faster computation of the likelihood function, and the addition of parallel computing to the package can further increase computing speeds.
Functional data can come from many different areas of study. Some of the most common examples come from finance (for example stock prices over time), or from health research (such as fMRI time series). Analyzing data of this form has been done traditionally using time series analysis techniques. However, viewing the data as functional, rather than individual observed points, can lead to more natural interpretations and analysis. Here we will be looking at a single example data set, and learning how to represent discrete data as functional data objects.
Neural Networks are an immensely useful class of machine learning model, with countless applications. Today we are going to analyze a data set and see if we can gain new insights by applying unsupervised clustering techniques to find patterns and hidden groupings within the data.
Our goal is to produce a dimension reduction on complicated data, so that we can create unsupervised, interpretable clusters like this:
Figure 1: Amazon cell phone data encoded in a 3 dimensional space, with K-means clustering defining eight clusters.
Rcpp is an R library allowing for easy integration of C++ code in your R workflow. It allows you to create optimized functions for when R just isn’t fast enough. It can also be used as a bridge between R and C++ giving you the ability to access the existing C++ libraries.
Why use Rcpp? There are many use cases for Rcpp, and of course many of them assume that you are interested in primarily working in R.
Bootstrapping is a statistical technique for analyzing the distributional properties of sample data (such as variability and bias). It has many uses, and is generally quite easy to implement. Continue reading to learn how you can perform a bootstrap procedure in R!
What is bootstrapping? The bootstrap essentially uses re-sampling of a set of sample data in order to observe properties of the distribution of the data. For each re-sampling of the data (each “bootstrap sample”), you sample with replacement from the sample data, and compute the statistic of interest on the bootstrap sample (the bootstrap statistic).
optimizeAPA is an R package which allows for multi-parameter optimization. That means you can use it to find the maximum (or the minimum) value of a function with many input values. What makes optimizeAPA unique? It works with arbitrary precision arithmetic.
Why use optimizeAPA? 1) works with both APA and NAPA optimization 2) works with both single parameter and multi-parameter functions 3) save an output file at each iteration 4) allows you to keep every value and input visited 5) easily plot the convergence path with a single function call Note: APA stands for “arbitrary precision arithmetic”, while NAPA stands for “non arbitrary precision arithmetic”
Welcome to the world of manifold regression! In part 2 we will apply manifold regression to a case study involving fMRI brain imaging data. See part 1 for an introduction to these models.
If you want to skip past the data preparation steps, and go right into the manifold regression, click here
Getting Data
First, we need a set of data to work from. There are many great fMRI imaging datasets available on the OpenNeuro website.
Welcome to the world of manifold regression! In part 1 we will introduce the basic concepts, overview the theory behind regression on manifolds, develop an intuition for these models, and discuss their applications. See part 2 for a step by step statistical analysis applying these models.
What is regression? We will consider regular linear regression (RLR) as an analogy to help understand manifold regression. In RLR, we consider pairs of observations \((x,y)\), with \(x\) the independent variable, and \(y\) the dependent variable.
Working on a likelihood function that relies on the Poisson distribution with large mean \(\lambda\), I ran into the problem of underflow! Underflow occurs when a number is too small to be stored in memory, and so it is truncated to be equal to zero. In my case, the probabilities are so small in the tails of the distribution, that the probabilities return as 0 (although there is a non-zero probability in those tails).