Functional data can come from many different areas of study. Some of the most common examples come from finance (for example stock prices over time), or from health research (such as fMRI time series). Analyzing data of this form has been done traditionally using time series analysis techniques. However, viewing the data as functional, rather than individual observed points, can lead to more natural interpretations and analysis. Here we will be looking at a single example data set, and learning how to represent discrete data as functional data objects.
Neural Networks are an immensely useful class of machine learning model, with countless applications. Today we are going to analyze a data set and see if we can gain new insights by applying unsupervised clustering techniques to find patterns and hidden groupings within the data.
Our goal is to produce a dimension reduction on complicated data, so that we can create unsupervised, interpretable clusters like this:
Figure 1: Amazon cell phone data encoded in a 3 dimensional space, with K-means clustering defining eight clusters.
Rcpp is an R library allowing for easy integration of C++ code in your R workflow. It allows you to create optimized functions for when R just isn’t fast enough. It can also be used as a bridge between R and C++ giving you the ability to access the existing C++ libraries.
Why use Rcpp? There are many use cases for Rcpp, and of course many of them assume that you are interested in primarily working in R.
Bootstrapping is a statistical technique for analyzing the distributional properties of sample data (such as variability and bias). It has many uses, and is generally quite easy to implement. Continue reading to learn how you can perform a bootstrap procedure in R!
What is bootstrapping? The bootstrap essentially uses re-sampling of a set of sample data in order to observe properties of the distribution of the data. For each re-sampling of the data (each “bootstrap sample”), you sample with replacement from the sample data, and compute the statistic of interest on the bootstrap sample (the bootstrap statistic).
Introduction Population abundance models are used extensively in ecological studies, and they provide methods of estimation when only site and time replicated counts are available. These models have many possible applications beyond ecology, and can be applied in the study of disease prevalence and detection rates. Yearly counts of reported cases of depression, stratified by Health Service Delivery Area (HSDA), will be used to estimate both the total cases and the case detection rate (CDR) of depression in the Vancouver Coastal Health Authority (VCHA) region from the year 2000 to 2014.