# Dynamic Statistical Comparisons (I)

I think it is time I started to outline what I have in mind in more detail. Remember the goal: to develop a framework for performing “dynamic statistical comparisons” (DSCs)- that is, statistical comparisons that are more reproducible and easily extensible (by adding new data sets or new methods to the comparison).

I’m going to start simple, which I think is a good way to start most projects. So initially I am going to restrict myself to comparisons that can be done fully in R. By doing this I hope to put off a lot of the trickier issues on platform compatibilities etc. that will have to be dealt with later.

### Concrete Example - comparing regression methods

To take a concrete example, I plan to take a simple simulation study from the Elastic Net paper by Zou and Hastie (Section 5 in this pdf), and try to turn it into a DSC. This is going to be a medium-term project - not finished in a single blog post. By outlining my thinking here I’m hoping to give people a chance to make suggestions on how to improve, or even to contribute directly to a github repository.

Let’s list the components of a typical (simple) simulation study, like this one:

- It has
**parameters**, which indicate how the data are to be simulated. - It has a way (or ways) of simulating
**input**data from those parameters. - It has
**methods**that turn**input**into**output**. - It has a way of
**scoring**methods by comparing the output with something – often with the parameters, but perhaps also with some other**meta-data**– to assess how good the performance is.

I think most of the terms I have highlighted above are somewhat self-explanatory, but maybe
**meta-data** deserves some elaboration. Here I am thinking of pretty much anything that might be needed
to score methods. In most cases the meta-data will be generated at the same time as the input data.
For the regression DSC I think we probably don’t need meta-data, but I believe including it here may allow
additional flexibility in the future.

To make these ideas more concrete, let’s look at how some of them apply in the regression simulation. In this case we are considering regression models of the form where \(E\) are iid \(N(0,\sigma^2)\).

The **input** \( (Y,X) \) is obtained by
i) simulating \( X \), and
ii) simulating \( Y|X \) from \( Y=XB+E. \)
The **parameters** needed to do this are i) the covariance of \( X \), and ii) \( B \) and \( \sigma \).
The **methods** (in this case, ridge regression, LASSO, EN, and naive EN) are given input \( (Y, X) \) and must **output** an estimate of \( B \), call it \( \hat{B} \).

The score computed for each simulation is the squared error \( \sum_j (B_j-\hat{B}_j)^2. \) (Methods are compared by the median squared error, MSE, over simulations in their Table 2).

### Recap

So in outline a typical comparison consists of the following:

- Make parameters
- Make input and meta-data from the parameters
- Run methods to turn input into output
- Score each method by comparing it’s output with the parameters and meta-data
- Make a graph or table of the results

The first 4 steps may well be repeated many times - for example, for multiple **seeds** of the random number generator,
and for multiple different simulation **scenarios** (ways of producing input data). And indeed, some methods
may be run multiple times with different parameter settings. For example,
we might run LASSO with both 2-fold CV and 10-fold CV. We will refer to this as different **flavors** of the method.

### Putting it together

Based on this, I propose the following basic structure for a simple DSC repository implementing a simulation study.
First, we will index simulations by the **seed** (an integer) and **scenario** (a string).
We will have directories `param/`

, `data/`

, `methods/`

, `output/`

and `results/`

to store various steps in the process.
(There is some question about exactly how much we want to store - for now I will go over the top
and store everything, which will be OK if the simulations are not too large.) We’ll use subdirectories to store
different scenarios.
So for example, the parameters used in simulation with seed 7 and scenario A will be stored in a file
with a name along the lines of `param/A/param.7.RData`

. (In general we may not want to restrict ourselves to the RData format,
but I’m going with this for now.)

Now the user will have to supply the following:

- A function
`parammaker`

for making parameters from a given seed and scenario. - A function
`datamaker`

for making data=(input,meta) from parameters (could also depend on seed and scenario) - Methods that turn input into output.
- A function
`score`

for turning output, data and parameters into scores. - A list of seeds to be used for each simulation scenario.

Given this, we’ll provide a function that generates all the parameter and data files, runs the methods to produce output files, and scores all the methods (perhaps with multiple flavors of some methods), outputting a dataframe of results.

### An Example

Initially I thought I would simply post this and wait for feedback, but during the writing phase I found myself on a long flight
and decided to start putting together the code for this. If you are interested, I hope you will take a look at my
github repository dscr which contains the start of an `R`

package.
You should be able to install the `dscr`

package directly from github using `devtools::install_github("stephens999/dscr")`

.
You should also be able to clone the repository and take a look at the example in `vignette/one_sample_location.rmd`

, perhaps even run it! As you will see I decided to start with an example even simpler than the regression one. And functionality is
basic at best, but it does run (for me at least!) and illustrate the ideas. The output I get from running this rmd file is here.

If you have comments I welcome them, either on the blog here, or preferably by opening an Issue. And if you further develop or improve the code, go ahead and put in a pull request!