Last updated: 2023-11-09

Checks: 7 0

Knit directory: misc/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(1) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 6ba05fe. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/ALStruct_cache/
    Ignored:    data/.Rhistory
    Ignored:    data/methylation-data-for-matthew.rds
    Ignored:    data/pbmc/

Untracked files:
    Untracked:  .dropbox
    Untracked:  Icon
    Untracked:  analysis/GHstan.Rmd
    Untracked:  analysis/GTEX-cogaps.Rmd
    Untracked:  analysis/PACS.Rmd
    Untracked:  analysis/Rplot.png
    Untracked:  analysis/SPCAvRP.rmd
    Untracked:  analysis/admm_02.Rmd
    Untracked:  analysis/admm_03.Rmd
    Untracked:  analysis/cholesky.Rmd
    Untracked:  analysis/compare-transformed-models.Rmd
    Untracked:  analysis/cormotif.Rmd
    Untracked:  analysis/cp_ash.Rmd
    Untracked:  analysis/eQTL.perm.rand.pdf
    Untracked:  analysis/eb_prepilot.Rmd
    Untracked:  analysis/eb_var.Rmd
    Untracked:  analysis/ebpmf1.Rmd
    Untracked:  analysis/ebpmf_sla_text.Rmd
    Untracked:  analysis/ebspca_sims.Rmd
    Untracked:  analysis/explore_psvd.Rmd
    Untracked:  analysis/fa_check_identify.Rmd
    Untracked:  analysis/fa_iterative.Rmd
    Untracked:  analysis/flash_test_tree.Rmd
    Untracked:  analysis/flash_tree.Rmd
    Untracked:  analysis/flashier_newgroups.Rmd
    Untracked:  analysis/greedy_ebpmf_exploration_00.Rmd
    Untracked:  analysis/ieQTL.perm.rand.pdf
    Untracked:  analysis/lasso_em_03.Rmd
    Untracked:  analysis/m6amash.Rmd
    Untracked:  analysis/mash_bhat_z.Rmd
    Untracked:  analysis/mash_ieqtl_permutations.Rmd
    Untracked:  analysis/methylation_example.Rmd
    Untracked:  analysis/mixsqp.Rmd
    Untracked:  analysis/mr.ash_lasso_init.Rmd
    Untracked:  analysis/mr.mash.test.Rmd
    Untracked:  analysis/mr_ash_modular.Rmd
    Untracked:  analysis/mr_ash_parameterization.Rmd
    Untracked:  analysis/mr_ash_ridge.Rmd
    Untracked:  analysis/mv_gaussian_message_passing.Rmd
    Untracked:  analysis/nejm.Rmd
    Untracked:  analysis/nmf_bg.Rmd
    Untracked:  analysis/normal_conditional_on_r2.Rmd
    Untracked:  analysis/normalize.Rmd
    Untracked:  analysis/pbmc.Rmd
    Untracked:  analysis/pca_binary_weighted.Rmd
    Untracked:  analysis/pca_l1.Rmd
    Untracked:  analysis/poisson_nmf_approx.Rmd
    Untracked:  analysis/poisson_shrink.Rmd
    Untracked:  analysis/poisson_transform.Rmd
    Untracked:  analysis/pseudodata.Rmd
    Untracked:  analysis/qrnotes.txt
    Untracked:  analysis/ridge_iterative_02.Rmd
    Untracked:  analysis/ridge_iterative_splitting.Rmd
    Untracked:  analysis/samps/
    Untracked:  analysis/sc_bimodal.Rmd
    Untracked:  analysis/shrinkage_comparisons_changepoints.Rmd
    Untracked:  analysis/susie_en.Rmd
    Untracked:  analysis/susie_z_investigate.Rmd
    Untracked:  analysis/svd-timing.Rmd
    Untracked:  analysis/temp.RDS
    Untracked:  analysis/temp.Rmd
    Untracked:  analysis/test-figure/
    Untracked:  analysis/test.Rmd
    Untracked:  analysis/test.Rpres
    Untracked:  analysis/test.md
    Untracked:  analysis/test_qr.R
    Untracked:  analysis/test_sparse.Rmd
    Untracked:  analysis/tree_dist_top_eigenvector.Rmd
    Untracked:  analysis/z.txt
    Untracked:  code/multivariate_testfuncs.R
    Untracked:  code/rqb.hacked.R
    Untracked:  data/4matthew/
    Untracked:  data/4matthew2/
    Untracked:  data/E-MTAB-2805.processed.1/
    Untracked:  data/ENSG00000156738.Sim_Y2.RDS
    Untracked:  data/GDS5363_full.soft.gz
    Untracked:  data/GSE41265_allGenesTPM.txt
    Untracked:  data/Muscle_Skeletal.ACTN3.pm1Mb.RDS
    Untracked:  data/Thyroid.FMO2.pm1Mb.RDS
    Untracked:  data/bmass.HaemgenRBC2016.MAF01.Vs2.MergedDataSources.200kRanSubset.ChrBPMAFMarkerZScores.vs1.txt.gz
    Untracked:  data/bmass.HaemgenRBC2016.Vs2.NewSNPs.ZScores.hclust.vs1.txt
    Untracked:  data/bmass.HaemgenRBC2016.Vs2.PreviousSNPs.ZScores.hclust.vs1.txt
    Untracked:  data/eb_prepilot/
    Untracked:  data/finemap_data/fmo2.sim/b.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out2.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out2_snp.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out_snp.txt
    Untracked:  data/finemap_data/fmo2.sim/data
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.config
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.k
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.k4.config
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.k4.snp
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.ld
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.snp
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.z
    Untracked:  data/finemap_data/fmo2.sim/pos.txt
    Untracked:  data/logm.csv
    Untracked:  data/m.cd.RDS
    Untracked:  data/m.cdu.old.RDS
    Untracked:  data/m.new.cd.RDS
    Untracked:  data/m.old.cd.RDS
    Untracked:  data/mainbib.bib.old
    Untracked:  data/mat.csv
    Untracked:  data/mat.txt
    Untracked:  data/mat_new.csv
    Untracked:  data/matrix_lik.rds
    Untracked:  data/paintor_data/
    Untracked:  data/running_data_chris.csv
    Untracked:  data/running_data_matthew.csv
    Untracked:  data/temp.txt
    Untracked:  data/y.txt
    Untracked:  data/y_f.txt
    Untracked:  data/zscore_jointLCLs_m6AQTLs_susie_eQTLpruned.rds
    Untracked:  data/zscore_jointLCLs_random.rds
    Untracked:  explore_udi.R
    Untracked:  output/fit.k10.rds
    Untracked:  output/fit.nn.rds
    Untracked:  output/fit.nn.s.001.rds
    Untracked:  output/fit.nn.s.01.rds
    Untracked:  output/fit.nn.s.1.rds
    Untracked:  output/fit.nn.s.10.rds
    Untracked:  output/fit.varbvs.RDS
    Untracked:  output/glmnet.fit.RDS
    Untracked:  output/test.bv.txt
    Untracked:  output/test.gamma.txt
    Untracked:  output/test.hyp.txt
    Untracked:  output/test.log.txt
    Untracked:  output/test.param.txt
    Untracked:  output/test2.bv.txt
    Untracked:  output/test2.gamma.txt
    Untracked:  output/test2.hyp.txt
    Untracked:  output/test2.log.txt
    Untracked:  output/test2.param.txt
    Untracked:  output/test3.bv.txt
    Untracked:  output/test3.gamma.txt
    Untracked:  output/test3.hyp.txt
    Untracked:  output/test3.log.txt
    Untracked:  output/test3.param.txt
    Untracked:  output/test4.bv.txt
    Untracked:  output/test4.gamma.txt
    Untracked:  output/test4.hyp.txt
    Untracked:  output/test4.log.txt
    Untracked:  output/test4.param.txt
    Untracked:  output/test5.bv.txt
    Untracked:  output/test5.gamma.txt
    Untracked:  output/test5.hyp.txt
    Untracked:  output/test5.log.txt
    Untracked:  output/test5.param.txt

Unstaged changes:
    Modified:   .gitignore
    Modified:   analysis/flashier_log1p.Rmd
    Modified:   analysis/mr_ash_pen.Rmd
    Modified:   analysis/susie_flash.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/flashier_sla_text.Rmd) and HTML (docs/flashier_sla_text.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 6ba05fe Matthew Stephens 2023-11-09 workflowr::wflow_publish("flashier_sla_text.Rmd")
html 0346f50 Matthew Stephens 2023-11-08 Build site.
Rmd a8ef428 Matthew Stephens 2023-11-08 workflowr::wflow_publish("flashier_sla_text.Rmd")
html 29f2f9a Matthew Stephens 2023-11-06 Build site.
Rmd 8eb7462 Matthew Stephens 2023-11-06 workflowr::wflow_publish("flashier_sla_text.Rmd")
html 68ddffa Matthew Stephens 2023-10-20 Build site.
Rmd 597ecff Matthew Stephens 2023-10-20 workflowr::wflow_publish("flashier_sla_text.Rmd")

Introduction

I want to try running flashier (non-negative) on some text data and see what happens. It is also a chance to try out the flashier release to CRAN.

I tried running flashier on both the log1p transformed counts directly, and log1p transform of fitted values from a topic model. Both produce somewhat promising results. It is hard to beat the log1p transform for simplicity and speed.

library(Matrix)
library(readr)
library(tm)
Loading required package: NLP
library(fastTopics)
library(flashier)
Loading required package: ebnm
Loading required package: magrittr
Loading required package: ggplot2

Attaching package: 'ggplot2'
The following object is masked from 'package:NLP':

    annotate
library(ebpmf)
library(RcppML)
RcppML v0.5.5 using 'options(RcppML.threads = 0)' (all available threads), 'options(RcppML.verbose = FALSE)'
sla <- read_csv("../../gsmash/data/SLA/SCC2016/Data/paperList.txt")
Rows: 3248 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): DOI, title, abstract
dbl (2): year, citCounts

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sla <- sla[!is.na(sla$abstract),]
sla$docnum = 1:nrow(sla)
datax = readRDS('../../gsmash/data/sla_full.rds')
dim(datax$data)
[1]  3207 10104
sum(datax$data==0)/prod(dim(datax$data))
[1] 0.9948157
datax$data = Matrix(datax$data,sparse = TRUE)

Data filtering

filter out some documents: use top 60% longest ones as in Ke and Wang 2022.

doc_to_use = order(rowSums(datax$data),decreasing = T)[1:round(nrow(datax$data)*0.6)]
mat = datax$data[doc_to_use,]
sla = sla[doc_to_use,]
samples = datax$samples
samples = lapply(samples, function(z){z[doc_to_use]})

Filter out words that appear in less than 5 documents. Note: if you don’t do this you can still get real factors that capture very rare words co-occuring. Eg two authors that are cited together. If you are interested in those factors, no need to filter…

word_to_use = which(colSums(mat>0)>4)
mat = mat[,word_to_use]
mat = Matrix(mat,sparse=TRUE)

I tried both the log1p transform on its own (no normalization for document size).

I also tried normalizing for document size, and different pseudocounts, (10, 1, 0.1, 0.01) where the first and last I expected to be too big/small (but in fact the results with 0.01 look quite reasonable in many ways). Note that to keep things sparse I use log(1+X/c) where c is the pseudo-count.

lmat = Matrix(log(mat+1),sparse=TRUE)

docsize = rowSums(mat)
s = docsize/mean(docsize)
lmat_s_10 = Matrix(log(0.1*mat/s+1),sparse=TRUE)
lmat_s_1 = Matrix(log(mat/s+1),sparse=TRUE)
lmat_s_01 = Matrix(log(10*mat/s+1),sparse=TRUE)
lmat_s_001 = Matrix(log(100*mat/s+1),sparse=TRUE)

In addition to the pseudocount, we also have to choose how to regularize the estimates of tau (column-wise precision). It turns out this can have quite a bit effect on results. If tau is not regularized then typically some tau get very big (very small variance) and, intuitively, one is going to “overfit” some words. In the following I implement a rule of thumb based on Jason’s work: I compute the standard deviation of the transformed data for a Poisson random variable of rate \(\mu=4/n\). The 4 comes from the fact that we filtered words that occured in less than 4 documents, so this is a lower bound on the average \(\mu\) for each word. (I ignore variation in document size in this calculation). I think this rule of thumb could be justified as a realistic lower bound on the variance you would expect under a Poisson distribution for the data. (There are reasons to believe that text data may be underdispersed relative to Poisson, but I will ignore this for now.)

mhat = 4/nrow(lmat)
xx = rpois(1e7,mhat) # random poisson
S10 = sd(log(0.1*xx+1))
S1 = sd(log(xx+1)) # sd of log(X+1)
S01 = sd(log(10*xx+1)) # sd if log(10X+1)
S001 = sd(log(100*xx+1)) # sd if log(10X+1)
print(c(S10,S1,S01,S001))
[1] 0.004339581 0.031536221 0.109033434 0.209811829

Fit log1p transformed data

I fit each of the four different pseudocounts here. For comparison I also looked at the maximum likelihood estimates (Frobenius norm minimization, which assumes constant column variances).

set.seed(1)
fit.nn = flash(lmat,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=S1)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Adding factor 53 to flash object...
Adding factor 54 to flash object...
Adding factor 55 to flash object...
Adding factor 56 to flash object...
Adding factor 57 to flash object...
Adding factor 58 to flash object...
Adding factor 59 to flash object...
Adding factor 60 to flash object...
Adding factor 61 to flash object...
Adding factor 62 to flash object...
Adding factor 63 to flash object...
Adding factor 64 to flash object...
Adding factor 65 to flash object...
Adding factor 66 to flash object...
Adding factor 67 to flash object...
Adding factor 68 to flash object...
Adding factor 69 to flash object...
Adding factor 70 to flash object...
Adding factor 71 to flash object...
Adding factor 72 to flash object...
Adding factor 73 to flash object...
Adding factor 74 to flash object...
Adding factor 75 to flash object...
Adding factor 76 to flash object...
Adding factor 77 to flash object...
Adding factor 78 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 77 factors...
Done.
set.seed(1)
fit.nn.s.10 = flash(lmat_s_10,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=S10)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Adding factor 53 to flash object...
Adding factor 54 to flash object...
Adding factor 55 to flash object...
Adding factor 56 to flash object...
Adding factor 57 to flash object...
Adding factor 58 to flash object...
Adding factor 59 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 58 factors...
Done.
set.seed(1)
fit.nn.s.1 = flash(lmat_s_1,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=S1)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Adding factor 53 to flash object...
Adding factor 54 to flash object...
Adding factor 55 to flash object...
Adding factor 56 to flash object...
Adding factor 57 to flash object...
Adding factor 58 to flash object...
Adding factor 59 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 58 factors...
Done.
set.seed(1)
fit.nn.s.01 = flash(lmat_s_01,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=S01)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Adding factor 53 to flash object...
Adding factor 54 to flash object...
Adding factor 55 to flash object...
Adding factor 56 to flash object...
Adding factor 57 to flash object...
Adding factor 58 to flash object...
Adding factor 59 to flash object...
Adding factor 60 to flash object...
Adding factor 61 to flash object...
Adding factor 62 to flash object...
Adding factor 63 to flash object...
Adding factor 64 to flash object...
Adding factor 65 to flash object...
Adding factor 66 to flash object...
Adding factor 67 to flash object...
Adding factor 68 to flash object...
Adding factor 69 to flash object...
Adding factor 70 to flash object...
Adding factor 71 to flash object...
Adding factor 72 to flash object...
Adding factor 73 to flash object...
Adding factor 74 to flash object...
Adding factor 75 to flash object...
Adding factor 76 to flash object...
Adding factor 77 to flash object...
Adding factor 78 to flash object...
Adding factor 79 to flash object...
Adding factor 80 to flash object...
Adding factor 81 to flash object...
Adding factor 82 to flash object...
Adding factor 83 to flash object...
Adding factor 84 to flash object...
Adding factor 85 to flash object...
Adding factor 86 to flash object...
Adding factor 87 to flash object...
Adding factor 88 to flash object...
Adding factor 89 to flash object...
Adding factor 90 to flash object...
Adding factor 91 to flash object...
Adding factor 92 to flash object...
Adding factor 93 to flash object...
Adding factor 94 to flash object...
Adding factor 95 to flash object...
Adding factor 96 to flash object...
Adding factor 97 to flash object...
Adding factor 98 to flash object...
Adding factor 99 to flash object...
Adding factor 100 to flash object...
Adding factor 101 to flash object...
Adding factor 102 to flash object...
Adding factor 103 to flash object...
Adding factor 104 to flash object...
Adding factor 105 to flash object...
Adding factor 106 to flash object...
Adding factor 107 to flash object...
Adding factor 108 to flash object...
Adding factor 109 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 108 factors...
Done.
set.seed(1)
fit.nn.s.001 = flash(lmat_s_001,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=S001)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Adding factor 53 to flash object...
Adding factor 54 to flash object...
Adding factor 55 to flash object...
Adding factor 56 to flash object...
Adding factor 57 to flash object...
Adding factor 58 to flash object...
Adding factor 59 to flash object...
Adding factor 60 to flash object...
Adding factor 61 to flash object...
Adding factor 62 to flash object...
Adding factor 63 to flash object...
Adding factor 64 to flash object...
Adding factor 65 to flash object...
Adding factor 66 to flash object...
Adding factor 67 to flash object...
Adding factor 68 to flash object...
Adding factor 69 to flash object...
Adding factor 70 to flash object...
Adding factor 71 to flash object...
Adding factor 72 to flash object...
Adding factor 73 to flash object...
Adding factor 74 to flash object...
Adding factor 75 to flash object...
Adding factor 76 to flash object...
Adding factor 77 to flash object...
Adding factor 78 to flash object...
Adding factor 79 to flash object...
Adding factor 80 to flash object...
Adding factor 81 to flash object...
Adding factor 82 to flash object...
Adding factor 83 to flash object...
Adding factor 84 to flash object...
Adding factor 85 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 84 factors...
Done.
set.seed(1)
fit.nn.ml = nmf(lmat,k = 100)

set.seed(1)
fit.nn.ml.s.1 = nmf(lmat_s_1, k=100)


saveRDS(fit.nn,file='../output/fit.nn.rds')
saveRDS(fit.nn.s.10,file='../output/fit.nn.s.10.rds')
saveRDS(fit.nn.s.1,file='../output/fit.nn.s.1.rds')
saveRDS(fit.nn.s.01,file='../output/fit.nn.s.01.rds')
saveRDS(fit.nn.s.001,file='../output/fit.nn.s.001.rds')

Look at the keywords for each factor. We see that the flash fits capture more interesting keywords than the ml fits. Generally the flash keywords seem to make some sense for all levels of the pseudocount (although I had to drop the keyword threshold for large pseudocounts).

The ml fits capture a lot of “single-word” factors. It turns out that each factor is loaded on quite a lot of documents (not shown here). So what seems to be happening is that it chooses to fit single common words to explain lots of documents, rather than a small set of words to explain a small set of documents (which is perhaps what we want!)

# sets keywords to NA if number of document membership 
# in the factor does not exceeed docfilter
get_keywords = function(fit,thresh = 2,docfilter=0){
  if("flash" %in% class(fit)){
    LL <- fit$L_pm
    FF = fit$F_pm
  }
  
  if("nmf" %in% class(fit)){ # deals with RcppML::nmf fit
    LL = fit@w
    FF = t(fit@d*fit@h) 
  }
  
  rownames(LL)<-1:nrow(LL)

  Lnorm = t(t(LL)/apply(LL,2,max))
  Fnorm = t(t(FF)*apply(LL,2,max))
  khat = apply(Lnorm,1,which.max)
  Lmax = apply(Lnorm,1,max)
  
  khat[Lmax<0.1] = 0
  keyw.nn =list()

  for(k in 1:ncol(Fnorm)){
     if(sum(Lnorm[,k]>0.5)> docfilter){
      key = Fnorm[,k]>log(thresh)
     
      keyw.nn[[k]] = (colnames(mat)[key])[order(Fnorm[key,k],decreasing = T)]
     } else { 
       keyw.nn[[k]] = NA
     }
  }
  return(keyw.nn)
}
print(get_keywords(fit.nn))
[[1]]
 [1] "model"     "estim"     "data"      "method"    "propos"    "studi"    
 [7] "function"  "distribut" "sampl"     "simul"    

[[2]]
 [1] "fals"      "control"   "procedur"  "test"      "reject"    "hypothes" 
 [7] "rate"      "discoveri" "null"      "multipl"   "pvalu"     "fdr"      
[13] "kfwer"     "stepdown"  "number"    "fwer"      "familywis" "hochberg" 
[19] "error"     "depend"    "alpha"     "statist"  

[[3]]
[1] "cancer" "diseas" "studi" 

[[4]]
[1] "rightcensor"  "surviv"       "lengthbias"   "semiparametr" "failur"      
[6] "data"         "time"         "nonparametr"  "effici"      

[[5]]
[1] "simex"  "measur" "error" 

[[6]]
[1] "wilk"       "test"       "ratio"      "phenomenon" "demonstr"  
[6] "backfit"   

[[7]]
[1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
[6] "casecohort"

[[8]]
[1] "semiparametr" "estim"        "model"       

[[9]]
[1] "test"      "null"      "hypothesi"

[[10]]
[1] "select"  "lasso"   "spars"   "penalti" "penal"   "variabl" "oracl"  

[[11]]
[1] "equivari"  "depth"     "scatter"   "project"   "affin"     "multivari"
[7] "introduc"  "breakdown"

[[12]]
[1] "bandwidth" "kernel"    "local"     "select"   

[[13]]
[1] "markov"    "chain"     "mont"      "carlo"     "algorithm" "bayesian" 

[[14]]
[1] "nconsist"

[[15]]
[1] "varyingcoeffici"

[[16]]
[1] "jackknif" "mix"      "varianc"  "squar"    "area"     "uncondit"

[[17]]
[1] "singleindex"

[[18]]
[1] "choleski"   "matrix"     "covari"     "decomposit"

[[19]]
[1] "motion"

[[20]]
[1] "homoscedast"   "heteroscedast"

[[21]]
[1] "onestep"

[[22]]
[1] "spline" "smooth"

[[23]]
[1] "mle"        "likelihood" "maximum"   

[[24]]
[1] "survey" "popul"  "sampl" 

[[25]]
[1] "memori"

[[26]]
 [1] "retail"   "tradit"   "compani"  "deliveri" "frequenc" "onlin"   
 [7] "quantiti" "differ"   "tail"     "custom"   "consum"   "compon"  
[13] "week"     "time"     "market"   "daili"    "total"    "cost"    
[19] "decis"   

[[27]]
 [1] "instrument"    "birth"         "measur"        "biomark"      
 [5] "health"        "error"         "assess"        "epidemiolog"  
 [9] "likelihoodbas" "prevent"       "identifi"      "nutrit"       
[13] "hour"          "cohort"        "serniparametr" "valid"        
[17] "adjust"        "led"           "factor"        "mortal"       
[21] "deliveri"      "pathway"       "longterm"      "exposur"      
[25] "morbid"        "typic"         "odd"           "food"         
[29] "infant"       

[[28]]
[1] "disabl"  "assumpt" "health"  "report"  "debat"  

[[29]]
[1] "studi"      "errorpron"  "heart"      "covari"     "baselin"   
[6] "framingham" "hazard"    

[[30]]
[1] "nonnorm"

[[31]]
[1] "polynomi" "local"   

[[32]]
[1] "gee"     "equat"   "correl"  "binari"  "general" "work"   

[[33]]
[1] "trim"   "robust" "depth" 

[[34]]
[1] "secondord"

[[35]]
[1] "survivor"

[[36]]
[1] "equat" "estim"

[[37]]
[1] "wavelet"    "besov"      "adapt"      "minimax"    "deconvolut"
[6] "ball"       "function"   "rang"       "rate"      

[[38]]
[1] "volatil"   "highfrequ" "asset"     "financi"   "price"     "matrix"   
[7] "lowfrequ" 

[[39]]
 [1] "dirichlet" "process"   "mixtur"    "hierarch"  "prior"     "tie"      
 [7] "number"    "contain"   "experienc" "discret"   "heavili"   "priori"   

[[40]]
[1] "wild"      "bootstrap" "seri"      "depend"    "irregular" "resampl"  

[[41]]
[1] "toxic"    "dose"     "trial"    "dosefind" "phase"    "probabl"  "clinic"  
[8] "target"   "design"  

[[42]]
[1] "drift"   "diffus"  "process"

[[43]]
[1] "slice"   "invers"  "dimens"  "method"  "regress"

[[44]]
[1] "coverag" "confid"  "interv" 

[[45]]
 [1] "wishart"  "famili"   "graph"    "cone"     "paramet"  "conjug"  
 [7] "prior"    "shape"    "graphic"  "matric"   "covari"   "gaussian"
[13] "decompos" "invers"   "homogen"  "dimens"   "ann"      "type"    
[19] "definit"  "posit"   

[[46]]
[1] "mutual" "empir"  "genet"  "pair"  

[[47]]
[1] "chi"       "test"      "distribut"

[[48]]
[1] "garch"   "process" "seri"   

[[49]]
[1] "varianc" "estim"  

[[50]]
[1] "mestim" "robust"

[[51]]
[1] "densiti"   "anisotrop" "unbound"   "novelti"  

[[52]]
[1] "reweight"

[[53]]
[1] "maximum"    "likelihood"

[[54]]
[1] "function"   "eigenfunct" "random"     "analysi"    "compon"    
[6] "data"       "princip"   

[[55]]
[1] "forecast"    "predict"     "wind"        "weather"     "probabilist"
[6] "calibr"      "northwest"   "speed"      

[[56]]
[1] "tabl"      "conting"   "loglinear"

[[57]]
[1] "aic"       "select"    "criterion" "bic"       "akaik"    

[[58]]
[1] "pollut"   "air"      "mortal"   "nation"   "confound" "trend"    "unmeasur"
[8] "sensit"   "coeffici"

[[59]]
 [1] "motif"      "cluster"    "gene"       "transcript" "bind"      
 [6] "factor"     "regul"      "sequenc"    "protein"    "discoveri" 
[11] "conserv"    "dna"        "nucleotid"  "call"       "dirichlet" 
[16] "process"    "short"      "pattern"    "vari"       "databas"   

[[60]]
 [1] "claim"  "insur"  "vehicl" "age"    "damag"  "type"   "year"   "turn"  
 [9] "detail" "experi" "sever"  "record"

[[61]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

[[62]]
 [1] "treatment"  "random"     "depress"    "care"       "trial"     
 [6] "patient"    "outcom"     "adher"      "subject"    "noncompli" 
[11] "effect"     "intervent"  "receiv"     "complianc"  "assumpt"   
[16] "primari"    "improv"     "causal"     "latent"     "elder"     
[21] "health"     "meet"       "longitudin"

[[63]]
 [1] "tau"      "yield"    "factor"   "month"    "appear"   "price"   
 [7] "curv"     "output"   "econom"   "consider" "current"  "fit"     

[[64]]
[1] "vaccin"    "infect"    "individu"  "outcom"    "causal"    "transmiss"

[[65]]
[1] "week"     "count"    "outbreak" "presenc"  "detect"   "fit"      "diseas"  
[8] "isol"     "resist"  

[[66]]
[1] "quantil" "regress"

[[67]]
[1] "spacetim" "site"     "time"     "tempor"   "spatial" 

[[68]]
[1] "nonneg"

[[69]]
[1] "servic"  "care"    "provid"  "patient" "health" 

[[70]]
[1] "sobolev" "densiti" "minimax"

[[71]]
[1] "distort"   "respons"   "predictor"

[[72]]
 [1] "elicit"       "inform"       "question"     "prior"        "psycholog"   
 [6] "respond"      "peopl"        "statistician" "result"       "person"      
[11] "success"      "uncertain"    "issu"         "particip"     "reduc"       
[16] "lack"         "histor"       "repres"       "sens"         "task"        
[21] "answer"      

[[73]]
[1] "loci"         "genet"        "allel"        "popul"        "genom"       
[6] "relationship" "diseas"       "map"          "genotyp"     

[[74]]
[1] "event"  "termin" "recurr" "censor"

[[75]]
[1] "unstabl"   "problemat" "exponenti" "famili"    "discret"   "depend"   

[[76]]
[1] "load"   "factor" "time"  

[[77]]
[1] "cox"     "hazard"  "proport"
print(get_keywords(fit.nn.s.10,1.2)) #there are no keywords at the default threshold
[[1]]
[1] "estim" "model"

[[2]]
[1] "lengthbias" "surviv"     "preval"     "cohort"    

[[3]]
[1] "hazard"  "proport"

[[4]]
[1] "simex"

[[5]]
[1] "memori"  "paramet" "subspac"

[[6]]
[1] "semiparametr" "estim"        "model"       

[[7]]
[1] "meansquar" "predict"   "error"     "small"     "area"     

[[8]]
[1] "lasso"    "select"   "variabl"  "regress"  "coeffici" "adapt"   

[[9]]
[1] "bandwidth" "kernel"   

[[10]]
[1] "jackknif" "squar"    "mix"      "lead"     "area"     "error"   

[[11]]
[1] "matrix"     "choleski"   "covari"     "decomposit" "factor"    

[[12]]
[1] "function"    "singleindex" "compon"      "link"       

[[13]]
[1] "onestep"

[[14]]
[1] "mse"       "predictor" "linear"    "empir"    

[[15]]
[1] "polynomi" "local"    "estim"    "regress" 

[[16]]
[1] "procedur"  "fals"      "control"   "test"      "discoveri" "rate"     
[7] "reject"    "hypothes"  "fdr"      

[[17]]
[1] "polymorph" "genotyp"   "haplotyp"  "snp"      

[[18]]
[1] "gee"     "equat"   "correl"  "binari"  "general"

[[19]]
[1] "equivari"  "depth"     "breakdown" "concept"   "introduc" 

[[20]]
[1] "nonrespons" "imput"      "survey"     "respons"   

[[21]]
[1] "mle"        "likelihood"

[[22]]
[1] "robin"      "miss"       "zhao"       "casecohort" "rotnitzki" 

[[23]]
[1] "vote"   "elect"  "candid"

[[24]]
[1] "sampl"     "survey"    "designbas" "infer"     "weight"    "modelbas" 
[7] "popul"    

[[25]]
[1] "forecast" "wind"     "predict"  "weather" 

[[26]]
[1] "test"      "logrank"   "weight"    "treatment" "formula"   "patient"  
[7] "supremum"  "standard"  "twostag"  

[[27]]
[1] "track"  "replac" "usag"  

[[28]]
[1] "precipit" "spatial" 

[[29]]
[1] "secondord"

[[30]]
character(0)

[[31]]
[1] "twostep"  "submodel"

[[32]]
[1] "design"  "paramet" "effici" 

[[33]]
[1] "timedepend" "covari"     "treatment" 

[[34]]
[1] "survivor"

[[35]]
[1] "miss" "data"

[[36]]
[1] "test"      "null"      "hypothesi"

[[37]]
[1] "densiti" "sobolev"

[[38]]
[1] "trim"   "robust" "depth" 

[[39]]
[1] "substitut" "euclidean"

[[40]]
[1] "empir"      "likelihood" "bartlett"   "adjust"    

[[41]]
[1] "equat" "estim"

[[42]]
[1] "volatil"   "highfrequ" "asset"     "price"    

[[43]]
[1] "statist" "assoc"   "amer"   

[[44]]
[1] "nonneg"

[[45]]
[1] "panel"

[[46]]
[1] "norm"   "matrix"

[[47]]
[1] "popul"      "superpopul"

[[48]]
[1] "homoscedast"

[[49]]
[1] "misspecif"

[[50]]
[1] "file"   "linkag"

[[51]]
[1] "varianc" "estim"  

[[52]]
[1] "adapt"   "besov"   "wavelet" "minimax" "risk"   

[[53]]
[1] "kaplanmei" "quantil"   "surviv"    "censor"   

[[54]]
[1] "axe"    "rotat"  "matric"

[[55]]
[1] "mutual" "empir"  "genet" 

[[56]]
[1] "innov"   "process" "residu" 

[[57]]
character(0)

[[58]]
[1] "monoton"  "function"
print(get_keywords(fit.nn.s.1))
[[1]]
[1] "model"  "estim"  "method" "data"  

[[2]]
 [1] "fals"      "procedur"  "control"   "test"      "discoveri" "rate"     
 [7] "reject"    "hypothes"  "fdr"       "multipl"   "pvalu"     "null"     
[13] "number"    "kfwer"    

[[3]]
[1] "test"      "null"      "hypothesi" "distribut"

[[4]]
 [1] "treatment" "trial"     "random"    "assign"    "patient"   "effect"   
 [7] "outcom"    "clinic"    "causal"    "placebo"   "assumpt"  

[[5]]
[1] "surviv" "time"   "hazard" "censor" "failur" "studi" 

[[6]]
[1] "simex"              "measur"             "simulationextrapol"
[4] "error"             

[[7]]
[1] "wilk"

[[8]]
[1] "lasso"    "select"   "variabl"  "regress"  "coeffici"

[[9]]
[1] "rankbas"  "effici"   "asymptot" "rank"    

[[10]]
[1] "nconsist"

[[11]]
[1] "assoc"   "amer"    "statist" "ann"    

[[12]]
[1] "mle"        "likelihood" "maximum"   

[[13]]
[1] "varyingcoeffici"

[[14]]
[1] "semiparametr" "estim"        "model"        "parametr"    

[[15]]
 [1] "adapt"      "wavelet"    "besov"      "minimax"    "ball"      
 [6] "rang"       "threshold"  "risk"       "deconvolut" "nois"      

[[16]]
[1] "memori"

[[17]]
[1] "bandwidth" "kernel"    "local"     "select"   

[[18]]
[1] "forecast"    "predict"     "wind"        "weather"     "spatial"    
[6] "calibr"      "speed"       "meteorolog"  "probabilist"

[[19]]
[1] "choleski"   "matrix"     "covari"     "decomposit" "factor"    
[6] "interpret" 

[[20]]
[1] "mse"       "predictor" "linear"    "error"     "squar"     "empir"    

[[21]]
[1] "depth"   "project"

[[22]]
[1] "singleindex" "function"    "link"        "compon"      "unknown"    

[[23]]
[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[[24]]
[1] "penal"      "nonconcav"  "likelihood" "select"     "variabl"   
[6] "oracl"      "penalti"    "regular"   

[[25]]
[1] "jackknif" "mix"      "squar"    "area"     "varianc" 

[[26]]
[1] "homoscedast"   "heteroscedast"

[[27]]
[1] "spline" "smooth"

[[28]]
[1] "survey" "popul"  "sampl" 

[[29]]
[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[[30]]
[1] "onestep"

[[31]]
[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[[32]]
[1] "nonnorm"

[[33]]
[1] "polynomi" "local"    "regress" 

[[34]]
[1] "gee"     "equat"   "correl"  "general" "binari"  "work"   

[[35]]
[1] "theta"   "paramet"

[[36]]
[1] "robin"     "miss"      "zhao"      "rotnitzki" "effici"   

[[37]]
[1] "mestim" "robust"

[[38]]
[1] "finitesampl"

[[39]]
[1] "sobolev" "densiti" "minimax" "rate"   

[[40]]
[1] "elect" "vote"  "poll" 

[[41]]
[1] "errorpron" "error"    

[[42]]
[1] "panel" "count"

[[43]]
[1] "stock"

[[44]]
[1] "garch"   "process" "volatil"

[[45]]
[1] "secondord"

[[46]]
[1] "equat" "estim"

[[47]]
[1] "slice"   "invers"  "regress" "dimens"  "method" 

[[48]]
[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[[49]]
[1] "survivor"

[[50]]
[1] "slope"

[[51]]
[1] "chi"  "test"

[[52]]
[1] "varianc"

[[53]]
[1] "function"   "eigenfunct" "analysi"    "random"     "princip"   
[6] "compon"     "data"      

[[54]]
[1] "tabl"    "conting"

[[55]]
[1] "criterion" "akaik"     "select"    "model"    

[[56]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

[[57]]
[1] "neighborhood"

[[58]]
[1] "maximum"    "welldefin"  "posteriori"
print(get_keywords(fit.nn.s.01))
[[1]]
 [1] "model"       "estim"       "data"        "method"      "propos"     
 [6] "studi"       "simul"       "distribut"   "function"    "sampl"      
[11] "paramet"     "approach"    "statist"     "base"        "asymptot"   
[16] "problem"     "general"     "regress"     "analysi"     "test"       
[21] "develop"     "procedur"    "perform"     "illustr"     "condit"     
[26] "set"         "applic"      "observ"      "variabl"     "likelihood" 
[31] "consist"     "time"        "appli"       "covari"      "properti"   
[36] "random"      "comput"      "articl"      "linear"      "case"       
[41] "process"     "infer"       "error"       "select"      "number"     
[46] "effici"      "rate"        "nonparametr" "deriv"       "measur"     
[51] "effect"      "algorithm"   "class"       "paper"       "compar"     
[56] "provid"      "includ"      "depend"     

[[2]]
 [1] "fals"       "control"    "procedur"   "test"       "rate"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "fdr"        "hochberg"   "number"     "stepdown"  
[16] "kfwer"      "familywis"  "error"      "depend"     "proport"   
[21] "benjamini"  "fwer"       "statist"    "fdp"        "soc"       
[26] "divid"      "power"      "roy"        "stepup"     "alpha"     
[31] "deriv"      "abil"       "ser"        "individu"   "detect"    
[36] "gamma"      "total"      "hypothesi"  "conserv"    "toler"     
[41] "attent"     "defin"      "singlestep" "construct"  "fix"       
[46] "simultan"   "probabl"    "independ"   "ann"        "usual"     
[51] "sime"       "improv"     "increas"   

[[3]]
 [1] "treatment"   "random"      "trial"       "patient"     "effect"     
 [6] "assign"      "noncompli"   "assumpt"     "outcom"      "complianc"  
[11] "causal"      "adher"       "depress"     "placebo"     "receiv"     
[16] "care"        "subject"     "clinic"      "intervent"   "drug"       
[21] "arm"         "dose"        "improv"      "primari"     "treat"      
[26] "princip"     "analys"      "latent"      "elder"       "control"    
[31] "sever"       "contrast"    "instrument"  "stratif"     "activ"      
[36] "particip"    "framework"   "prevent"     "potenti"     "physician"  
[41] "benefit"     "infer"       "imperfect"   "children"    "encourag"   
[46] "estimand"    "doserespons"

[[4]]
 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "event"        "semiparametr" "proport"      "data"        
[11] "cancer"       "covari"       "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "rightcensor" 
[21] "consist"      "nonparametr"  "trial"       

[[5]]
[1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
[7] "power"     "asymptot"  "hypothes" 

[[6]]
 [1] "simex"              "simulationextrapol" "measur"            
 [4] "error"              "undersmooth"        "asymptot"          
 [7] "longer"             "accuraci"           "finitesampl"       
[10] "principl"           "bias"               "presenc"           
[13] "selector"           "wang"               "rootn"             

[[7]]
 [1] "wilk"       "ratio"      "phenomenon" "correct"    "relax"     
 [6] "conduct"    "newli"      "unspecifi"  "freedom"    "follow"    
[11] "backfit"    "nuisanc"    "theorem"    "degre"      "chisquar"  
[16] "likelihood" "empir"      "ask"        "hold"      

[[8]]
 [1] "mle"         "maximum"     "likelihood"  "main"        "prove"      
 [6] "asymptot"    "converg"     "limit"       "mles"        "status"     
[11] "rate"        "current"     "brownian"    "behavior"    "motion"     
[16] "estim"       "proof"       "uniqu"       "nonparametr"

[[9]]
 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "algorithm"
 [7] "posterior" "infer"     "prior"     "model"     "mcmc"     

[[10]]
 [1] "lasso"     "select"    "variabl"   "regress"   "coeffici"  "spars"    
 [7] "penalti"   "adapt"     "linear"    "oracl"     "penal"     "problem"  
[13] "sparsiti"  "algorithm" "regular"  

[[11]]
[1] "varyingcoeffici" "nonparametr"     "coeffici"        "linear"         
[5] "longitudin"      "conduct"         "propos"          "vari"           
[9] "regress"        

[[12]]
 [1] "rankbas"      "effici"       "asymptot"     "rank"         "ellipt"      
 [6] "cam"          "class"        "densiti"      "uniform"      "normal"      
[11] "version"      "sign"         "multivari"    "matric"       "symmetri"    
[16] "valid"        "finit"        "scatter"      "ann"          "contour"     
[21] "tradit"       "assumpt"      "sens"         "irrespect"    "rootn"       
[26] "semiparametr" "center"      

[[13]]
 [1] "nconsist" "root"     "reduct"   "dimens"   "exist"    "direct"  
 [7] "central"  "slice"    "exhaust"  "contour"  "ellipt"   "advantag"
[13] "mild"     "strong"   "regress"  "varianc"  "suffici"  "invers"  
[19] "averag"  

[[14]]
 [1] "semiparametr" "estim"        "nonparametr"  "parametr"     "paramet"     
 [6] "model"        "effici"       "asymptot"     "likelihood"   "regress"     
[11] "function"    

[[15]]
 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  

[[16]]
 [1] "nonconcav"     "penal"         "select"        "oracl"        
 [5] "penalti"       "variabl"       "likelihood"    "regular"      
 [9] "fan"           "challeng"      "nondifferenti" "maxim"        
[13] "sandwich"      "onestep"       "establish"     "concav"       
[17] "broad"         "enjoy"         "employ"        "selector"     
[21] "encourag"      "cost"         

[[17]]
 [1] "penalis"       "newtonraphson" "framingham"    "penalti"      
 [5] "likelihood"    "heart"         "failur"        "carri"        
 [9] "algorithm"     "proper"        "conduct"       "advanc"       
[13] "grow"          "dropout"       "familiar"      "prospect"     

[[18]]
[1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
[5] "famili"        "error"        

[[19]]
[1] "nonnorm"   "normal"    "mix"       "linear"    "exponenti"

[[20]]
[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

[[21]]
 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

[[22]]
 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "taper"         "frequenc"      "long"          "fraction"     
 [9] "averag"        "depend"        "paramet"       "periodogram"  
[13] "stationari"    "move"          "slowli"        "whittl"       
[17] "eigenvector"   "local"         "nonstationari" "distinct"     
[21] "angl"         

[[23]]
 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

[[24]]
[1] "polynomi"    "local"       "regress"     "smooth"      "nonparametr"
[6] "asymptot"   

[[25]]
 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

[[26]]
 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

[[27]]
 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

[[28]]
 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

[[29]]
 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[[30]]
[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

[[31]]
 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "condit"       
 [9] "moment"        "autoregress"   "financi"       "local"        
[13] "standard"      "innov"         "sequenc"       "satisfi"      
[17] "move"          "iid"           "time"          "averag"       
[21] "root"          "mont"          "carlo"        

[[32]]
[1] "quantil" "regress"

[[33]]
 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

[[34]]
 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

[[35]]
 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[[36]]
[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

[[37]]
 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[[38]]
[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

[[39]]
 [1] "singleindex" "unknown"     "link"        "compon"      "equat"      
 [6] "function"    "varianc"     "nonparametr" "beta"        "femal"      
[11] "structur"    "smaller"     "compos"      "vectorvalu"  "eigenfunct" 
[16] "composit"    "econometr"  

[[40]]
[1] "finitesampl" "propos"     

[[41]]
 [1] "wavelet"    "adapt"      "besov"      "minimax"    "ball"      
 [6] "threshold"  "rang"       "nois"       "wide"       "unknown"   
[11] "rate"       "risk"       "bound"      "deconvolut" "smooth"    
[16] "problem"    "function"   "signal"     "white"      "converg"   
[21] "gaussian"   "transform"  "recov"      "densiti"    "shape"     
[26] "view"       "noisi"      "discret"    "nearoptim"  "spars"     
[31] "blur"       "fourier"    "decay"      "upper"      "convolut"  
[36] "invers"    

[[42]]
 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "weight"     "casecohort" "design"     "invers"     "twophas"   
[11] "cohort"     "random"     "causal"     "outcom"     "biometrika"
[16] "prentic"    "calcul"     "purpos"     "confound"   "lemma"     
[21] "mar"        "exemplifi"  "suit"       "amer"       "assoc"     
[26] "proceed"    "summar"     "cox"        "ser"        "soc"       
[31] "roy"        "iid"        "appear"     "unbias"    

[[43]]
[1] "maximum"    "likelihood" "estim"     

[[44]]
[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[[45]]
[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

[[46]]
 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[[47]]
[1] "chi"       "test"      "distribut" "space"     "ratio"     "restrict" 
[7] "statist"  

[[48]]
[1] "coeffici" "regress" 

[[49]]
 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "low"           "optim"         "nonasymptot"   "highdimension"
[13] "convex"        "spars"         "minimax"       "noisi"        
[17] "element"       "minim"         "error"         "singular"     
[21] "setup"         "vector"        "theori"        "precis"       
[25] "autoregress"   "predict"      

[[50]]
 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[[51]]
[1] "unequ"     "designbas" "survey"    "weight"   

[[52]]
[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[[53]]
[1] "variancecovari" "matrix"         "analyz"        

[[54]]
[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[[55]]
[1] "bspline" "kernel"  "penal"  

[[56]]
[1] "varianc"  "asymptot"

[[57]]
 [1] "eigenfunct" "function"   "princip"    "compon"     "random"    
 [6] "analysi"    "data"       "smooth"     "eigenvalu"  "deriv"     
[11] "curv"       "spars"      "trajectori" "space"      "score"     

[[58]]
 [1] "forecast"    "predict"     "weather"     "spatial"     "wind"       
 [6] "probabilist" "northwest"   "calibr"      "pacif"       "meteorolog" 
[11] "temperatur"  "speed"       "hour"        "energi"      "atmospher"  
[16] "averag"      "ensembl"     "geostatist"  "futur"       "center"     
[21] "north"       "precipit"    "accur"       "tempor"      "daili"      
[26] "event"       "resourc"     "site"        "american"    "state"      
[31] "sharp"       "spacetim"    "qualiti"     "climat"      "ozon"       
[36] "concentr"    "generat"     "regim"       "transport"   "season"     
[41] "shortterm"   "determinist" "input"      

[[59]]
 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

[[60]]
 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

[[61]]
 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

[[62]]
 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

[[63]]
 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

[[64]]
 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

[[65]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

[[66]]
 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[[67]]
[1] "bar"    "vertic" "cap"    "lambda"

[[68]]
 [1] "integ"      "algebra"    "coher"      "ail"        "ident"     
 [6] "countabl"   "multist"    "system"     "appl"       "finit"     
[11] "classic"    "object"     "ideal"      "grid"       "util"      
[16] "math"       "fewer"      "state"      "call"       "binari"    
[21] "inequ"      "pure"       "geometri"   "comprehens" "alpha"     
[26] "posit"      "socal"      "repres"     "idea"       "complex"   
[31] "probabl"    "yield"      "failur"     "relat"      "type"      

[[69]]
 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

[[70]]
 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

[[71]]
 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

[[72]]
 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

[[73]]
 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

[[74]]
 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

[[75]]
 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[[76]]
[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

[[77]]
 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[[78]]
[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[[79]]
[1] "iid"   "prove"

[[80]]
 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

[[81]]
 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

[[82]]
 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[[83]]
[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

[[84]]
 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[[85]]
[1] "goodnessoffit" "test"          "includ"        "residu"       

[[86]]
 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

[[87]]
 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

[[88]]
 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

[[89]]
 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

[[90]]
 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[[91]]
[1] "size"   "sampl"  "number"

[[92]]
 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

[[93]]
[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[[94]]
[1] "tilt"       "exponenti"  "constraint" "employ"    

[[95]]
 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

[[96]]
 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

[[97]]
 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

[[98]]
 [1] "elect"      "vote"       "poll"       "evid"       "candid"    
 [6] "presidenti" "count"      "station"    "forecast"   "proport"   
[11] "polit"      "prefer"     "counti"     "record"     "lower"     

[[99]]
 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

[[100]]
 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

[[101]]
 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

[[102]]
 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[[103]]
[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[[104]]
[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

[[105]]
 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "admit"         "stein"         "identif"      
 [9] "moder"         "satisfi"       "nontrivi"      "impli"        
[13] "detail"        "deviat"        "loglikelihood" "benchmark"    
[17] "moment"        "nest"          "yield"         "exponenti"    
[21] "decay"         "deal"          "difficulti"    "mild"         
[25] "posit"         "relat"         "version"       "prove"        

[[106]]
 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 

[[107]]
[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

[[108]]
 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  
print(get_keywords(fit.nn.s.001))
[[1]]
  [1] "model"        "estim"        "data"         "method"       "propos"      
  [6] "studi"        "simul"        "distribut"    "function"     "sampl"       
 [11] "base"         "paramet"      "approach"     "statist"      "asymptot"    
 [16] "problem"      "general"      "regress"      "analysi"      "develop"     
 [21] "illustr"      "perform"      "procedur"     "test"         "applic"      
 [26] "condit"       "set"          "observ"       "variabl"      "appli"       
 [31] "consist"      "properti"     "likelihood"   "articl"       "time"        
 [36] "comput"       "covari"       "random"       "case"         "linear"      
 [41] "process"      "infer"        "number"       "error"        "effici"      
 [46] "select"       "rate"         "nonparametr"  "deriv"        "effect"      
 [51] "compar"       "measur"       "includ"       "provid"       "paper"       
 [56] "algorithm"    "class"        "depend"       "normal"       "demonstr"    
 [61] "bayesian"     "larg"         "assumpt"      "probabl"      "approxim"    
 [66] "addit"        "size"         "structur"     "optim"        "varianc"     
 [71] "exist"        "independ"     "construct"    "introduc"     "smooth"      
 [76] "real"         "theoret"      "compon"       "point"        "methodolog"  
 [81] "investig"     "requir"       "predict"      "standard"     "respons"     
 [86] "establish"    "common"       "empir"        "practic"      "converg"     
 [91] "work"         "maximum"      "term"         "discuss"      "combin"      
 [96] "finit"        "framework"    "design"       "parametr"     "multipl"     
[101] "assum"        "form"         "theori"       "simpl"        "carlo"       
[106] "limit"        "mont"         "lead"         "altern"       "numer"       
[111] "improv"       "local"        "involv"       "high"         "identifi"    
[116] "space"        "techniqu"     "prior"        "level"        "multivari"   
[121] "correl"       "fit"          "semiparametr" "increas"      "unknown"     
[126] "bias"         "small"        "exampl"       "order"        "direct"      
[131] "extend"       "defin"        "matrix"       "coeffici"     "dataset"     
[136] "implement"    "weight"       "control"      "densiti"      "markov"      
[141] "extens"       "adapt"        "evalu"        "relat"        "power"       
[146] "consid"       "analyz"       "robust"       "type"         "result"      
[151] "valu"         "assess"       "vector"       "seri"         "factor"      
[156] "popul"       

[[2]]
 [1] "fals"       "control"    "procedur"   "rate"       "test"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "familywis"  "hochberg"   "fdr"        "stepdown"  
[16] "error"      "kfwer"      "number"     "proport"    "benjamini" 
[21] "fwer"       "depend"     "statist"    "soc"        "divid"     
[26] "fdp"        "roy"        "abil"       "ser"        "alpha"     
[31] "deriv"      "individu"   "total"      "stepup"     "detect"    
[36] "toler"      "attent"     "power"      "gamma"      "defin"     
[41] "singlestep" "conserv"    "probabl"    "construct"  "hypothesi" 
[46] "fix"        "ann"        "simultan"   "restrict"   "usual"     
[51] "increas"    "structur"   "contrast"   "prove"      "goal"      
[56] "implicit"   "replac"     "resampl"    "independ"   "sime"      
[61] "holm"       "improv"     "sens"       "configur"   "stat"      
[66] "stringent"  "intersect"  "bonferroni" "der"        "appl"      
[71] "van"        "deal"       "order"     

[[3]]
 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "semiparametr" "proport"      "event"        "cancer"      
[11] "covari"       "data"         "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "consist"     
[21] "rightcensor"  "trial"        "subject"      "analysi"      "nonparametr" 
[26] "simul"        "equat"        "cohort"       "diseas"       "incid"       
[31] "patient"      "clinic"       "cure"         "recurr"       "compet"      
[36] "associ"       "joint"        "followup"     "frailti"      "timevari"    
[41] "bivari"       "margin"       "lengthbias"   "prostat"      "assumpt"     
[46] "coeffici"     "medic"        "breast"       "extens"       "propos"      

[[4]]
 [1] "simex"              "simulationextrapol" "undersmooth"       
 [4] "error"              "measur"             "asymptot"          
 [7] "accuraci"           "longer"             "bias"              
[10] "principl"           "finitesampl"        "selector"          
[13] "bandwidth"          "wang"               "epidemiolog"       
[16] "cook"               "rootn"              "difficulti"        
[19] "presenc"            "nutrit"             "decreas"           
[22] "compar"             "coverag"            "appropri"          
[25] "simul"              "tractabl"           "need"              
[28] "recommend"          "polynomi"           "engin"             
[31] "chisquar"           "scientist"          "errorpron"         

[[5]]
 [1] "wilk"           "ratio"          "phenomenon"     "correct"       
 [5] "relax"          "power"          "conduct"        "null"          
 [9] "newli"          "freedom"        "unspecifi"      "follow"        
[13] "hypothesi"      "degre"          "ask"            "nuisanc"       
[17] "chisquar"       "test"           "theorem"        "hold"          
[21] "backfit"        "attempt"        "admit"          "constant"      
[25] "demonstr"       "rescal"         "biascorrect"    "answer"        
[29] "zhang"          "scientif"       "fan"            "likelihood"    
[33] "withinsubject"  "pitman"         "asymptot"       "side"          
[37] "share"          "contemporari"   "popular"        "variancecovari"
[41] "singleindex"    "save"           "tau"            "kendal"        
[45] "coverag"       

[[6]]
 [1] "mle"         "maximum"     "likelihood"  "main"        "asymptot"   
 [6] "mles"        "prove"       "converg"     "limit"       "status"     
[11] "estim"       "brownian"    "current"     "motion"      "behavior"   
[16] "rate"        "proof"       "uniqu"       "siev"        "nonparametr"
[21] "ann"         "gap"         "drift"       "naiv"        "global"     
[26] "monoton"     "simpler"     "parametr"    "result"      "discuss"    
[31] "ergod"      

[[7]]
 [1] "varyingcoeffici" "nonparametr"     "linear"          "coeffici"       
 [5] "longitudin"      "conduct"         "vari"            "regress"        
 [9] "partial"         "propos"          "simul"           "backfit"        
[13] "thought"         "illustr"         "enjoy"           "fashion"        
[17] "twostep"         "contamin"        "pose"           

[[8]]
 [1] "rankbas"         "asymptot"        "effici"          "rank"           
 [5] "ellipt"          "cam"             "class"           "uniform"        
 [9] "test"            "densiti"         "version"         "multivari"      
[13] "normal"          "sign"            "valid"           "scatter"        
[17] "symmetri"        "matrix"          "matric"          "assumpt"        
[21] "finit"           "sens"            "ann"             "contour"        
[25] "irrespect"       "tradit"          "rootn"           "moment"         
[29] "actual"          "center"          "strict"          "equivari"       
[33] "gaussian"        "onestep"         "invari"          "finitesampl"    
[37] "concept"         "local"           "serial"          "bernoulli"      
[41] "shape"           "unspecifi"       "classic"         "acceler"        
[45] "respect"         "semiparametr"    "depth"           "null"           
[49] "univari"         "median"          "prespecifi"      "spheric"        
[53] "biometrika"      "distributionfre" "excel"          

[[9]]
 [1] "nconsist"   "root"       "reduct"     "exist"      "central"   
 [6] "direct"     "dimens"     "varianc"    "slice"      "exhaust"   
[11] "contour"    "mild"       "ellipt"     "strong"     "advantag"  
[16] "invers"     "averag"     "asymptot"   "suffici"    "predictor" 
[21] "regress"    "identif"    "subspac"    "guarante"   "space"     
[26] "attack"     "accuraci"   "span"       "plugin"     "synthes"   
[31] "digit"      "squar"      "complement" "normal"     "eas"       
[36] "variat"     "landmark"   "realdata"  

[[10]]
 [1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
 [7] "hypothes"  "power"     "asymptot"  "procedur"  "ratio"     "reject"   
[13] "control"  

[[11]]
 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "posterior"
 [7] "algorithm" "infer"     "prior"     "mcmc"      "model"     "hierarch" 
[13] "sampler"   "mixtur"    "space"    

[[12]]
 [1] "lasso"         "select"        "variabl"       "regress"      
 [5] "coeffici"      "spars"         "penalti"       "adapt"        
 [9] "linear"        "oracl"         "penal"         "sparsiti"     
[13] "problem"       "algorithm"     "regular"       "matrix"       
[17] "nonzero"       "path"          "shrinkag"      "vector"       
[21] "larger"        "absolut"       "high"          "highdimension"
[25] "true"          "method"        "group"         "dimension"    
[29] "nois"          "connect"      

[[13]]
[1] "bar"     "vertic"  "cap"     "lambda"  "beta"    "theta"   "alpha"  
[8] "element"

[[14]]
 [1] "singleindex"  "unknown"      "nonparametr"  "link"         "compon"      
 [6] "equat"        "structur"     "varianc"      "beta"         "smaller"     
[11] "function"     "semiparametr" "econometr"    "achiev"       "femal"       
[16] "compos"       "vectorvalu"   "linear"       "eigenfunct"   "rateoptim"   
[21] "composit"     "isol"         "ball"         "singl"       

[[15]]
 [1] "genet"       "trait"       "loci"        "quantit"     "diseas"     
 [6] "linkag"      "map"         "gene"        "phenotyp"    "pedigre"    
[11] "allel"       "marker"      "popul"       "associ"      "genotyp"    
[16] "locus"       "chromosom"   "frequenc"    "polymorph"   "genom"      
[21] "multipl"     "complex"     "involv"      "domin"       "interact"   
[26] "casecontrol" "haplotyp"    "treat"       "individu"    "nucleotid"  
[31] "unifi"       "singl"       "simultan"    "snp"         "inherit"    
[36] "geneenviron" "distinguish" "suscept"     "dichotom"    "score"      
[41] "mutat"       "aim"         "genomewid"   "member"      "dna"        
[46] "ascertain"   "parent"      "descent"     "crucial"     "arbitrari"  
[51] "retrospect"  "tau"         "softwar"    

[[16]]
 [1] "dichotom"        "outcom"          "exposur"         "genet"          
 [5] "inherit"         "confound"        "interact"        "causal"         
 [9] "trial"           "factor"          "binari"          "presenc"        
[13] "categor"         "assess"          "alcohol"         "continu"        
[17] "disord"          "misspecif"       "ordin"           "clinic"         
[21] "postul"          "trait"           "topic"           "environment"    
[25] "subgroup"        "potenti"         "geneenviron"     "alter"          
[29] "adequ"           "examin"          "adjust"          "intermedi"      
[33] "cancer"          "robin"           "stage"           "logist"         
[37] "arm"             "firststag"       "generic"         "latent"         
[41] "build"           "variabl"         "conduct"         "affect"         
[45] "accommod"        "prone"           "submodel"        "transmiss"      
[49] "mental"          "mediat"          "unspecifi"       "quantit"        
[53] "expos"           "major"           "multipli"        "sever"          
[57] "believ"          "gene"            "zhang"           "distributionfre"
[61] "routin"          "today"          

[[17]]
 [1] "treatment"     "random"        "trial"         "noncompli"    
 [5] "patient"       "assumpt"       "effect"        "adher"        
 [9] "complianc"     "assign"        "depress"       "outcom"       
[13] "causal"        "receiv"        "care"          "placebo"      
[17] "subject"       "intervent"     "clinic"        "improv"       
[21] "primari"       "drug"          "arm"           "treat"        
[25] "dose"          "elder"         "latent"        "princip"      
[29] "analys"        "contrast"      "sever"         "instrument"   
[33] "control"       "particip"      "stratif"       "benefit"      
[37] "physician"     "imperfect"     "encourag"      "prevent"      
[41] "fisher"        "strata"        "prescrib"      "children"     
[45] "activ"         "reason"        "strict"        "rubin"        
[49] "efron"         "behavior"      "educ"          "estimand"     
[53] "plausibl"      "doserespons"   "meet"          "suffer"       
[57] "protocol"      "framework"     "collabor"      "debat"        
[61] "doubleblind"   "potenti"       "blind"         "status"       
[65] "opposit"       "guidelin"      "logic"         "acknowledg"   
[69] "nonrandom"     "import"        "substanti"     "infer"        
[73] "prospect"      "summar"        "heart"         "childhood"    
[77] "subjectspecif" "access"       

[[18]]
 [1] "nonconcav"     "penal"         "select"        "penalti"      
 [5] "oracl"         "variabl"       "regular"       "nondifferenti"
 [9] "fan"           "likelihood"    "challeng"      "sandwich"     
[13] "establish"     "maxim"         "broad"         "find"         
[17] "concav"        "onestep"       "employ"        "encourag"     
[21] "enjoy"         "finit"         "cost"          "distinguish"  
[25] "dramat"        "selector"      "appropri"      "render"       
[29] "conduct"       "heavili"       "possess"       "newli"        
[33] "converg"       "paramet"       "function"      "discontinu"   
[37] "aic"           "algorithm"     "bic"           "encompass"    
[41] "guarante"      "object"        "metropoli"    

[[19]]
 [1] "semiparametr" "estim"        "parametr"     "nonparametr"  "paramet"     
 [6] "asymptot"     "model"        "effici"       "likelihood"   "regress"     
[11] "function"     "normal"       "simul"        "compon"       "achiev"      

[[20]]
 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  
[11] "choic"      "choos"      "squar"      "bootstrap"  "datadriven"
[16] "version"    "asymptot"   "global"     "chosen"    

[[21]]
 [1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
 [6] "viral"        "transmiss"    "vaccin"       "subject"      "genet"       
[11] "drug"         "develop"      "efficaci"     "mutat"        "outcom"      
[16] "causal"       "cell"         "syndrom"      "medic"        "pathway"     
[21] "resist"       "evolutionari" "therapi"      "pressur"     

[[22]]
 [1] "dropout"       "stratum"       "prevent"       "reduc"        
 [5] "oil"           "trial"         "adjust"        "longitudin"   
 [9] "cancer"        "prostat"       "mechan"        "men"          
[13] "find"          "stratifi"      "arm"           "nuisanc"      
[17] "treatment"     "assign"        "grade"         "doubleblind"  
[21] "avoid"         "colleagu"      "randomeffect"  "sever"        
[25] "verif"         "agent"         "conjectur"     "annual"       
[29] "nonignor"      "placebo"       "volum"         "elect"        
[33] "caus"          "daili"         "visit"         "preval"       
[37] "absolut"       "lie"           "indic"         "sensit"       
[41] "frequent"      "particip"      "year"          "reduct"       
[45] "causal"        "report"        "newtonraphson" "adopt"        
[49] "question"      "women"         "elder"         "surrog"       
[53] "inform"        "elicit"        "prospect"      "collabor"     
[57] "drawn"         "ignor"         "differ"        "link"         
[61] "retain"        "tilt"          "random"        "constraint"   
[65] "status"        "impli"         "doubli"        "expert"       
[69] "nonidentifi"   "intermitt"     "satur"         "sex"          
[73] "characterist"  "invers"       

[[23]]
  [1] "polici"       "statistician" "maker"        "decis"        "scienc"      
  [6] "role"         "technolog"    "today"        "chang"        "live"        
 [11] "bring"        "social"       "communic"     "integr"       "individu"    
 [16] "futur"        "knowledg"     "disciplin"    "nation"       "public"      
 [21] "scientif"     "health"       "activ"        "human"        "impact"      
 [26] "organ"        "inform"       "protect"      "promot"       "qualiti"     
 [31] "understand"   "program"      "way"          "student"      "mathemat"    
 [36] "increas"      "face"         "foundat"      "play"         "essenti"     
 [41] "uncertainti"  "effort"       "engin"        "expect"       "advanc"      
 [46] "confidenti"   "children"     "relev"        "make"         "industri"    
 [51] "govern"       "countri"      "encourag"     "polit"        "place"       
 [56] "modern"       "intern"       "scientist"    "closer"       "benefit"     
 [61] "reflect"      "explor"       "stronger"     "purpos"       "univers"     
 [66] "spread"       "environment"  "network"      "grow"         "forc"        
 [71] "access"       "devic"        "ingredi"      "excel"        "comprehens"  
 [76] "pollut"       "attract"      "broader"      "elementari"   "evolv"       
 [81] "train"        "pressur"      "air"          "option"       "imposs"      
 [86] "secondari"    "map"          "edg"          "success"      "progress"    
 [91] "critic"       "global"       "action"       "year"         "agenc"       
 [96] "communiti"    "american"     "quantit"      "genom"        "system"      
[101] "fundament"    "discoveri"    "evid"         "guarante"     "mortal"      
[106] "address"      "citi"         "requir"       "technic"      "serv"        
[111] "path"         "statist"      "separ"        "climat"       "contribut"   
[116] "opportun"     "adequaci"     "disabl"       "affect"       "driven"      
[121] "grade"        "psycholog"    "diagnost"     "morbid"       "view"        
[126] "delay"        "primari"      "state"       

[[24]]
 [1] "penalis"       "framingham"    "newtonraphson" "heart"        
 [5] "penalti"       "carri"         "conduct"       "failur"       
 [9] "proper"        "advanc"        "costeffect"    "grow"         
[13] "dataset"       "familiar"      "longterm"      "likelihood"   
[17] "prospect"      "assess"        "choleski"      "extens"       
[21] "disabl"        "wang"         

[[25]]
 [1] "nonnorm"         "normal"          "mix"             "linear"         
 [5] "exponenti"       "piecewiselinear" "general"         "abund"          
 [9] "famili"          "examin"         

[[26]]
 [1] "seem"           "unrel"          "spline"         "retail"        
 [5] "credit"         "vehicl"         "dataadapt"      "correl"        
 [9] "knot"           "residu"         "conveni"        "nongaussian"   
[13] "univari"        "allevi"         "leav"           "reversiblejump"
[17] "part"           "neglig"         "difficulti"     "smooth"        
[21] "latent"         "sampler"        "compani"        "abil"          
[25] "wang"           "withinclust"    "smallest"       "consum"        

[[27]]
 [1] "slice"     "invers"    "dimens"    "reduct"    "regress"   "averag"   
 [7] "sir"       "direct"    "central"   "goal"      "respons"   "save"     
[13] "subset"    "method"    "predictor" "subspac"   "varianc"   "preserv"  
[19] "replac"    "suffici"   "systemat" 

[[28]]
 [1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
 [5] "famili"        "multiscal"     "quadrat"       "respect"      
 [9] "poisson"       "regress"       "epidemiolog"   "stabil"       
[13] "wavelet"       "explain"       "contribut"    

[[29]]
 [1] "band"       "confid"     "simultan"   "consid"     "trajectori"
 [6] "extend"     "choos"      "regular"    "asymptot"   "ball"      
[11] "uniform"   

[[30]]
 [1] "administr"      "secondari"      "fda"            "food"          
 [5] "endpoint"       "drug"           "efficaci"       "health"        
 [9] "adjust"         "prevent"        "record"         "separ"         
[13] "agent"          "cardiovascular" "primari"        "instrument"    
[17] "simplifi"       "frequenc"       "dose"           "week"          
[21] "maintain"       "databas"        "deliveri"       "clinic"        
[25] "benefit"        "birth"          "path"           "trial"         
[29] "drastic"        "odd"            "guidanc"        "perspect"      
[33] "intersect"      "guid"           "biomark"        "morbid"        
[37] "emerg"          "fwer"           "serniparametr"  "hour"          
[41] "make"           "stepwis"        "safeti"         "led"           
[45] "nutrit"         "decis"          "describ"        "errorpron"     
[49] "infant"         "serum"          "exemplifi"      "insight"       
[53] "feder"          "advers"         "prospect"       "valid"         
[57] "follow"         "likelihoodbas"  "energi"         "combin"        

[[31]]
 [1] "distort"         "respons"         "unobserv"        "confound"       
 [5] "predictor"       "under"           "adjust"          "serum"          
 [9] "factor"          "magnitud"        "generat"         "alter"          
[13] "intens"          "absent"          "explanatori"     "indirect"       
[17] "likelihoodbas"   "straightforward" "multipl"         "datagener"      
[21] "leastsquar"      "identifi"        "decid"           "stepwis"        
[25] "observ"          "intervent"       "sever"           "relationship"   
[29] "recov"           "system"          "car"             "coeffici"       
[33] "census"          "releas"          "agenc"           "closest"        
[37] "electr"          "shortcom"        "analyst"        

[[32]]
 [1] "motif"       "regul"       "gene"        "dna"         "transcript" 
 [6] "bind"        "sequenc"     "protein"     "factor"      "short"      
[11] "conserv"     "discoveri"   "nucleotid"   "cluster"     "biolog"     
[16] "high"        "site"        "mixtur"      "process"     "call"       
[21] "width"       "genom"       "vari"        "hierarch"    "dirichlet"  
[26] "pattern"     "priori"      "cell"        "strategi"    "organ"      
[31] "databas"     "matric"      "group"       "technolog"   "repres"     
[36] "stochast"    "refin"       "switch"      "substant"    "segment"    
[41] "aid"         "delet"       "similar"     "gibb"        "reduct"     
[46] "regulatori"  "express"     "core"        "find"        "live"       
[51] "yeast"       "composit"    "dictionari"  "accompani"   "appear"     
[56] "missingdata" "genomewid"   "generat"     "principl"    "facilit"    
[61] "recurs"      "background"  "specif"      "chromosom"   "address"    
[66] "wish"        "cycl"        "name"        "understand"  "adjac"      
[71] "variabl"    

[[33]]
[1] "absolut"  "deviat"   "clip"     "oracl"    "progress"

[[34]]
[1] "quantil" "regress"

[[35]]
 [1] "breakdown"  "point"      "robust"     "depth"      "locat"     
 [6] "project"    "equivari"   "finit"      "function"   "possess"   
[11] "contamin"   "competitor" "affin"      "definit"    "introduc"  
[16] "lead"       "induc"      "influenc"   "high"       "outlier"   
[21] "strong"     "trim"       "median"     "region"     "york"      
[26] "scale"      "desir"      "favor"      "turn"       "pursu"     
[31] "enjoy"      "scatter"    "suffic"     "behav"      "uniform"   
[36] "relat"      "comparison" "suggest"    "fact"       "univari"   
[41] "ann"        "radius"    

[[36]]
 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "frequenc"      "long"          "taper"         "fraction"     
 [9] "averag"        "stationari"    "depend"        "periodogram"  
[13] "move"          "whittl"        "slowli"        "nonstationari"
[17] "local"         "process"       "eigenvector"   "angl"         
[21] "paramet"       "period"        "short"         "univari"      
[25] "distinct"      "autoregress"   "volatil"       "fourier"      
[29] "infin"         "longrang"      "delta"         "residu"       
[33] "trim"          "raw"           "log"           "question"     
[37] "break"         "stress"        "know"          "gamma"        
[41] "serniparametr" "subspac"      

[[37]]
 [1] "auxiliari" "survey"    "varianc"   "design"    "popul"     "sampl"    
 [7] "variabl"   "weight"    "calibr"    "designbas" "probabl"   "servic"   
[13] "total"     "finit"     "work"      "feasibl"   "explain"   "miss"     

[[38]]
 [1] "lin"           "addit"         "transplant"    "bone"         
 [5] "work"          "carrol"        "registri"      "intern"       
 [9] "termin"        "multist"       "complic"       "serv"         
[13] "progress"      "transit"       "death"         "domin"        
[17] "backfit"       "implicit"      "largesampl"    "longer"       
[21] "inconsist"     "withinsubject" "withinclust"   "margin"       

[[39]]
 [1] "taper"       "approxim"    "matrix"      "gaussian"    "consist"    
 [6] "spars"       "oper"        "spatial"     "covari"      "block"      
[11] "requir"      "balanc"      "norm"        "precipit"    "station"    
[16] "weather"     "technic"     "manipul"     "matern"      "infeas"     
[21] "multipli"    "wild"        "simpli"      "eigenvector" "sever"      
[26] "onestep"     "resampl"     "oil"         "lose"        "expans"     
[31] "finitesampl" "emphasi"    

[[40]]
[1] "finitesampl" "propos"      "properti"    "simul"      

[[41]]
 [1] "wavelet"     "adapt"       "besov"       "minimax"     "threshold"  
 [6] "rang"        "ball"        "nois"        "wide"        "rate"       
[11] "unknown"     "smooth"      "risk"        "bound"       "function"   
[16] "deconvolut"  "problem"     "white"       "converg"     "signal"     
[21] "recov"       "gaussian"    "transform"   "noisi"       "view"       
[26] "blur"        "discret"     "shape"       "invers"      "spars"      
[31] "densiti"     "nearoptim"   "convolut"    "fourier"     "upper"      
[36] "decay"       "chosen"      "block"       "basi"        "dens"       
[41] "attain"      "waveletbas"  "continu"     "mathemat"    "counterpart"
[46] "physic"      "possess"     "lower"       "global"      "achiev"     
[51] "boundari"    "distinct"    "belong"      "domin"       "estim"      
[56] "place"      

[[42]]
 [1] "forecast"      "predict"       "weather"       "northwest"    
 [5] "spatial"       "probabilist"   "pacif"         "calibr"       
 [9] "wind"          "meteorolog"    "hour"          "temperatur"   
[13] "speed"         "atmospher"     "energi"        "north"        
[17] "center"        "geostatist"    "event"         "futur"        
[21] "averag"        "ensembl"       "american"      "tempor"       
[25] "accur"         "resourc"       "precipit"      "daili"        
[29] "state"         "sharp"         "qualiti"       "site"         
[33] "spacetim"      "generat"       "transport"     "concentr"     
[37] "season"        "climat"        "regim"         "shortterm"    
[41] "numer"         "determinist"   "ozon"          "input"        
[45] "climatolog"    "previous"      "output"        "parsimoni"    
[49] "perturb"       "geograph"      "period"        "trend"        
[53] "correl"        "vari"          "break"         "favor"        
[57] "quantit"       "laplac"        "caus"          "merg"         
[61] "safeti"        "station"       "agricultur"    "accumul"      
[65] "oppos"         "benefit"       "vast"          "global"       
[69] "stateoftheart" "featur"        "system"        "activ"        
[73] "dispers"       "simpler"       "decad"         "organ"        
[77] "crossvalid"    "member"       

[[43]]
 [1] "spacetim"       "spatial"        "fit"            "year"          
 [5] "site"           "separ"          "intens"         "california"    
 [9] "thin"           "process"        "monitor"        "residu"        
[13] "tempor"         "activ"          "multidimension" "occurr"        
[17] "space"          "background"     "appear"         "origin"        
[21] "smoother"       "irregular"      "earthquak"      "indic"         
[25] "asymmetr"       "trend"          "hazard"         "spectral"      
[29] "symmetr"        "environment"    "ozon"           "wind"          
[33] "meteorolog"     "daili"          "allow"          "rescal"        
[37] "season"         "time"           "anisotrop"      "cross"         
[41] "insid"          "bear"           "arbitrari"      "autoregress"   
[45] "interact"       "magnitud"       "sequenc"        "homogen"       
[49] "widespread"     "sphere"         "coordin"        "highlight"     
[53] "elabor"         "extrem"         "ascertain"      "forest"        
[57] "counti"         "rotat"          "month"          "threat"        
[61] "govern"         "secondari"      "aic"            "account"       
[65] "aid"            "emphas"         "routin"         "assess"        
[69] "departur"       "rare"          

[[44]]
 [1] "inhomogen"   "intens"      "spatial"     "process"     "poisson"    
 [6] "point"       "thin"        "stationari"  "function"    "firstord"   
[11] "efficaci"    "secondord"   "caus"        "infecti"     "network"    
[16] "infect"      "transmiss"   "respiratori" "environ"     "epidem"     
[21] "unrealist"   "lend"        "syndrom"     "hospit"      "emphasi"    
[26] "unusu"       "paid"        "peak"       

[[45]]
 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "moment"       
 [9] "autoregress"   "local"         "financi"       "condit"       
[13] "standard"      "move"          "averag"        "sequenc"      
[17] "mont"          "carlo"         "innov"         "satisfi"      
[21] "iid"           "root"          "time"          "forecast"     
[25] "nonstationari" "fourth"        "capabl"        "residu"       
[29] "return"        "rescal"        "exponenti"     "exchang"      
[33] "reparameter"   "arma"          "ergod"         "homogen"      
[37] "simpli"        "normal"        "explain"       "uniqu"        
[41] "exist"        

[[46]]
 [1] "withinclust"   "cluster"       "correl"        "account"      
 [5] "frequent"      "frailti"       "varianc"       "carri"        
 [9] "arbitrari"     "abil"          "achiev"        "hormon"       
[13] "generalis"     "tackl"         "characteris"   "evalu"        
[17] "simplic"       "fashion"       "closedform"    "noninform"    
[21] "hamper"        "intuit"        "dementia"      "birth"        
[25] "errorpron"     "ill"           "copula"        "withinsubject"

[[47]]
[1] "polynomi"    "local"       "smooth"      "regress"     "nonparametr"
[6] "asymptot"    "spline"     

[[48]]
 [1] "elect"        "vote"         "poll"         "presidenti"   "evid"        
 [6] "candid"       "polit"        "count"        "station"      "proport"     
[11] "forecast"     "nonrespons"   "elimin"       "prefer"       "counti"      
[16] "scientist"    "permit"       "lower"        "incom"        "fisher"      
[21] "york"         "record"       "heterogen"    "purpos"       "respond"     
[26] "percentag"    "particip"     "quick"        "transfer"     "week"        
[31] "spatiotempor" "evolut"       "california"   "histor"       "krige"       
[36] "list"         "appar"        "outcom"       "invalid"      "nonignor"    
[41] "publish"      "nonrespond"  

[[49]]
 [1] "survey"      "nonrespons"  "census"      "nation"      "respond"    
 [6] "imput"       "popul"       "health"      "race"        "bureau"     
[11] "nonignor"    "unit"        "respons"     "item"        "incom"      
[16] "miss"        "person"      "year"        "state"       "bias"       
[21] "employ"      "higher"      "valu"        "sensit"      "interview"  
[26] "labor"       "nonrespond"  "age"         "feder"       "collect"    
[31] "measur"      "handl"       "assess"      "report"      "level"      
[36] "counti"      "domain"      "preval"      "agenc"       "confidenti" 
[41] "benchmark"   "incorpor"    "protect"     "status"      "cell"       
[46] "earn"        "produc"      "sourc"       "relat"       "weight"     
[51] "propens"     "public"      "household"   "area"        "geograph"   
[56] "nutrit"      "document"    "lower"       "plan"        "bodi"       
[61] "gender"      "extrapol"    "preliminari" "birth"       "polit"      
[66] "correct"     "american"    "proxi"       "requir"      "previous"   
[71] "children"    "york"        "unemploy"    "death"      

[[50]]
 [1] "jackknif"  "file"      "replic"    "varianc"   "inconsist" "strata"   
 [7] "analyt"    "unbias"    "met"       "domain"    "schedul"   "freedom"  
[13] "survey"    "attain"    "balanc"    "mix"       "ensur"     "public"   
[19] "repeat"    "upper"     "bootstrap" "uncondit"  "plausibl"  "person"   
[25] "pseudo"    "concern"   "linkag"   

[[51]]
[1] "variancecovari"  "matrix"          "analyz"          "respect"        
[5] "quasilikelihood" "criterion"       "coin"            "efron"          

[[52]]
[1] "root"     "squar"    "approxim"

[[53]]
[1] "maximum"    "likelihood" "estim"      "paramet"   

[[54]]
 [1] "pca"           "princip"       "compon"        "matrix"       
 [5] "eigenvector"   "size"          "dimension"     "reduct"       
 [9] "eigenvalu"     "analysi"       "spike"         "perturb"      
[13] "logp"          "succeed"       "transit"       "dimens"       
[17] "maxim"         "highdimension" "set"           "sampl"        
[21] "threshold"     "nonzero"       "oil"           "direct"       
[25] "critic"        "sophist"       "recov"         "hold"         
[29] "sharp"         "larger"        "theorem"       "relax"        
[33] "high"          "diagon"        "overlap"       "domin"        
[37] "success"       "geometr"       "regim"         "tractabl"     
[41] "popul"         "ill"           "behav"         "extract"      
[45] "exhibit"       "support"       "tool"          "crossov"      
[49] "sudden"        "track"         "lose"          "infinit"      
[53] "evolutionari"  "tree"          "complex"       "largest"      
[57] "phenomenon"    "program"       "describ"       "nonasymptot"  
[61] "branch"        "topolog"       "row"           "embed"        
[65] "euclidean"     "geodes"        "anim"          "nois"         
[69] "machin"        "phase"         "speci"         "twoway"       
[73] "rise"         

[[55]]
 [1] "eigenfunct"  "function"    "princip"     "compon"      "random"     
 [6] "analysi"     "smooth"      "eigenvalu"   "data"        "curv"       
[11] "spars"       "space"       "trajectori"  "score"       "noisi"      
[16] "deriv"       "lead"        "sampl"       "longitudin"  "eigenvector"
[21] "expans"      "impact"      "elucid"      "decomposit"  "firstord"   
[26] "repres"      "differenti"  "measur"      "dynam"       "intrins"    
[31] "similar"     "plan"       

[[56]]
 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "gene"          "latent"        "viral"         "initi"        
 [9] "biomark"       "understand"    "protein"       "pronounc"     
[13] "infect"        "therapi"       "supplementari" "quantifi"     
[17] "concentr"      "chemic"        "tackl"         "incorrect"    
[21] "healthi"       "identifi"      "molecular"     "human"        
[25] "serum"         "hormon"        "investig"      "experiment"   
[29] "search"        "status"        "sort"          "drug"         
[33] "inflat"        "pertin"        "mediat"        "mutat"        
[37] "resist"        "absent"        "blood"         "exemplifi"    
[41] "valuabl"       "phenotyp"      "led"           "indic"        
[45] "subsequ"       "format"        "framework"    

[[57]]
[1] "establish" "asymptot"  "consist"   "converg"  

[[58]]
 [1] "classifi"        "classif"         "discrimin"       "distancebas"    
 [5] "vector"          "centroid"        "support"         "machin"         
 [9] "theoret"         "popul"           "featur"          "rule"           
[13] "poor"            "popular"         "produc"          "distanc"        
[17] "method"          "highdimension"   "accumul"         "varieti"        
[21] "heavytail"       "differ"          "diverg"          "nearest"        
[25] "train"           "median"          "difficulti"      "spectra"        
[29] "componentwis"    "replac"          "excess"          "convent"        
[33] "frequent"        "truncat"         "boundari"        "counterpart"    
[37] "insensit"        "encount"         "closest"         "entail"         
[41] "case"            "allevi"          "problemat"       "today"          
[45] "argument"        "euclidean"       "inconsist"       "caus"           
[49] "straightforward" "neighbour"       "suffer"          "anneal"         
[53] "attempt"         "perform"         "misclassif"      "alloc"          
[57] "volatil"         "believ"          "explor"          "help"           
[61] "inher"           "explos"          "earthquak"       "base"           
[65] "consequ"         "achiev"          "jin"             "kullbackleibl"  
[69] "contemporari"    "construct"       "drawback"        "tstatist"       

[[59]]
 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "random"     "casecohort" "weight"     "invers"     "twophas"   
[11] "cohort"     "biometrika" "design"     "prentic"    "causal"    
[16] "purpos"     "lemma"      "exemplifi"  "unbias"     "mar"       
[21] "suit"       "amer"       "assoc"      "proceed"    "summar"    
[26] "ser"        "soc"        "roy"        "calcul"     "iid"       
[31] "appear"     "cox"        "imput"      "visit"      "ann"       
[36] "augment"    "percentag"  "schedul"    "direct"     "unbalanc"  
[41] "mediat"     "day"        "embed"      "mental"     "equat"     
[46] "nice"       "month"     

[[60]]
[1] "bootstrap" "confid"    "distribut" "sampl"     "interv"    "method"   
[7] "correct"   "seri"      "empir"    

[[61]]
 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "optim"         "low"           "highdimension" "nonasymptot"  
[13] "convex"        "minimax"       "noisi"         "spars"        
[17] "vector"        "singular"      "element"       "error"        
[21] "minim"         "setup"         "predict"       "autoregress"  
[25] "recoveri"      "theori"        "trace"         "obtain"       
[29] "decomposit"    "class"         "excel"         "mean"         
[33] "lower"         "instanc"       "yield"         "sharp"        
[37] "agreement"     "precis"        "mestim"        "complementari"
[41] "lowdimension"  "entri"         "analyz"        "oper"         
[45] "meansquar"     "relax"         "hold"          "determinist"  
[49] "observ"        "condit"        "autocovari"    "decompos"     
[53] "notion"        "stay"          "restrict"      "stronger"     
[57] "krige"        

[[62]]
 [1] "minimax"  "rate"     "densiti"  "optim"    "unknown"  "adapt"   
 [7] "loss"     "class"    "prove"    "sens"     "converg"  "problem" 
[13] "bound"    "estim"    "risk"     "vector"   "set"      "gaussian"
[19] "lower"   

[[63]]
 [1] "imag"     "magnet"   "reson"    "field"    "brain"    "fmri"    
 [7] "activ"    "voxel"    "signal"   "detect"   "locat"    "volum"   
[13] "accur"    "follow"   "task"     "motion"   "region"   "visual"  
[19] "identifi" "exploit"  "tissu"    "aim"      "contigu"  "map"     
[25] "rotat"    "neuron"  

[[64]]
 [1] "ozon"            "maxima"          "splinebas"       "nonlinear"      
 [5] "piecewiselinear" "concentr"        "pressur"         "cycl"           
 [9] "transport"       "variat"          "contribut"       "atmospher"      
[13] "peak"            "trend"           "measur"          "basi"           
[17] "evid"            "instrument"      "thought"         "greater"        
[21] "link"            "scientif"        "lag"             "dimensionreduct"
[25] "absenc"          "wave"            "global"          "separ"          
[29] "month"           "coincid"         "influenc"        "lowdimension"   
[33] "clear"           "contrast"        "lower"           "year"           
[37] "site"            "qualiti"         "profil"          "sequenc"        
[41] "sensit"          "origin"          "relat"           "presenc"        
[45] "satellit"        "partial"         "pattern"         "identifi"       

[[65]]
 [1] "experienc" "event"     "deterior"  "trial"     "aberr"     "patient"  
 [7] "import"    "die"       "protocol"  "benefici"  "rank"      "treatment"
[13] "mention"   "wilcoxon"  "receiv"    "children"  "aspect"    "consequ"  
[19] "exact"     "preserv"   "fisher"    "placebo"   "sort"      "magnitud" 
[25] "longer"    "medic"     "exposur"   "adequ"     "discard"   "greatest" 
[31] "fact"      "need"      "invert"    "substanti" "subsequ"   "tabl"     
[37] "remov"     "exhibit"   "way"       "basic"     "singl"     "health"   
[43] "aim"       "care"      "interv"    "complet"   "specif"    "sum"      
[49] "question"  "cubic"     "cancer"    "situat"    "extrem"    "splinebas"
[55] "outcom"    "treat"     "rotat"     "control"   "binari"    "effect"   

[[66]]
 [1] "bspline"   "kernel"    "tackl"     "represent" "spline"    "penal"    
 [7] "tempor"    "proceed"   "splinebas" "truncat"   "solut"     "rigor"    
[13] "account"  

[[67]]
 [1] "integ"      "algebra"    "appl"       "ail"        "finit"     
 [6] "classic"    "countabl"   "coher"      "multist"    "util"      
[11] "math"       "ident"      "system"     "call"       "fewer"     
[16] "state"      "ideal"      "grid"       "object"     "binari"    
[21] "posit"      "pure"       "geometri"   "probabl"    "inequ"     
[26] "comprehens" "alpha"      "socal"      "repres"     "idea"      
[31] "failur"     "yield"      "type"       "relat"      "bernoulli" 
[36] "bound"      "probab"     "compon"     "superposit" "complex"   

[[68]]
 [1] "electr"        "forecast"      "renew"         "bivari"       
 [5] "load"          "market"        "daili"         "power"        
 [9] "serial"        "shortterm"     "wind"          "autoregress"  
[13] "diagon"        "speed"         "time"          "season"       
[17] "focus"         "difficult"     "peak"          "spectrum"     
[21] "temperatur"    "regressor"     "heteroscedast" "firstord"     
[25] "total"         "highlight"     "energi"        "justifi"      
[29] "simpl"         "week"          "vari"          "hour"         
[33] "trend"         "citi"          "recogn"        "stationari"   
[37] "autocovari"    "detail"        "promis"        "realiti"      
[41] "favor"         "reveal"        "year"          "longmemori"   
[45] "gain"          "accuraci"      "exploit"       "predict"      
[49] "option"        "reliabl"       "price"         "evolut"       
[53] "avail"         "superpopul"   

[[69]]
 [1] "highfrequ"       "financi"         "asset"           "volatil"        
 [5] "price"           "lowfrequ"        "exchang"         "dynam"          
 [9] "stock"           "matrix"          "daili"           "period"         
[13] "nois"            "realiz"          "pool"            "market"         
[17] "matric"          "infin"           "diffus"          "return"         
[21] "day"             "trade"           "captur"          "forecast"       
[25] "vast"            "overcom"         "variat"          "hundr"          
[29] "pertin"          "dimensionreduct" "econom"          "iii"            
[33] "alloc"           "noisi"           "industri"        "zhang"          
[37] "guidanc"         "merit"           "adequ"           "size"           
[41] "highdimension"   "fan"             "eigenvector"     "option"         
[45] "wavelet"         "built"           "avail"          

[[70]]
 [1] "day"        "daili"      "record"     "time"       "financi"   
 [6] "activ"      "short"      "peak"       "consecut"   "help"      
[11] "autocovari" "appropri"   "intens"     "physic"     "character" 
[16] "measur"     "children"   "trade"      "strength"   "scalar"    
[21] "superposit" "incomplet"  "copi"      

[[71]]
[1] "secondord"   "firstord"    "accur"       "expans"      "unbias"     
[6] "moment"      "approxim"    "frequentist" "exact"      

[[72]]
 [1] "treatment"  "assign"     "causal"     "score"      "outcom"    
 [6] "propens"    "averag"     "effect"     "grade"      "school"    
[11] "potenti"    "stratif"    "promot"     "confound"   "rubin"     
[16] "student"    "unit"       "regim"      "educ"       "adjust"    
[21] "children"   "plausibl"   "polici"     "program"    "evid"      
[26] "pretreat"   "posttreat"  "summar"     "stage"      "child"     
[31] "intermedi"  "assumpt"    "retain"     "multilevel" "block"     
[36] "econom"     "experiment" "stabl"      "arbitrari"  "nation"    
[41] "articl"     "balanc"     "learn"      "perspect"   "status"    
[46] "unmeasur"   "fewer"      "scalar"     "affect"     "low"       
[51] "mathemat"   "track"      "twostag"    "covari"     "tradeoff"  
[56] "recov"      "nonrandom"  "bind"       "pose"       "estimand"  
[61] "impos"      "feasibl"    "return"    

[[73]]
 [1] "extrapol"      "errorpron"     "posttreat"     "instrument"   
 [5] "classic"       "baselin"       "replic"        "subsampl"     
 [9] "nonlinear"     "daili"         "summari"       "air"          
[13] "encount"       "subset"        "bias"          "efficaci"     
[17] "heteroscedast" "frequenc"      "trajectori"    "spheric"      
[21] "supplementari" "correct"       "multiscal"     "scatter"      
[25] "reconstruct"   "subject"       "error"         "temperatur"   

[[74]]
 [1] "admiss"       "inadmiss"     "loss"         "bay"          "risk"        
 [6] "endpoint"     "action"       "ann"          "accept"       "math"        
[11] "genom"        "screen"       "stringent"    "result"       "complet"     
[16] "stepup"       "character"    "formul"       "treat"        "pearson"     
[21] "amer"         "assoc"        "biometrika"   "prototyp"     "vector"      
[26] "pay"          "reject"       "decad"        "revisit"      "metaanalysi" 
[31] "criteria"     "effort"       "bioassay"     "thought"      "hard"        
[36] "psycholog"    "nonneg"       "predetermin"  "fals"         "energi"      
[41] "earlier"      "educ"         "hoc"          "stein"        "emerg"       
[46] "fair"         "dna"          "appeal"       "sign"         "singlestep"  
[51] "drug"         "microarray"   "statistician" "jeffrey"      "year"        
[56] "fewer"        "fisher"       "paper"        "resembl"      "paradox"     
[61] "share"        "twodimension" "nonzero"      "stepdown"     "seek"        
[66] "expect"      

[[75]]
[1] "coeffici" "regress"  "linear"   "vari"    

[[76]]
 [1] "unbound"   "novelti"   "function"  "yield"     "oracl"     "tail"     
 [7] "decreas"   "satisfi"   "anisotrop" "inequ"     "median"    "slower"   
[13] "literatur" "bivari"    "free"      "vast"      "fast"      "input"    
[19] "setup"     "output"    "aggreg"    "aforement" "behav"     "influenti"
[25] "iii"       "bound"     "univers"   "main"      "nuclear"   "radius"   
[31] "need"      "tilt"      "hyperplan" "higherord" "symmetri"  "equivari" 
[37] "gee"       "scatter"   "bin"       "quadrat"  

[[77]]
 [1] "wishart"     "graph"       "cone"        "graphic"     "famili"     
 [6] "matric"      "conjug"      "gaussian"    "matrix"      "covari"     
[11] "decompos"    "prior"       "paramet"     "edg"         "definit"    
[16] "paper"       "homogen"     "space"       "correspond"  "posit"      
[21] "standard"    "shape"       "form"        "ann"         "miss"       
[26] "zero"        "equal"       "eigenvalu"   "dimens"      "close"      
[31] "respect"     "invers"      "sigma"       "chisquar"    "distinct"   
[36] "flexibl"     "margin"      "precis"      "bay"         "undirect"   
[41] "fix"         "refer"       "direct"      "constant"    "acycl"      
[46] "satisfi"     "expect"      "encod"       "entri"       "enrich"     
[51] "accept"      "phi"         "scalabl"     "omega"       "nonhomogen" 
[56] "probab"      "euclidean"   "dual"        "read"        "restrict"   
[61] "centr"       "characteris" "deep"        "tangent"     "fourth"     
[66] "perfect"    

[[78]]
 [1] "schedul"         "longitudin"      "followup"        "analys"         
 [5] "phase"           "generat"         "incomplet"       "flexibl"        
 [9] "respons"         "avail"           "ill"             "unbalanc"       
[13] "pursu"           "offer"           "enter"           "resourc"        
[17] "impact"          "merg"            "concret"         "intermitt"      
[21] "interim"         "preced"          "perfect"         "divid"          
[25] "maker"           "face"            "preliminari"     "fluctuat"       
[29] "missingatrandom" "versatil"        "alloc"           "timetoev"       
[33] "withinsubject"   "compromis"       "manag"           "metropoli"      
[37] "missingdata"     "walk"            "logrank"        

[[79]]
[1] "real"    "simul"   "data"    "illustr"

[[80]]
 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "identif"       "stein"         "admit"        
 [9] "moder"         "satisfi"       "moment"        "nontrivi"     
[13] "iid"           "impli"         "detail"        "deviat"       
[17] "decay"         "loglikelihood" "benchmark"     "nest"         
[21] "yield"         "posit"         "deal"          "difficulti"   
[25] "mild"          "prove"         "exponenti"     "relat"        
[29] "version"       "preliminari"   "nation"        "ratio"        
[33] "populationbas" "order"         "specif"       

[[81]]
 [1] "chi"           "test"          "distribut"     "space"        
 [5] "ratio"         "restrict"      "statist"       "conveni"      
 [9] "tail"          "goodnessoffit" "pearson"      

[[82]]
[1] "size"   "sampl"  "number" "small"  "larg"  

[[83]]
[1] "misspecifi" "robust"     "misspecif" 

[[84]]
 [1] "climat"      "temperatur"  "chang"       "greenhous"   "global"     
 [6] "earth"       "uncertainti" "northern"    "atmospher"   "quantifi"   
[11] "trend"       "increas"     "reconstruct" "averag"      "region"     
[16] "separ"       "tempor"      "concentr"    "surfac"      "longterm"   
[21] "pollut"      "period"      "centuri"     "opposit"     "tree"       
[26] "gas"         "creat"       "purpos"      "futur"       "record"     
[31] "remot"       "understand"  "radiat"      "emiss"       "proxi"      
[36] "histor"      "air"         "ecolog"      "forest"      "magnitud"   
[41] "massiv"      "cloud"       "gather"      "forc"        "weather"    
[46] "synthet"     "actual"      "pattern"     "expert"      "extern"     
[51] "current"     "quantif"     "agreement"   "institut"    "act"        
print(get_keywords(fit.nn.ml))
[[1]]
[1] "diseas"   "individu" "level"   

[[2]]
[1] "model"

[[3]]
[1] "estim"

[[4]]
[1] "time" "seri"

[[5]]
[1] "control"   "fals"      "multipl"   "discoveri" "hypothes"  "fdr"      
[7] "reject"    "rate"      "pvalu"    

[[6]]
[1] "risk"      "bound"     "adapt"     "threshold"

[[7]]
[1] "vector"   "classif"  "classifi"

[[8]]
[1] "space" "shape"

[[9]]
[1] "time"    "surviv"  "hazard"  "event"   "censor"  "failur"  "proport"

[[10]]
[1] "popul"  "survey" "weight"

[[11]]
[1] "data"

[[12]]
[1] "treatment" "outcom"    "causal"    "assign"   

[[13]]
[1] "articl"

[[14]]
[1] "bayesian"  "posterior"

[[15]]
[1] "trial"     "patient"   "clinic"    "treatment"

[[16]]
[1] "statist"

[[17]]
[1] "select"

[[18]]
[1] "gene"       "express"    "microarray" "differenti"

[[19]]
[1] "process"

[[20]]
[1] "matrix"

[[21]]
[1] "predict"

[[22]]
[1] "method"

[[23]]
[1] "carlo"  "mont"   "markov" "chain" 

[[24]]
[1] "function"

[[25]]
[1] "algorithm"

[[26]]
[1] "test"

[[27]]
[1] "problem"

[[28]]
[1] "fit"

[[29]]
[1] "point"

[[30]]
[1] "likelihood" "maximum"   

[[31]]
[1] "confid"    "interv"    "construct" "bootstrap"

[[32]]
[1] "comput"

[[33]]
[1] "bias"

[[34]]
[1] "optim"

[[35]]
[1] "prior"

[[36]]
[1] "distribut"

[[37]]
[1] "paper"

[[38]]
[1] "propos"

[[39]]
[1] "observ"

[[40]]
[1] "densiti"

[[41]]
[1] "smooth"

[[42]]
[1] "rate"    "converg"

[[43]]
[1] "structur"

[[44]]
[1] "analysi"

[[45]]
[1] "design"

[[46]]
[1] "requir" "direct"

[[47]]
[1] "paramet"

[[48]]
[1] "random"

[[49]]
[1] "local" "deriv"

[[50]]
[1] "case"

[[51]]
[1] "robust"

[[52]]
[1] "respons"   "predictor"

[[53]]
[1] "develop"

[[54]]
[1] "studi"

[[55]]
[1] "properti"

[[56]]
[1] "measur"

[[57]]
[1] "empir"

[[58]]
[1] "assumpt"

[[59]]
[1] "number"

[[60]]
[1] "error"

[[61]]
[1] "simul"

[[62]]
[1] "general"

[[63]]
[1] "limit"

[[64]]
[1] "compar"

[[65]]
[1] "asymptot"

[[66]]
[1] "larg"

[[67]]
[1] "variabl"

[[68]]
[1] "probabl"

[[69]]
[1] "condit"

[[70]]
[1] "sampl"

[[71]]
[1] "infer"

[[72]]
[1] "compon"  "princip"

[[73]]
[1] "null"      "hypothesi" "altern"   

[[74]]
[1] "linear"

[[75]]
[1] "effect"

[[76]]
[1] "provid" "addit" 

[[77]]
[1] "size"

[[78]]
[1] "depend"

[[79]]
[1] "effici"

[[80]]
[1] "regress"

[[81]]
[1] "perform"

[[82]]
[1] "procedur"

[[83]]
[1] "class"

[[84]]
[1] "correl"

[[85]]
[1] "illustr"

[[86]]
[1] "approach"

[[87]]
[1] "applic"

[[88]]
[1] "consist"

[[89]]
[1] "coeffici"

[[90]]
[1] "cluster"

[[91]]
[1] "independ"

[[92]]
[1] "set"

[[93]]
[1] "nonparametr"

[[94]]
[1] "base"

[[95]]
[1] "semiparametr" "parametr"    

[[96]]
[1] "standard"

[[97]]
[1] "appli"

[[98]]
[1] "normal"

[[99]]
[1] "covari"

[[100]]
[1] "varianc"
print(get_keywords(fit.nn.ml.s.1))
[[1]]
[1] "level"    "popul"    "individu"

[[2]]
[1] "estim"

[[3]]
[1] "time"   "surviv" "hazard" "event"  "censor" "failur"

[[4]]
[1] "model"

[[5]]
[1] "space"  "dimens" "shape" 

[[6]]
[1] "control"   "fals"      "discoveri" "multipl"   "rate"      "fdr"      
[7] "hypothes"  "reject"    "pvalu"    

[[7]]
[1] "data"

[[8]]
[1] "time" "seri"

[[9]]
[1] "treatment" "trial"     "patient"  

[[10]]
[1] "statist"

[[11]]
[1] "articl"

[[12]]
[1] "gene"       "express"    "microarray"

[[13]]
[1] "outcom"

[[14]]
[1] "select"

[[15]]
[1] "risk"  "bound" "loss" 

[[16]]
[1] "matrix" "vector"

[[17]]
[1] "bayesian"

[[18]]
[1] "predict"

[[19]]
[1] "process"

[[20]]
[1] "adapt"

[[21]]
[1] "method"

[[22]]
[1] "test"

[[23]]
[1] "problem"

[[24]]
[1] "algorithm"

[[25]]
[1] "design"

[[26]]
[1] "function"

[[27]]
[1] "point"

[[28]]
[1] "likelihood" "maximum"   

[[29]]
[1] "fit"   "model"

[[30]]
[1] "distribut"

[[31]]
[1] "bias"

[[32]]
[1] "prior"

[[33]]
[1] "perform"

[[34]]
[1] "observ"

[[35]]
[1] "studi"

[[36]]
[1] "propos"

[[37]]
[1] "number"

[[38]]
[1] "properti"

[[39]]
[1] "confid"    "interv"    "construct" "bootstrap"

[[40]]
[1] "sampl"

[[41]]
[1] "size"

[[42]]
[1] "analysi"

[[43]]
[1] "rate"    "converg"

[[44]]
[1] "random"

[[45]]
[1] "probabl"

[[46]]
[1] "optim"

[[47]]
[1] "paramet"

[[48]]
[1] "structur"

[[49]]
[1] "comput"

[[50]]
[1] "respons"   "predictor"

[[51]]
[1] "smooth"

[[52]]
[1] "develop"

[[53]]
[1] "markov" "chain" 

[[54]]
[1] "assumpt"

[[55]]
[1] "densiti"

[[56]]
[1] "paper"

[[57]]
[1] "case"

[[58]]
[1] "empir"

[[59]]
[1] "error"

[[60]]
[1] "requir"

[[61]]
[1] "exist"    "demonstr"

[[62]]
[1] "general"

[[63]]
[1] "effici"

[[64]]
[1] "asymptot"

[[65]]
[1] "compon"  "princip"

[[66]]
[1] "measur"

[[67]]
[1] "simul"

[[68]]
[1] "effect"

[[69]]
[1] "condit"

[[70]]
[1] "local"

[[71]]
[1] "variabl"

[[72]]
[1] "infer"

[[73]]
[1] "procedur"

[[74]]
[1] "limit"

[[75]]
[1] "class"

[[76]]
[1] "linear"

[[77]]
[1] "provid"

[[78]]
[1] "regress"

[[79]]
[1] "null"      "hypothesi" "altern"   

[[80]]
[1] "approach"

[[81]]
[1] "base"

[[82]]
[1] "consist"

[[83]]
[1] "correl"

[[84]]
[1] "independ"

[[85]]
[1] "applic"

[[86]]
[1] "carlo" "mont" 

[[87]]
[1] "depend"

[[88]]
[1] "illustr"

[[89]]
[1] "set"

[[90]]
[1] "normal"

[[91]]
[1] "deriv"

[[92]]
[1] "semiparametr" "parametr"    

[[93]]
[1] "appli"

[[94]]
[1] "approxim"

[[95]]
[1] "coeffici"

[[96]]
[1] "cluster"

[[97]]
[1] "nonparametr"

[[98]]
[1] "covari"

[[99]]
[1] "standard"

[[100]]
[1] "varianc"

It turns out that the flash fit (with psedocount =1) is actually a better fit by Frobenius norm than the maximum likelihood fit! Maybe the greedy approach of flash is helping it to find better solutions? In general these plots don’t really show very close correspondence between the data and the fit.

  fv= fitted(fit.nn.s.10)
  sub = sample(1:length(fv),100000)
  plot(lmat_s_10[sub],fv[sub],main="flash fit (pseudocount 10)")

Version Author Date
0346f50 Matthew Stephens 2023-11-08
29f2f9a Matthew Stephens 2023-11-06
68ddffa Matthew Stephens 2023-10-20
  fv= fitted(fit.nn.s.1)
  plot(lmat_s_1[sub],fv[sub],main="flash fit (pseudocount 1)")

Version Author Date
0346f50 Matthew Stephens 2023-11-08
  fv= fitted(fit.nn.s.01)
  plot(lmat_s_01[sub],fv[sub],main="flash fit (pseudocount 0.1)")

Version Author Date
0346f50 Matthew Stephens 2023-11-08
  fv= fitted(fit.nn.s.001)
  plot(lmat_s_001[sub],fv[sub],main="flash fit (pseudocount 0.01)")

Version Author Date
0346f50 Matthew Stephens 2023-11-08
  fv= fit.nn.ml@w %*% (fit.nn.ml.s.1@d*fit.nn.ml.s.1@h)
  plot(lmat_s_1[sub],fv[sub], main = "mle fit")

Version Author Date
0346f50 Matthew Stephens 2023-11-08
  mean((lmat_s_1-fit.nn.ml@w %*% (fit.nn.ml.s.1@d*fit.nn.ml.s.1@h))^2)
[1] 0.0316755
  mean((lmat_s_1-fitted(fit.nn.s.1))^2)
[1] 0.01768353

Comparing the fits

It is hard to go through all the different keyword lists, so I tried comparing fits pairwise. The idea is to focus on factors being found by one fit and not the other when trying to assess whether you prefer one fit or the other.

First I compare pseudocount 1 and 10:

cc = cor(fit.nn.s.1$F_pm,fit.nn.s.10$F_pm)
sum(cc>0.9)
[1] 26

See which ones are fit-specific

spec1 = apply(cc,1,max)<0.9
spec2 = apply(cc,2,max)<0.9
print(get_keywords(fit.nn.s.1)[spec1])
[[1]]
 [1] "treatment" "trial"     "random"    "assign"    "patient"   "effect"   
 [7] "outcom"    "clinic"    "causal"    "placebo"   "assumpt"  

[[2]]
[1] "surviv" "time"   "hazard" "censor" "failur" "studi" 

[[3]]
[1] "wilk"

[[4]]
[1] "rankbas"  "effici"   "asymptot" "rank"    

[[5]]
[1] "varyingcoeffici"

[[6]]
[1] "depth"   "project"

[[7]]
[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[[8]]
[1] "penal"      "nonconcav"  "likelihood" "select"     "variabl"   
[6] "oracl"      "penalti"    "regular"   

[[9]]
[1] "spline" "smooth"

[[10]]
[1] "survey" "popul"  "sampl" 

[[11]]
[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[[12]]
[1] "onestep"

[[13]]
[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[[14]]
[1] "nonnorm"

[[15]]
[1] "theta"   "paramet"

[[16]]
[1] "robin"     "miss"      "zhao"      "rotnitzki" "effici"   

[[17]]
[1] "mestim" "robust"

[[18]]
[1] "finitesampl"

[[19]]
[1] "elect" "vote"  "poll" 

[[20]]
[1] "errorpron" "error"    

[[21]]
[1] "stock"

[[22]]
[1] "garch"   "process" "volatil"

[[23]]
[1] "slice"   "invers"  "regress" "dimens"  "method" 

[[24]]
[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[[25]]
[1] "slope"

[[26]]
[1] "chi"  "test"

[[27]]
[1] "function"   "eigenfunct" "analysi"    "random"     "princip"   
[6] "compon"     "data"      

[[28]]
[1] "tabl"    "conting"

[[29]]
[1] "criterion" "akaik"     "select"    "model"    

[[30]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

[[31]]
[1] "neighborhood"

[[32]]
[1] "maximum"    "welldefin"  "posteriori"
print(get_keywords(fit.nn.s.10,1.2)[spec2])
[[1]]
[1] "lengthbias" "surviv"     "preval"     "cohort"    

[[2]]
[1] "hazard"  "proport"

[[3]]
[1] "meansquar" "predict"   "error"     "small"     "area"     

[[4]]
[1] "onestep"

[[5]]
[1] "polymorph" "genotyp"   "haplotyp"  "snp"      

[[6]]
[1] "equivari"  "depth"     "breakdown" "concept"   "introduc" 

[[7]]
[1] "nonrespons" "imput"      "survey"     "respons"   

[[8]]
[1] "robin"      "miss"       "zhao"       "casecohort" "rotnitzki" 

[[9]]
[1] "vote"   "elect"  "candid"

[[10]]
[1] "sampl"     "survey"    "designbas" "infer"     "weight"    "modelbas" 
[7] "popul"    

[[11]]
[1] "test"      "logrank"   "weight"    "treatment" "formula"   "patient"  
[7] "supremum"  "standard"  "twostag"  

[[12]]
[1] "track"  "replac" "usag"  

[[13]]
[1] "precipit" "spatial" 

[[14]]
character(0)

[[15]]
[1] "twostep"  "submodel"

[[16]]
[1] "design"  "paramet" "effici" 

[[17]]
[1] "timedepend" "covari"     "treatment" 

[[18]]
[1] "miss" "data"

[[19]]
[1] "trim"   "robust" "depth" 

[[20]]
[1] "substitut" "euclidean"

[[21]]
[1] "empir"      "likelihood" "bartlett"   "adjust"    

[[22]]
[1] "volatil"   "highfrequ" "asset"     "price"    

[[23]]
[1] "nonneg"

[[24]]
[1] "norm"   "matrix"

[[25]]
[1] "popul"      "superpopul"

[[26]]
[1] "misspecif"

[[27]]
[1] "file"   "linkag"

[[28]]
[1] "kaplanmei" "quantil"   "surviv"    "censor"   

[[29]]
[1] "axe"    "rotat"  "matric"

[[30]]
[1] "mutual" "empir"  "genet" 

[[31]]
[1] "innov"   "process" "residu" 

[[32]]
[1] "monoton"  "function"

Here I compare the fits with lower pseudocounts.

compare = function(fit1,fit2){
  cc = cor(fit1$F_pm,fit2$F_pm)
  spec1 = apply(cc,1,max)<0.9
  spec2 = apply(cc,2,max)<0.9
  print(get_keywords(fit1)[spec1])
  print(get_keywords(fit2)[spec2])
}

Pseudocount 1 vs 0.1:

compare(fit.nn.s.1,fit.nn.s.01)
[[1]]
[1] "assoc"   "amer"    "statist" "ann"    

[[2]]
[1] "choleski"   "matrix"     "covari"     "decomposit" "factor"    
[6] "interpret" 

[[3]]
[1] "mse"       "predictor" "linear"    "error"     "squar"     "empir"    

[[4]]
[1] "depth"   "project"

[[5]]
[1] "jackknif" "mix"      "squar"    "area"     "varianc" 

[[6]]
[1] "spline" "smooth"

[[7]]
[1] "survey" "popul"  "sampl" 

[[8]]
[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[[9]]
[1] "onestep"

[[10]]
[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[[11]]
[1] "sobolev" "densiti" "minimax" "rate"   

[[12]]
[1] "errorpron" "error"    

[[13]]
[1] "panel" "count"

[[14]]
[1] "stock"

[[15]]
[1] "secondord"

[[16]]
[1] "equat" "estim"

[[17]]
[1] "slice"   "invers"  "regress" "dimens"  "method" 

[[18]]
[1] "survivor"

[[19]]
[1] "slope"

[[20]]
[1] "tabl"    "conting"

[[21]]
[1] "criterion" "akaik"     "select"    "model"    

[[22]]
[1] "neighborhood"

[[23]]
[1] "maximum"    "welldefin"  "posteriori"

[[1]]
 [1] "penalis"       "newtonraphson" "framingham"    "penalti"      
 [5] "likelihood"    "heart"         "failur"        "carri"        
 [9] "algorithm"     "proper"        "conduct"       "advanc"       
[13] "grow"          "dropout"       "familiar"      "prospect"     

[[2]]
[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

[[3]]
 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

[[4]]
 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

[[5]]
 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

[[6]]
 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

[[7]]
 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

[[8]]
 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

[[9]]
 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[[10]]
[1] "quantil" "regress"

[[11]]
 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

[[12]]
 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[[13]]
[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

[[14]]
 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[[15]]
[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

[[16]]
[1] "maximum"    "likelihood" "estim"     

[[17]]
[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[[18]]
[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

[[19]]
 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[[20]]
[1] "coeffici" "regress" 

[[21]]
 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[[22]]
[1] "unequ"     "designbas" "survey"    "weight"   

[[23]]
[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[[24]]
[1] "variancecovari" "matrix"         "analyz"        

[[25]]
[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[[26]]
[1] "bspline" "kernel"  "penal"  

[[27]]
 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

[[28]]
 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

[[29]]
 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

[[30]]
 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

[[31]]
 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

[[32]]
 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

[[33]]
 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[[34]]
[1] "bar"    "vertic" "cap"    "lambda"

[[35]]
 [1] "integ"      "algebra"    "coher"      "ail"        "ident"     
 [6] "countabl"   "multist"    "system"     "appl"       "finit"     
[11] "classic"    "object"     "ideal"      "grid"       "util"      
[16] "math"       "fewer"      "state"      "call"       "binari"    
[21] "inequ"      "pure"       "geometri"   "comprehens" "alpha"     
[26] "posit"      "socal"      "repres"     "idea"       "complex"   
[31] "probabl"    "yield"      "failur"     "relat"      "type"      

[[36]]
 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

[[37]]
 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

[[38]]
 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

[[39]]
 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

[[40]]
 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

[[41]]
 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

[[42]]
 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[[43]]
[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

[[44]]
 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[[45]]
[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[[46]]
[1] "iid"   "prove"

[[47]]
 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

[[48]]
 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

[[49]]
 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[[50]]
 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[[51]]
[1] "goodnessoffit" "test"          "includ"        "residu"       

[[52]]
 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

[[53]]
 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

[[54]]
 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

[[55]]
 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

[[56]]
 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[[57]]
[1] "size"   "sampl"  "number"

[[58]]
 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

[[59]]
[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[[60]]
[1] "tilt"       "exponenti"  "constraint" "employ"    

[[61]]
 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

[[62]]
 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

[[63]]
 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

[[64]]
 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

[[65]]
 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

[[66]]
 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

[[67]]
 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[[68]]
[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[[69]]
[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

[[70]]
 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "admit"         "stein"         "identif"      
 [9] "moder"         "satisfi"       "nontrivi"      "impli"        
[13] "detail"        "deviat"        "loglikelihood" "benchmark"    
[17] "moment"        "nest"          "yield"         "exponenti"    
[21] "decay"         "deal"          "difficulti"    "mild"         
[25] "posit"         "relat"         "version"       "prove"        

[[71]]
 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 

[[72]]
[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

[[73]]
 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  

Pseudocount 0.1 vs 0.01. The 0.01 are not as bad as I expected.

compare(fit.nn.s.01,fit.nn.s.001)
[[1]]
 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

[[2]]
 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

[[3]]
 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

[[4]]
 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[[5]]
[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

[[6]]
 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

[[7]]
 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

[[8]]
 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[[9]]
[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

[[10]]
 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[[11]]
[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

[[12]]
[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[[13]]
[1] "unequ"     "designbas" "survey"    "weight"   

[[14]]
[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[[15]]
[1] "varianc"  "asymptot"

[[16]]
 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

[[17]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

[[18]]
 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[[19]]
 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

[[20]]
 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

[[21]]
 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

[[22]]
 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

[[23]]
 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

[[24]]
 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

[[25]]
 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[[26]]
[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

[[27]]
 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[[28]]
[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[[29]]
[1] "iid"   "prove"

[[30]]
 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[[31]]
[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

[[32]]
[1] "goodnessoffit" "test"          "includ"        "residu"       

[[33]]
 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

[[34]]
 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

[[35]]
 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

[[36]]
 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

[[37]]
 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[[38]]
 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

[[39]]
[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[[40]]
[1] "tilt"       "exponenti"  "constraint" "employ"    

[[41]]
 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

[[42]]
 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

[[43]]
 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

[[44]]
 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

[[45]]
 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

[[46]]
 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

[[47]]
 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[[48]]
[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[[49]]
[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

[[50]]
 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 

[[51]]
[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

[[52]]
 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  

[[1]]
 [1] "dichotom"        "outcom"          "exposur"         "genet"          
 [5] "inherit"         "confound"        "interact"        "causal"         
 [9] "trial"           "factor"          "binari"          "presenc"        
[13] "categor"         "assess"          "alcohol"         "continu"        
[17] "disord"          "misspecif"       "ordin"           "clinic"         
[21] "postul"          "trait"           "topic"           "environment"    
[25] "subgroup"        "potenti"         "geneenviron"     "alter"          
[29] "adequ"           "examin"          "adjust"          "intermedi"      
[33] "cancer"          "robin"           "stage"           "logist"         
[37] "arm"             "firststag"       "generic"         "latent"         
[41] "build"           "variabl"         "conduct"         "affect"         
[45] "accommod"        "prone"           "submodel"        "transmiss"      
[49] "mental"          "mediat"          "unspecifi"       "quantit"        
[53] "expos"           "major"           "multipli"        "sever"          
[57] "believ"          "gene"            "zhang"           "distributionfre"
[61] "routin"          "today"          

[[2]]
 [1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
 [6] "viral"        "transmiss"    "vaccin"       "subject"      "genet"       
[11] "drug"         "develop"      "efficaci"     "mutat"        "outcom"      
[16] "causal"       "cell"         "syndrom"      "medic"        "pathway"     
[21] "resist"       "evolutionari" "therapi"      "pressur"     

[[3]]
 [1] "dropout"       "stratum"       "prevent"       "reduc"        
 [5] "oil"           "trial"         "adjust"        "longitudin"   
 [9] "cancer"        "prostat"       "mechan"        "men"          
[13] "find"          "stratifi"      "arm"           "nuisanc"      
[17] "treatment"     "assign"        "grade"         "doubleblind"  
[21] "avoid"         "colleagu"      "randomeffect"  "sever"        
[25] "verif"         "agent"         "conjectur"     "annual"       
[29] "nonignor"      "placebo"       "volum"         "elect"        
[33] "caus"          "daili"         "visit"         "preval"       
[37] "absolut"       "lie"           "indic"         "sensit"       
[41] "frequent"      "particip"      "year"          "reduct"       
[45] "causal"        "report"        "newtonraphson" "adopt"        
[49] "question"      "women"         "elder"         "surrog"       
[53] "inform"        "elicit"        "prospect"      "collabor"     
[57] "drawn"         "ignor"         "differ"        "link"         
[61] "retain"        "tilt"          "random"        "constraint"   
[65] "status"        "impli"         "doubli"        "expert"       
[69] "nonidentifi"   "intermitt"     "satur"         "sex"          
[73] "characterist"  "invers"       

[[4]]
  [1] "polici"       "statistician" "maker"        "decis"        "scienc"      
  [6] "role"         "technolog"    "today"        "chang"        "live"        
 [11] "bring"        "social"       "communic"     "integr"       "individu"    
 [16] "futur"        "knowledg"     "disciplin"    "nation"       "public"      
 [21] "scientif"     "health"       "activ"        "human"        "impact"      
 [26] "organ"        "inform"       "protect"      "promot"       "qualiti"     
 [31] "understand"   "program"      "way"          "student"      "mathemat"    
 [36] "increas"      "face"         "foundat"      "play"         "essenti"     
 [41] "uncertainti"  "effort"       "engin"        "expect"       "advanc"      
 [46] "confidenti"   "children"     "relev"        "make"         "industri"    
 [51] "govern"       "countri"      "encourag"     "polit"        "place"       
 [56] "modern"       "intern"       "scientist"    "closer"       "benefit"     
 [61] "reflect"      "explor"       "stronger"     "purpos"       "univers"     
 [66] "spread"       "environment"  "network"      "grow"         "forc"        
 [71] "access"       "devic"        "ingredi"      "excel"        "comprehens"  
 [76] "pollut"       "attract"      "broader"      "elementari"   "evolv"       
 [81] "train"        "pressur"      "air"          "option"       "imposs"      
 [86] "secondari"    "map"          "edg"          "success"      "progress"    
 [91] "critic"       "global"       "action"       "year"         "agenc"       
 [96] "communiti"    "american"     "quantit"      "genom"        "system"      
[101] "fundament"    "discoveri"    "evid"         "guarante"     "mortal"      
[106] "address"      "citi"         "requir"       "technic"      "serv"        
[111] "path"         "statist"      "separ"        "climat"       "contribut"   
[116] "opportun"     "adequaci"     "disabl"       "affect"       "driven"      
[121] "grade"        "psycholog"    "diagnost"     "morbid"       "view"        
[126] "delay"        "primari"      "state"       

[[5]]
 [1] "slice"     "invers"    "dimens"    "reduct"    "regress"   "averag"   
 [7] "sir"       "direct"    "central"   "goal"      "respons"   "save"     
[13] "subset"    "method"    "predictor" "subspac"   "varianc"   "preserv"  
[19] "replac"    "suffici"   "systemat" 

[[6]]
 [1] "band"       "confid"     "simultan"   "consid"     "trajectori"
 [6] "extend"     "choos"      "regular"    "asymptot"   "ball"      
[11] "uniform"   

[[7]]
[1] "absolut"  "deviat"   "clip"     "oracl"    "progress"

[[8]]
 [1] "breakdown"  "point"      "robust"     "depth"      "locat"     
 [6] "project"    "equivari"   "finit"      "function"   "possess"   
[11] "contamin"   "competitor" "affin"      "definit"    "introduc"  
[16] "lead"       "induc"      "influenc"   "high"       "outlier"   
[21] "strong"     "trim"       "median"     "region"     "york"      
[26] "scale"      "desir"      "favor"      "turn"       "pursu"     
[31] "enjoy"      "scatter"    "suffic"     "behav"      "uniform"   
[36] "relat"      "comparison" "suggest"    "fact"       "univari"   
[41] "ann"        "radius"    

[[9]]
 [1] "spacetim"       "spatial"        "fit"            "year"          
 [5] "site"           "separ"          "intens"         "california"    
 [9] "thin"           "process"        "monitor"        "residu"        
[13] "tempor"         "activ"          "multidimension" "occurr"        
[17] "space"          "background"     "appear"         "origin"        
[21] "smoother"       "irregular"      "earthquak"      "indic"         
[25] "asymmetr"       "trend"          "hazard"         "spectral"      
[29] "symmetr"        "environment"    "ozon"           "wind"          
[33] "meteorolog"     "daili"          "allow"          "rescal"        
[37] "season"         "time"           "anisotrop"      "cross"         
[41] "insid"          "bear"           "arbitrari"      "autoregress"   
[45] "interact"       "magnitud"       "sequenc"        "homogen"       
[49] "widespread"     "sphere"         "coordin"        "highlight"     
[53] "elabor"         "extrem"         "ascertain"      "forest"        
[57] "counti"         "rotat"          "month"          "threat"        
[61] "govern"         "secondari"      "aic"            "account"       
[65] "aid"            "emphas"         "routin"         "assess"        
[69] "departur"       "rare"          

[[10]]
 [1] "survey"      "nonrespons"  "census"      "nation"      "respond"    
 [6] "imput"       "popul"       "health"      "race"        "bureau"     
[11] "nonignor"    "unit"        "respons"     "item"        "incom"      
[16] "miss"        "person"      "year"        "state"       "bias"       
[21] "employ"      "higher"      "valu"        "sensit"      "interview"  
[26] "labor"       "nonrespond"  "age"         "feder"       "collect"    
[31] "measur"      "handl"       "assess"      "report"      "level"      
[36] "counti"      "domain"      "preval"      "agenc"       "confidenti" 
[41] "benchmark"   "incorpor"    "protect"     "status"      "cell"       
[46] "earn"        "produc"      "sourc"       "relat"       "weight"     
[51] "propens"     "public"      "household"   "area"        "geograph"   
[56] "nutrit"      "document"    "lower"       "plan"        "bodi"       
[61] "gender"      "extrapol"    "preliminari" "birth"       "polit"      
[66] "correct"     "american"    "proxi"       "requir"      "previous"   
[71] "children"    "york"        "unemploy"    "death"      

[[11]]
 [1] "jackknif"  "file"      "replic"    "varianc"   "inconsist" "strata"   
 [7] "analyt"    "unbias"    "met"       "domain"    "schedul"   "freedom"  
[13] "survey"    "attain"    "balanc"    "mix"       "ensur"     "public"   
[19] "repeat"    "upper"     "bootstrap" "uncondit"  "plausibl"  "person"   
[25] "pseudo"    "concern"   "linkag"   

[[12]]
[1] "root"     "squar"    "approxim"

[[13]]
 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "gene"          "latent"        "viral"         "initi"        
 [9] "biomark"       "understand"    "protein"       "pronounc"     
[13] "infect"        "therapi"       "supplementari" "quantifi"     
[17] "concentr"      "chemic"        "tackl"         "incorrect"    
[21] "healthi"       "identifi"      "molecular"     "human"        
[25] "serum"         "hormon"        "investig"      "experiment"   
[29] "search"        "status"        "sort"          "drug"         
[33] "inflat"        "pertin"        "mediat"        "mutat"        
[37] "resist"        "absent"        "blood"         "exemplifi"    
[41] "valuabl"       "phenotyp"      "led"           "indic"        
[45] "subsequ"       "format"        "framework"    

[[14]]
[1] "establish" "asymptot"  "consist"   "converg"  

[[15]]
[1] "bootstrap" "confid"    "distribut" "sampl"     "interv"    "method"   
[7] "correct"   "seri"      "empir"    

[[16]]
 [1] "imag"     "magnet"   "reson"    "field"    "brain"    "fmri"    
 [7] "activ"    "voxel"    "signal"   "detect"   "locat"    "volum"   
[13] "accur"    "follow"   "task"     "motion"   "region"   "visual"  
[19] "identifi" "exploit"  "tissu"    "aim"      "contigu"  "map"     
[25] "rotat"    "neuron"  

[[17]]
 [1] "ozon"            "maxima"          "splinebas"       "nonlinear"      
 [5] "piecewiselinear" "concentr"        "pressur"         "cycl"           
 [9] "transport"       "variat"          "contribut"       "atmospher"      
[13] "peak"            "trend"           "measur"          "basi"           
[17] "evid"            "instrument"      "thought"         "greater"        
[21] "link"            "scientif"        "lag"             "dimensionreduct"
[25] "absenc"          "wave"            "global"          "separ"          
[29] "month"           "coincid"         "influenc"        "lowdimension"   
[33] "clear"           "contrast"        "lower"           "year"           
[37] "site"            "qualiti"         "profil"          "sequenc"        
[41] "sensit"          "origin"          "relat"           "presenc"        
[45] "satellit"        "partial"         "pattern"         "identifi"       

[[18]]
 [1] "experienc" "event"     "deterior"  "trial"     "aberr"     "patient"  
 [7] "import"    "die"       "protocol"  "benefici"  "rank"      "treatment"
[13] "mention"   "wilcoxon"  "receiv"    "children"  "aspect"    "consequ"  
[19] "exact"     "preserv"   "fisher"    "placebo"   "sort"      "magnitud" 
[25] "longer"    "medic"     "exposur"   "adequ"     "discard"   "greatest" 
[31] "fact"      "need"      "invert"    "substanti" "subsequ"   "tabl"     
[37] "remov"     "exhibit"   "way"       "basic"     "singl"     "health"   
[43] "aim"       "care"      "interv"    "complet"   "specif"    "sum"      
[49] "question"  "cubic"     "cancer"    "situat"    "extrem"    "splinebas"
[55] "outcom"    "treat"     "rotat"     "control"   "binari"    "effect"   

[[19]]
 [1] "electr"        "forecast"      "renew"         "bivari"       
 [5] "load"          "market"        "daili"         "power"        
 [9] "serial"        "shortterm"     "wind"          "autoregress"  
[13] "diagon"        "speed"         "time"          "season"       
[17] "focus"         "difficult"     "peak"          "spectrum"     
[21] "temperatur"    "regressor"     "heteroscedast" "firstord"     
[25] "total"         "highlight"     "energi"        "justifi"      
[29] "simpl"         "week"          "vari"          "hour"         
[33] "trend"         "citi"          "recogn"        "stationari"   
[37] "autocovari"    "detail"        "promis"        "realiti"      
[41] "favor"         "reveal"        "year"          "longmemori"   
[45] "gain"          "accuraci"      "exploit"       "predict"      
[49] "option"        "reliabl"       "price"         "evolut"       
[53] "avail"         "superpopul"   

[[20]]
 [1] "day"        "daili"      "record"     "time"       "financi"   
 [6] "activ"      "short"      "peak"       "consecut"   "help"      
[11] "autocovari" "appropri"   "intens"     "physic"     "character" 
[16] "measur"     "children"   "trade"      "strength"   "scalar"    
[21] "superposit" "incomplet"  "copi"      

[[21]]
[1] "secondord"   "firstord"    "accur"       "expans"      "unbias"     
[6] "moment"      "approxim"    "frequentist" "exact"      

[[22]]
 [1] "treatment"  "assign"     "causal"     "score"      "outcom"    
 [6] "propens"    "averag"     "effect"     "grade"      "school"    
[11] "potenti"    "stratif"    "promot"     "confound"   "rubin"     
[16] "student"    "unit"       "regim"      "educ"       "adjust"    
[21] "children"   "plausibl"   "polici"     "program"    "evid"      
[26] "pretreat"   "posttreat"  "summar"     "stage"      "child"     
[31] "intermedi"  "assumpt"    "retain"     "multilevel" "block"     
[36] "econom"     "experiment" "stabl"      "arbitrari"  "nation"    
[41] "articl"     "balanc"     "learn"      "perspect"   "status"    
[46] "unmeasur"   "fewer"      "scalar"     "affect"     "low"       
[51] "mathemat"   "track"      "twostag"    "covari"     "tradeoff"  
[56] "recov"      "nonrandom"  "bind"       "pose"       "estimand"  
[61] "impos"      "feasibl"    "return"    

[[23]]
 [1] "extrapol"      "errorpron"     "posttreat"     "instrument"   
 [5] "classic"       "baselin"       "replic"        "subsampl"     
 [9] "nonlinear"     "daili"         "summari"       "air"          
[13] "encount"       "subset"        "bias"          "efficaci"     
[17] "heteroscedast" "frequenc"      "trajectori"    "spheric"      
[21] "supplementari" "correct"       "multiscal"     "scatter"      
[25] "reconstruct"   "subject"       "error"         "temperatur"   

[[24]]
 [1] "admiss"       "inadmiss"     "loss"         "bay"          "risk"        
 [6] "endpoint"     "action"       "ann"          "accept"       "math"        
[11] "genom"        "screen"       "stringent"    "result"       "complet"     
[16] "stepup"       "character"    "formul"       "treat"        "pearson"     
[21] "amer"         "assoc"        "biometrika"   "prototyp"     "vector"      
[26] "pay"          "reject"       "decad"        "revisit"      "metaanalysi" 
[31] "criteria"     "effort"       "bioassay"     "thought"      "hard"        
[36] "psycholog"    "nonneg"       "predetermin"  "fals"         "energi"      
[41] "earlier"      "educ"         "hoc"          "stein"        "emerg"       
[46] "fair"         "dna"          "appeal"       "sign"         "singlestep"  
[51] "drug"         "microarray"   "statistician" "jeffrey"      "year"        
[56] "fewer"        "fisher"       "paper"        "resembl"      "paradox"     
[61] "share"        "twodimension" "nonzero"      "stepdown"     "seek"        
[66] "expect"      

[[25]]
 [1] "unbound"   "novelti"   "function"  "yield"     "oracl"     "tail"     
 [7] "decreas"   "satisfi"   "anisotrop" "inequ"     "median"    "slower"   
[13] "literatur" "bivari"    "free"      "vast"      "fast"      "input"    
[19] "setup"     "output"    "aggreg"    "aforement" "behav"     "influenti"
[25] "iii"       "bound"     "univers"   "main"      "nuclear"   "radius"   
[31] "need"      "tilt"      "hyperplan" "higherord" "symmetri"  "equivari" 
[37] "gee"       "scatter"   "bin"       "quadrat"  

[[26]]
 [1] "schedul"         "longitudin"      "followup"        "analys"         
 [5] "phase"           "generat"         "incomplet"       "flexibl"        
 [9] "respons"         "avail"           "ill"             "unbalanc"       
[13] "pursu"           "offer"           "enter"           "resourc"        
[17] "impact"          "merg"            "concret"         "intermitt"      
[21] "interim"         "preced"          "perfect"         "divid"          
[25] "maker"           "face"            "preliminari"     "fluctuat"       
[29] "missingatrandom" "versatil"        "alloc"           "timetoev"       
[33] "withinsubject"   "compromis"       "manag"           "metropoli"      
[37] "missingdata"     "walk"            "logrank"        

[[27]]
[1] "real"    "simul"   "data"    "illustr"

[[28]]
[1] "misspecifi" "robust"     "misspecif" 

Overall the results for pseudocounts 0.01-1 look kind of reasonable…

Looking at memberships

Looking at the non-zero memberships, it seems all four pseudo-counts result in similar overall levels of sparsity of \(L\).

hist_lnorm = function(fit,...){
  LL = fit$L_pm
  Lnorm = t(t(LL)/apply(LL,2,max))
  hist(Lnorm[Lnorm>0.01],...)
}
hist_lnorm(fit.nn.s.10,main="pseudocount=10",ylim=c(0,800),nclass=20)

Version Author Date
0346f50 Matthew Stephens 2023-11-08
29f2f9a Matthew Stephens 2023-11-06
68ddffa Matthew Stephens 2023-10-20
hist_lnorm(fit.nn.s.1,main="pseudocount=1",ylim=c(0,800),nclass=20)

Version Author Date
0346f50 Matthew Stephens 2023-11-08
hist_lnorm(fit.nn.s.01,main="pseudocount=0.1",ylim=c(0,800),nclass=20)

Version Author Date
0346f50 Matthew Stephens 2023-11-08
hist_lnorm(fit.nn.s.001,main="pseudocount=0.01",ylim=c(0,800),nclass=20)

Version Author Date
0346f50 Matthew Stephens 2023-11-08

Here I threshold the normalized L values at 0.2 to get an idea of how many factors are present per document. All the documents are loaded on the first factor so the ones that load on only one factor can be thought of as not really being assigned to any topic.

LL = fit.nn.s.01$L_pm
FF = fit.nn.s.01$F_pm
Lnorm = t(t(LL)/apply(LL,2,max))
Fnorm = t(t(FF)*apply(LL,2,max))

nfac = rowSums(Lnorm>0.2)
hist(nfac,breaks = seq(0.5,9.5,length=10))

Version Author Date
0346f50 Matthew Stephens 2023-11-08
29f2f9a Matthew Stephens 2023-11-06

Here I make an initial structure plot of the results.

structure_plot_general = function(Lhat,Fhat,grouping,title=NULL,
                                  loadings_order = 'embed',
                                  print_plot=FALSE,
                                  seed=12345,
                                  n_samples = NULL,
                                  gap=40,
                                  std_L_method = 'sum_to_1',
                                  show_legend=TRUE,
                                  K = NULL
                                  ){
  set.seed(seed)
  #s       <- apply(Lhat,2,max)
  #Lhat    <-   t(t(Lhat) / s)

  if(is.null(n_samples)&all(loadings_order == "embed")){
    n_samples = 2000
  }

  if(std_L_method=='sum_to_1'){
    Lhat = Lhat/rowSums(Lhat)
  }
  if(std_L_method=='row_max_1'){
    Lhat = Lhat/c(apply(Lhat,1,max))
  }
  if(std_L_method=='col_max_1'){
    Lhat = apply(Lhat,2,function(z){z/max(z)})
  }
  if(std_L_method=='col_norm_1'){
    Lhat = apply(Lhat,2,function(z){z/norm(z,'2')})
  }
  
  if(!is.null(K)){
    Lhat = Lhat[,1:K]
    Fhat = Fhat[,1:K]
  }
  Fhat = matrix(1,nrow=3,ncol=ncol(Lhat))
  if(is.null(colnames(Lhat))){
    colnames(Lhat) <- paste0("k",1:ncol(Lhat))
  }
  fit_list     <- list(L = Lhat,F = Fhat)
  class(fit_list) <- c("multinom_topic_model_fit", "list")
  p <- structure_plot(fit_list,grouping = grouping,
                      loadings_order = loadings_order,
                      n = n_samples,gap = gap,verbose=F) +
    labs(y = "loading",color = "dim",fill = "dim") + ggtitle(title)
  if(!show_legend){
    p <- p + theme(legend.position="none")
  }
  if(print_plot){
    print(p)
  }
  return(p)
}

This is structure plot (with first common factor set to 0)

Lnorm0=Lnorm
Fnorm0=Fnorm
Lnorm0[,1]=0
Fnorm0[,1]=0
structure_plot_general(Lnorm0,Fnorm0)
Running tsne on 1924 x 108 matrix.

Version Author Date
0346f50 Matthew Stephens 2023-11-08
structure_plot_general(Lnorm,Fnorm,std_L_method = "col_max_1")
Running tsne on 1924 x 108 matrix.

Version Author Date
0346f50 Matthew Stephens 2023-11-08

Repeat for smaller pseudocount

LL = fit.nn.s.001$L_pm
FF = fit.nn.s.001$F_pm
Lnorm = t(t(LL)/apply(LL,2,max))
Fnorm = t(t(FF)*apply(LL,2,max))

nfac = rowSums(Lnorm>0.2)
hist(nfac,breaks = seq(0.5,9.5,length=10))

Version Author Date
0346f50 Matthew Stephens 2023-11-08

This is structure plot (with first common factor set to 0)

Lnorm0=Lnorm
Fnorm0=Fnorm
Lnorm0[,1]=0
Fnorm0[,1]=0
structure_plot_general(Lnorm0,Fnorm0)
Running tsne on 1924 x 84 matrix.

Version Author Date
0346f50 Matthew Stephens 2023-11-08

Here without making the columns sum to 1. It is interesting that the plot seems to make the memberships here look more “binary” than for the larger pseudo-count.

structure_plot_general(Lnorm,Fnorm,std_L_method = "col_max_1")
Running tsne on 1924 x 84 matrix.

Version Author Date
0346f50 Matthew Stephens 2023-11-08

Thresholding factors

One thing I noticed is that some factors have a single document that is “driving” them (membership 1 in the normalized L), and no other document that has appreciable membership (say 0.5) even though several documents will have membership. For example, take topic 86 in the 01 fit. From the keywords it looks like “recommender system” factor, but also a “nearest neighbor” factor. It seems to be driven by a single document that has both those features.

get_keywords(fit.nn.s.01)[86]
[[1]]
 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   
LL = fit.nn.s.01$L_pm
FF = fit.nn.s.01$F_pm
Lnorm = t(t(LL)/apply(LL,2,max))
Fnorm = t(t(FF)*apply(LL,2,max))
Lnorm0=Lnorm
Fnorm0=Fnorm
Lnorm0[,1]=0
Fnorm0[,1]=0

plot(Lnorm[,86])

Version Author Date
0346f50 Matthew Stephens 2023-11-08
order(Lnorm[,86],decreasing = TRUE)[1:4]
[1] 1181 1395 1460 1024
sla[1181,]$abstract
[1] "Collaborative recommendation is an information-filtering technique that attempts to present information items that are likely of interest to an Internet user. Traditionally, collaborative systems deal with situations with two types of variables, users and items. In its most common form, the problem is framed as trying to estimate ratings for items that have not yet been consumed by a user. Despite wide-ranging literature, little is known about the statistical properties of recommendation systems. In fact, no clear probabilistic model even exists which would allow us to precisely describe the mathematical forces driving collaborative filtering. To provide an initial contribution to this, we propose to set out a general sequential stochastic model for collaborative recommendation. We offer an in-depth analysis of the so-called cosine-type nearest neighbor collaborative method, which is one of the most widely used algorithms in collaborative filtering, and analyze its asymptotic performance as the number of users grows. We establish consistency of the procedure under mild assumptions on the model. Rates of convergence and examples are also provided."
sla[1395,]$abstract
[1] "It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of the actual sample size, in the case of with-replacement bagging, or less than 50% of the sample size, for without-replacement bagging. However, for larger sampling fractions there is no asymptotic difference between the risk of the regular nearest neighbour classifier and its bagged version. In particular, neither achieves the large sample performance of the Bayes classifier. In contrast, when the sampling fractions converge to 0, but the resample sizes diverge to infinity, the bagged classifier converges to the optimal Bayes rule and its risk converges to the risk of the latter. These results are most readily seen when the two populations have well-defined densities, but they may also be derived in other cases, where densities exist in only a relative sense. Cross-validation can be used effectively to choose the sampling fraction. Numerical calculation is used to illustrate these theoretical properties."
sla[1460,]$abstract
[1] "Traditionally the neighbourhood size k in the k-nearest-neighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the k-nearest-neighbour algorithm using likelihood-based inference. Our method takes the form of a generalised linear regression on a set of k-nearest-neighbour autocovariates. By defining the k-nearest-neighbour algorithm in this way we are able to extend the method to accommodate the original predictor variables as possible linear effects as well as allowing for the inclusion of multiple nearest-neighbour terms. The choice of the final model proceeds via a stepwise regression procedure. It is shown that our method incorporates a conventional generalised linear model and a conventional k-nearest-neighbour algorithm as special cases. Empirical results suggest that the method out-performs the standard k-nearest-neighbour method in terms of misclassification rate on a wide variety of data-sets."
sla[1024,]$abstract
[1] "In this article we study random forests through their connection with a new framework of adaptive nearest-neighbor methods. We introduce a concept of potential nearest neighbors (k-PNNs) and show that random forests can be viewed as adaptively weighted k-PNN methods. Various aspects of random forests can be studied from this perspective. We study the effect of terminal node sizes on the prediction accuracy of random forests. We further show that random forests with adaptive splitting schemes assign weights to k-PNNs in a desirable way: for the estimation at a given target point, these random forests assign voting weights to the k-PNNs of the target point according to the local importance of different input variables. We propose a new simple splitting scheme that achieves desirable adaptivity in a straightforward fashion. This simple scheme can be combined with existing algorithms. The resulting algorithm is computationally faster and gives comparable results. Other possible aspects of random forests, such as using linear combinations in splitting, are also discussed. Simulations and real datasets are used to illustrate the results."

It seems that this factor is being “polluted” by the strongest single document - it is perhaps actually a “nearest neighbor” factor, not a “recommender system” factor.

Here I look at some other factors that have a single outlying document to see what they look like

which(colSums(Lnorm0>0.5)==1)
 [1]  17  68  69  74  86  92  99 100 101 105 106
get_keywords(fit.nn.s.01)[colSums(Lnorm0>0.5)==1]
[[1]]
 [1] "penalis"       "newtonraphson" "framingham"    "penalti"      
 [5] "likelihood"    "heart"         "failur"        "carri"        
 [9] "algorithm"     "proper"        "conduct"       "advanc"       
[13] "grow"          "dropout"       "familiar"      "prospect"     

[[2]]
 [1] "integ"      "algebra"    "coher"      "ail"        "ident"     
 [6] "countabl"   "multist"    "system"     "appl"       "finit"     
[11] "classic"    "object"     "ideal"      "grid"       "util"      
[16] "math"       "fewer"      "state"      "call"       "binari"    
[21] "inequ"      "pure"       "geometri"   "comprehens" "alpha"     
[26] "posit"      "socal"      "repres"     "idea"       "complex"   
[31] "probabl"    "yield"      "failur"     "relat"      "type"      

[[3]]
 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

[[4]]
 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

[[5]]
 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

[[6]]
 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

[[7]]
 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

[[8]]
 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

[[9]]
 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

[[10]]
 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "admit"         "stein"         "identif"      
 [9] "moder"         "satisfi"       "nontrivi"      "impli"        
[13] "detail"        "deviat"        "loglikelihood" "benchmark"    
[17] "moment"        "nest"          "yield"         "exponenti"    
[21] "decay"         "deal"          "difficulti"    "mild"         
[25] "posit"         "relat"         "version"       "prove"        

[[11]]
 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 
plot(Lnorm[,17])

Version Author Date
0346f50 Matthew Stephens 2023-11-08
order(Lnorm[,17],decreasing = TRUE)[1:4]
[1] 1789  475  792 1781
sla[1789,]$abstract
[1] "In this paper, we propose a penalised pseudo-partial likelihood method for variable selection with multivariate failure time data with a growing number of regression coefficients. Under certain regularity conditions, we show the consistency and asymptotic normality of the penalised likelihood estimators. We further demonstrate that, for certain penalty functions with proper choices of regularisation parameters, the resulting estimator can correctly identify the true model, as if it were known in advance. Based on a simple approximation of the penalty function, the proposed method can be easily carried out with the Newton-Raphson algorithm. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed procedures. We illustrate the proposed method by analysing a dataset from the Framingham Heart Study."
sla[475,]$abstract
[1] "Pattern-mixture models are frequently used for longitudinal data analysis with dropouts because they do not require explicit specification Of the dropout mechanism. These models stratify the data according to time to dropout and formulate a model for each stratum. This usually results in underindentifiability, because we need to estimate many pattern-specific parameters even though the eventual interest is usually or, the marginal parameters. In this article we extend this framework to a random pattern-mixture model, where the pattern-specific parameters are treated as nuisance parameters and modeled as random instead of fixed. The pattern is defined according to a surrogate for the dropout process. A constraint is then put oil the pattern by linking it to the time to dropout using a random-effects survival model. We assume, conditional on the latent pattern effects. that the longitudinal outcome and the dropout process are independent. This model retains the robustness of the traditional pattern-mixture models. while avoiding the overparameterization problem. When we define each subject as a separate stratum. this model reduces to the shared parameter model. Maximum likelihood estimates are obtained using an EM Newton-Raphson algorithm. We apply the method to the depression data from the Prevention of Suicide in Primary Care Elderly Collaborative Trial (PROSPECT). We show when the dropout information is adjusted for under the proposed model, the treatment seems to reduce depression in the elderly."
sla[792,]$abstract
[1] "We propose a nonparametric method for identifying parsimony and for producing a statistically efficient estimator of a large covariance matrix. We reparameterise a covariance matrix through the modified Cholesky decomposition of its inverse or the one-step-ahead predictive representation of the vector of responses and reduce the nonintuitive task of modelling covariance matrices to the familiar task of model selection and estimation for a sequence of regression models. The Cholesky factor containing these regression coefficients is likely to have many off-diagonal elements that are zero or close to zero. Penalised normal likelihoods in this situation with L-1 and L-2 penalities are shown to be closely related to Tibshirani's (1996) LASSO approach and to ridge regression. Adding either penalty to the likelihood helps to produce more stable estimators by introducing shrinkage to the elements in the Cholesky factor, while, because of its singularity, the L-1 penalty will set some elements to zero and produce interpretable models. An algorithm is developed for computing the estimator and selecting the tuning parameter. The proposed maximum penalised likelihood estimator is illustrated using simulation and a real dataset involving estimation of a 102 x 102 covariance matrix."
sla[1781,]$abstract
[1] "This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton-Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost."
plot(Lnorm[,69])

Version Author Date
0346f50 Matthew Stephens 2023-11-08
order(Lnorm[,69],decreasing = TRUE)[1:4]
[1] 1867  288 1751 1258
sla[1867,]$abstract
[1] "We show that the class of conditional distributions satisfying the coarsening at random (CAR) property for discrete data has a simple and robust algorithmic description based oil randomized uniform multicovers: combinatorial objects generalizing the notion of partition of a set. However, the complexity of a given CAR mechanism can be large: the maximal \"height\" of the needed multicovers can be exponential in the number of points, in the sample space. The results stein from a geometric interpretation of the set of CAR distributions as a convex polytope and a characterization of its extreme points. The hierarchy of CAR models defined in this way could be useful in parsimonious statistical modeling of CAR mechanisms, though the results also raise doubts in applied work as to the meaningfulness of the CAR assumption in its full generality."
sla[288,]$abstract
[1] "Attachment loss, the extent of a tooth's root (in millimeters) that is no longer attached to surrounding bone by periodontal ligament, is often used to measure the current state of a patient's periodontal disease and monitor disease progression. Attachment loss data can be analyzed using a conditionally autoregressive (CAR) prior distribution that smooths fitted values toward neighboring values. However, it may be desirable to have more than one class of neighbor relation in the spatial structure, so the different classes of neighbor relations can induce different degrees of smoothing. For example, we may wish to allow smoothing of neighbor pairs bridging the gap between teeth to differ from smoothing of pairs that do not bridge such gaps. Adequately modeling the spatial structure may improve the monitoring of periodontal disease progression. This article develops a two-neighbor-relation CAR model to handle this situation and presents associated theory to help explain the sometimes unusual posterior distributions of the parameters controlling the different types of smoothing. The posterior of these smoothing parameters often has long upper tails, and its shape can change dramatically depending on the spatial structure. Like previous authors, we show that the prior distribution on these parameters has little effect on the posterior of the fixed effects but has a marked influence on the posterior of both the random effects and the smoothing parameters. Our analysis of attachment loss data also suggests that the spatial structure itself varies between individuals."
sla[1751,]$abstract
[1] "An easy-to-implement global procedure for testing the four assumptions of the linear model is proposed. The test can be viewed as a Neyman smooth test and relies only on the standardized residual vector. If the global procedure indicates a violation of at least one of the assumptions, then the components of the global test statistic can be used to gain insight into which assumptions have been violated. The procedure can also be used in conjunction with associated deletion statistics to detect unusual observations. Simulation results are presented indicating the sensitivity of the procedure in detecting model violations under a variety of situations, and its performance is compared with three potential competitors, including a procedure based on the Box-Cox power transformation. The procedure is demonstrated by applying it to a new car mileage dataset and a water salinity dataset that has been used earlier to illustrate model diagnostics."
sla[1258,]$abstract
[1] "This paper provides answers to questions regarding the almost sure limiting behavior of rooted, binary tree-structured rules for regression. Examples show that questions raised by Gordon and Olshen in 1984 have negative answers. For these examples of regression functions and sequences of their associated binary tree-structured approximations, for all regression functions except those in a set of the first category, almost sure consistency fails dramatically on events of full probability. One consequence is that almost sure consistency of binary tree-structured rules such as CART requires conditions beyond requiring that (1) the regression function be in L-1, (2) partitions of a Euclidean feature space be into polytopes with sides parallel to coordinate axes, (3) the mesh of the partitions becomes arbitrarily fine almost surely and (4) the empirical learning sample content of each polytope be \"large enough.\" The material in this paper includes the solution to a problem raised by Dudley in discussions. The main results have a corollary regarding the lack of almost sure consistency of certain Bayes-risk consistent rules for classification."

Generally speaking it seems that these factors are not very interpretable, and should perhaps be filtered out. That is what motivated me to implement the ‘docfilter’ variable in the ’get_keywords” function.

print(get_keywords(fit.nn.s.1,docfilter = 1))
[[1]]
[1] "model"  "estim"  "method" "data"  

[[2]]
 [1] "fals"      "procedur"  "control"   "test"      "discoveri" "rate"     
 [7] "reject"    "hypothes"  "fdr"       "multipl"   "pvalu"     "null"     
[13] "number"    "kfwer"    

[[3]]
[1] "test"      "null"      "hypothesi" "distribut"

[[4]]
 [1] "treatment" "trial"     "random"    "assign"    "patient"   "effect"   
 [7] "outcom"    "clinic"    "causal"    "placebo"   "assumpt"  

[[5]]
[1] "surviv" "time"   "hazard" "censor" "failur" "studi" 

[[6]]
[1] "simex"              "measur"             "simulationextrapol"
[4] "error"             

[[7]]
[1] "wilk"

[[8]]
[1] "lasso"    "select"   "variabl"  "regress"  "coeffici"

[[9]]
[1] "rankbas"  "effici"   "asymptot" "rank"    

[[10]]
[1] "nconsist"

[[11]]
[1] "assoc"   "amer"    "statist" "ann"    

[[12]]
[1] "mle"        "likelihood" "maximum"   

[[13]]
[1] "varyingcoeffici"

[[14]]
[1] "semiparametr" "estim"        "model"        "parametr"    

[[15]]
 [1] "adapt"      "wavelet"    "besov"      "minimax"    "ball"      
 [6] "rang"       "threshold"  "risk"       "deconvolut" "nois"      

[[16]]
[1] "memori"

[[17]]
[1] "bandwidth" "kernel"    "local"     "select"   

[[18]]
[1] "forecast"    "predict"     "wind"        "weather"     "spatial"    
[6] "calibr"      "speed"       "meteorolog"  "probabilist"

[[19]]
[1] "choleski"   "matrix"     "covari"     "decomposit" "factor"    
[6] "interpret" 

[[20]]
[1] "mse"       "predictor" "linear"    "error"     "squar"     "empir"    

[[21]]
[1] "depth"   "project"

[[22]]
[1] "singleindex" "function"    "link"        "compon"      "unknown"    

[[23]]
[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[[24]]
[1] "penal"      "nonconcav"  "likelihood" "select"     "variabl"   
[6] "oracl"      "penalti"    "regular"   

[[25]]
[1] "jackknif" "mix"      "squar"    "area"     "varianc" 

[[26]]
[1] "homoscedast"   "heteroscedast"

[[27]]
[1] "spline" "smooth"

[[28]]
[1] "survey" "popul"  "sampl" 

[[29]]
[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[[30]]
[1] "onestep"

[[31]]
[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[[32]]
[1] "nonnorm"

[[33]]
[1] "polynomi" "local"    "regress" 

[[34]]
[1] "gee"     "equat"   "correl"  "general" "binari"  "work"   

[[35]]
[1] "theta"   "paramet"

[[36]]
[1] "robin"     "miss"      "zhao"      "rotnitzki" "effici"   

[[37]]
[1] "mestim" "robust"

[[38]]
[1] "finitesampl"

[[39]]
[1] "sobolev" "densiti" "minimax" "rate"   

[[40]]
[1] "elect" "vote"  "poll" 

[[41]]
[1] "errorpron" "error"    

[[42]]
[1] "panel" "count"

[[43]]
[1] "stock"

[[44]]
[1] "garch"   "process" "volatil"

[[45]]
[1] "secondord"

[[46]]
[1] "equat" "estim"

[[47]]
[1] "slice"   "invers"  "regress" "dimens"  "method" 

[[48]]
[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[[49]]
[1] "survivor"

[[50]]
[1] "slope"

[[51]]
[1] "chi"  "test"

[[52]]
[1] "varianc"

[[53]]
[1] "function"   "eigenfunct" "analysi"    "random"     "princip"   
[6] "compon"     "data"      

[[54]]
[1] "tabl"    "conting"

[[55]]
[1] "criterion" "akaik"     "select"    "model"    

[[56]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

[[57]]
[1] "neighborhood"

[[58]]
[1] "maximum"    "welldefin"  "posteriori"
print(get_keywords(fit.nn.s.01,docfilter = 1))
[[1]]
 [1] "model"       "estim"       "data"        "method"      "propos"     
 [6] "studi"       "simul"       "distribut"   "function"    "sampl"      
[11] "paramet"     "approach"    "statist"     "base"        "asymptot"   
[16] "problem"     "general"     "regress"     "analysi"     "test"       
[21] "develop"     "procedur"    "perform"     "illustr"     "condit"     
[26] "set"         "applic"      "observ"      "variabl"     "likelihood" 
[31] "consist"     "time"        "appli"       "covari"      "properti"   
[36] "random"      "comput"      "articl"      "linear"      "case"       
[41] "process"     "infer"       "error"       "select"      "number"     
[46] "effici"      "rate"        "nonparametr" "deriv"       "measur"     
[51] "effect"      "algorithm"   "class"       "paper"       "compar"     
[56] "provid"      "includ"      "depend"     

[[2]]
 [1] "fals"       "control"    "procedur"   "test"       "rate"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "fdr"        "hochberg"   "number"     "stepdown"  
[16] "kfwer"      "familywis"  "error"      "depend"     "proport"   
[21] "benjamini"  "fwer"       "statist"    "fdp"        "soc"       
[26] "divid"      "power"      "roy"        "stepup"     "alpha"     
[31] "deriv"      "abil"       "ser"        "individu"   "detect"    
[36] "gamma"      "total"      "hypothesi"  "conserv"    "toler"     
[41] "attent"     "defin"      "singlestep" "construct"  "fix"       
[46] "simultan"   "probabl"    "independ"   "ann"        "usual"     
[51] "sime"       "improv"     "increas"   

[[3]]
 [1] "treatment"   "random"      "trial"       "patient"     "effect"     
 [6] "assign"      "noncompli"   "assumpt"     "outcom"      "complianc"  
[11] "causal"      "adher"       "depress"     "placebo"     "receiv"     
[16] "care"        "subject"     "clinic"      "intervent"   "drug"       
[21] "arm"         "dose"        "improv"      "primari"     "treat"      
[26] "princip"     "analys"      "latent"      "elder"       "control"    
[31] "sever"       "contrast"    "instrument"  "stratif"     "activ"      
[36] "particip"    "framework"   "prevent"     "potenti"     "physician"  
[41] "benefit"     "infer"       "imperfect"   "children"    "encourag"   
[46] "estimand"    "doserespons"

[[4]]
 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "event"        "semiparametr" "proport"      "data"        
[11] "cancer"       "covari"       "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "rightcensor" 
[21] "consist"      "nonparametr"  "trial"       

[[5]]
[1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
[7] "power"     "asymptot"  "hypothes" 

[[6]]
 [1] "simex"              "simulationextrapol" "measur"            
 [4] "error"              "undersmooth"        "asymptot"          
 [7] "longer"             "accuraci"           "finitesampl"       
[10] "principl"           "bias"               "presenc"           
[13] "selector"           "wang"               "rootn"             

[[7]]
 [1] "wilk"       "ratio"      "phenomenon" "correct"    "relax"     
 [6] "conduct"    "newli"      "unspecifi"  "freedom"    "follow"    
[11] "backfit"    "nuisanc"    "theorem"    "degre"      "chisquar"  
[16] "likelihood" "empir"      "ask"        "hold"      

[[8]]
 [1] "mle"         "maximum"     "likelihood"  "main"        "prove"      
 [6] "asymptot"    "converg"     "limit"       "mles"        "status"     
[11] "rate"        "current"     "brownian"    "behavior"    "motion"     
[16] "estim"       "proof"       "uniqu"       "nonparametr"

[[9]]
 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "algorithm"
 [7] "posterior" "infer"     "prior"     "model"     "mcmc"     

[[10]]
 [1] "lasso"     "select"    "variabl"   "regress"   "coeffici"  "spars"    
 [7] "penalti"   "adapt"     "linear"    "oracl"     "penal"     "problem"  
[13] "sparsiti"  "algorithm" "regular"  

[[11]]
[1] "varyingcoeffici" "nonparametr"     "coeffici"        "linear"         
[5] "longitudin"      "conduct"         "propos"          "vari"           
[9] "regress"        

[[12]]
 [1] "rankbas"      "effici"       "asymptot"     "rank"         "ellipt"      
 [6] "cam"          "class"        "densiti"      "uniform"      "normal"      
[11] "version"      "sign"         "multivari"    "matric"       "symmetri"    
[16] "valid"        "finit"        "scatter"      "ann"          "contour"     
[21] "tradit"       "assumpt"      "sens"         "irrespect"    "rootn"       
[26] "semiparametr" "center"      

[[13]]
 [1] "nconsist" "root"     "reduct"   "dimens"   "exist"    "direct"  
 [7] "central"  "slice"    "exhaust"  "contour"  "ellipt"   "advantag"
[13] "mild"     "strong"   "regress"  "varianc"  "suffici"  "invers"  
[19] "averag"  

[[14]]
 [1] "semiparametr" "estim"        "nonparametr"  "parametr"     "paramet"     
 [6] "model"        "effici"       "asymptot"     "likelihood"   "regress"     
[11] "function"    

[[15]]
 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  

[[16]]
 [1] "nonconcav"     "penal"         "select"        "oracl"        
 [5] "penalti"       "variabl"       "likelihood"    "regular"      
 [9] "fan"           "challeng"      "nondifferenti" "maxim"        
[13] "sandwich"      "onestep"       "establish"     "concav"       
[17] "broad"         "enjoy"         "employ"        "selector"     
[21] "encourag"      "cost"         

[[17]]
[1] NA

[[18]]
[1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
[5] "famili"        "error"        

[[19]]
[1] "nonnorm"   "normal"    "mix"       "linear"    "exponenti"

[[20]]
[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

[[21]]
 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

[[22]]
 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "taper"         "frequenc"      "long"          "fraction"     
 [9] "averag"        "depend"        "paramet"       "periodogram"  
[13] "stationari"    "move"          "slowli"        "whittl"       
[17] "eigenvector"   "local"         "nonstationari" "distinct"     
[21] "angl"         

[[23]]
 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

[[24]]
[1] "polynomi"    "local"       "regress"     "smooth"      "nonparametr"
[6] "asymptot"   

[[25]]
 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

[[26]]
 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

[[27]]
 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

[[28]]
 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

[[29]]
 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[[30]]
[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

[[31]]
 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "condit"       
 [9] "moment"        "autoregress"   "financi"       "local"        
[13] "standard"      "innov"         "sequenc"       "satisfi"      
[17] "move"          "iid"           "time"          "averag"       
[21] "root"          "mont"          "carlo"        

[[32]]
[1] "quantil" "regress"

[[33]]
 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

[[34]]
 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

[[35]]
 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[[36]]
[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

[[37]]
 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[[38]]
[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

[[39]]
 [1] "singleindex" "unknown"     "link"        "compon"      "equat"      
 [6] "function"    "varianc"     "nonparametr" "beta"        "femal"      
[11] "structur"    "smaller"     "compos"      "vectorvalu"  "eigenfunct" 
[16] "composit"    "econometr"  

[[40]]
[1] "finitesampl" "propos"     

[[41]]
 [1] "wavelet"    "adapt"      "besov"      "minimax"    "ball"      
 [6] "threshold"  "rang"       "nois"       "wide"       "unknown"   
[11] "rate"       "risk"       "bound"      "deconvolut" "smooth"    
[16] "problem"    "function"   "signal"     "white"      "converg"   
[21] "gaussian"   "transform"  "recov"      "densiti"    "shape"     
[26] "view"       "noisi"      "discret"    "nearoptim"  "spars"     
[31] "blur"       "fourier"    "decay"      "upper"      "convolut"  
[36] "invers"    

[[42]]
 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "weight"     "casecohort" "design"     "invers"     "twophas"   
[11] "cohort"     "random"     "causal"     "outcom"     "biometrika"
[16] "prentic"    "calcul"     "purpos"     "confound"   "lemma"     
[21] "mar"        "exemplifi"  "suit"       "amer"       "assoc"     
[26] "proceed"    "summar"     "cox"        "ser"        "soc"       
[31] "roy"        "iid"        "appear"     "unbias"    

[[43]]
[1] "maximum"    "likelihood" "estim"     

[[44]]
[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[[45]]
[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

[[46]]
 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[[47]]
[1] "chi"       "test"      "distribut" "space"     "ratio"     "restrict" 
[7] "statist"  

[[48]]
[1] "coeffici" "regress" 

[[49]]
 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "low"           "optim"         "nonasymptot"   "highdimension"
[13] "convex"        "spars"         "minimax"       "noisi"        
[17] "element"       "minim"         "error"         "singular"     
[21] "setup"         "vector"        "theori"        "precis"       
[25] "autoregress"   "predict"      

[[50]]
 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[[51]]
[1] "unequ"     "designbas" "survey"    "weight"   

[[52]]
[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[[53]]
[1] "variancecovari" "matrix"         "analyz"        

[[54]]
[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[[55]]
[1] "bspline" "kernel"  "penal"  

[[56]]
[1] "varianc"  "asymptot"

[[57]]
 [1] "eigenfunct" "function"   "princip"    "compon"     "random"    
 [6] "analysi"    "data"       "smooth"     "eigenvalu"  "deriv"     
[11] "curv"       "spars"      "trajectori" "space"      "score"     

[[58]]
 [1] "forecast"    "predict"     "weather"     "spatial"     "wind"       
 [6] "probabilist" "northwest"   "calibr"      "pacif"       "meteorolog" 
[11] "temperatur"  "speed"       "hour"        "energi"      "atmospher"  
[16] "averag"      "ensembl"     "geostatist"  "futur"       "center"     
[21] "north"       "precipit"    "accur"       "tempor"      "daili"      
[26] "event"       "resourc"     "site"        "american"    "state"      
[31] "sharp"       "spacetim"    "qualiti"     "climat"      "ozon"       
[36] "concentr"    "generat"     "regim"       "transport"   "season"     
[41] "shortterm"   "determinist" "input"      

[[59]]
 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

[[60]]
 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

[[61]]
 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

[[62]]
 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

[[63]]
 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

[[64]]
 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

[[65]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

[[66]]
 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[[67]]
[1] "bar"    "vertic" "cap"    "lambda"

[[68]]
[1] NA

[[69]]
[1] NA

[[70]]
 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

[[71]]
 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

[[72]]
 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

[[73]]
 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

[[74]]
[1] NA

[[75]]
 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[[76]]
[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

[[77]]
 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[[78]]
[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[[79]]
[1] "iid"   "prove"

[[80]]
 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

[[81]]
 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

[[82]]
 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[[83]]
[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

[[84]]
 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[[85]]
[1] "goodnessoffit" "test"          "includ"        "residu"       

[[86]]
[1] NA

[[87]]
 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

[[88]]
 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

[[89]]
 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

[[90]]
 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[[91]]
[1] "size"   "sampl"  "number"

[[92]]
[1] NA

[[93]]
[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[[94]]
[1] "tilt"       "exponenti"  "constraint" "employ"    

[[95]]
 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

[[96]]
 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

[[97]]
 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

[[98]]
 [1] "elect"      "vote"       "poll"       "evid"       "candid"    
 [6] "presidenti" "count"      "station"    "forecast"   "proport"   
[11] "polit"      "prefer"     "counti"     "record"     "lower"     

[[99]]
[1] NA

[[100]]
[1] NA

[[101]]
[1] NA

[[102]]
 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[[103]]
[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[[104]]
[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

[[105]]
[1] NA

[[106]]
[1] NA

[[107]]
[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

[[108]]
 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  
print(get_keywords(fit.nn.s.001,docfilter = 1))
[[1]]
  [1] "model"        "estim"        "data"         "method"       "propos"      
  [6] "studi"        "simul"        "distribut"    "function"     "sampl"       
 [11] "base"         "paramet"      "approach"     "statist"      "asymptot"    
 [16] "problem"      "general"      "regress"      "analysi"      "develop"     
 [21] "illustr"      "perform"      "procedur"     "test"         "applic"      
 [26] "condit"       "set"          "observ"       "variabl"      "appli"       
 [31] "consist"      "properti"     "likelihood"   "articl"       "time"        
 [36] "comput"       "covari"       "random"       "case"         "linear"      
 [41] "process"      "infer"        "number"       "error"        "effici"      
 [46] "select"       "rate"         "nonparametr"  "deriv"        "effect"      
 [51] "compar"       "measur"       "includ"       "provid"       "paper"       
 [56] "algorithm"    "class"        "depend"       "normal"       "demonstr"    
 [61] "bayesian"     "larg"         "assumpt"      "probabl"      "approxim"    
 [66] "addit"        "size"         "structur"     "optim"        "varianc"     
 [71] "exist"        "independ"     "construct"    "introduc"     "smooth"      
 [76] "real"         "theoret"      "compon"       "point"        "methodolog"  
 [81] "investig"     "requir"       "predict"      "standard"     "respons"     
 [86] "establish"    "common"       "empir"        "practic"      "converg"     
 [91] "work"         "maximum"      "term"         "discuss"      "combin"      
 [96] "finit"        "framework"    "design"       "parametr"     "multipl"     
[101] "assum"        "form"         "theori"       "simpl"        "carlo"       
[106] "limit"        "mont"         "lead"         "altern"       "numer"       
[111] "improv"       "local"        "involv"       "high"         "identifi"    
[116] "space"        "techniqu"     "prior"        "level"        "multivari"   
[121] "correl"       "fit"          "semiparametr" "increas"      "unknown"     
[126] "bias"         "small"        "exampl"       "order"        "direct"      
[131] "extend"       "defin"        "matrix"       "coeffici"     "dataset"     
[136] "implement"    "weight"       "control"      "densiti"      "markov"      
[141] "extens"       "adapt"        "evalu"        "relat"        "power"       
[146] "consid"       "analyz"       "robust"       "type"         "result"      
[151] "valu"         "assess"       "vector"       "seri"         "factor"      
[156] "popul"       

[[2]]
 [1] "fals"       "control"    "procedur"   "rate"       "test"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "familywis"  "hochberg"   "fdr"        "stepdown"  
[16] "error"      "kfwer"      "number"     "proport"    "benjamini" 
[21] "fwer"       "depend"     "statist"    "soc"        "divid"     
[26] "fdp"        "roy"        "abil"       "ser"        "alpha"     
[31] "deriv"      "individu"   "total"      "stepup"     "detect"    
[36] "toler"      "attent"     "power"      "gamma"      "defin"     
[41] "singlestep" "conserv"    "probabl"    "construct"  "hypothesi" 
[46] "fix"        "ann"        "simultan"   "restrict"   "usual"     
[51] "increas"    "structur"   "contrast"   "prove"      "goal"      
[56] "implicit"   "replac"     "resampl"    "independ"   "sime"      
[61] "holm"       "improv"     "sens"       "configur"   "stat"      
[66] "stringent"  "intersect"  "bonferroni" "der"        "appl"      
[71] "van"        "deal"       "order"     

[[3]]
 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "semiparametr" "proport"      "event"        "cancer"      
[11] "covari"       "data"         "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "consist"     
[21] "rightcensor"  "trial"        "subject"      "analysi"      "nonparametr" 
[26] "simul"        "equat"        "cohort"       "diseas"       "incid"       
[31] "patient"      "clinic"       "cure"         "recurr"       "compet"      
[36] "associ"       "joint"        "followup"     "frailti"      "timevari"    
[41] "bivari"       "margin"       "lengthbias"   "prostat"      "assumpt"     
[46] "coeffici"     "medic"        "breast"       "extens"       "propos"      

[[4]]
 [1] "simex"              "simulationextrapol" "undersmooth"       
 [4] "error"              "measur"             "asymptot"          
 [7] "accuraci"           "longer"             "bias"              
[10] "principl"           "finitesampl"        "selector"          
[13] "bandwidth"          "wang"               "epidemiolog"       
[16] "cook"               "rootn"              "difficulti"        
[19] "presenc"            "nutrit"             "decreas"           
[22] "compar"             "coverag"            "appropri"          
[25] "simul"              "tractabl"           "need"              
[28] "recommend"          "polynomi"           "engin"             
[31] "chisquar"           "scientist"          "errorpron"         

[[5]]
 [1] "wilk"           "ratio"          "phenomenon"     "correct"       
 [5] "relax"          "power"          "conduct"        "null"          
 [9] "newli"          "freedom"        "unspecifi"      "follow"        
[13] "hypothesi"      "degre"          "ask"            "nuisanc"       
[17] "chisquar"       "test"           "theorem"        "hold"          
[21] "backfit"        "attempt"        "admit"          "constant"      
[25] "demonstr"       "rescal"         "biascorrect"    "answer"        
[29] "zhang"          "scientif"       "fan"            "likelihood"    
[33] "withinsubject"  "pitman"         "asymptot"       "side"          
[37] "share"          "contemporari"   "popular"        "variancecovari"
[41] "singleindex"    "save"           "tau"            "kendal"        
[45] "coverag"       

[[6]]
 [1] "mle"         "maximum"     "likelihood"  "main"        "asymptot"   
 [6] "mles"        "prove"       "converg"     "limit"       "status"     
[11] "estim"       "brownian"    "current"     "motion"      "behavior"   
[16] "rate"        "proof"       "uniqu"       "siev"        "nonparametr"
[21] "ann"         "gap"         "drift"       "naiv"        "global"     
[26] "monoton"     "simpler"     "parametr"    "result"      "discuss"    
[31] "ergod"      

[[7]]
 [1] "varyingcoeffici" "nonparametr"     "linear"          "coeffici"       
 [5] "longitudin"      "conduct"         "vari"            "regress"        
 [9] "partial"         "propos"          "simul"           "backfit"        
[13] "thought"         "illustr"         "enjoy"           "fashion"        
[17] "twostep"         "contamin"        "pose"           

[[8]]
 [1] "rankbas"         "asymptot"        "effici"          "rank"           
 [5] "ellipt"          "cam"             "class"           "uniform"        
 [9] "test"            "densiti"         "version"         "multivari"      
[13] "normal"          "sign"            "valid"           "scatter"        
[17] "symmetri"        "matrix"          "matric"          "assumpt"        
[21] "finit"           "sens"            "ann"             "contour"        
[25] "irrespect"       "tradit"          "rootn"           "moment"         
[29] "actual"          "center"          "strict"          "equivari"       
[33] "gaussian"        "onestep"         "invari"          "finitesampl"    
[37] "concept"         "local"           "serial"          "bernoulli"      
[41] "shape"           "unspecifi"       "classic"         "acceler"        
[45] "respect"         "semiparametr"    "depth"           "null"           
[49] "univari"         "median"          "prespecifi"      "spheric"        
[53] "biometrika"      "distributionfre" "excel"          

[[9]]
 [1] "nconsist"   "root"       "reduct"     "exist"      "central"   
 [6] "direct"     "dimens"     "varianc"    "slice"      "exhaust"   
[11] "contour"    "mild"       "ellipt"     "strong"     "advantag"  
[16] "invers"     "averag"     "asymptot"   "suffici"    "predictor" 
[21] "regress"    "identif"    "subspac"    "guarante"   "space"     
[26] "attack"     "accuraci"   "span"       "plugin"     "synthes"   
[31] "digit"      "squar"      "complement" "normal"     "eas"       
[36] "variat"     "landmark"   "realdata"  

[[10]]
 [1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
 [7] "hypothes"  "power"     "asymptot"  "procedur"  "ratio"     "reject"   
[13] "control"  

[[11]]
 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "posterior"
 [7] "algorithm" "infer"     "prior"     "mcmc"      "model"     "hierarch" 
[13] "sampler"   "mixtur"    "space"    

[[12]]
 [1] "lasso"         "select"        "variabl"       "regress"      
 [5] "coeffici"      "spars"         "penalti"       "adapt"        
 [9] "linear"        "oracl"         "penal"         "sparsiti"     
[13] "problem"       "algorithm"     "regular"       "matrix"       
[17] "nonzero"       "path"          "shrinkag"      "vector"       
[21] "larger"        "absolut"       "high"          "highdimension"
[25] "true"          "method"        "group"         "dimension"    
[29] "nois"          "connect"      

[[13]]
[1] "bar"     "vertic"  "cap"     "lambda"  "beta"    "theta"   "alpha"  
[8] "element"

[[14]]
 [1] "singleindex"  "unknown"      "nonparametr"  "link"         "compon"      
 [6] "equat"        "structur"     "varianc"      "beta"         "smaller"     
[11] "function"     "semiparametr" "econometr"    "achiev"       "femal"       
[16] "compos"       "vectorvalu"   "linear"       "eigenfunct"   "rateoptim"   
[21] "composit"     "isol"         "ball"         "singl"       

[[15]]
 [1] "genet"       "trait"       "loci"        "quantit"     "diseas"     
 [6] "linkag"      "map"         "gene"        "phenotyp"    "pedigre"    
[11] "allel"       "marker"      "popul"       "associ"      "genotyp"    
[16] "locus"       "chromosom"   "frequenc"    "polymorph"   "genom"      
[21] "multipl"     "complex"     "involv"      "domin"       "interact"   
[26] "casecontrol" "haplotyp"    "treat"       "individu"    "nucleotid"  
[31] "unifi"       "singl"       "simultan"    "snp"         "inherit"    
[36] "geneenviron" "distinguish" "suscept"     "dichotom"    "score"      
[41] "mutat"       "aim"         "genomewid"   "member"      "dna"        
[46] "ascertain"   "parent"      "descent"     "crucial"     "arbitrari"  
[51] "retrospect"  "tau"         "softwar"    

[[16]]
 [1] "dichotom"        "outcom"          "exposur"         "genet"          
 [5] "inherit"         "confound"        "interact"        "causal"         
 [9] "trial"           "factor"          "binari"          "presenc"        
[13] "categor"         "assess"          "alcohol"         "continu"        
[17] "disord"          "misspecif"       "ordin"           "clinic"         
[21] "postul"          "trait"           "topic"           "environment"    
[25] "subgroup"        "potenti"         "geneenviron"     "alter"          
[29] "adequ"           "examin"          "adjust"          "intermedi"      
[33] "cancer"          "robin"           "stage"           "logist"         
[37] "arm"             "firststag"       "generic"         "latent"         
[41] "build"           "variabl"         "conduct"         "affect"         
[45] "accommod"        "prone"           "submodel"        "transmiss"      
[49] "mental"          "mediat"          "unspecifi"       "quantit"        
[53] "expos"           "major"           "multipli"        "sever"          
[57] "believ"          "gene"            "zhang"           "distributionfre"
[61] "routin"          "today"          

[[17]]
 [1] "treatment"     "random"        "trial"         "noncompli"    
 [5] "patient"       "assumpt"       "effect"        "adher"        
 [9] "complianc"     "assign"        "depress"       "outcom"       
[13] "causal"        "receiv"        "care"          "placebo"      
[17] "subject"       "intervent"     "clinic"        "improv"       
[21] "primari"       "drug"          "arm"           "treat"        
[25] "dose"          "elder"         "latent"        "princip"      
[29] "analys"        "contrast"      "sever"         "instrument"   
[33] "control"       "particip"      "stratif"       "benefit"      
[37] "physician"     "imperfect"     "encourag"      "prevent"      
[41] "fisher"        "strata"        "prescrib"      "children"     
[45] "activ"         "reason"        "strict"        "rubin"        
[49] "efron"         "behavior"      "educ"          "estimand"     
[53] "plausibl"      "doserespons"   "meet"          "suffer"       
[57] "protocol"      "framework"     "collabor"      "debat"        
[61] "doubleblind"   "potenti"       "blind"         "status"       
[65] "opposit"       "guidelin"      "logic"         "acknowledg"   
[69] "nonrandom"     "import"        "substanti"     "infer"        
[73] "prospect"      "summar"        "heart"         "childhood"    
[77] "subjectspecif" "access"       

[[18]]
 [1] "nonconcav"     "penal"         "select"        "penalti"      
 [5] "oracl"         "variabl"       "regular"       "nondifferenti"
 [9] "fan"           "likelihood"    "challeng"      "sandwich"     
[13] "establish"     "maxim"         "broad"         "find"         
[17] "concav"        "onestep"       "employ"        "encourag"     
[21] "enjoy"         "finit"         "cost"          "distinguish"  
[25] "dramat"        "selector"      "appropri"      "render"       
[29] "conduct"       "heavili"       "possess"       "newli"        
[33] "converg"       "paramet"       "function"      "discontinu"   
[37] "aic"           "algorithm"     "bic"           "encompass"    
[41] "guarante"      "object"        "metropoli"    

[[19]]
 [1] "semiparametr" "estim"        "parametr"     "nonparametr"  "paramet"     
 [6] "asymptot"     "model"        "effici"       "likelihood"   "regress"     
[11] "function"     "normal"       "simul"        "compon"       "achiev"      

[[20]]
 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  
[11] "choic"      "choos"      "squar"      "bootstrap"  "datadriven"
[16] "version"    "asymptot"   "global"     "chosen"    

[[21]]
 [1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
 [6] "viral"        "transmiss"    "vaccin"       "subject"      "genet"       
[11] "drug"         "develop"      "efficaci"     "mutat"        "outcom"      
[16] "causal"       "cell"         "syndrom"      "medic"        "pathway"     
[21] "resist"       "evolutionari" "therapi"      "pressur"     

[[22]]
 [1] "dropout"       "stratum"       "prevent"       "reduc"        
 [5] "oil"           "trial"         "adjust"        "longitudin"   
 [9] "cancer"        "prostat"       "mechan"        "men"          
[13] "find"          "stratifi"      "arm"           "nuisanc"      
[17] "treatment"     "assign"        "grade"         "doubleblind"  
[21] "avoid"         "colleagu"      "randomeffect"  "sever"        
[25] "verif"         "agent"         "conjectur"     "annual"       
[29] "nonignor"      "placebo"       "volum"         "elect"        
[33] "caus"          "daili"         "visit"         "preval"       
[37] "absolut"       "lie"           "indic"         "sensit"       
[41] "frequent"      "particip"      "year"          "reduct"       
[45] "causal"        "report"        "newtonraphson" "adopt"        
[49] "question"      "women"         "elder"         "surrog"       
[53] "inform"        "elicit"        "prospect"      "collabor"     
[57] "drawn"         "ignor"         "differ"        "link"         
[61] "retain"        "tilt"          "random"        "constraint"   
[65] "status"        "impli"         "doubli"        "expert"       
[69] "nonidentifi"   "intermitt"     "satur"         "sex"          
[73] "characterist"  "invers"       

[[23]]
  [1] "polici"       "statistician" "maker"        "decis"        "scienc"      
  [6] "role"         "technolog"    "today"        "chang"        "live"        
 [11] "bring"        "social"       "communic"     "integr"       "individu"    
 [16] "futur"        "knowledg"     "disciplin"    "nation"       "public"      
 [21] "scientif"     "health"       "activ"        "human"        "impact"      
 [26] "organ"        "inform"       "protect"      "promot"       "qualiti"     
 [31] "understand"   "program"      "way"          "student"      "mathemat"    
 [36] "increas"      "face"         "foundat"      "play"         "essenti"     
 [41] "uncertainti"  "effort"       "engin"        "expect"       "advanc"      
 [46] "confidenti"   "children"     "relev"        "make"         "industri"    
 [51] "govern"       "countri"      "encourag"     "polit"        "place"       
 [56] "modern"       "intern"       "scientist"    "closer"       "benefit"     
 [61] "reflect"      "explor"       "stronger"     "purpos"       "univers"     
 [66] "spread"       "environment"  "network"      "grow"         "forc"        
 [71] "access"       "devic"        "ingredi"      "excel"        "comprehens"  
 [76] "pollut"       "attract"      "broader"      "elementari"   "evolv"       
 [81] "train"        "pressur"      "air"          "option"       "imposs"      
 [86] "secondari"    "map"          "edg"          "success"      "progress"    
 [91] "critic"       "global"       "action"       "year"         "agenc"       
 [96] "communiti"    "american"     "quantit"      "genom"        "system"      
[101] "fundament"    "discoveri"    "evid"         "guarante"     "mortal"      
[106] "address"      "citi"         "requir"       "technic"      "serv"        
[111] "path"         "statist"      "separ"        "climat"       "contribut"   
[116] "opportun"     "adequaci"     "disabl"       "affect"       "driven"      
[121] "grade"        "psycholog"    "diagnost"     "morbid"       "view"        
[126] "delay"        "primari"      "state"       

[[24]]
[1] NA

[[25]]
 [1] "nonnorm"         "normal"          "mix"             "linear"         
 [5] "exponenti"       "piecewiselinear" "general"         "abund"          
 [9] "famili"          "examin"         

[[26]]
 [1] "seem"           "unrel"          "spline"         "retail"        
 [5] "credit"         "vehicl"         "dataadapt"      "correl"        
 [9] "knot"           "residu"         "conveni"        "nongaussian"   
[13] "univari"        "allevi"         "leav"           "reversiblejump"
[17] "part"           "neglig"         "difficulti"     "smooth"        
[21] "latent"         "sampler"        "compani"        "abil"          
[25] "wang"           "withinclust"    "smallest"       "consum"        

[[27]]
 [1] "slice"     "invers"    "dimens"    "reduct"    "regress"   "averag"   
 [7] "sir"       "direct"    "central"   "goal"      "respons"   "save"     
[13] "subset"    "method"    "predictor" "subspac"   "varianc"   "preserv"  
[19] "replac"    "suffici"   "systemat" 

[[28]]
 [1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
 [5] "famili"        "multiscal"     "quadrat"       "respect"      
 [9] "poisson"       "regress"       "epidemiolog"   "stabil"       
[13] "wavelet"       "explain"       "contribut"    

[[29]]
 [1] "band"       "confid"     "simultan"   "consid"     "trajectori"
 [6] "extend"     "choos"      "regular"    "asymptot"   "ball"      
[11] "uniform"   

[[30]]
 [1] "administr"      "secondari"      "fda"            "food"          
 [5] "endpoint"       "drug"           "efficaci"       "health"        
 [9] "adjust"         "prevent"        "record"         "separ"         
[13] "agent"          "cardiovascular" "primari"        "instrument"    
[17] "simplifi"       "frequenc"       "dose"           "week"          
[21] "maintain"       "databas"        "deliveri"       "clinic"        
[25] "benefit"        "birth"          "path"           "trial"         
[29] "drastic"        "odd"            "guidanc"        "perspect"      
[33] "intersect"      "guid"           "biomark"        "morbid"        
[37] "emerg"          "fwer"           "serniparametr"  "hour"          
[41] "make"           "stepwis"        "safeti"         "led"           
[45] "nutrit"         "decis"          "describ"        "errorpron"     
[49] "infant"         "serum"          "exemplifi"      "insight"       
[53] "feder"          "advers"         "prospect"       "valid"         
[57] "follow"         "likelihoodbas"  "energi"         "combin"        

[[31]]
 [1] "distort"         "respons"         "unobserv"        "confound"       
 [5] "predictor"       "under"           "adjust"          "serum"          
 [9] "factor"          "magnitud"        "generat"         "alter"          
[13] "intens"          "absent"          "explanatori"     "indirect"       
[17] "likelihoodbas"   "straightforward" "multipl"         "datagener"      
[21] "leastsquar"      "identifi"        "decid"           "stepwis"        
[25] "observ"          "intervent"       "sever"           "relationship"   
[29] "recov"           "system"          "car"             "coeffici"       
[33] "census"          "releas"          "agenc"           "closest"        
[37] "electr"          "shortcom"        "analyst"        

[[32]]
 [1] "motif"       "regul"       "gene"        "dna"         "transcript" 
 [6] "bind"        "sequenc"     "protein"     "factor"      "short"      
[11] "conserv"     "discoveri"   "nucleotid"   "cluster"     "biolog"     
[16] "high"        "site"        "mixtur"      "process"     "call"       
[21] "width"       "genom"       "vari"        "hierarch"    "dirichlet"  
[26] "pattern"     "priori"      "cell"        "strategi"    "organ"      
[31] "databas"     "matric"      "group"       "technolog"   "repres"     
[36] "stochast"    "refin"       "switch"      "substant"    "segment"    
[41] "aid"         "delet"       "similar"     "gibb"        "reduct"     
[46] "regulatori"  "express"     "core"        "find"        "live"       
[51] "yeast"       "composit"    "dictionari"  "accompani"   "appear"     
[56] "missingdata" "genomewid"   "generat"     "principl"    "facilit"    
[61] "recurs"      "background"  "specif"      "chromosom"   "address"    
[66] "wish"        "cycl"        "name"        "understand"  "adjac"      
[71] "variabl"    

[[33]]
[1] "absolut"  "deviat"   "clip"     "oracl"    "progress"

[[34]]
[1] "quantil" "regress"

[[35]]
 [1] "breakdown"  "point"      "robust"     "depth"      "locat"     
 [6] "project"    "equivari"   "finit"      "function"   "possess"   
[11] "contamin"   "competitor" "affin"      "definit"    "introduc"  
[16] "lead"       "induc"      "influenc"   "high"       "outlier"   
[21] "strong"     "trim"       "median"     "region"     "york"      
[26] "scale"      "desir"      "favor"      "turn"       "pursu"     
[31] "enjoy"      "scatter"    "suffic"     "behav"      "uniform"   
[36] "relat"      "comparison" "suggest"    "fact"       "univari"   
[41] "ann"        "radius"    

[[36]]
 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "frequenc"      "long"          "taper"         "fraction"     
 [9] "averag"        "stationari"    "depend"        "periodogram"  
[13] "move"          "whittl"        "slowli"        "nonstationari"
[17] "local"         "process"       "eigenvector"   "angl"         
[21] "paramet"       "period"        "short"         "univari"      
[25] "distinct"      "autoregress"   "volatil"       "fourier"      
[29] "infin"         "longrang"      "delta"         "residu"       
[33] "trim"          "raw"           "log"           "question"     
[37] "break"         "stress"        "know"          "gamma"        
[41] "serniparametr" "subspac"      

[[37]]
 [1] "auxiliari" "survey"    "varianc"   "design"    "popul"     "sampl"    
 [7] "variabl"   "weight"    "calibr"    "designbas" "probabl"   "servic"   
[13] "total"     "finit"     "work"      "feasibl"   "explain"   "miss"     

[[38]]
 [1] "lin"           "addit"         "transplant"    "bone"         
 [5] "work"          "carrol"        "registri"      "intern"       
 [9] "termin"        "multist"       "complic"       "serv"         
[13] "progress"      "transit"       "death"         "domin"        
[17] "backfit"       "implicit"      "largesampl"    "longer"       
[21] "inconsist"     "withinsubject" "withinclust"   "margin"       

[[39]]
 [1] "taper"       "approxim"    "matrix"      "gaussian"    "consist"    
 [6] "spars"       "oper"        "spatial"     "covari"      "block"      
[11] "requir"      "balanc"      "norm"        "precipit"    "station"    
[16] "weather"     "technic"     "manipul"     "matern"      "infeas"     
[21] "multipli"    "wild"        "simpli"      "eigenvector" "sever"      
[26] "onestep"     "resampl"     "oil"         "lose"        "expans"     
[31] "finitesampl" "emphasi"    

[[40]]
[1] "finitesampl" "propos"      "properti"    "simul"      

[[41]]
 [1] "wavelet"     "adapt"       "besov"       "minimax"     "threshold"  
 [6] "rang"        "ball"        "nois"        "wide"        "rate"       
[11] "unknown"     "smooth"      "risk"        "bound"       "function"   
[16] "deconvolut"  "problem"     "white"       "converg"     "signal"     
[21] "recov"       "gaussian"    "transform"   "noisi"       "view"       
[26] "blur"        "discret"     "shape"       "invers"      "spars"      
[31] "densiti"     "nearoptim"   "convolut"    "fourier"     "upper"      
[36] "decay"       "chosen"      "block"       "basi"        "dens"       
[41] "attain"      "waveletbas"  "continu"     "mathemat"    "counterpart"
[46] "physic"      "possess"     "lower"       "global"      "achiev"     
[51] "boundari"    "distinct"    "belong"      "domin"       "estim"      
[56] "place"      

[[42]]
 [1] "forecast"      "predict"       "weather"       "northwest"    
 [5] "spatial"       "probabilist"   "pacif"         "calibr"       
 [9] "wind"          "meteorolog"    "hour"          "temperatur"   
[13] "speed"         "atmospher"     "energi"        "north"        
[17] "center"        "geostatist"    "event"         "futur"        
[21] "averag"        "ensembl"       "american"      "tempor"       
[25] "accur"         "resourc"       "precipit"      "daili"        
[29] "state"         "sharp"         "qualiti"       "site"         
[33] "spacetim"      "generat"       "transport"     "concentr"     
[37] "season"        "climat"        "regim"         "shortterm"    
[41] "numer"         "determinist"   "ozon"          "input"        
[45] "climatolog"    "previous"      "output"        "parsimoni"    
[49] "perturb"       "geograph"      "period"        "trend"        
[53] "correl"        "vari"          "break"         "favor"        
[57] "quantit"       "laplac"        "caus"          "merg"         
[61] "safeti"        "station"       "agricultur"    "accumul"      
[65] "oppos"         "benefit"       "vast"          "global"       
[69] "stateoftheart" "featur"        "system"        "activ"        
[73] "dispers"       "simpler"       "decad"         "organ"        
[77] "crossvalid"    "member"       

[[43]]
 [1] "spacetim"       "spatial"        "fit"            "year"          
 [5] "site"           "separ"          "intens"         "california"    
 [9] "thin"           "process"        "monitor"        "residu"        
[13] "tempor"         "activ"          "multidimension" "occurr"        
[17] "space"          "background"     "appear"         "origin"        
[21] "smoother"       "irregular"      "earthquak"      "indic"         
[25] "asymmetr"       "trend"          "hazard"         "spectral"      
[29] "symmetr"        "environment"    "ozon"           "wind"          
[33] "meteorolog"     "daili"          "allow"          "rescal"        
[37] "season"         "time"           "anisotrop"      "cross"         
[41] "insid"          "bear"           "arbitrari"      "autoregress"   
[45] "interact"       "magnitud"       "sequenc"        "homogen"       
[49] "widespread"     "sphere"         "coordin"        "highlight"     
[53] "elabor"         "extrem"         "ascertain"      "forest"        
[57] "counti"         "rotat"          "month"          "threat"        
[61] "govern"         "secondari"      "aic"            "account"       
[65] "aid"            "emphas"         "routin"         "assess"        
[69] "departur"       "rare"          

[[44]]
 [1] "inhomogen"   "intens"      "spatial"     "process"     "poisson"    
 [6] "point"       "thin"        "stationari"  "function"    "firstord"   
[11] "efficaci"    "secondord"   "caus"        "infecti"     "network"    
[16] "infect"      "transmiss"   "respiratori" "environ"     "epidem"     
[21] "unrealist"   "lend"        "syndrom"     "hospit"      "emphasi"    
[26] "unusu"       "paid"        "peak"       

[[45]]
 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "moment"       
 [9] "autoregress"   "local"         "financi"       "condit"       
[13] "standard"      "move"          "averag"        "sequenc"      
[17] "mont"          "carlo"         "innov"         "satisfi"      
[21] "iid"           "root"          "time"          "forecast"     
[25] "nonstationari" "fourth"        "capabl"        "residu"       
[29] "return"        "rescal"        "exponenti"     "exchang"      
[33] "reparameter"   "arma"          "ergod"         "homogen"      
[37] "simpli"        "normal"        "explain"       "uniqu"        
[41] "exist"        

[[46]]
 [1] "withinclust"   "cluster"       "correl"        "account"      
 [5] "frequent"      "frailti"       "varianc"       "carri"        
 [9] "arbitrari"     "abil"          "achiev"        "hormon"       
[13] "generalis"     "tackl"         "characteris"   "evalu"        
[17] "simplic"       "fashion"       "closedform"    "noninform"    
[21] "hamper"        "intuit"        "dementia"      "birth"        
[25] "errorpron"     "ill"           "copula"        "withinsubject"

[[47]]
[1] "polynomi"    "local"       "smooth"      "regress"     "nonparametr"
[6] "asymptot"    "spline"     

[[48]]
 [1] "elect"        "vote"         "poll"         "presidenti"   "evid"        
 [6] "candid"       "polit"        "count"        "station"      "proport"     
[11] "forecast"     "nonrespons"   "elimin"       "prefer"       "counti"      
[16] "scientist"    "permit"       "lower"        "incom"        "fisher"      
[21] "york"         "record"       "heterogen"    "purpos"       "respond"     
[26] "percentag"    "particip"     "quick"        "transfer"     "week"        
[31] "spatiotempor" "evolut"       "california"   "histor"       "krige"       
[36] "list"         "appar"        "outcom"       "invalid"      "nonignor"    
[41] "publish"      "nonrespond"  

[[49]]
 [1] "survey"      "nonrespons"  "census"      "nation"      "respond"    
 [6] "imput"       "popul"       "health"      "race"        "bureau"     
[11] "nonignor"    "unit"        "respons"     "item"        "incom"      
[16] "miss"        "person"      "year"        "state"       "bias"       
[21] "employ"      "higher"      "valu"        "sensit"      "interview"  
[26] "labor"       "nonrespond"  "age"         "feder"       "collect"    
[31] "measur"      "handl"       "assess"      "report"      "level"      
[36] "counti"      "domain"      "preval"      "agenc"       "confidenti" 
[41] "benchmark"   "incorpor"    "protect"     "status"      "cell"       
[46] "earn"        "produc"      "sourc"       "relat"       "weight"     
[51] "propens"     "public"      "household"   "area"        "geograph"   
[56] "nutrit"      "document"    "lower"       "plan"        "bodi"       
[61] "gender"      "extrapol"    "preliminari" "birth"       "polit"      
[66] "correct"     "american"    "proxi"       "requir"      "previous"   
[71] "children"    "york"        "unemploy"    "death"      

[[50]]
 [1] "jackknif"  "file"      "replic"    "varianc"   "inconsist" "strata"   
 [7] "analyt"    "unbias"    "met"       "domain"    "schedul"   "freedom"  
[13] "survey"    "attain"    "balanc"    "mix"       "ensur"     "public"   
[19] "repeat"    "upper"     "bootstrap" "uncondit"  "plausibl"  "person"   
[25] "pseudo"    "concern"   "linkag"   

[[51]]
[1] "variancecovari"  "matrix"          "analyz"          "respect"        
[5] "quasilikelihood" "criterion"       "coin"            "efron"          

[[52]]
[1] "root"     "squar"    "approxim"

[[53]]
[1] "maximum"    "likelihood" "estim"      "paramet"   

[[54]]
 [1] "pca"           "princip"       "compon"        "matrix"       
 [5] "eigenvector"   "size"          "dimension"     "reduct"       
 [9] "eigenvalu"     "analysi"       "spike"         "perturb"      
[13] "logp"          "succeed"       "transit"       "dimens"       
[17] "maxim"         "highdimension" "set"           "sampl"        
[21] "threshold"     "nonzero"       "oil"           "direct"       
[25] "critic"        "sophist"       "recov"         "hold"         
[29] "sharp"         "larger"        "theorem"       "relax"        
[33] "high"          "diagon"        "overlap"       "domin"        
[37] "success"       "geometr"       "regim"         "tractabl"     
[41] "popul"         "ill"           "behav"         "extract"      
[45] "exhibit"       "support"       "tool"          "crossov"      
[49] "sudden"        "track"         "lose"          "infinit"      
[53] "evolutionari"  "tree"          "complex"       "largest"      
[57] "phenomenon"    "program"       "describ"       "nonasymptot"  
[61] "branch"        "topolog"       "row"           "embed"        
[65] "euclidean"     "geodes"        "anim"          "nois"         
[69] "machin"        "phase"         "speci"         "twoway"       
[73] "rise"         

[[55]]
 [1] "eigenfunct"  "function"    "princip"     "compon"      "random"     
 [6] "analysi"     "smooth"      "eigenvalu"   "data"        "curv"       
[11] "spars"       "space"       "trajectori"  "score"       "noisi"      
[16] "deriv"       "lead"        "sampl"       "longitudin"  "eigenvector"
[21] "expans"      "impact"      "elucid"      "decomposit"  "firstord"   
[26] "repres"      "differenti"  "measur"      "dynam"       "intrins"    
[31] "similar"     "plan"       

[[56]]
 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "gene"          "latent"        "viral"         "initi"        
 [9] "biomark"       "understand"    "protein"       "pronounc"     
[13] "infect"        "therapi"       "supplementari" "quantifi"     
[17] "concentr"      "chemic"        "tackl"         "incorrect"    
[21] "healthi"       "identifi"      "molecular"     "human"        
[25] "serum"         "hormon"        "investig"      "experiment"   
[29] "search"        "status"        "sort"          "drug"         
[33] "inflat"        "pertin"        "mediat"        "mutat"        
[37] "resist"        "absent"        "blood"         "exemplifi"    
[41] "valuabl"       "phenotyp"      "led"           "indic"        
[45] "subsequ"       "format"        "framework"    

[[57]]
[1] "establish" "asymptot"  "consist"   "converg"  

[[58]]
 [1] "classifi"        "classif"         "discrimin"       "distancebas"    
 [5] "vector"          "centroid"        "support"         "machin"         
 [9] "theoret"         "popul"           "featur"          "rule"           
[13] "poor"            "popular"         "produc"          "distanc"        
[17] "method"          "highdimension"   "accumul"         "varieti"        
[21] "heavytail"       "differ"          "diverg"          "nearest"        
[25] "train"           "median"          "difficulti"      "spectra"        
[29] "componentwis"    "replac"          "excess"          "convent"        
[33] "frequent"        "truncat"         "boundari"        "counterpart"    
[37] "insensit"        "encount"         "closest"         "entail"         
[41] "case"            "allevi"          "problemat"       "today"          
[45] "argument"        "euclidean"       "inconsist"       "caus"           
[49] "straightforward" "neighbour"       "suffer"          "anneal"         
[53] "attempt"         "perform"         "misclassif"      "alloc"          
[57] "volatil"         "believ"          "explor"          "help"           
[61] "inher"           "explos"          "earthquak"       "base"           
[65] "consequ"         "achiev"          "jin"             "kullbackleibl"  
[69] "contemporari"    "construct"       "drawback"        "tstatist"       

[[59]]
 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "random"     "casecohort" "weight"     "invers"     "twophas"   
[11] "cohort"     "biometrika" "design"     "prentic"    "causal"    
[16] "purpos"     "lemma"      "exemplifi"  "unbias"     "mar"       
[21] "suit"       "amer"       "assoc"      "proceed"    "summar"    
[26] "ser"        "soc"        "roy"        "calcul"     "iid"       
[31] "appear"     "cox"        "imput"      "visit"      "ann"       
[36] "augment"    "percentag"  "schedul"    "direct"     "unbalanc"  
[41] "mediat"     "day"        "embed"      "mental"     "equat"     
[46] "nice"       "month"     

[[60]]
[1] "bootstrap" "confid"    "distribut" "sampl"     "interv"    "method"   
[7] "correct"   "seri"      "empir"    

[[61]]
 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "optim"         "low"           "highdimension" "nonasymptot"  
[13] "convex"        "minimax"       "noisi"         "spars"        
[17] "vector"        "singular"      "element"       "error"        
[21] "minim"         "setup"         "predict"       "autoregress"  
[25] "recoveri"      "theori"        "trace"         "obtain"       
[29] "decomposit"    "class"         "excel"         "mean"         
[33] "lower"         "instanc"       "yield"         "sharp"        
[37] "agreement"     "precis"        "mestim"        "complementari"
[41] "lowdimension"  "entri"         "analyz"        "oper"         
[45] "meansquar"     "relax"         "hold"          "determinist"  
[49] "observ"        "condit"        "autocovari"    "decompos"     
[53] "notion"        "stay"          "restrict"      "stronger"     
[57] "krige"        

[[62]]
 [1] "minimax"  "rate"     "densiti"  "optim"    "unknown"  "adapt"   
 [7] "loss"     "class"    "prove"    "sens"     "converg"  "problem" 
[13] "bound"    "estim"    "risk"     "vector"   "set"      "gaussian"
[19] "lower"   

[[63]]
 [1] "imag"     "magnet"   "reson"    "field"    "brain"    "fmri"    
 [7] "activ"    "voxel"    "signal"   "detect"   "locat"    "volum"   
[13] "accur"    "follow"   "task"     "motion"   "region"   "visual"  
[19] "identifi" "exploit"  "tissu"    "aim"      "contigu"  "map"     
[25] "rotat"    "neuron"  

[[64]]
[1] NA

[[65]]
[1] NA

[[66]]
 [1] "bspline"   "kernel"    "tackl"     "represent" "spline"    "penal"    
 [7] "tempor"    "proceed"   "splinebas" "truncat"   "solut"     "rigor"    
[13] "account"  

[[67]]
[1] NA

[[68]]
 [1] "electr"        "forecast"      "renew"         "bivari"       
 [5] "load"          "market"        "daili"         "power"        
 [9] "serial"        "shortterm"     "wind"          "autoregress"  
[13] "diagon"        "speed"         "time"          "season"       
[17] "focus"         "difficult"     "peak"          "spectrum"     
[21] "temperatur"    "regressor"     "heteroscedast" "firstord"     
[25] "total"         "highlight"     "energi"        "justifi"      
[29] "simpl"         "week"          "vari"          "hour"         
[33] "trend"         "citi"          "recogn"        "stationari"   
[37] "autocovari"    "detail"        "promis"        "realiti"      
[41] "favor"         "reveal"        "year"          "longmemori"   
[45] "gain"          "accuraci"      "exploit"       "predict"      
[49] "option"        "reliabl"       "price"         "evolut"       
[53] "avail"         "superpopul"   

[[69]]
 [1] "highfrequ"       "financi"         "asset"           "volatil"        
 [5] "price"           "lowfrequ"        "exchang"         "dynam"          
 [9] "stock"           "matrix"          "daili"           "period"         
[13] "nois"            "realiz"          "pool"            "market"         
[17] "matric"          "infin"           "diffus"          "return"         
[21] "day"             "trade"           "captur"          "forecast"       
[25] "vast"            "overcom"         "variat"          "hundr"          
[29] "pertin"          "dimensionreduct" "econom"          "iii"            
[33] "alloc"           "noisi"           "industri"        "zhang"          
[37] "guidanc"         "merit"           "adequ"           "size"           
[41] "highdimension"   "fan"             "eigenvector"     "option"         
[45] "wavelet"         "built"           "avail"          

[[70]]
 [1] "day"        "daili"      "record"     "time"       "financi"   
 [6] "activ"      "short"      "peak"       "consecut"   "help"      
[11] "autocovari" "appropri"   "intens"     "physic"     "character" 
[16] "measur"     "children"   "trade"      "strength"   "scalar"    
[21] "superposit" "incomplet"  "copi"      

[[71]]
[1] "secondord"   "firstord"    "accur"       "expans"      "unbias"     
[6] "moment"      "approxim"    "frequentist" "exact"      

[[72]]
 [1] "treatment"  "assign"     "causal"     "score"      "outcom"    
 [6] "propens"    "averag"     "effect"     "grade"      "school"    
[11] "potenti"    "stratif"    "promot"     "confound"   "rubin"     
[16] "student"    "unit"       "regim"      "educ"       "adjust"    
[21] "children"   "plausibl"   "polici"     "program"    "evid"      
[26] "pretreat"   "posttreat"  "summar"     "stage"      "child"     
[31] "intermedi"  "assumpt"    "retain"     "multilevel" "block"     
[36] "econom"     "experiment" "stabl"      "arbitrari"  "nation"    
[41] "articl"     "balanc"     "learn"      "perspect"   "status"    
[46] "unmeasur"   "fewer"      "scalar"     "affect"     "low"       
[51] "mathemat"   "track"      "twostag"    "covari"     "tradeoff"  
[56] "recov"      "nonrandom"  "bind"       "pose"       "estimand"  
[61] "impos"      "feasibl"    "return"    

[[73]]
 [1] "extrapol"      "errorpron"     "posttreat"     "instrument"   
 [5] "classic"       "baselin"       "replic"        "subsampl"     
 [9] "nonlinear"     "daili"         "summari"       "air"          
[13] "encount"       "subset"        "bias"          "efficaci"     
[17] "heteroscedast" "frequenc"      "trajectori"    "spheric"      
[21] "supplementari" "correct"       "multiscal"     "scatter"      
[25] "reconstruct"   "subject"       "error"         "temperatur"   

[[74]]
 [1] "admiss"       "inadmiss"     "loss"         "bay"          "risk"        
 [6] "endpoint"     "action"       "ann"          "accept"       "math"        
[11] "genom"        "screen"       "stringent"    "result"       "complet"     
[16] "stepup"       "character"    "formul"       "treat"        "pearson"     
[21] "amer"         "assoc"        "biometrika"   "prototyp"     "vector"      
[26] "pay"          "reject"       "decad"        "revisit"      "metaanalysi" 
[31] "criteria"     "effort"       "bioassay"     "thought"      "hard"        
[36] "psycholog"    "nonneg"       "predetermin"  "fals"         "energi"      
[41] "earlier"      "educ"         "hoc"          "stein"        "emerg"       
[46] "fair"         "dna"          "appeal"       "sign"         "singlestep"  
[51] "drug"         "microarray"   "statistician" "jeffrey"      "year"        
[56] "fewer"        "fisher"       "paper"        "resembl"      "paradox"     
[61] "share"        "twodimension" "nonzero"      "stepdown"     "seek"        
[66] "expect"      

[[75]]
[1] "coeffici" "regress"  "linear"   "vari"    

[[76]]
 [1] "unbound"   "novelti"   "function"  "yield"     "oracl"     "tail"     
 [7] "decreas"   "satisfi"   "anisotrop" "inequ"     "median"    "slower"   
[13] "literatur" "bivari"    "free"      "vast"      "fast"      "input"    
[19] "setup"     "output"    "aggreg"    "aforement" "behav"     "influenti"
[25] "iii"       "bound"     "univers"   "main"      "nuclear"   "radius"   
[31] "need"      "tilt"      "hyperplan" "higherord" "symmetri"  "equivari" 
[37] "gee"       "scatter"   "bin"       "quadrat"  

[[77]]
 [1] "wishart"     "graph"       "cone"        "graphic"     "famili"     
 [6] "matric"      "conjug"      "gaussian"    "matrix"      "covari"     
[11] "decompos"    "prior"       "paramet"     "edg"         "definit"    
[16] "paper"       "homogen"     "space"       "correspond"  "posit"      
[21] "standard"    "shape"       "form"        "ann"         "miss"       
[26] "zero"        "equal"       "eigenvalu"   "dimens"      "close"      
[31] "respect"     "invers"      "sigma"       "chisquar"    "distinct"   
[36] "flexibl"     "margin"      "precis"      "bay"         "undirect"   
[41] "fix"         "refer"       "direct"      "constant"    "acycl"      
[46] "satisfi"     "expect"      "encod"       "entri"       "enrich"     
[51] "accept"      "phi"         "scalabl"     "omega"       "nonhomogen" 
[56] "probab"      "euclidean"   "dual"        "read"        "restrict"   
[61] "centr"       "characteris" "deep"        "tangent"     "fourth"     
[66] "perfect"    

[[78]]
 [1] "schedul"         "longitudin"      "followup"        "analys"         
 [5] "phase"           "generat"         "incomplet"       "flexibl"        
 [9] "respons"         "avail"           "ill"             "unbalanc"       
[13] "pursu"           "offer"           "enter"           "resourc"        
[17] "impact"          "merg"            "concret"         "intermitt"      
[21] "interim"         "preced"          "perfect"         "divid"          
[25] "maker"           "face"            "preliminari"     "fluctuat"       
[29] "missingatrandom" "versatil"        "alloc"           "timetoev"       
[33] "withinsubject"   "compromis"       "manag"           "metropoli"      
[37] "missingdata"     "walk"            "logrank"        

[[79]]
[1] "real"    "simul"   "data"    "illustr"

[[80]]
[1] NA

[[81]]
 [1] "chi"           "test"          "distribut"     "space"        
 [5] "ratio"         "restrict"      "statist"       "conveni"      
 [9] "tail"          "goodnessoffit" "pearson"      

[[82]]
[1] "size"   "sampl"  "number" "small"  "larg"  

[[83]]
[1] "misspecifi" "robust"     "misspecif" 

[[84]]
 [1] "climat"      "temperatur"  "chang"       "greenhous"   "global"     
 [6] "earth"       "uncertainti" "northern"    "atmospher"   "quantifi"   
[11] "trend"       "increas"     "reconstruct" "averag"      "region"     
[16] "separ"       "tempor"      "concentr"    "surfac"      "longterm"   
[21] "pollut"      "period"      "centuri"     "opposit"     "tree"       
[26] "gas"         "creat"       "purpos"      "futur"       "record"     
[31] "remot"       "understand"  "radiat"      "emiss"       "proxi"      
[36] "histor"      "air"         "ecolog"      "forest"      "magnitud"   
[41] "massiv"      "cloud"       "gather"      "forc"        "weather"    
[46] "synthet"     "actual"      "pattern"     "expert"      "extern"     
[51] "current"     "quantif"     "agreement"   "institut"    "act"        

For the future: it seems worth looking at whether generalized binary priors on L might help with this, since they might help avoid this kind of fit. I also wonder whether document-specific variances could help model outlying documents better.

Backfitting

I try backfitting one fit - it did not change things much. (When i tried backfitting the results with pseudocount 0.1 I got an error.)

fit.nn.s.1.2 = flash_backfit(fit.nn.s.1)
Backfitting 58 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+04...
  Difference between iterations is within 1.0e+03...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
compare(fit.nn.s.1,fit.nn.s.1.2)
[[1]]
[1] "singleindex" "function"    "link"        "compon"      "unknown"    

[[2]]
[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[[1]]
[1] "singleindex" "link"        "unknown"    

[[2]]
[1] "norm"   "estim"  "matrix"

Adding a factor

I was struck by the “mri” factor in the fit with pseudocount =0.01 that did not appear in the fit with 0.1. This factor makes a lot of sense so I thought maybe this is just a failure to “find” this factor in the fit with pseudocount = 0.1, rather than an indication of its absense in that data set. Here I confirm this by adding this factor and backfitting - the new factor is kept indicating that it improves the ELBO.

fit.nn.s.01.b = flash_factors_init(fit.nn.s.01,init = list(u = cbind(fit.nn.s.001$L_pm[,64]) ,d=cbind(c(1),drop=FALSE), v=cbind(fit.nn.s.001$F_pm[,64]) ))
fit.nn.s.01.b %>% flash_backfit(kset=109)
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Flash object with 109 factors.
  Proportion of variance explained*:
    Factor 1: 0.146
    Factor 2: 0.003
    Factor 3: 0.001
    Factor 4: 0.003
    Factor 5: 0.003
    Factor 9: 0.004
    Factor 10: 0.002
    Factor 14: 0.003
    Factor 15: 0.001
    Factor 38: 0.002
    Factor 43: 0.003
    Factor 48: 0.002
    Factor 50: 0.001
    Factor 56: 0.002
    Factor 65: 0.001
    Factor 91: 0.002
    *Factors with PVE < 0.001 are omitted from this summary.
  Variational lower bound: -109847.549
get_keywords(fit.nn.s.01.b)[109]
[[1]]
 [1] "ozon"            "maxima"          "splinebas"       "nonlinear"      
 [5] "piecewiselinear" "concentr"        "pressur"         "cycl"           
 [9] "transport"       "variat"          "contribut"       "atmospher"      
[13] "peak"            "trend"           "measur"          "basi"           
[17] "evid"            "instrument"      "thought"         "greater"        
[21] "link"            "scientif"        "lag"             "dimensionreduct"
[25] "absenc"          "wave"            "global"          "separ"          
[29] "month"           "coincid"         "influenc"        "lowdimension"   
[33] "clear"           "contrast"        "lower"           "year"           
[37] "site"            "qualiti"         "profil"          "sequenc"        
[41] "sensit"          "origin"          "relat"           "presenc"        
[45] "satellit"        "partial"         "pattern"         "identifi"       

Adding factors from 0.01 fit to 0.1 fit

I wondered how many of the differences are due to this kind of issue. So I tried adding all the factors from the 0.01 fit that did not appear in the original 0.1 fit and backfitting. I tried adding them all at once and backfitting but my initial attempt at that gave an error (I may not have done it correctly though), so here i add them one at a time. Most (but not all) are kept, indicating that maybe many of the differences between the runs are simply due to the runs finding different solutions, rather than due to differences in the structure present by pseudocount.

fit1 = fit.nn.s.01
fit2 = fit.nn.s.001
cc = cor(fit1$F_pm,fit2$F_pm)
spec2 = which(apply(cc,2,max)<0.9)
fit.nn.s.01.2 = fit.nn.s.01
for(i in spec2){
  init = list(u = cbind(fit2$L_pm[,i]),d= diag(1, nrow=1), v = cbind(fit2$F_pm[,i]))
  fit.nn.s.01.2 <- flash_factors_init(fit.nn.s.01.2, init = init, ebnm_fn= ebnm_point_exponential) 
  fit.nn.s.01.2 <- flash_backfit(fit.nn.s.01.2,kset=fit.nn.s.01.2$n_factors)
}
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  --Estimate of factor 112 is numerically zero!
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  --Estimate of factor 115 is numerically zero!
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Done.
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Done.
fit.nn.s.01.2 <- flash_nullcheck(fit.nn.s.01.2)
Nullchecking 136 factors...
  2 factors are identically zero.
Wrapping up...
  Removed 2 factors.
Done.
get_keywords(fit.nn.s.01.2,docfilter = 1)
[[1]]
 [1] "model"       "estim"       "data"        "method"      "propos"     
 [6] "studi"       "simul"       "distribut"   "function"    "sampl"      
[11] "paramet"     "approach"    "statist"     "base"        "asymptot"   
[16] "problem"     "general"     "regress"     "analysi"     "test"       
[21] "develop"     "procedur"    "perform"     "illustr"     "condit"     
[26] "set"         "applic"      "observ"      "variabl"     "likelihood" 
[31] "consist"     "time"        "appli"       "covari"      "properti"   
[36] "random"      "comput"      "articl"      "linear"      "case"       
[41] "process"     "infer"       "error"       "select"      "number"     
[46] "effici"      "rate"        "nonparametr" "deriv"       "measur"     
[51] "effect"      "algorithm"   "class"       "paper"       "compar"     
[56] "provid"      "includ"      "depend"     

[[2]]
 [1] "fals"       "control"    "procedur"   "test"       "rate"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "fdr"        "hochberg"   "number"     "stepdown"  
[16] "kfwer"      "familywis"  "error"      "depend"     "proport"   
[21] "benjamini"  "fwer"       "statist"    "fdp"        "soc"       
[26] "divid"      "power"      "roy"        "stepup"     "alpha"     
[31] "deriv"      "abil"       "ser"        "individu"   "detect"    
[36] "gamma"      "total"      "hypothesi"  "conserv"    "toler"     
[41] "attent"     "defin"      "singlestep" "construct"  "fix"       
[46] "simultan"   "probabl"    "independ"   "ann"        "usual"     
[51] "sime"       "improv"     "increas"   

[[3]]
 [1] "treatment"   "random"      "trial"       "patient"     "effect"     
 [6] "assign"      "noncompli"   "assumpt"     "outcom"      "complianc"  
[11] "causal"      "adher"       "depress"     "placebo"     "receiv"     
[16] "care"        "subject"     "clinic"      "intervent"   "drug"       
[21] "arm"         "dose"        "improv"      "primari"     "treat"      
[26] "princip"     "analys"      "latent"      "elder"       "control"    
[31] "sever"       "contrast"    "instrument"  "stratif"     "activ"      
[36] "particip"    "framework"   "prevent"     "potenti"     "physician"  
[41] "benefit"     "infer"       "imperfect"   "children"    "encourag"   
[46] "estimand"    "doserespons"

[[4]]
 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "event"        "semiparametr" "proport"      "data"        
[11] "cancer"       "covari"       "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "rightcensor" 
[21] "consist"      "nonparametr"  "trial"       

[[5]]
[1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
[7] "power"     "asymptot"  "hypothes" 

[[6]]
 [1] "simex"              "simulationextrapol" "measur"            
 [4] "error"              "undersmooth"        "asymptot"          
 [7] "longer"             "accuraci"           "finitesampl"       
[10] "principl"           "bias"               "presenc"           
[13] "selector"           "wang"               "rootn"             

[[7]]
 [1] "wilk"       "ratio"      "phenomenon" "correct"    "relax"     
 [6] "conduct"    "newli"      "unspecifi"  "freedom"    "follow"    
[11] "backfit"    "nuisanc"    "theorem"    "degre"      "chisquar"  
[16] "likelihood" "empir"      "ask"        "hold"      

[[8]]
 [1] "mle"         "maximum"     "likelihood"  "main"        "prove"      
 [6] "asymptot"    "converg"     "limit"       "mles"        "status"     
[11] "rate"        "current"     "brownian"    "behavior"    "motion"     
[16] "estim"       "proof"       "uniqu"       "nonparametr"

[[9]]
 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "algorithm"
 [7] "posterior" "infer"     "prior"     "model"     "mcmc"     

[[10]]
 [1] "lasso"     "select"    "variabl"   "regress"   "coeffici"  "spars"    
 [7] "penalti"   "adapt"     "linear"    "oracl"     "penal"     "problem"  
[13] "sparsiti"  "algorithm" "regular"  

[[11]]
[1] "varyingcoeffici" "nonparametr"     "coeffici"        "linear"         
[5] "longitudin"      "conduct"         "propos"          "vari"           
[9] "regress"        

[[12]]
 [1] "rankbas"      "effici"       "asymptot"     "rank"         "ellipt"      
 [6] "cam"          "class"        "densiti"      "uniform"      "normal"      
[11] "version"      "sign"         "multivari"    "matric"       "symmetri"    
[16] "valid"        "finit"        "scatter"      "ann"          "contour"     
[21] "tradit"       "assumpt"      "sens"         "irrespect"    "rootn"       
[26] "semiparametr" "center"      

[[13]]
 [1] "nconsist" "root"     "reduct"   "dimens"   "exist"    "direct"  
 [7] "central"  "slice"    "exhaust"  "contour"  "ellipt"   "advantag"
[13] "mild"     "strong"   "regress"  "varianc"  "suffici"  "invers"  
[19] "averag"  

[[14]]
 [1] "semiparametr" "estim"        "nonparametr"  "parametr"     "paramet"     
 [6] "model"        "effici"       "asymptot"     "likelihood"   "regress"     
[11] "function"    

[[15]]
 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  

[[16]]
 [1] "nonconcav"     "penal"         "select"        "oracl"        
 [5] "penalti"       "variabl"       "likelihood"    "regular"      
 [9] "fan"           "challeng"      "nondifferenti" "maxim"        
[13] "sandwich"      "onestep"       "establish"     "concav"       
[17] "broad"         "enjoy"         "employ"        "selector"     
[21] "encourag"      "cost"         

[[17]]
[1] NA

[[18]]
[1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
[5] "famili"        "error"        

[[19]]
[1] "nonnorm"   "normal"    "mix"       "linear"    "exponenti"

[[20]]
[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

[[21]]
 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

[[22]]
 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "taper"         "frequenc"      "long"          "fraction"     
 [9] "averag"        "depend"        "paramet"       "periodogram"  
[13] "stationari"    "move"          "slowli"        "whittl"       
[17] "eigenvector"   "local"         "nonstationari" "distinct"     
[21] "angl"         

[[23]]
 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

[[24]]
[1] "polynomi"    "local"       "regress"     "smooth"      "nonparametr"
[6] "asymptot"   

[[25]]
 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

[[26]]
 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

[[27]]
 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

[[28]]
 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

[[29]]
 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[[30]]
[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

[[31]]
 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "condit"       
 [9] "moment"        "autoregress"   "financi"       "local"        
[13] "standard"      "innov"         "sequenc"       "satisfi"      
[17] "move"          "iid"           "time"          "averag"       
[21] "root"          "mont"          "carlo"        

[[32]]
[1] "quantil" "regress"

[[33]]
 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

[[34]]
 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

[[35]]
 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[[36]]
[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

[[37]]
 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[[38]]
[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

[[39]]
 [1] "singleindex" "unknown"     "link"        "compon"      "equat"      
 [6] "function"    "varianc"     "nonparametr" "beta"        "femal"      
[11] "structur"    "smaller"     "compos"      "vectorvalu"  "eigenfunct" 
[16] "composit"    "econometr"  

[[40]]
[1] "finitesampl" "propos"     

[[41]]
 [1] "wavelet"    "adapt"      "besov"      "minimax"    "ball"      
 [6] "threshold"  "rang"       "nois"       "wide"       "unknown"   
[11] "rate"       "risk"       "bound"      "deconvolut" "smooth"    
[16] "problem"    "function"   "signal"     "white"      "converg"   
[21] "gaussian"   "transform"  "recov"      "densiti"    "shape"     
[26] "view"       "noisi"      "discret"    "nearoptim"  "spars"     
[31] "blur"       "fourier"    "decay"      "upper"      "convolut"  
[36] "invers"    

[[42]]
 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "weight"     "casecohort" "design"     "invers"     "twophas"   
[11] "cohort"     "random"     "causal"     "outcom"     "biometrika"
[16] "prentic"    "calcul"     "purpos"     "confound"   "lemma"     
[21] "mar"        "exemplifi"  "suit"       "amer"       "assoc"     
[26] "proceed"    "summar"     "cox"        "ser"        "soc"       
[31] "roy"        "iid"        "appear"     "unbias"    

[[43]]
[1] "maximum"    "likelihood" "estim"     

[[44]]
[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[[45]]
[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

[[46]]
 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[[47]]
[1] "chi"       "test"      "distribut" "space"     "ratio"     "restrict" 
[7] "statist"  

[[48]]
[1] "coeffici" "regress" 

[[49]]
 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "low"           "optim"         "nonasymptot"   "highdimension"
[13] "convex"        "spars"         "minimax"       "noisi"        
[17] "element"       "minim"         "error"         "singular"     
[21] "setup"         "vector"        "theori"        "precis"       
[25] "autoregress"   "predict"      

[[50]]
 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[[51]]
[1] "unequ"     "designbas" "survey"    "weight"   

[[52]]
[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[[53]]
[1] "variancecovari" "matrix"         "analyz"        

[[54]]
[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[[55]]
[1] "bspline" "kernel"  "penal"  

[[56]]
[1] "varianc"  "asymptot"

[[57]]
 [1] "eigenfunct" "function"   "princip"    "compon"     "random"    
 [6] "analysi"    "data"       "smooth"     "eigenvalu"  "deriv"     
[11] "curv"       "spars"      "trajectori" "space"      "score"     

[[58]]
 [1] "forecast"    "predict"     "weather"     "spatial"     "wind"       
 [6] "probabilist" "northwest"   "calibr"      "pacif"       "meteorolog" 
[11] "temperatur"  "speed"       "hour"        "energi"      "atmospher"  
[16] "averag"      "ensembl"     "geostatist"  "futur"       "center"     
[21] "north"       "precipit"    "accur"       "tempor"      "daili"      
[26] "event"       "resourc"     "site"        "american"    "state"      
[31] "sharp"       "spacetim"    "qualiti"     "climat"      "ozon"       
[36] "concentr"    "generat"     "regim"       "transport"   "season"     
[41] "shortterm"   "determinist" "input"      

[[59]]
 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

[[60]]
 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

[[61]]
 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

[[62]]
 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

[[63]]
 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

[[64]]
 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

[[65]]
[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

[[66]]
 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[[67]]
[1] "bar"    "vertic" "cap"    "lambda"

[[68]]
[1] NA

[[69]]
[1] NA

[[70]]
 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

[[71]]
 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

[[72]]
 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

[[73]]
 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

[[74]]
[1] NA

[[75]]
 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[[76]]
[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

[[77]]
 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[[78]]
[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[[79]]
[1] "iid"   "prove"

[[80]]
 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

[[81]]
 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

[[82]]
 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[[83]]
[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

[[84]]
 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[[85]]
[1] "goodnessoffit" "test"          "includ"        "residu"       

[[86]]
[1] NA

[[87]]
 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

[[88]]
 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

[[89]]
 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

[[90]]
 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[[91]]
[1] "size"   "sampl"  "number"

[[92]]
[1] NA

[[93]]
[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[[94]]
[1] "tilt"       "exponenti"  "constraint" "employ"    

[[95]]
 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

[[96]]
 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

[[97]]
 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

[[98]]
 [1] "elect"      "vote"       "poll"       "evid"       "candid"    
 [6] "presidenti" "count"      "station"    "forecast"   "proport"   
[11] "polit"      "prefer"     "counti"     "record"     "lower"     

[[99]]
[1] NA

[[100]]
[1] NA

[[101]]
[1] NA

[[102]]
 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[[103]]
[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[[104]]
[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

[[105]]
[1] NA

[[106]]
[1] NA

[[107]]
[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

[[108]]
 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  

[[109]]
 [1] "dichotom"    "exposur"     "outcom"      "genet"       "interact"   
 [6] "inherit"     "factor"      "alcohol"     "confound"    "trait"      
[11] "categor"     "assess"      "presenc"     "binari"      "ordin"      
[16] "disord"      "environment" "topic"       "examin"      "geneenviron"
[21] "cancer"      "causal"      "adequ"       "stage"       "alter"      
[26] "intermedi"   "conduct"     "continu"     "subgroup"    "postul"     
[31] "misspecif"  

[[110]]
[1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
[6] "viral"       

[[111]]
 [1] "dropout"      "stratum"      "prevent"      "oil"          "reduc"       
 [6] "cancer"       "prostat"      "longitudin"   "find"         "adjust"      
[11] "stratifi"     "nuisanc"      "men"          "arm"          "trial"       
[16] "randomeffect" "mechan"       "sever"        "verif"        "frequent"    
[21] "conjectur"    "grade"        "colleagu"     "annual"       "agent"       
[26] "placebo"      "volum"        "drawn"        "doubleblind"  "caus"        
[31] "absolut"      "preval"       "daili"        "lie"          "reduct"      

[[112]]
 [1] "slice"   "invers"  "dimens"  "reduct"  "regress" "method"  "central"
 [8] "direct"  "goal"    "subspac"

[[113]]
[1] "band"   "consid"

[[114]]
 [1] "breakdown" "robust"    "point"     "outlier"   "definit"   "finit"    
 [7] "suggest"   "possess"   "previous"  "region"    "suffic"    "lead"     

[[115]]
 [1] "spacetim"    "site"        "spatial"     "monitor"     "year"       
 [6] "tempor"      "separ"       "fit"         "smoother"    "trend"      
[11] "ozon"        "environment" "relat"       "meteorolog"  "space"      
[16] "arbitrari"   "indic"       "daili"       "interact"    "wind"       
[21] "autoregress" "avoid"       "cross"      

[[116]]
 [1] "census"   "survey"   "bureau"   "relat"    "area"     "count"   
 [7] "incorpor" "collect"  "labor"    "protect"  "race"    

[[117]]
[1] "microarray" "gene"       "express"    "analysi"    "differenti"
[6] "data"       "experi"    

[[118]]
[1] "root"  "squar"

[[119]]
 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "latent"        "gene"          "viral"         "protein"      
 [9] "biomark"       "initi"         "infect"        "therapi"      
[13] "understand"    "pronounc"      "supplementari" "concentr"     

[[120]]
[1] "establish"

[[121]]
[1] "bootstrap" "distribut"

[[122]]
 [1] "imag"   "magnet" "reson"  "field"  "brain"  "fmri"   "activ"  "signal"
 [9] "voxel"  "detect" "locat"  "volum" 

[[123]]
 [1] "alloc"         "responseadapt" "treatment"     "random"       
 [5] "design"        "optim"         "trial"         "clinic"       
 [9] "proport"       "target"        "criteria"      "coin"         
[13] "power"         "sequenti"      "procedur"      "rule"         
[17] "assign"        "taylor"        "relationship"  "reli"         
[21] "expans"        "bias"          "failur"        "patient"      
[25] "induc"         "paper"         "author"        "lower"        
[29] "efron"         "binari"        "expect"        "discontinu"   
[33] "prefer"        "nondifferenti" "earlier"       "lot"          
[37] "stop"          "stage"        

[[124]]
 [1] "design"        "aberr"         "factori"       "minimum"      
 [5] "construct"     "factor"        "theori"        "fraction"     
 [9] "doubl"         "project"       "pattern"       "run"          
[13] "twolevel"      "complementari" "defin"         "repeat"       
[17] "maxim"         "link"          "criteria"      "ident"        
[21] "import"       

[[125]]
 [1] "electr"        "load"          "power"         "forecast"     
 [5] "bivari"        "daili"         "market"        "wind"         
 [9] "serial"        "shortterm"     "speed"         "diagon"       
[13] "difficult"     "temperatur"    "price"         "season"       
[17] "spectrum"      "heteroscedast" "regressor"     "firstord"     
[21] "peak"          "citi"          "vari"          "justifi"      
[25] "highlight"     "energi"       

[[126]]
 [1] "day"       "daili"     "activ"     "financi"   "peak"      "record"   
 [7] "help"      "character" "account"   "appropri" 

[[127]]
[1] "secondord" "firstord" 

[[128]]
 [1] "school"     "promot"     "assign"     "treatment"  "grade"     
 [6] "children"   "score"      "averag"     "student"    "outcom"    
[11] "potenti"    "polici"     "propens"    "causal"     "retain"    
[16] "evid"       "child"      "regim"      "program"    "stratif"   
[21] "block"      "unit"       "rubin"      "nation"     "plausibl"  
[26] "stage"      "summar"     "multilevel" "educ"       "affect"    
[31] "fewer"      "stabl"      "impos"      "year"       "scalar"    
[36] "twostag"    "articl"     "consid"     "effect"     "learn"     
[41] "intermedi"  "low"        "pretreat"   "confound"   "track"     

[[129]]
 [1] "extrapol"      "errorpron"     "posttreat"     "baselin"      
 [5] "subsampl"      "instrument"    "replic"        "classic"      
 [9] "treatment"     "nonlinear"     "bias"          "daili"        
[13] "summari"       "air"           "encount"       "efficaci"     
[17] "heteroscedast" "supplementari" "spheric"       "frequenc"     
[21] "trajectori"    "multiscal"     "correct"       "subset"       
[25] "scatter"       "temperatur"   

[[130]]
 [1] "admiss"      "inadmiss"    "endpoint"    "loss"        "risk"       
 [6] "pearson"     "action"      "genom"       "screen"      "ann"        
[11] "biometrika"  "bay"         "amer"        "assoc"       "accept"     
[16] "math"        "paper"       "complet"     "character"   "stepup"     
[21] "revisit"     "stringent"   "metaanalysi" "year"        "thought"    
[26] "hard"        "nonneg"      "share"       "nonzero"     "fisher"     
[31] "formul"      "upper"       "reject"     

[[131]]
 [1] "unbound"   "novelti"   "oracl"     "function"  "anisotrop" "tail"     
 [7] "inequ"     "aforement" "satisfi"   "literatur" "decreas"   "vast"     
[13] "fast"      "setup"     "bivari"    "slower"    "free"      "input"    
[19] "output"    "aggreg"    "yield"     "iii"       "behav"     "residu"   
[25] "main"      "inform"    "univari"   "univers"  

[[132]]
 [1] "schedul"         "longitudin"      "miss"            "respons"        
 [5] "incomplet"       "analys"          "followup"        "missingatrandom"
 [9] "assess"          "intermitt"       "ill"             "account"        
[13] "avail"           "data"            "impact"          "offer"          
[17] "missingdata"     "joint"           "visit"           "unbalanc"       
[21] "merg"            "indic"           "naiv"            "equat"          
[25] "appeal"         

[[133]]
[1] "real"  "simul" "data" 

[[134]]
[1] "misspecifi" "robust"    

Constant variance

I thought I would try constant variance flash to see what happens (no need to regularize tau this way). It turns out to fit a very large number of single word factors… I ran it with Kmax=200 and it fit all 200 factors. I do just 30 here to illustrate more quickly. You can see it reduces the mean squared error compared with the “maximum likelihood” perhaps suggesting the greedy approach helps find a better fit?

set.seed(1)
fit.nn.s.v0 = flash(lmat_s_1,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=0,greedy_Kmax = 30)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Wrapping up...
Done.
Nullchecking 30 factors...
Done.
print(get_keywords(fit.nn.s.v0))
[[1]]
[1] "estim"    "model"    "method"   "data"     "propos"   "function" "studi"   

[[2]]
[1] "test"     "procedur" "statist"  "null"    

[[3]]
[1] "treatment" "random"    "effect"    "studi"     "outcom"    "trial"    
[7] "design"   

[[4]]
[1] "model"    "bayesian"

[[5]]
[1] "select"  "variabl" "regress"

[[6]]
[1] "data"    "analysi"

[[7]]
[1] "function"

[[8]]
[1] "sampl" "size" 

[[9]]
[1] "problem"

[[10]]
[1] "statist"

[[11]]
[1] "time" "seri"

[[12]]
[1] "method"

[[13]]
[1] "design"

[[14]]
[1] "estim"

[[15]]
[1] "rate"    "converg"

[[16]]
[1] "propos"

[[17]]
[1] "gene"       "express"    "microarray"

[[18]]
[1] "approach"

[[19]]
[1] "distribut"

[[20]]
[1] "error"  "measur"

[[21]]
[1] "general"

[[22]]
[1] "develop"

[[23]]
[1] "covari"

[[24]]
[1] "number"

[[25]]
[1] "process"

[[26]]
[1] "risk"

[[27]]
[1] "space"

[[28]]
[1] "predict"

[[29]]
[1] "articl"

[[30]]
[1] "level"
mean((lmat_s_1-fitted(fit.nn.s.v0))^2)
[1] 0.01649761

Topic model

Here I fit a topic model with k= 100; this yields a visually better fit to large values.

fit_nmf_k100 = fit_poisson_nmf(mat,k=100,init.method="random")
Fitting rank-100 Poisson NMF to 1924 x 2172 sparse matrix.
Running 100 SCD updates, without extrapolation (fastTopics 0.6-158).
fvals.nmf.k100 = fit_nmf_k100$L %*% t(fit_nmf_k100$F)
plot(mat[sub],fvals.nmf.k100[sub])

Version Author Date
0346f50 Matthew Stephens 2023-11-08
plot(log(1+mat[sub]),log(1+fvals.nmf.k100[sub]))

I tried fitting flash to the transform of the fitted values. The rationale here is to use topic modelling to “denoise” the data and then transform the denoised data. However, there are computational issues with this approach in general… it seems like it will not be tractible in general because it cannot exploit sparsity, which is essential for big datasets. The keywords seem promising. Maybe we should experiment some more(?)

set.seed(1)
fit.nn.nmf.k100 = flash(log(fvals.nmf.k100+1),ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200,S=0.01)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Warning in scale.EF(EF): Fitting stopped after the initialization function
failed to find a non-zero factor.
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 51 factors...
Done.
plot(log(1+mat[sub]),fitted(fit.nn.nmf.k100)[sub])

print(get_keywords(fit.nn.nmf.k100))
[[1]]
 [1] "model"     "estim"     "data"      "method"    "studi"     "propos"   
 [7] "distribut" "statist"   "approach"  "function"  "asymptot"  "simul"    
[13] "general"   "base"      "sampl"     "problem"   "analysi"   "paramet"  
[19] "procedur"  "regress"   "test"     

[[2]]
 [1] "coeffici" "partial"  "hazard"   "proport"  "estim"    "model"   
 [7] "covari"   "surviv"   "studi"    "baselin"  "vari"     "regress" 

[[3]]
[1] "weight"  "miss"    "imput"   "handl"   "data"    "mechan"  "augment"
[8] "covari"  "effici" 

[[4]]
[1] "spars"    "lasso"    "select"   "sparsiti" "oracl"    "coeffici" "nonzero" 
[8] "adapt"   

[[5]]
[1] "local"     "kernel"    "bandwidth" "global"    "polynomi"  "estim"    
[7] "asymptot" 

[[6]]
[1] "likelihood" "maximum"    "ratio"      "estim"      "paramet"   
[6] "asymptot"   "distribut"  "normal"    

[[7]]
[1] "respons"   "predictor" "interpret" "regress"   "linear"    "function" 
[7] "anova"    

[[8]]
[1] "depend" "censor" "surviv" "copula" "compet" "bivari" "time"   "data"  

[[9]]
[1] "robust"    "correct"   "presenc"   "outcom"    "misspecif" "model"    
[7] "assumpt"  

[[10]]
[1] "smooth" "addit"  "spline" "select"

[[11]]
[1] "error"   "squar"   "measur"  "estim"   "predict" "price"  

[[12]]
[1] "group"   "activ"   "sourc"   "brain"   "imag"    "heart"   "analysi"
[8] "separ"  

[[13]]
[1] "structur"   "correl"     "screen"     "independ"   "longitudin"

[[14]]
 [1] "nonparametr"  "covari"       "parametr"     "semiparametr" "estim"       
 [6] "propos"       "model"        "function"     "asymptot"     "regress"     
[11] "effici"      

[[15]]
 [1] "procedur"  "control"   "fals"      "discoveri" "reject"    "test"     
 [7] "pvalu"     "fdr"       "rate"      "hypothes"  "multipl"   "null"     
[13] "power"     "conserv"  

[[16]]
[1] "matrix"    "covari"    "matric"    "eigenvalu" "vector"   

[[17]]
[1] "rank"     "sign"     "attribut" "rankbas" 

[[18]]
[1] "test"      "altern"    "hypothesi" "null"      "statist"   "power"    
[7] "hypothes"  "asymptot" 

[[19]]
[1] "popul"      "survey"     "calibr"     "sampl"      "nonrespons"
[6] "unit"       "auxiliari"  "census"     "modelbas"  

[[20]]
 [1] "project"   "depth"     "concept"   "robust"    "scatter"   "dispers"  
 [7] "trim"      "breakdown" "ellipt"    "definit"   "defin"     "equivari" 
[13] "median"    "point"     "introduc" 

[[21]]
[1] "high"          "dimens"        "dimension"     "reduct"       
[5] "invers"        "highdimension"

[[22]]
[1] "threshold" "rang"      "nois"      "signal"    "wavelet"   "wide"     
[7] "adapt"     "shrinkag" 

[[23]]
[1] "equat"      "stochast"   "dynam"      "diffus"     "differenti"
[6] "solut"      "infer"      "discret"   

[[24]]
[1] "select"  "penal"   "penalti" "variabl" "regular"

[[25]]
[1] "gaussian"    "fraction"    "expans"      "truncat"     "nongaussian"

[[26]]
[1] "bootstrap" "calcul"    "block"     "accuraci"  "resampl"   "mestim"   
[7] "accur"    

[[27]]
[1] "varianc" "mix"     "fix"     "sampl"   "outlier"

[[28]]
[1] "bayesian"  "prior"     "mixtur"    "posterior" "hierarch"  "model"    
[7] "dirichlet"

[[29]]
 [1] "point"     "prove"     "statist"   "consist"   "result"    "main"     
 [7] "condit"    "uniform"   "paper"     "weak"      "ann"       "assumpt"  
[13] "establish"

[[30]]
 [1] "implement" "nonlinear" "iter"      "step"      "easi"      "exploit"  
 [7] "filter"    "comput"    "algorithm" "recurs"   

[[31]]
[1] "theoret" "practic" "numer"   "improv"  "effici"  "adapt"  

[[32]]
[1] "sequenc"   "oper"      "volatil"   "financi"   "jump"      "surfac"   
[7] "pattern"   "highfrequ"

[[33]]
[1] "propos"   "procedur"

[[34]]
[1] "densiti"    "bound"      "constraint" "minimax"    "lower"     
[6] "upper"      "inequ"     

[[35]]
[1] "space"     "transform" "invari"   

[[36]]
[1] "compon"   "princip"  "analysi"  "function"

[[37]]
[1] "beta"     "bar"      "vertic"   "theta"    "cap"      "lambda"   "parallel"
[8] "vote"     "elect"   

[[38]]
[1] "class"   "unknown" "vector"  "element"

[[39]]
[1] "trend"    "tree"     "tempor"   "histor"   "time"     "year"     "spatial" 
[8] "spacetim" "season"  

[[40]]
[1] "seri"          "time"          "onlin"         "materi"       
[5] "autoregress"   "supplementari" "supplement"   

[[41]]
[1] "number" "size"   "larg"   "small"  "sampl" 

[[42]]
 [1] "factor"  "cancer"  "cure"    "breast"  "prostat" "incid"   "report" 
 [8] "diseas"  "assoc"   "amer"   

[[43]]
[1] "averag"   "diagnost" "imag"     "tensor"  

[[44]]
 [1] "scale"    "assess"   "distanc"  "continu"  "influenc" "degre"   
 [7] "perturb"  "tool"     "composit" "issu"     "freedom" 

[[45]]
[1] "approxim" "forecast" "accur"    "wind"     "speed"    "cost"    

[[46]]
[1] "framework" "area"      "unbias"    "unifi"     "basic"     "deal"     
[7] "great"    

[[47]]
[1] "variabl"     "latent"      "explanatori"

[[48]]
[1] "direct"   "type"     "classic"  "integr"   "locat"    "indirect" "claim"   

[[49]]
 [1] "effect"     "treatment"  "random"     "causal"     "assign"    
 [6] "outcom"     "assumpt"    "infer"      "instrument" "bias"      
[11] "studi"     

[[50]]
[1] "trial"     "treatment" "clinic"    "patient"   "stage"     "alloc"    
[7] "arm"       "placebo"  

[[51]]
[1] "design"     "orthogon"   "experiment" "balanc"     "nest"      
[6] "construct" 
  fv= fitted(fit.nn.nmf.k100)
  sub = sample(1:length(fv),100000)
  plot(lmat_s_1[sub],fv[sub])

Version Author Date
0346f50 Matthew Stephens 2023-11-08

Anscombe transform

This is a very brief look at the anscombe transformation for comparison:

fit.nn.a = flash(sqrt(mat+3/8),ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=0.01)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Adding factor 53 to flash object...
Adding factor 54 to flash object...
Adding factor 55 to flash object...
Adding factor 56 to flash object...
Adding factor 57 to flash object...
Adding factor 58 to flash object...
Adding factor 59 to flash object...
Adding factor 60 to flash object...
Adding factor 61 to flash object...
Adding factor 62 to flash object...
Adding factor 63 to flash object...
Adding factor 64 to flash object...
Adding factor 65 to flash object...
Adding factor 66 to flash object...
Adding factor 67 to flash object...
Adding factor 68 to flash object...
Adding factor 69 to flash object...
Adding factor 70 to flash object...
Adding factor 71 to flash object...
Adding factor 72 to flash object...
Adding factor 73 to flash object...
Adding factor 74 to flash object...
Adding factor 75 to flash object...
Adding factor 76 to flash object...
Adding factor 77 to flash object...
Adding factor 78 to flash object...
Adding factor 79 to flash object...
Adding factor 80 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 79 factors...
Done.
print(get_keywords(fit.nn.a))
[[1]]
  [1] "estim"        "model"        "data"         "method"       "propos"      
  [6] "function"     "studi"        "distribut"    "sampl"        "paramet"     
 [11] "simul"        "test"         "statist"      "asymptot"     "regress"     
 [16] "approach"     "problem"      "base"         "general"      "procedur"    
 [21] "analysi"      "variabl"      "condit"       "covari"       "likelihood"  
 [26] "develop"      "observ"       "time"         "set"          "random"      
 [31] "perform"      "process"      "select"       "consist"      "applic"      
 [36] "illustr"      "linear"       "error"        "properti"     "comput"      
 [41] "case"         "rate"         "number"       "appli"        "infer"       
 [46] "effici"       "nonparametr"  "measur"       "algorithm"    "articl"      
 [51] "effect"       "class"        "deriv"        "depend"       "paper"       
 [56] "compar"       "provid"       "includ"       "normal"       "probabl"     
 [61] "optim"        "bayesian"     "approxim"     "varianc"      "design"      
 [66] "compon"       "assumpt"      "larg"         "structur"     "size"        
 [71] "smooth"       "predict"      "demonstr"     "independ"     "addit"       
 [76] "point"        "respons"      "construct"    "empir"        "exist"       
 [81] "converg"      "prior"        "densiti"      "introduc"     "standard"    
 [86] "correl"       "methodolog"   "local"        "maximum"      "treatment"   
 [91] "multipl"      "theoret"      "parametr"     "combin"       "requir"      
 [96] "investig"     "establish"    "space"        "theori"       "common"      
[101] "term"         "matrix"       "real"         "limit"        "work"        
[106] "multivari"    "practic"      "bias"         "finit"        "level"       
[111] "control"      "altern"       "coeffici"     "discuss"      "framework"   
[116] "semiparametr" "order"        "assum"        "simpl"        "weight"      
[121] "carlo"        "form"         "mont"         "fit"          "robust"      
[126] "identifi"     "lead"         "adapt"        "improv"       "factor"      
[131] "small"        "high"         "direct"       "seri"         "techniqu"    
[136] "power"        "numer"        "cluster"      "spatial"      "involv"      
[141] "predictor"    "unknown"      "increas"     

[[2]]
[1] "miss"      "robin"     "rotnitzki" "zhao"     

[[3]]
[1] "cancer" "studi"  "diseas" "data"  

[[4]]
[1] "rightcensor"  "surviv"       "estim"        "semiparametr"

[[5]]
[1] "retail"   "deliveri" "tradit"   "frequenc" "servic"   "birth"    "tail"    
[8] "compani"  "differ"  

[[6]]
[1] "wilk" "test"

[[7]]
[1] "simex"  "measur"

[[8]]
[1] "select"  "lasso"   "spars"   "penalti" "penal"  

[[9]]
[1] "forecast"    "predict"     "probabilist"

[[10]]
 [1] "motif"      "cluster"    "gene"       "transcript" "factor"    
 [6] "bind"       "sequenc"    "protein"    "discoveri"  "regul"     
[11] "conserv"    "pattern"    "dirichlet"  "call"      

[[11]]
[1] "climat"     "temperatur" "chang"      "model"      "futur"     

[[12]]
[1] "nonrespons" "survey"     "imput"      "respons"   

[[13]]
[1] "missingdata" "covari"      "miss"        "mechan"     

[[14]]
character(0)

[[15]]
[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[[16]]
[1] "reml"      "smooth"    "criterion" "converg"   "akaik"     "maximum"  
[7] "direct"    "restrict"  "criteria" 

[[17]]
[1] "varyingcoeffici" "propos"         

[[18]]
[1] "hazard"  "proport" "surviv"  "time"   

[[19]]
[1] "nconsist"

[[20]]
[1] "elicit"   "interact" "exposur"  "prone"   

[[21]]
[1] "mles"       "likelihood"

[[22]]
[1] "singleindex"

[[23]]
[1] "semiparametr" "estim"        "model"       

[[24]]
 [1] "claim"  "insur"  "vehicl" "type"   "age"    "damag"  "year"   "turn"  
 [9] "detail" "experi"

[[25]]
[1] "pollut"   "air"      "nation"   "mortal"   "confound" "coeffici" "time"    

[[26]]
[1] "depth"    "project"  "function" "robust"  

[[27]]
[1] "loglinear" "model"     "tabl"     

[[28]]
 [1] "procedur"  "fals"      "control"   "test"      "reject"    "hypothes" 
 [7] "rate"      "discoveri" "null"      "multipl"   "pvalu"     "fdr"      
[13] "kfwer"     "stepdown"  "number"    "fwer"      "depend"   

[[29]]
[1] "spacetim" "site"     "time"    

[[30]]
 [1] "loci"         "genet"        "popul"        "genom"        "allel"       
 [6] "map"          "outlier"      "region"       "statist"      "diverg"      
[11] "relationship" "variat"      

[[31]]
[1] "dirichlet" "process"   "mixtur"    "prior"    

[[32]]
[1] "volatil"   "highfrequ" "asset"     "financi"   "price"     "matrix"   

[[33]]
[1] "bandwidth" "kernel"    "local"     "select"   

[[34]]
[1] "jackknif" "mix"      "squar"    "varianc"  "area"     "respons"  "uncondit"

[[35]]
[1] "tensor"      "diffus"      "imag"        "eigenvalu"   "eigenvector"
[6] "develop"     "nois"       

[[36]]
[1] "auxiliari" "survey"    "sampl"     "variabl"  

[[37]]
[1] "onestep"    "estim"      "likelihood"

[[38]]
 [1] "manifest" "variabl"  "latent"   "model"    "type"     "pseudo"  
 [7] "ordin"    "under"    "covari"   "induc"   

[[39]]
[1] "besov"      "wavelet"    "adapt"      "minimax"    "rang"      
[6] "deconvolut" "function"  

[[40]]
character(0)

[[41]]
[1] "tau"     "yield"   "factor"  "month"   "truncat"

[[42]]
[1] "gee"     "equat"   "correl"  "binari"  "work"    "general"

[[43]]
character(0)

[[44]]
[1] "propag"

[[45]]
[1] "homoscedast"

[[46]]
[1] "covari"    "error"     "errorpron" "studi"    

[[47]]
[1] "twostep"  "estim"    "submodel"

[[48]]
[1] "drift"   "process" "diffus" 

[[49]]
 [1] "flow"         "traffic"      "network"      "dynam"        "intervent"   
 [6] "causal"       "forecast"     "articl"       "identifi"     "manag"       
[11] "seri"         "relationship" "monitor"     

[[50]]
[1] "satur"    "shrinkag" "adapt"    "candid"   "oneway"  

[[51]]
[1] "quasilikelihood" "function"       

[[52]]
[1] "spatiotempor" "spatial"      "process"     

[[53]]
[1] "area"      "unemploy"  "benchmark" "census"   

[[54]]
 [1] "gene"       "microarray" "express"    "cdna"       "intens"    
 [6] "imag"       "normal"     "replic"     "array"      "background"
[11] "differenti" "outlier"   

[[55]]
[1] "taper"    "approxim" "matrix"   "covari"   "gaussian"

[[56]]
[1] "seem"   "spline"

[[57]]
[1] "errorsinvari" "error"       

[[58]]
[1] "nonnorm"

[[59]]
[1] "polynomi" "local"    "regress" 

[[60]]
[1] "axe"    "rotat"  "matric" "motion"

[[61]]
[1] "biascorrect"

[[62]]
[1] "equivari" "matrix"  

[[63]]
[1] "unbias" "estim" 

[[64]]
[1] "substitut"

[[65]]
[1] "equat" "estim"

[[66]]
[1] "trajectori" "function"   "time"       "longitudin" "data"      

[[67]]
[1] "test"      "null"      "hypothesi"

[[68]]
[1] "aic"       "select"    "criterion" "bic"       "akaik"    

[[69]]
[1] "nonidentifi" "identifi"   

[[70]]
[1] "net"     "elast"   "prior"   "regress" "path"   

[[71]]
[1] "instabl" "select"  "combin" 

[[72]]
[1] "robust" "out"    "curv"   "altern"

[[73]]
 [1] "depress"   "random"    "treatment" "care"      "patient"   "subject"  
 [7] "outcom"    "trial"     "adher"     "noncompli" "intervent" "health"   
[13] "meet"      "improv"    "receiv"    "primari"   "latent"   

[[74]]
character(0)

[[75]]
 [1] "trait"       "alcohol"     "genet"       "ordin"       "exist"      
 [6] "associ"      "complex"     "famili"      "dichotom"    "environment"

[[76]]
[1] "vanish"    "interact"  "nonlinear"

[[77]]
[1] "agre"

[[78]]
[1] "posterior" "proprieti" "miss"      "dataset"   "improp"   

[[79]]
[1] "subgroup" "interact"
fv= fitted(fit.nn.a)
sub = sample(1:length(fv),100000)
plot(sqrt(mat+3/8)[sub],fv[sub])


sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RcppML_0.5.6       ebpmf_2.3.2        flashier_1.0.0     ggplot2_3.4.3     
 [5] magrittr_2.0.3     ebnm_1.0-55        fastTopics_0.6-158 tm_0.7-11         
 [9] NLP_0.2-1          readr_2.1.4        Matrix_1.5-3      

loaded via a namespace (and not attached):
  [1] Rtsne_0.16         ebpm_0.0.1.3       colorspace_2.1-0  
  [4] smashr_1.2-9       ellipsis_0.3.2     rprojroot_2.0.3   
  [7] fs_1.6.3           rstudioapi_0.14    farver_2.1.1      
 [10] MatrixModels_0.5-1 ggrepel_0.9.3      bit64_4.0.5       
 [13] fansi_1.0.5        mvtnorm_1.2-3      xml2_1.3.3        
 [16] splines_4.2.1      cachem_1.0.7       knitr_1.42        
 [19] jsonlite_1.8.7     workflowr_1.7.0    nloptr_2.0.3      
 [22] mcmc_0.9-7         ashr_2.2-63        smashrgen_1.2.5   
 [25] uwot_0.1.14        compiler_4.2.1     httr_1.4.5        
 [28] RcppZiggurat_0.1.6 fastmap_1.1.1      lazyeval_0.2.2    
 [31] cli_3.6.1          later_1.3.0        htmltools_0.5.4   
 [34] quantreg_5.94      prettyunits_1.2.0  tools_4.2.1       
 [37] coda_0.19-4        gtable_0.3.4       glue_1.6.2        
 [40] dplyr_1.1.3        Rcpp_1.0.11        softImpute_1.4-1  
 [43] slam_0.1-50        jquerylib_0.1.4    vctrs_0.6.4       
 [46] wavethresh_4.7.2   xfun_0.37          stringr_1.5.0     
 [49] trust_0.1-8        lifecycle_1.0.3    irlba_2.3.5.1     
 [52] MASS_7.3-58.2      scales_1.2.1       vroom_1.6.1       
 [55] hms_1.1.2          promises_1.2.0.1   parallel_4.2.1    
 [58] SparseM_1.81       yaml_2.3.7         pbapply_1.7-0     
 [61] sass_0.4.5         stringi_1.7.12     SQUAREM_2021.1    
 [64] highr_0.10         deconvolveR_1.2-1  caTools_1.18.2    
 [67] truncnorm_1.0-9    horseshoe_0.2.0    rlang_1.1.1       
 [70] pkgconfig_2.0.3    matrixStats_1.0.0  bitops_1.0-7      
 [73] evaluate_0.22      lattice_0.20-45    invgamma_1.1      
 [76] purrr_1.0.2        labeling_0.4.3     htmlwidgets_1.6.1 
 [79] bit_4.0.5          Rfast_2.0.8        cowplot_1.1.1     
 [82] tidyselect_1.2.0   R6_2.5.1           generics_0.1.3    
 [85] pillar_1.9.0       whisker_0.4.1      withr_2.5.1       
 [88] survival_3.5-3     mixsqp_0.3-48      tibble_3.2.1      
 [91] crayon_1.5.2       utf8_1.2.3         plotly_4.10.2     
 [94] tzdb_0.3.0         rmarkdown_2.20     progress_1.2.2    
 [97] grid_4.2.1         data.table_1.14.8  git2r_0.31.0      
[100] digest_0.6.33      vebpm_0.4.9        tidyr_1.3.0       
[103] httpuv_1.6.9       MCMCpack_1.6-3     RcppParallel_5.1.7
[106] munsell_0.5.0      viridisLite_0.4.2  bslib_0.4.2       
[109] quadprog_1.5-8