I want to try running flashier (non-negative) on some text data and see what happens. It is also a chance to try out the flashier release to CRAN.

I tried running flashier on both the log1p transformed counts directly, and log1p transform of fitted values from a topic model. Both produce somewhat promising results. It is hard to beat the log1p transform for simplicity and speed.

datax$data = Matrix(datax$data,sparse = TRUE)

Data filtering

filter out some documents: use top 60% longest ones as in Ke and Wang 2022.

doc_to_use = order(rowSums(datax$data),decreasing = T)[1:round(nrow(datax$data)*0.6)]
mat = datax$data[doc_to_use,]
sla = sla[doc_to_use,]
samples = datax$samples
samples = lapply(samples, function(z){z[doc_to_use]})

Filter out words that appear in less than 5 documents. Note: if you don’t do this you can still get real factors that capture very rare words co-occuring. Eg two authors that are cited together. If you are interested in those factors, no need to filter…

word_to_use = which(colSums(mat>0)>4)
mat = mat[,word_to_use]
mat = Matrix(mat,sparse=TRUE)

I tried both the log1p transform on its own (no normalization for document size).

I also tried normalizing for document size, and different pseudocounts, (10, 1, 0.1, 0.01) where the first and last I expected to be too big/small (but in fact the results with 0.01 look quite reasonable in many ways). Note that to keep things sparse I use log(1+X/c) where c is the pseudo-count.

lmat = Matrix(log(mat+1),sparse=TRUE)

docsize = rowSums(mat)
s = docsize/mean(docsize)
lmat_s_10 = Matrix(log(0.1*mat/s+1),sparse=TRUE)
lmat_s_1 = Matrix(log(mat/s+1),sparse=TRUE)
lmat_s_01 = Matrix(log(10*mat/s+1),sparse=TRUE)
lmat_s_001 = Matrix(log(100*mat/s+1),sparse=TRUE)

In addition to the pseudocount, we also have to choose how to regularize the estimates of tau (column-wise precision). It turns out this can have quite a bit effect on results. If tau is not regularized then typically some tau get very big (very small variance) and, intuitively, one is going to “overfit” some words. In the following I implement a rule of thumb based on Jason’s work: I compute the standard deviation of the transformed data for a Poisson random variable of rate \(\mu=4/n\). The 4 comes from the fact that we filtered words that occured in less than 4 documents, so this is a lower bound on the average \(\mu\) for each word. (I ignore variation in document size in this calculation). I think this rule of thumb could be justified as a realistic lower bound on the variance you would expect under a Poisson distribution for the data. (There are reasons to believe that text data may be underdispersed relative to Poisson, but I will ignore this for now.)

mhat = 4/nrow(lmat)
xx = rpois(1e7,mhat) # random poisson
S10 = sd(log(0.1*xx+1))
S1 = sd(log(xx+1)) # sd of log(X+1)
S01 = sd(log(10*xx+1)) # sd if log(10X+1)
S001 = sd(log(100*xx+1)) # sd if log(10X+1)
[1] 0.004339581 0.031536221 0.109033434 0.209811829

Fit log1p transformed data

I fit each of the four different pseudocounts here. For comparison I also looked at the maximum likelihood estimates (Frobenius norm minimization, which assumes constant column variances).

fit.nn = flash(lmat,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=S1)
set.seed(1) = nmf(lmat,k = 100)

set.seed(1) = nmf(lmat_s_1, k=100)


Look at the keywords for each factor. We see that the flash fits capture more interesting keywords than the ml fits. Generally the flash keywords seem to make some sense for all levels of the pseudocount (although I had to drop the keyword threshold for large pseudocounts).

The ml fits capture a lot of “single-word” factors. It turns out that each factor is loaded on quite a lot of documents (not shown here). So what seems to be happening is that it chooses to fit single common words to explain lots of documents, rather than a small set of words to explain a small set of documents (which is perhaps what we want!)

# sets keywords to NA if number of document membership 
# in the factor does not exceeed docfilter
get_keywords = function(fit,thresh = 2,docfilter=0){
  if("flash" %in% class(fit)){
    LL <- fit$L_pm
    FF = fit$F_pm
  if("nmf" %in% class(fit)){ # deals with RcppML::nmf fit
    LL = fit@w
    FF = t(fit@d*fit@h) 

  Lnorm = t(t(LL)/apply(LL,2,max))
  Fnorm = t(t(FF)*apply(LL,2,max))
  khat = apply(Lnorm,1,which.max)
  Lmax = apply(Lnorm,1,max)
  khat[Lmax<0.1] = 0
  keyw.nn =list()

  for(k in 1:ncol(Fnorm)){
     if(sum(Lnorm[,k]>0.5)> docfilter){
      key = Fnorm[,k]>log(thresh)
      keyw.nn[[k]] = (colnames(mat)[key])[order(Fnorm[key,k],decreasing = T)]
     } else { 
       keyw.nn[[k]] = NA
 [1] "model"     "estim"     "data"      "method"    "propos"    "studi"    
 [7] "function"  "distribut" "sampl"     "simul"    

 [1] "fals"      "control"   "procedur"  "test"      "reject"    "hypothes" 
 [7] "rate"      "discoveri" "null"      "multipl"   "pvalu"     "fdr"      
[13] "kfwer"     "stepdown"  "number"    "fwer"      "familywis" "hochberg" 
[19] "error"     "depend"    "alpha"     "statist"  

[1] "cancer" "diseas" "studi" 

[1] "rightcensor"  "surviv"       "lengthbias"   "semiparametr" "failur"      
[6] "data"         "time"         "nonparametr"  "effici"      

[1] "simex"  "measur" "error" 

[1] "wilk"       "test"       "ratio"      "phenomenon" "demonstr"  
[6] "backfit"   

[1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
[6] "casecohort"

[1] "semiparametr" "estim"        "model"       

[1] "test"      "null"      "hypothesi"

[1] "select"  "lasso"   "spars"   "penalti" "penal"   "variabl" "oracl"  

[1] "equivari"  "depth"     "scatter"   "project"   "affin"     "multivari"
[7] "introduc"  "breakdown"

[1] "bandwidth" "kernel"    "local"     "select"   

[1] "markov"    "chain"     "mont"      "carlo"     "algorithm" "bayesian" 

[1] "nconsist"

[1] "varyingcoeffici"

[1] "jackknif" "mix"      "varianc"  "squar"    "area"     "uncondit"

[1] "singleindex"

[1] "choleski"   "matrix"     "covari"     "decomposit"

[1] "motion"

[1] "homoscedast"   "heteroscedast"

[1] "onestep"

[1] "spline" "smooth"

[1] "mle"        "likelihood" "maximum"   

[1] "survey" "popul"  "sampl" 

[1] "memori"

 [1] "retail"   "tradit"   "compani"  "deliveri" "frequenc" "onlin"   
 [7] "quantiti" "differ"   "tail"     "custom"   "consum"   "compon"  
[13] "week"     "time"     "market"   "daili"    "total"    "cost"    
[19] "decis"   

 [1] "instrument"    "birth"         "measur"        "biomark"      
 [5] "health"        "error"         "assess"        "epidemiolog"  
 [9] "likelihoodbas" "prevent"       "identifi"      "nutrit"       
[13] "hour"          "cohort"        "serniparametr" "valid"        
[17] "adjust"        "led"           "factor"        "mortal"       
[21] "deliveri"      "pathway"       "longterm"      "exposur"      
[25] "morbid"        "typic"         "odd"           "food"         
[29] "infant"       

[1] "disabl"  "assumpt" "health"  "report"  "debat"  

[1] "studi"      "errorpron"  "heart"      "covari"     "baselin"   
[6] "framingham" "hazard"    

[1] "nonnorm"

[1] "polynomi" "local"   

[1] "gee"     "equat"   "correl"  "binari"  "general" "work"   

[1] "trim"   "robust" "depth" 

[1] "secondord"

[1] "survivor"

[1] "equat" "estim"

[1] "wavelet"    "besov"      "adapt"      "minimax"    "deconvolut"
[6] "ball"       "function"   "rang"       "rate"      

[1] "volatil"   "highfrequ" "asset"     "financi"   "price"     "matrix"   
[7] "lowfrequ" 

 [1] "dirichlet" "process"   "mixtur"    "hierarch"  "prior"     "tie"      
 [7] "number"    "contain"   "experienc" "discret"   "heavili"   "priori"   

[1] "wild"      "bootstrap" "seri"      "depend"    "irregular" "resampl"  

[1] "toxic"    "dose"     "trial"    "dosefind" "phase"    "probabl"  "clinic"  
[8] "target"   "design"  

[1] "drift"   "diffus"  "process"

[1] "slice"   "invers"  "dimens"  "method"  "regress"

[1] "coverag" "confid"  "interv" 

 [1] "wishart"  "famili"   "graph"    "cone"     "paramet"  "conjug"  
 [7] "prior"    "shape"    "graphic"  "matric"   "covari"   "gaussian"
[13] "decompos" "invers"   "homogen"  "dimens"   "ann"      "type"    
[19] "definit"  "posit"   

[1] "mutual" "empir"  "genet"  "pair"  

[1] "chi"       "test"      "distribut"

[1] "garch"   "process" "seri"   

[1] "varianc" "estim"  

[1] "mestim" "robust"

[1] "densiti"   "anisotrop" "unbound"   "novelti"  

[1] "reweight"

[1] "maximum"    "likelihood"

[1] "function"   "eigenfunct" "random"     "analysi"    "compon"    
[6] "data"       "princip"   

[1] "forecast"    "predict"     "wind"        "weather"     "probabilist"
[6] "calibr"      "northwest"   "speed"      

[1] "tabl"      "conting"   "loglinear"

[1] "aic"       "select"    "criterion" "bic"       "akaik"    

[1] "pollut"   "air"      "mortal"   "nation"   "confound" "trend"    "unmeasur"
[8] "sensit"   "coeffici"

 [1] "motif"      "cluster"    "gene"       "transcript" "bind"      
 [6] "factor"     "regul"      "sequenc"    "protein"    "discoveri" 
[11] "conserv"    "dna"        "nucleotid"  "call"       "dirichlet" 
[16] "process"    "short"      "pattern"    "vari"       "databas"   

 [1] "claim"  "insur"  "vehicl" "age"    "damag"  "type"   "year"   "turn"  
 [9] "detail" "experi" "sever"  "record"

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

 [1] "treatment"  "random"     "depress"    "care"       "trial"     
 [6] "patient"    "outcom"     "adher"      "subject"    "noncompli" 
[11] "effect"     "intervent"  "receiv"     "complianc"  "assumpt"   
[16] "primari"    "improv"     "causal"     "latent"     "elder"     
[21] "health"     "meet"       "longitudin"

 [1] "tau"      "yield"    "factor"   "month"    "appear"   "price"   
 [7] "curv"     "output"   "econom"   "consider" "current"  "fit"     

[1] "vaccin"    "infect"    "individu"  "outcom"    "causal"    "transmiss"

[1] "week"     "count"    "outbreak" "presenc"  "detect"   "fit"      "diseas"  
[8] "isol"     "resist"  

[1] "quantil" "regress"

[1] "spacetim" "site"     "time"     "tempor"   "spatial" 

[1] "nonneg"

[1] "servic"  "care"    "provid"  "patient" "health" 

[1] "sobolev" "densiti" "minimax"

[1] "distort"   "respons"   "predictor"

 [1] "elicit"       "inform"       "question"     "prior"        "psycholog"   
 [6] "respond"      "peopl"        "statistician" "result"       "person"      
[11] "success"      "uncertain"    "issu"         "particip"     "reduc"       
[16] "lack"         "histor"       "repres"       "sens"         "task"        
[21] "answer"      

[1] "loci"         "genet"        "allel"        "popul"        "genom"       
[6] "relationship" "diseas"       "map"          "genotyp"     

[1] "event"  "termin" "recurr" "censor"

[1] "unstabl"   "problemat" "exponenti" "famili"    "discret"   "depend"   

[1] "load"   "factor" "time"  

[1] "cox"     "hazard"  "proport"
print(get_keywords(fit.nn.s.10,1.2)) #there are no keywords at the default threshold
[1] "estim" "model"

[1] "lengthbias" "surviv"     "preval"     "cohort"    

[1] "hazard"  "proport"

[1] "simex"

[1] "memori"  "paramet" "subspac"

[1] "semiparametr" "estim"        "model"       

[1] "meansquar" "predict"   "error"     "small"     "area"     

[1] "lasso"    "select"   "variabl"  "regress"  "coeffici" "adapt"   

[1] "bandwidth" "kernel"   

[1] "jackknif" "squar"    "mix"      "lead"     "area"     "error"   

[1] "matrix"     "choleski"   "covari"     "decomposit" "factor"    

[1] "function"    "singleindex" "compon"      "link"       

[1] "onestep"

[1] "mse"       "predictor" "linear"    "empir"    

[1] "polynomi" "local"    "estim"    "regress" 

[1] "procedur"  "fals"      "control"   "test"      "discoveri" "rate"     
[7] "reject"    "hypothes"  "fdr"      

[1] "polymorph" "genotyp"   "haplotyp"  "snp"      

[1] "gee"     "equat"   "correl"  "binari"  "general"

[1] "equivari"  "depth"     "breakdown" "concept"   "introduc" 

[1] "nonrespons" "imput"      "survey"     "respons"   

[1] "mle"        "likelihood"

[1] "robin"      "miss"       "zhao"       "casecohort" "rotnitzki" 

[1] "vote"   "elect"  "candid"

[1] "sampl"     "survey"    "designbas" "infer"     "weight"    "modelbas" 
[7] "popul"    

[1] "forecast" "wind"     "predict"  "weather" 

[1] "test"      "logrank"   "weight"    "treatment" "formula"   "patient"  
[7] "supremum"  "standard"  "twostag"  

[1] "track"  "replac" "usag"  

[1] "precipit" "spatial" 

[1] "secondord"


[1] "twostep"  "submodel"

[1] "design"  "paramet" "effici" 

[1] "timedepend" "covari"     "treatment" 

[1] "survivor"

[1] "miss" "data"

[1] "test"      "null"      "hypothesi"

[1] "densiti" "sobolev"

[1] "trim"   "robust" "depth" 

[1] "substitut" "euclidean"

[1] "empir"      "likelihood" "bartlett"   "adjust"    

[1] "equat" "estim"

[1] "volatil"   "highfrequ" "asset"     "price"    

[1] "statist" "assoc"   "amer"   

[1] "nonneg"

[1] "panel"

[1] "norm"   "matrix"

[1] "popul"      "superpopul"

[1] "homoscedast"

[1] "misspecif"

[1] "file"   "linkag"

[1] "varianc" "estim"  

[1] "adapt"   "besov"   "wavelet" "minimax" "risk"   

[1] "kaplanmei" "quantil"   "surviv"    "censor"   

[1] "axe"    "rotat"  "matric"

[1] "mutual" "empir"  "genet" 

[1] "innov"   "process" "residu" 


[1] "monoton"  "function"
[1] "model"  "estim"  "method" "data"  

 [1] "fals"      "procedur"  "control"   "test"      "discoveri" "rate"     
 [7] "reject"    "hypothes"  "fdr"       "multipl"   "pvalu"     "null"     
[13] "number"    "kfwer"    

[1] "test"      "null"      "hypothesi" "distribut"

 [1] "treatment" "trial"     "random"    "assign"    "patient"   "effect"   
 [7] "outcom"    "clinic"    "causal"    "placebo"   "assumpt"  

[1] "surviv" "time"   "hazard" "censor" "failur" "studi" 

[1] "simex"              "measur"             "simulationextrapol"
[4] "error"             

[1] "wilk"

[1] "lasso"    "select"   "variabl"  "regress"  "coeffici"

[1] "rankbas"  "effici"   "asymptot" "rank"    

[1] "nconsist"

[1] "assoc"   "amer"    "statist" "ann"    

[1] "mle"        "likelihood" "maximum"   

[1] "varyingcoeffici"

[1] "semiparametr" "estim"        "model"        "parametr"    

 [1] "adapt"      "wavelet"    "besov"      "minimax"    "ball"      
 [6] "rang"       "threshold"  "risk"       "deconvolut" "nois"      

[1] "memori"

[1] "bandwidth" "kernel"    "local"     "select"   

[1] "forecast"    "predict"     "wind"        "weather"     "spatial"    
[6] "calibr"      "speed"       "meteorolog"  "probabilist"

[1] "choleski"   "matrix"     "covari"     "decomposit" "factor"    
[6] "interpret" 

[1] "mse"       "predictor" "linear"    "error"     "squar"     "empir"    

[1] "depth"   "project"

[1] "singleindex" "function"    "link"        "compon"      "unknown"    

[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[1] "penal"      "nonconcav"  "likelihood" "select"     "variabl"   
[6] "oracl"      "penalti"    "regular"   

[1] "jackknif" "mix"      "squar"    "area"     "varianc" 

[1] "homoscedast"   "heteroscedast"

[1] "spline" "smooth"

[1] "survey" "popul"  "sampl" 

[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[1] "onestep"

[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[1] "nonnorm"

[1] "polynomi" "local"    "regress" 

[1] "gee"     "equat"   "correl"  "general" "binari"  "work"   

[1] "theta"   "paramet"

[1] "robin"     "miss"      "zhao"      "rotnitzki" "effici"   

[1] "mestim" "robust"

[1] "finitesampl"

[1] "sobolev" "densiti" "minimax" "rate"   

[1] "elect" "vote"  "poll" 

[1] "errorpron" "error"    

[1] "panel" "count"

[1] "stock"

[1] "garch"   "process" "volatil"

[1] "secondord"

[1] "equat" "estim"

[1] "slice"   "invers"  "regress" "dimens"  "method" 

[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[1] "survivor"

[1] "slope"

[1] "chi"  "test"

[1] "varianc"

[1] "function"   "eigenfunct" "analysi"    "random"     "princip"   
[6] "compon"     "data"      

[1] "tabl"    "conting"

[1] "criterion" "akaik"     "select"    "model"    

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

[1] "neighborhood"

[1] "maximum"    "welldefin"  "posteriori"
 [1] "model"       "estim"       "data"        "method"      "propos"     
 [6] "studi"       "simul"       "distribut"   "function"    "sampl"      
[11] "paramet"     "approach"    "statist"     "base"        "asymptot"   
[16] "problem"     "general"     "regress"     "analysi"     "test"       
[21] "develop"     "procedur"    "perform"     "illustr"     "condit"     
[26] "set"         "applic"      "observ"      "variabl"     "likelihood" 
[31] "consist"     "time"        "appli"       "covari"      "properti"   
[36] "random"      "comput"      "articl"      "linear"      "case"       
[41] "process"     "infer"       "error"       "select"      "number"     
[46] "effici"      "rate"        "nonparametr" "deriv"       "measur"     
[51] "effect"      "algorithm"   "class"       "paper"       "compar"     
[56] "provid"      "includ"      "depend"     

 [1] "fals"       "control"    "procedur"   "test"       "rate"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "fdr"        "hochberg"   "number"     "stepdown"  
[16] "kfwer"      "familywis"  "error"      "depend"     "proport"   
[21] "benjamini"  "fwer"       "statist"    "fdp"        "soc"       
[26] "divid"      "power"      "roy"        "stepup"     "alpha"     
[31] "deriv"      "abil"       "ser"        "individu"   "detect"    
[36] "gamma"      "total"      "hypothesi"  "conserv"    "toler"     
[41] "attent"     "defin"      "singlestep" "construct"  "fix"       
[46] "simultan"   "probabl"    "independ"   "ann"        "usual"     
[51] "sime"       "improv"     "increas"   

 [1] "treatment"   "random"      "trial"       "patient"     "effect"     
 [6] "assign"      "noncompli"   "assumpt"     "outcom"      "complianc"  
[11] "causal"      "adher"       "depress"     "placebo"     "receiv"     
[16] "care"        "subject"     "clinic"      "intervent"   "drug"       
[21] "arm"         "dose"        "improv"      "primari"     "treat"      
[26] "princip"     "analys"      "latent"      "elder"       "control"    
[31] "sever"       "contrast"    "instrument"  "stratif"     "activ"      
[36] "particip"    "framework"   "prevent"     "potenti"     "physician"  
[41] "benefit"     "infer"       "imperfect"   "children"    "encourag"   
[46] "estimand"    "doserespons"

 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "event"        "semiparametr" "proport"      "data"        
[11] "cancer"       "covari"       "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "rightcensor" 
[21] "consist"      "nonparametr"  "trial"       

[1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
[7] "power"     "asymptot"  "hypothes" 

 [1] "simex"              "simulationextrapol" "measur"            
 [4] "error"              "undersmooth"        "asymptot"          
 [7] "longer"             "accuraci"           "finitesampl"       
[10] "principl"           "bias"               "presenc"           
[13] "selector"           "wang"               "rootn"             

 [1] "wilk"       "ratio"      "phenomenon" "correct"    "relax"     
 [6] "conduct"    "newli"      "unspecifi"  "freedom"    "follow"    
[11] "backfit"    "nuisanc"    "theorem"    "degre"      "chisquar"  
[16] "likelihood" "empir"      "ask"        "hold"      

 [1] "mle"         "maximum"     "likelihood"  "main"        "prove"      
 [6] "asymptot"    "converg"     "limit"       "mles"        "status"     
[11] "rate"        "current"     "brownian"    "behavior"    "motion"     
[16] "estim"       "proof"       "uniqu"       "nonparametr"

 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "algorithm"
 [7] "posterior" "infer"     "prior"     "model"     "mcmc"     

 [1] "lasso"     "select"    "variabl"   "regress"   "coeffici"  "spars"    
 [7] "penalti"   "adapt"     "linear"    "oracl"     "penal"     "problem"  
[13] "sparsiti"  "algorithm" "regular"  

[1] "varyingcoeffici" "nonparametr"     "coeffici"        "linear"         
[5] "longitudin"      "conduct"         "propos"          "vari"           
[9] "regress"        

 [1] "rankbas"      "effici"       "asymptot"     "rank"         "ellipt"      
 [6] "cam"          "class"        "densiti"      "uniform"      "normal"      
[11] "version"      "sign"         "multivari"    "matric"       "symmetri"    
[16] "valid"        "finit"        "scatter"      "ann"          "contour"     
[21] "tradit"       "assumpt"      "sens"         "irrespect"    "rootn"       
[26] "semiparametr" "center"      

 [1] "nconsist" "root"     "reduct"   "dimens"   "exist"    "direct"  
 [7] "central"  "slice"    "exhaust"  "contour"  "ellipt"   "advantag"
[13] "mild"     "strong"   "regress"  "varianc"  "suffici"  "invers"  
[19] "averag"  

 [1] "semiparametr" "estim"        "nonparametr"  "parametr"     "paramet"     
 [6] "model"        "effici"       "asymptot"     "likelihood"   "regress"     
[11] "function"    

 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  

 [1] "nonconcav"     "penal"         "select"        "oracl"        
 [5] "penalti"       "variabl"       "likelihood"    "regular"      
 [9] "fan"           "challeng"      "nondifferenti" "maxim"        
[13] "sandwich"      "onestep"       "establish"     "concav"       
[17] "broad"         "enjoy"         "employ"        "selector"     
[21] "encourag"      "cost"         

 [1] "penalis"       "newtonraphson" "framingham"    "penalti"      
 [5] "likelihood"    "heart"         "failur"        "carri"        
 [9] "algorithm"     "proper"        "conduct"       "advanc"       
[13] "grow"          "dropout"       "familiar"      "prospect"     

[1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
[5] "famili"        "error"        

[1] "nonnorm"   "normal"    "mix"       "linear"    "exponenti"

[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "taper"         "frequenc"      "long"          "fraction"     
 [9] "averag"        "depend"        "paramet"       "periodogram"  
[13] "stationari"    "move"          "slowli"        "whittl"       
[17] "eigenvector"   "local"         "nonstationari" "distinct"     
[21] "angl"         

 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

[1] "polynomi"    "local"       "regress"     "smooth"      "nonparametr"
[6] "asymptot"   

 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "condit"       
 [9] "moment"        "autoregress"   "financi"       "local"        
[13] "standard"      "innov"         "sequenc"       "satisfi"      
[17] "move"          "iid"           "time"          "averag"       
[21] "root"          "mont"          "carlo"        

[1] "quantil" "regress"

 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

 [1] "singleindex" "unknown"     "link"        "compon"      "equat"      
 [6] "function"    "varianc"     "nonparametr" "beta"        "femal"      
[11] "structur"    "smaller"     "compos"      "vectorvalu"  "eigenfunct" 
[16] "composit"    "econometr"  

[1] "finitesampl" "propos"     

 [1] "wavelet"    "adapt"      "besov"      "minimax"    "ball"      
 [6] "threshold"  "rang"       "nois"       "wide"       "unknown"   
[11] "rate"       "risk"       "bound"      "deconvolut" "smooth"    
[16] "problem"    "function"   "signal"     "white"      "converg"   
[21] "gaussian"   "transform"  "recov"      "densiti"    "shape"     
[26] "view"       "noisi"      "discret"    "nearoptim"  "spars"     
[31] "blur"       "fourier"    "decay"      "upper"      "convolut"  
[36] "invers"    

 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "weight"     "casecohort" "design"     "invers"     "twophas"   
[11] "cohort"     "random"     "causal"     "outcom"     "biometrika"
[16] "prentic"    "calcul"     "purpos"     "confound"   "lemma"     
[21] "mar"        "exemplifi"  "suit"       "amer"       "assoc"     
[26] "proceed"    "summar"     "cox"        "ser"        "soc"       
[31] "roy"        "iid"        "appear"     "unbias"    

[1] "maximum"    "likelihood" "estim"     

[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[1] "chi"       "test"      "distribut" "space"     "ratio"     "restrict" 
[7] "statist"  

[1] "coeffici" "regress" 

 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "low"           "optim"         "nonasymptot"   "highdimension"
[13] "convex"        "spars"         "minimax"       "noisi"        
[17] "element"       "minim"         "error"         "singular"     
[21] "setup"         "vector"        "theori"        "precis"       
[25] "autoregress"   "predict"      

 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[1] "unequ"     "designbas" "survey"    "weight"   

[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[1] "variancecovari" "matrix"         "analyz"        

[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[1] "bspline" "kernel"  "penal"  

[1] "varianc"  "asymptot"

 [1] "eigenfunct" "function"   "princip"    "compon"     "random"    
 [6] "analysi"    "data"       "smooth"     "eigenvalu"  "deriv"     
[11] "curv"       "spars"      "trajectori" "space"      "score"     

 [1] "forecast"    "predict"     "weather"     "spatial"     "wind"       
 [6] "probabilist" "northwest"   "calibr"      "pacif"       "meteorolog" 
[11] "temperatur"  "speed"       "hour"        "energi"      "atmospher"  
[16] "averag"      "ensembl"     "geostatist"  "futur"       "center"     
[21] "north"       "precipit"    "accur"       "tempor"      "daili"      
[26] "event"       "resourc"     "site"        "american"    "state"      
[31] "sharp"       "spacetim"    "qualiti"     "climat"      "ozon"       
[36] "concentr"    "generat"     "regim"       "transport"   "season"     
[41] "shortterm"   "determinist" "input"      

 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[1] "bar"    "vertic" "cap"    "lambda"

 [1] "integ"      "algebra"    "coher"      "ail"        "ident"     
 [6] "countabl"   "multist"    "system"     "appl"       "finit"     
[11] "classic"    "object"     "ideal"      "grid"       "util"      
[16] "math"       "fewer"      "state"      "call"       "binari"    
[21] "inequ"      "pure"       "geometri"   "comprehens" "alpha"     
[26] "posit"      "socal"      "repres"     "idea"       "complex"   
[31] "probabl"    "yield"      "failur"     "relat"      "type"      

 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[1] "iid"   "prove"

 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[1] "goodnessoffit" "test"          "includ"        "residu"       

 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[1] "size"   "sampl"  "number"

 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[1] "tilt"       "exponenti"  "constraint" "employ"    

 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

 [1] "elect"      "vote"       "poll"       "evid"       "candid"    
 [6] "presidenti" "count"      "station"    "forecast"   "proport"   
[11] "polit"      "prefer"     "counti"     "record"     "lower"     

 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "admit"         "stein"         "identif"      
 [9] "moder"         "satisfi"       "nontrivi"      "impli"        
[13] "detail"        "deviat"        "loglikelihood" "benchmark"    
[17] "moment"        "nest"          "yield"         "exponenti"    
[21] "decay"         "deal"          "difficulti"    "mild"         
[25] "posit"         "relat"         "version"       "prove"        

 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 

[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  
  [1] "model"        "estim"        "data"         "method"       "propos"      
  [6] "studi"        "simul"        "distribut"    "function"     "sampl"       
 [11] "base"         "paramet"      "approach"     "statist"      "asymptot"    
 [16] "problem"      "general"      "regress"      "analysi"      "develop"     
 [21] "illustr"      "perform"      "procedur"     "test"         "applic"      
 [26] "condit"       "set"          "observ"       "variabl"      "appli"       
 [31] "consist"      "properti"     "likelihood"   "articl"       "time"        
 [36] "comput"       "covari"       "random"       "case"         "linear"      
 [41] "process"      "infer"        "number"       "error"        "effici"      
 [46] "select"       "rate"         "nonparametr"  "deriv"        "effect"      
 [51] "compar"       "measur"       "includ"       "provid"       "paper"       
 [56] "algorithm"    "class"        "depend"       "normal"       "demonstr"    
 [61] "bayesian"     "larg"         "assumpt"      "probabl"      "approxim"    
 [66] "addit"        "size"         "structur"     "optim"        "varianc"     
 [71] "exist"        "independ"     "construct"    "introduc"     "smooth"      
 [76] "real"         "theoret"      "compon"       "point"        "methodolog"  
 [81] "investig"     "requir"       "predict"      "standard"     "respons"     
 [86] "establish"    "common"       "empir"        "practic"      "converg"     
 [91] "work"         "maximum"      "term"         "discuss"      "combin"      
 [96] "finit"        "framework"    "design"       "parametr"     "multipl"     
[101] "assum"        "form"         "theori"       "simpl"        "carlo"       
[106] "limit"        "mont"         "lead"         "altern"       "numer"       
[111] "improv"       "local"        "involv"       "high"         "identifi"    
[116] "space"        "techniqu"     "prior"        "level"        "multivari"   
[121] "correl"       "fit"          "semiparametr" "increas"      "unknown"     
[126] "bias"         "small"        "exampl"       "order"        "direct"      
[131] "extend"       "defin"        "matrix"       "coeffici"     "dataset"     
[136] "implement"    "weight"       "control"      "densiti"      "markov"      
[141] "extens"       "adapt"        "evalu"        "relat"        "power"       
[146] "consid"       "analyz"       "robust"       "type"         "result"      
[151] "valu"         "assess"       "vector"       "seri"         "factor"      
[156] "popul"       

 [1] "fals"       "control"    "procedur"   "rate"       "test"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "familywis"  "hochberg"   "fdr"        "stepdown"  
[16] "error"      "kfwer"      "number"     "proport"    "benjamini" 
[21] "fwer"       "depend"     "statist"    "soc"        "divid"     
[26] "fdp"        "roy"        "abil"       "ser"        "alpha"     
[31] "deriv"      "individu"   "total"      "stepup"     "detect"    
[36] "toler"      "attent"     "power"      "gamma"      "defin"     
[41] "singlestep" "conserv"    "probabl"    "construct"  "hypothesi" 
[46] "fix"        "ann"        "simultan"   "restrict"   "usual"     
[51] "increas"    "structur"   "contrast"   "prove"      "goal"      
[56] "implicit"   "replac"     "resampl"    "independ"   "sime"      
[61] "holm"       "improv"     "sens"       "configur"   "stat"      
[66] "stringent"  "intersect"  "bonferroni" "der"        "appl"      
[71] "van"        "deal"       "order"     

 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "semiparametr" "proport"      "event"        "cancer"      
[11] "covari"       "data"         "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "consist"     
[21] "rightcensor"  "trial"        "subject"      "analysi"      "nonparametr" 
[26] "simul"        "equat"        "cohort"       "diseas"       "incid"       
[31] "patient"      "clinic"       "cure"         "recurr"       "compet"      
[36] "associ"       "joint"        "followup"     "frailti"      "timevari"    
[41] "bivari"       "margin"       "lengthbias"   "prostat"      "assumpt"     
[46] "coeffici"     "medic"        "breast"       "extens"       "propos"      

 [1] "simex"              "simulationextrapol" "undersmooth"       
 [4] "error"              "measur"             "asymptot"          
 [7] "accuraci"           "longer"             "bias"              
[10] "principl"           "finitesampl"        "selector"          
[13] "bandwidth"          "wang"               "epidemiolog"       
[16] "cook"               "rootn"              "difficulti"        
[19] "presenc"            "nutrit"             "decreas"           
[22] "compar"             "coverag"            "appropri"          
[25] "simul"              "tractabl"           "need"              
[28] "recommend"          "polynomi"           "engin"             
[31] "chisquar"           "scientist"          "errorpron"         

 [1] "wilk"           "ratio"          "phenomenon"     "correct"       
 [5] "relax"          "power"          "conduct"        "null"          
 [9] "newli"          "freedom"        "unspecifi"      "follow"        
[13] "hypothesi"      "degre"          "ask"            "nuisanc"       
[17] "chisquar"       "test"           "theorem"        "hold"          
[21] "backfit"        "attempt"        "admit"          "constant"      
[25] "demonstr"       "rescal"         "biascorrect"    "answer"        
[29] "zhang"          "scientif"       "fan"            "likelihood"    
[33] "withinsubject"  "pitman"         "asymptot"       "side"          
[37] "share"          "contemporari"   "popular"        "variancecovari"
[41] "singleindex"    "save"           "tau"            "kendal"        
[45] "coverag"       

 [1] "mle"         "maximum"     "likelihood"  "main"        "asymptot"   
 [6] "mles"        "prove"       "converg"     "limit"       "status"     
[11] "estim"       "brownian"    "current"     "motion"      "behavior"   
[16] "rate"        "proof"       "uniqu"       "siev"        "nonparametr"
[21] "ann"         "gap"         "drift"       "naiv"        "global"     
[26] "monoton"     "simpler"     "parametr"    "result"      "discuss"    
[31] "ergod"      

 [1] "varyingcoeffici" "nonparametr"     "linear"          "coeffici"       
 [5] "longitudin"      "conduct"         "vari"            "regress"        
 [9] "partial"         "propos"          "simul"           "backfit"        
[13] "thought"         "illustr"         "enjoy"           "fashion"        
[17] "twostep"         "contamin"        "pose"           

 [1] "rankbas"         "asymptot"        "effici"          "rank"           
 [5] "ellipt"          "cam"             "class"           "uniform"        
 [9] "test"            "densiti"         "version"         "multivari"      
[13] "normal"          "sign"            "valid"           "scatter"        
[17] "symmetri"        "matrix"          "matric"          "assumpt"        
[21] "finit"           "sens"            "ann"             "contour"        
[25] "irrespect"       "tradit"          "rootn"           "moment"         
[29] "actual"          "center"          "strict"          "equivari"       
[33] "gaussian"        "onestep"         "invari"          "finitesampl"    
[37] "concept"         "local"           "serial"          "bernoulli"      
[41] "shape"           "unspecifi"       "classic"         "acceler"        
[45] "respect"         "semiparametr"    "depth"           "null"           
[49] "univari"         "median"          "prespecifi"      "spheric"        
[53] "biometrika"      "distributionfre" "excel"          

 [1] "nconsist"   "root"       "reduct"     "exist"      "central"   
 [6] "direct"     "dimens"     "varianc"    "slice"      "exhaust"   
[11] "contour"    "mild"       "ellipt"     "strong"     "advantag"  
[16] "invers"     "averag"     "asymptot"   "suffici"    "predictor" 
[21] "regress"    "identif"    "subspac"    "guarante"   "space"     
[26] "attack"     "accuraci"   "span"       "plugin"     "synthes"   
[31] "digit"      "squar"      "complement" "normal"     "eas"       
[36] "variat"     "landmark"   "realdata"  

 [1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
 [7] "hypothes"  "power"     "asymptot"  "procedur"  "ratio"     "reject"   
[13] "control"  

 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "posterior"
 [7] "algorithm" "infer"     "prior"     "mcmc"      "model"     "hierarch" 
[13] "sampler"   "mixtur"    "space"    

 [1] "lasso"         "select"        "variabl"       "regress"      
 [5] "coeffici"      "spars"         "penalti"       "adapt"        
 [9] "linear"        "oracl"         "penal"         "sparsiti"     
[13] "problem"       "algorithm"     "regular"       "matrix"       
[17] "nonzero"       "path"          "shrinkag"      "vector"       
[21] "larger"        "absolut"       "high"          "highdimension"
[25] "true"          "method"        "group"         "dimension"    
[29] "nois"          "connect"      

[1] "bar"     "vertic"  "cap"     "lambda"  "beta"    "theta"   "alpha"  
[8] "element"

 [1] "singleindex"  "unknown"      "nonparametr"  "link"         "compon"      
 [6] "equat"        "structur"     "varianc"      "beta"         "smaller"     
[11] "function"     "semiparametr" "econometr"    "achiev"       "femal"       
[16] "compos"       "vectorvalu"   "linear"       "eigenfunct"   "rateoptim"   
[21] "composit"     "isol"         "ball"         "singl"       

 [1] "genet"       "trait"       "loci"        "quantit"     "diseas"     
 [6] "linkag"      "map"         "gene"        "phenotyp"    "pedigre"    
[11] "allel"       "marker"      "popul"       "associ"      "genotyp"    
[16] "locus"       "chromosom"   "frequenc"    "polymorph"   "genom"      
[21] "multipl"     "complex"     "involv"      "domin"       "interact"   
[26] "casecontrol" "haplotyp"    "treat"       "individu"    "nucleotid"  
[31] "unifi"       "singl"       "simultan"    "snp"         "inherit"    
[36] "geneenviron" "distinguish" "suscept"     "dichotom"    "score"      
[41] "mutat"       "aim"         "genomewid"   "member"      "dna"        
[46] "ascertain"   "parent"      "descent"     "crucial"     "arbitrari"  
[51] "retrospect"  "tau"         "softwar"    

 [1] "dichotom"        "outcom"          "exposur"         "genet"          
 [5] "inherit"         "confound"        "interact"        "causal"         
 [9] "trial"           "factor"          "binari"          "presenc"        
[13] "categor"         "assess"          "alcohol"         "continu"        
[17] "disord"          "misspecif"       "ordin"           "clinic"         
[21] "postul"          "trait"           "topic"           "environment"    
[25] "subgroup"        "potenti"         "geneenviron"     "alter"          
[29] "adequ"           "examin"          "adjust"          "intermedi"      
[33] "cancer"          "robin"           "stage"           "logist"         
[37] "arm"             "firststag"       "generic"         "latent"         
[41] "build"           "variabl"         "conduct"         "affect"         
[45] "accommod"        "prone"           "submodel"        "transmiss"      
[49] "mental"          "mediat"          "unspecifi"       "quantit"        
[53] "expos"           "major"           "multipli"        "sever"          
[57] "believ"          "gene"            "zhang"           "distributionfre"
[61] "routin"          "today"          

 [1] "treatment"     "random"        "trial"         "noncompli"    
 [5] "patient"       "assumpt"       "effect"        "adher"        
 [9] "complianc"     "assign"        "depress"       "outcom"       
[13] "causal"        "receiv"        "care"          "placebo"      
[17] "subject"       "intervent"     "clinic"        "improv"       
[21] "primari"       "drug"          "arm"           "treat"        
[25] "dose"          "elder"         "latent"        "princip"      
[29] "analys"        "contrast"      "sever"         "instrument"   
[33] "control"       "particip"      "stratif"       "benefit"      
[37] "physician"     "imperfect"     "encourag"      "prevent"      
[41] "fisher"        "strata"        "prescrib"      "children"     
[45] "activ"         "reason"        "strict"        "rubin"        
[49] "efron"         "behavior"      "educ"          "estimand"     
[53] "plausibl"      "doserespons"   "meet"          "suffer"       
[57] "protocol"      "framework"     "collabor"      "debat"        
[61] "doubleblind"   "potenti"       "blind"         "status"       
[65] "opposit"       "guidelin"      "logic"         "acknowledg"   
[69] "nonrandom"     "import"        "substanti"     "infer"        
[73] "prospect"      "summar"        "heart"         "childhood"    
[77] "subjectspecif" "access"       

 [1] "nonconcav"     "penal"         "select"        "penalti"      
 [5] "oracl"         "variabl"       "regular"       "nondifferenti"
 [9] "fan"           "likelihood"    "challeng"      "sandwich"     
[13] "establish"     "maxim"         "broad"         "find"         
[17] "concav"        "onestep"       "employ"        "encourag"     
[21] "enjoy"         "finit"         "cost"          "distinguish"  
[25] "dramat"        "selector"      "appropri"      "render"       
[29] "conduct"       "heavili"       "possess"       "newli"        
[33] "converg"       "paramet"       "function"      "discontinu"   
[37] "aic"           "algorithm"     "bic"           "encompass"    
[41] "guarante"      "object"        "metropoli"    

 [1] "semiparametr" "estim"        "parametr"     "nonparametr"  "paramet"     
 [6] "asymptot"     "model"        "effici"       "likelihood"   "regress"     
[11] "function"     "normal"       "simul"        "compon"       "achiev"      

 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  
[11] "choic"      "choos"      "squar"      "bootstrap"  "datadriven"
[16] "version"    "asymptot"   "global"     "chosen"    

 [1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
 [6] "viral"        "transmiss"    "vaccin"       "subject"      "genet"       
[11] "drug"         "develop"      "efficaci"     "mutat"        "outcom"      
[16] "causal"       "cell"         "syndrom"      "medic"        "pathway"     
[21] "resist"       "evolutionari" "therapi"      "pressur"     

 [1] "dropout"       "stratum"       "prevent"       "reduc"        
 [5] "oil"           "trial"         "adjust"        "longitudin"   
 [9] "cancer"        "prostat"       "mechan"        "men"          
[13] "find"          "stratifi"      "arm"           "nuisanc"      
[17] "treatment"     "assign"        "grade"         "doubleblind"  
[21] "avoid"         "colleagu"      "randomeffect"  "sever"        
[25] "verif"         "agent"         "conjectur"     "annual"       
[29] "nonignor"      "placebo"       "volum"         "elect"        
[33] "caus"          "daili"         "visit"         "preval"       
[37] "absolut"       "lie"           "indic"         "sensit"       
[41] "frequent"      "particip"      "year"          "reduct"       
[45] "causal"        "report"        "newtonraphson" "adopt"        
[49] "question"      "women"         "elder"         "surrog"       
[53] "inform"        "elicit"        "prospect"      "collabor"     
[57] "drawn"         "ignor"         "differ"        "link"         
[61] "retain"        "tilt"          "random"        "constraint"   
[65] "status"        "impli"         "doubli"        "expert"       
[69] "nonidentifi"   "intermitt"     "satur"         "sex"          
[73] "characterist"  "invers"       

  [1] "polici"       "statistician" "maker"        "decis"        "scienc"      
  [6] "role"         "technolog"    "today"        "chang"        "live"        
 [11] "bring"        "social"       "communic"     "integr"       "individu"    
 [16] "futur"        "knowledg"     "disciplin"    "nation"       "public"      
 [21] "scientif"     "health"       "activ"        "human"        "impact"      
 [26] "organ"        "inform"       "protect"      "promot"       "qualiti"     
 [31] "understand"   "program"      "way"          "student"      "mathemat"    
 [36] "increas"      "face"         "foundat"      "play"         "essenti"     
 [41] "uncertainti"  "effort"       "engin"        "expect"       "advanc"      
 [46] "confidenti"   "children"     "relev"        "make"         "industri"    
 [51] "govern"       "countri"      "encourag"     "polit"        "place"       
 [56] "modern"       "intern"       "scientist"    "closer"       "benefit"     
 [61] "reflect"      "explor"       "stronger"     "purpos"       "univers"     
 [66] "spread"       "environment"  "network"      "grow"         "forc"        
 [71] "access"       "devic"        "ingredi"      "excel"        "comprehens"  
 [76] "pollut"       "attract"      "broader"      "elementari"   "evolv"       
 [81] "train"        "pressur"      "air"          "option"       "imposs"      
 [86] "secondari"    "map"          "edg"          "success"      "progress"    
 [91] "critic"       "global"       "action"       "year"         "agenc"       
 [96] "communiti"    "american"     "quantit"      "genom"        "system"      
[101] "fundament"    "discoveri"    "evid"         "guarante"     "mortal"      
[106] "address"      "citi"         "requir"       "technic"      "serv"        
[111] "path"         "statist"      "separ"        "climat"       "contribut"   
[116] "opportun"     "adequaci"     "disabl"       "affect"       "driven"      
[121] "grade"        "psycholog"    "diagnost"     "morbid"       "view"        
[126] "delay"        "primari"      "state"       

 [1] "penalis"       "framingham"    "newtonraphson" "heart"        
 [5] "penalti"       "carri"         "conduct"       "failur"       
 [9] "proper"        "advanc"        "costeffect"    "grow"         
[13] "dataset"       "familiar"      "longterm"      "likelihood"   
[17] "prospect"      "assess"        "choleski"      "extens"       
[21] "disabl"        "wang"         

 [1] "nonnorm"         "normal"          "mix"             "linear"         
 [5] "exponenti"       "piecewiselinear" "general"         "abund"          
 [9] "famili"          "examin"         

 [1] "seem"           "unrel"          "spline"         "retail"        
 [5] "credit"         "vehicl"         "dataadapt"      "correl"        
 [9] "knot"           "residu"         "conveni"        "nongaussian"   
[13] "univari"        "allevi"         "leav"           "reversiblejump"
[17] "part"           "neglig"         "difficulti"     "smooth"        
[21] "latent"         "sampler"        "compani"        "abil"          
[25] "wang"           "withinclust"    "smallest"       "consum"        

 [1] "slice"     "invers"    "dimens"    "reduct"    "regress"   "averag"   
 [7] "sir"       "direct"    "central"   "goal"      "respons"   "save"     
[13] "subset"    "method"    "predictor" "subspac"   "varianc"   "preserv"  
[19] "replac"    "suffici"   "systemat" 

 [1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
 [5] "famili"        "multiscal"     "quadrat"       "respect"      
 [9] "poisson"       "regress"       "epidemiolog"   "stabil"       
[13] "wavelet"       "explain"       "contribut"    

 [1] "band"       "confid"     "simultan"   "consid"     "trajectori"
 [6] "extend"     "choos"      "regular"    "asymptot"   "ball"      
[11] "uniform"   

 [1] "administr"      "secondari"      "fda"            "food"          
 [5] "endpoint"       "drug"           "efficaci"       "health"        
 [9] "adjust"         "prevent"        "record"         "separ"         
[13] "agent"          "cardiovascular" "primari"        "instrument"    
[17] "simplifi"       "frequenc"       "dose"           "week"          
[21] "maintain"       "databas"        "deliveri"       "clinic"        
[25] "benefit"        "birth"          "path"           "trial"         
[29] "drastic"        "odd"            "guidanc"        "perspect"      
[33] "intersect"      "guid"           "biomark"        "morbid"        
[37] "emerg"          "fwer"           "serniparametr"  "hour"          
[41] "make"           "stepwis"        "safeti"         "led"           
[45] "nutrit"         "decis"          "describ"        "errorpron"     
[49] "infant"         "serum"          "exemplifi"      "insight"       
[53] "feder"          "advers"         "prospect"       "valid"         
[57] "follow"         "likelihoodbas"  "energi"         "combin"        

 [1] "distort"         "respons"         "unobserv"        "confound"       
 [5] "predictor"       "under"           "adjust"          "serum"          
 [9] "factor"          "magnitud"        "generat"         "alter"          
[13] "intens"          "absent"          "explanatori"     "indirect"       
[17] "likelihoodbas"   "straightforward" "multipl"         "datagener"      
[21] "leastsquar"      "identifi"        "decid"           "stepwis"        
[25] "observ"          "intervent"       "sever"           "relationship"   
[29] "recov"           "system"          "car"             "coeffici"       
[33] "census"          "releas"          "agenc"           "closest"        
[37] "electr"          "shortcom"        "analyst"        

 [1] "motif"       "regul"       "gene"        "dna"         "transcript" 
 [6] "bind"        "sequenc"     "protein"     "factor"      "short"      
[11] "conserv"     "discoveri"   "nucleotid"   "cluster"     "biolog"     
[16] "high"        "site"        "mixtur"      "process"     "call"       
[21] "width"       "genom"       "vari"        "hierarch"    "dirichlet"  
[26] "pattern"     "priori"      "cell"        "strategi"    "organ"      
[31] "databas"     "matric"      "group"       "technolog"   "repres"     
[36] "stochast"    "refin"       "switch"      "substant"    "segment"    
[41] "aid"         "delet"       "similar"     "gibb"        "reduct"     
[46] "regulatori"  "express"     "core"        "find"        "live"       
[51] "yeast"       "composit"    "dictionari"  "accompani"   "appear"     
[56] "missingdata" "genomewid"   "generat"     "principl"    "facilit"    
[61] "recurs"      "background"  "specif"      "chromosom"   "address"    
[66] "wish"        "cycl"        "name"        "understand"  "adjac"      
[71] "variabl"    

[1] "absolut"  "deviat"   "clip"     "oracl"    "progress"

[1] "quantil" "regress"

 [1] "breakdown"  "point"      "robust"     "depth"      "locat"     
 [6] "project"    "equivari"   "finit"      "function"   "possess"   
[11] "contamin"   "competitor" "affin"      "definit"    "introduc"  
[16] "lead"       "induc"      "influenc"   "high"       "outlier"   
[21] "strong"     "trim"       "median"     "region"     "york"      
[26] "scale"      "desir"      "favor"      "turn"       "pursu"     
[31] "enjoy"      "scatter"    "suffic"     "behav"      "uniform"   
[36] "relat"      "comparison" "suggest"    "fact"       "univari"   
[41] "ann"        "radius"    

 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "frequenc"      "long"          "taper"         "fraction"     
 [9] "averag"        "stationari"    "depend"        "periodogram"  
[13] "move"          "whittl"        "slowli"        "nonstationari"
[17] "local"         "process"       "eigenvector"   "angl"         
[21] "paramet"       "period"        "short"         "univari"      
[25] "distinct"      "autoregress"   "volatil"       "fourier"      
[29] "infin"         "longrang"      "delta"         "residu"       
[33] "trim"          "raw"           "log"           "question"     
[37] "break"         "stress"        "know"          "gamma"        
[41] "serniparametr" "subspac"      

 [1] "auxiliari" "survey"    "varianc"   "design"    "popul"     "sampl"    
 [7] "variabl"   "weight"    "calibr"    "designbas" "probabl"   "servic"   
[13] "total"     "finit"     "work"      "feasibl"   "explain"   "miss"     

 [1] "lin"           "addit"         "transplant"    "bone"         
 [5] "work"          "carrol"        "registri"      "intern"       
 [9] "termin"        "multist"       "complic"       "serv"         
[13] "progress"      "transit"       "death"         "domin"        
[17] "backfit"       "implicit"      "largesampl"    "longer"       
[21] "inconsist"     "withinsubject" "withinclust"   "margin"       

 [1] "taper"       "approxim"    "matrix"      "gaussian"    "consist"    
 [6] "spars"       "oper"        "spatial"     "covari"      "block"      
[11] "requir"      "balanc"      "norm"        "precipit"    "station"    
[16] "weather"     "technic"     "manipul"     "matern"      "infeas"     
[21] "multipli"    "wild"        "simpli"      "eigenvector" "sever"      
[26] "onestep"     "resampl"     "oil"         "lose"        "expans"     
[31] "finitesampl" "emphasi"    

[1] "finitesampl" "propos"      "properti"    "simul"      

 [1] "wavelet"     "adapt"       "besov"       "minimax"     "threshold"  
 [6] "rang"        "ball"        "nois"        "wide"        "rate"       
[11] "unknown"     "smooth"      "risk"        "bound"       "function"   
[16] "deconvolut"  "problem"     "white"       "converg"     "signal"     
[21] "recov"       "gaussian"    "transform"   "noisi"       "view"       
[26] "blur"        "discret"     "shape"       "invers"      "spars"      
[31] "densiti"     "nearoptim"   "convolut"    "fourier"     "upper"      
[36] "decay"       "chosen"      "block"       "basi"        "dens"       
[41] "attain"      "waveletbas"  "continu"     "mathemat"    "counterpart"
[46] "physic"      "possess"     "lower"       "global"      "achiev"     
[51] "boundari"    "distinct"    "belong"      "domin"       "estim"      
[56] "place"      

 [1] "forecast"      "predict"       "weather"       "northwest"    
 [5] "spatial"       "probabilist"   "pacif"         "calibr"       
 [9] "wind"          "meteorolog"    "hour"          "temperatur"   
[13] "speed"         "atmospher"     "energi"        "north"        
[17] "center"        "geostatist"    "event"         "futur"        
[21] "averag"        "ensembl"       "american"      "tempor"       
[25] "accur"         "resourc"       "precipit"      "daili"        
[29] "state"         "sharp"         "qualiti"       "site"         
[33] "spacetim"      "generat"       "transport"     "concentr"     
[37] "season"        "climat"        "regim"         "shortterm"    
[41] "numer"         "determinist"   "ozon"          "input"        
[45] "climatolog"    "previous"      "output"        "parsimoni"    
[49] "perturb"       "geograph"      "period"        "trend"        
[53] "correl"        "vari"          "break"         "favor"        
[57] "quantit"       "laplac"        "caus"          "merg"         
[61] "safeti"        "station"       "agricultur"    "accumul"      
[65] "oppos"         "benefit"       "vast"          "global"       
[69] "stateoftheart" "featur"        "system"        "activ"        
[73] "dispers"       "simpler"       "decad"         "organ"        
[77] "crossvalid"    "member"       

 [1] "spacetim"       "spatial"        "fit"            "year"          
 [5] "site"           "separ"          "intens"         "california"    
 [9] "thin"           "process"        "monitor"        "residu"        
[13] "tempor"         "activ"          "multidimension" "occurr"        
[17] "space"          "background"     "appear"         "origin"        
[21] "smoother"       "irregular"      "earthquak"      "indic"         
[25] "asymmetr"       "trend"          "hazard"         "spectral"      
[29] "symmetr"        "environment"    "ozon"           "wind"          
[33] "meteorolog"     "daili"          "allow"          "rescal"        
[37] "season"         "time"           "anisotrop"      "cross"         
[41] "insid"          "bear"           "arbitrari"      "autoregress"   
[45] "interact"       "magnitud"       "sequenc"        "homogen"       
[49] "widespread"     "sphere"         "coordin"        "highlight"     
[53] "elabor"         "extrem"         "ascertain"      "forest"        
[57] "counti"         "rotat"          "month"          "threat"        
[61] "govern"         "secondari"      "aic"            "account"       
[65] "aid"            "emphas"         "routin"         "assess"        
[69] "departur"       "rare"          

 [1] "inhomogen"   "intens"      "spatial"     "process"     "poisson"    
 [6] "point"       "thin"        "stationari"  "function"    "firstord"   
[11] "efficaci"    "secondord"   "caus"        "infecti"     "network"    
[16] "infect"      "transmiss"   "respiratori" "environ"     "epidem"     
[21] "unrealist"   "lend"        "syndrom"     "hospit"      "emphasi"    
[26] "unusu"       "paid"        "peak"       

 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "moment"       
 [9] "autoregress"   "local"         "financi"       "condit"       
[13] "standard"      "move"          "averag"        "sequenc"      
[17] "mont"          "carlo"         "innov"         "satisfi"      
[21] "iid"           "root"          "time"          "forecast"     
[25] "nonstationari" "fourth"        "capabl"        "residu"       
[29] "return"        "rescal"        "exponenti"     "exchang"      
[33] "reparameter"   "arma"          "ergod"         "homogen"      
[37] "simpli"        "normal"        "explain"       "uniqu"        
[41] "exist"        

 [1] "withinclust"   "cluster"       "correl"        "account"      
 [5] "frequent"      "frailti"       "varianc"       "carri"        
 [9] "arbitrari"     "abil"          "achiev"        "hormon"       
[13] "generalis"     "tackl"         "characteris"   "evalu"        
[17] "simplic"       "fashion"       "closedform"    "noninform"    
[21] "hamper"        "intuit"        "dementia"      "birth"        
[25] "errorpron"     "ill"           "copula"        "withinsubject"

[1] "polynomi"    "local"       "smooth"      "regress"     "nonparametr"
[6] "asymptot"    "spline"     

 [1] "elect"        "vote"         "poll"         "presidenti"   "evid"        
 [6] "candid"       "polit"        "count"        "station"      "proport"     
[11] "forecast"     "nonrespons"   "elimin"       "prefer"       "counti"      
[16] "scientist"    "permit"       "lower"        "incom"        "fisher"      
[21] "york"         "record"       "heterogen"    "purpos"       "respond"     
[26] "percentag"    "particip"     "quick"        "transfer"     "week"        
[31] "spatiotempor" "evolut"       "california"   "histor"       "krige"       
[36] "list"         "appar"        "outcom"       "invalid"      "nonignor"    
[41] "publish"      "nonrespond"  

 [1] "survey"      "nonrespons"  "census"      "nation"      "respond"    
 [6] "imput"       "popul"       "health"      "race"        "bureau"     
[11] "nonignor"    "unit"        "respons"     "item"        "incom"      
[16] "miss"        "person"      "year"        "state"       "bias"       
[21] "employ"      "higher"      "valu"        "sensit"      "interview"  
[26] "labor"       "nonrespond"  "age"         "feder"       "collect"    
[31] "measur"      "handl"       "assess"      "report"      "level"      
[36] "counti"      "domain"      "preval"      "agenc"       "confidenti" 
[41] "benchmark"   "incorpor"    "protect"     "status"      "cell"       
[46] "earn"        "produc"      "sourc"       "relat"       "weight"     
[51] "propens"     "public"      "household"   "area"        "geograph"   
[56] "nutrit"      "document"    "lower"       "plan"        "bodi"       
[61] "gender"      "extrapol"    "preliminari" "birth"       "polit"      
[66] "correct"     "american"    "proxi"       "requir"      "previous"   
[71] "children"    "york"        "unemploy"    "death"      

 [1] "jackknif"  "file"      "replic"    "varianc"   "inconsist" "strata"   
 [7] "analyt"    "unbias"    "met"       "domain"    "schedul"   "freedom"  
[13] "survey"    "attain"    "balanc"    "mix"       "ensur"     "public"   
[19] "repeat"    "upper"     "bootstrap" "uncondit"  "plausibl"  "person"   
[25] "pseudo"    "concern"   "linkag"   

[1] "variancecovari"  "matrix"          "analyz"          "respect"        
[5] "quasilikelihood" "criterion"       "coin"            "efron"          

[1] "root"     "squar"    "approxim"

[1] "maximum"    "likelihood" "estim"      "paramet"   

 [1] "pca"           "princip"       "compon"        "matrix"       
 [5] "eigenvector"   "size"          "dimension"     "reduct"       
 [9] "eigenvalu"     "analysi"       "spike"         "perturb"      
[13] "logp"          "succeed"       "transit"       "dimens"       
[17] "maxim"         "highdimension" "set"           "sampl"        
[21] "threshold"     "nonzero"       "oil"           "direct"       
[25] "critic"        "sophist"       "recov"         "hold"         
[29] "sharp"         "larger"        "theorem"       "relax"        
[33] "high"          "diagon"        "overlap"       "domin"        
[37] "success"       "geometr"       "regim"         "tractabl"     
[41] "popul"         "ill"           "behav"         "extract"      
[45] "exhibit"       "support"       "tool"          "crossov"      
[49] "sudden"        "track"         "lose"          "infinit"      
[53] "evolutionari"  "tree"          "complex"       "largest"      
[57] "phenomenon"    "program"       "describ"       "nonasymptot"  
[61] "branch"        "topolog"       "row"           "embed"        
[65] "euclidean"     "geodes"        "anim"          "nois"         
[69] "machin"        "phase"         "speci"         "twoway"       
[73] "rise"         

 [1] "eigenfunct"  "function"    "princip"     "compon"      "random"     
 [6] "analysi"     "smooth"      "eigenvalu"   "data"        "curv"       
[11] "spars"       "space"       "trajectori"  "score"       "noisi"      
[16] "deriv"       "lead"        "sampl"       "longitudin"  "eigenvector"
[21] "expans"      "impact"      "elucid"      "decomposit"  "firstord"   
[26] "repres"      "differenti"  "measur"      "dynam"       "intrins"    
[31] "similar"     "plan"       

 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "gene"          "latent"        "viral"         "initi"        
 [9] "biomark"       "understand"    "protein"       "pronounc"     
[13] "infect"        "therapi"       "supplementari" "quantifi"     
[17] "concentr"      "chemic"        "tackl"         "incorrect"    
[21] "healthi"       "identifi"      "molecular"     "human"        
[25] "serum"         "hormon"        "investig"      "experiment"   
[29] "search"        "status"        "sort"          "drug"         
[33] "inflat"        "pertin"        "mediat"        "mutat"        
[37] "resist"        "absent"        "blood"         "exemplifi"    
[41] "valuabl"       "phenotyp"      "led"           "indic"        
[45] "subsequ"       "format"        "framework"    

[1] "establish" "asymptot"  "consist"   "converg"  

 [1] "classifi"        "classif"         "discrimin"       "distancebas"    
 [5] "vector"          "centroid"        "support"         "machin"         
 [9] "theoret"         "popul"           "featur"          "rule"           
[13] "poor"            "popular"         "produc"          "distanc"        
[17] "method"          "highdimension"   "accumul"         "varieti"        
[21] "heavytail"       "differ"          "diverg"          "nearest"        
[25] "train"           "median"          "difficulti"      "spectra"        
[29] "componentwis"    "replac"          "excess"          "convent"        
[33] "frequent"        "truncat"         "boundari"        "counterpart"    
[37] "insensit"        "encount"         "closest"         "entail"         
[41] "case"            "allevi"          "problemat"       "today"          
[45] "argument"        "euclidean"       "inconsist"       "caus"           
[49] "straightforward" "neighbour"       "suffer"          "anneal"         
[53] "attempt"         "perform"         "misclassif"      "alloc"          
[57] "volatil"         "believ"          "explor"          "help"           
[61] "inher"           "explos"          "earthquak"       "base"           
[65] "consequ"         "achiev"          "jin"             "kullbackleibl"  
[69] "contemporari"    "construct"       "drawback"        "tstatist"       

 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "random"     "casecohort" "weight"     "invers"     "twophas"   
[11] "cohort"     "biometrika" "design"     "prentic"    "causal"    
[16] "purpos"     "lemma"      "exemplifi"  "unbias"     "mar"       
[21] "suit"       "amer"       "assoc"      "proceed"    "summar"    
[26] "ser"        "soc"        "roy"        "calcul"     "iid"       
[31] "appear"     "cox"        "imput"      "visit"      "ann"       
[36] "augment"    "percentag"  "schedul"    "direct"     "unbalanc"  
[41] "mediat"     "day"        "embed"      "mental"     "equat"     
[46] "nice"       "month"     

[1] "bootstrap" "confid"    "distribut" "sampl"     "interv"    "method"   
[7] "correct"   "seri"      "empir"    

 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "optim"         "low"           "highdimension" "nonasymptot"  
[13] "convex"        "minimax"       "noisi"         "spars"        
[17] "vector"        "singular"      "element"       "error"        
[21] "minim"         "setup"         "predict"       "autoregress"  
[25] "recoveri"      "theori"        "trace"         "obtain"       
[29] "decomposit"    "class"         "excel"         "mean"         
[33] "lower"         "instanc"       "yield"         "sharp"        
[37] "agreement"     "precis"        "mestim"        "complementari"
[41] "lowdimension"  "entri"         "analyz"        "oper"         
[45] "meansquar"     "relax"         "hold"          "determinist"  
[49] "observ"        "condit"        "autocovari"    "decompos"     
[53] "notion"        "stay"          "restrict"      "stronger"     
[57] "krige"        

 [1] "minimax"  "rate"     "densiti"  "optim"    "unknown"  "adapt"   
 [7] "loss"     "class"    "prove"    "sens"     "converg"  "problem" 
[13] "bound"    "estim"    "risk"     "vector"   "set"      "gaussian"
[19] "lower"   

 [1] "imag"     "magnet"   "reson"    "field"    "brain"    "fmri"    
 [7] "activ"    "voxel"    "signal"   "detect"   "locat"    "volum"   
[13] "accur"    "follow"   "task"     "motion"   "region"   "visual"  
[19] "identifi" "exploit"  "tissu"    "aim"      "contigu"  "map"     
[25] "rotat"    "neuron"  

 [1] "ozon"            "maxima"          "splinebas"       "nonlinear"      
 [5] "piecewiselinear" "concentr"        "pressur"         "cycl"           
 [9] "transport"       "variat"          "contribut"       "atmospher"      
[13] "peak"            "trend"           "measur"          "basi"           
[17] "evid"            "instrument"      "thought"         "greater"        
[21] "link"            "scientif"        "lag"             "dimensionreduct"
[25] "absenc"          "wave"            "global"          "separ"          
[29] "month"           "coincid"         "influenc"        "lowdimension"   
[33] "clear"           "contrast"        "lower"           "year"           
[37] "site"            "qualiti"         "profil"          "sequenc"        
[41] "sensit"          "origin"          "relat"           "presenc"        
[45] "satellit"        "partial"         "pattern"         "identifi"       

 [1] "experienc" "event"     "deterior"  "trial"     "aberr"     "patient"  
 [7] "import"    "die"       "protocol"  "benefici"  "rank"      "treatment"
[13] "mention"   "wilcoxon"  "receiv"    "children"  "aspect"    "consequ"  
[19] "exact"     "preserv"   "fisher"    "placebo"   "sort"      "magnitud" 
[25] "longer"    "medic"     "exposur"   "adequ"     "discard"   "greatest" 
[31] "fact"      "need"      "invert"    "substanti" "subsequ"   "tabl"     
[37] "remov"     "exhibit"   "way"       "basic"     "singl"     "health"   
[43] "aim"       "care"      "interv"    "complet"   "specif"    "sum"      
[49] "question"  "cubic"     "cancer"    "situat"    "extrem"    "splinebas"
[55] "outcom"    "treat"     "rotat"     "control"   "binari"    "effect"   

 [1] "bspline"   "kernel"    "tackl"     "represent" "spline"    "penal"    
 [7] "tempor"    "proceed"   "splinebas" "truncat"   "solut"     "rigor"    
[13] "account"  

 [1] "integ"      "algebra"    "appl"       "ail"        "finit"     
 [6] "classic"    "countabl"   "coher"      "multist"    "util"      
[11] "math"       "ident"      "system"     "call"       "fewer"     
[16] "state"      "ideal"      "grid"       "object"     "binari"    
[21] "posit"      "pure"       "geometri"   "probabl"    "inequ"     
[26] "comprehens" "alpha"      "socal"      "repres"     "idea"      
[31] "failur"     "yield"      "type"       "relat"      "bernoulli" 
[36] "bound"      "probab"     "compon"     "superposit" "complex"   

 [1] "electr"        "forecast"      "renew"         "bivari"       
 [5] "load"          "market"        "daili"         "power"        
 [9] "serial"        "shortterm"     "wind"          "autoregress"  
[13] "diagon"        "speed"         "time"          "season"       
[17] "focus"         "difficult"     "peak"          "spectrum"     
[21] "temperatur"    "regressor"     "heteroscedast" "firstord"     
[25] "total"         "highlight"     "energi"        "justifi"      
[29] "simpl"         "week"          "vari"          "hour"         
[33] "trend"         "citi"          "recogn"        "stationari"   
[37] "autocovari"    "detail"        "promis"        "realiti"      
[41] "favor"         "reveal"        "year"          "longmemori"   
[45] "gain"          "accuraci"      "exploit"       "predict"      
[49] "option"        "reliabl"       "price"         "evolut"       
[53] "avail"         "superpopul"   

 [1] "highfrequ"       "financi"         "asset"           "volatil"        
 [5] "price"           "lowfrequ"        "exchang"         "dynam"          
 [9] "stock"           "matrix"          "daili"           "period"         
[13] "nois"            "realiz"          "pool"            "market"         
[17] "matric"          "infin"           "diffus"          "return"         
[21] "day"             "trade"           "captur"          "forecast"       
[25] "vast"            "overcom"         "variat"          "hundr"          
[29] "pertin"          "dimensionreduct" "econom"          "iii"            
[33] "alloc"           "noisi"           "industri"        "zhang"          
[37] "guidanc"         "merit"           "adequ"           "size"           
[41] "highdimension"   "fan"             "eigenvector"     "option"         
[45] "wavelet"         "built"           "avail"          

 [1] "day"        "daili"      "record"     "time"       "financi"   
 [6] "activ"      "short"      "peak"       "consecut"   "help"      
[11] "autocovari" "appropri"   "intens"     "physic"     "character" 
[16] "measur"     "children"   "trade"      "strength"   "scalar"    
[21] "superposit" "incomplet"  "copi"      

[1] "secondord"   "firstord"    "accur"       "expans"      "unbias"     
[6] "moment"      "approxim"    "frequentist" "exact"      

 [1] "treatment"  "assign"     "causal"     "score"      "outcom"    
 [6] "propens"    "averag"     "effect"     "grade"      "school"    
[11] "potenti"    "stratif"    "promot"     "confound"   "rubin"     
[16] "student"    "unit"       "regim"      "educ"       "adjust"    
[21] "children"   "plausibl"   "polici"     "program"    "evid"      
[26] "pretreat"   "posttreat"  "summar"     "stage"      "child"     
[31] "intermedi"  "assumpt"    "retain"     "multilevel" "block"     
[36] "econom"     "experiment" "stabl"      "arbitrari"  "nation"    
[41] "articl"     "balanc"     "learn"      "perspect"   "status"    
[46] "unmeasur"   "fewer"      "scalar"     "affect"     "low"       
[51] "mathemat"   "track"      "twostag"    "covari"     "tradeoff"  
[56] "recov"      "nonrandom"  "bind"       "pose"       "estimand"  
[61] "impos"      "feasibl"    "return"    

 [1] "extrapol"      "errorpron"     "posttreat"     "instrument"   
 [5] "classic"       "baselin"       "replic"        "subsampl"     
 [9] "nonlinear"     "daili"         "summari"       "air"          
[13] "encount"       "subset"        "bias"          "efficaci"     
[17] "heteroscedast" "frequenc"      "trajectori"    "spheric"      
[21] "supplementari" "correct"       "multiscal"     "scatter"      
[25] "reconstruct"   "subject"       "error"         "temperatur"   

 [1] "admiss"       "inadmiss"     "loss"         "bay"          "risk"        
 [6] "endpoint"     "action"       "ann"          "accept"       "math"        
[11] "genom"        "screen"       "stringent"    "result"       "complet"     
[16] "stepup"       "character"    "formul"       "treat"        "pearson"     
[21] "amer"         "assoc"        "biometrika"   "prototyp"     "vector"      
[26] "pay"          "reject"       "decad"        "revisit"      "metaanalysi" 
[31] "criteria"     "effort"       "bioassay"     "thought"      "hard"        
[36] "psycholog"    "nonneg"       "predetermin"  "fals"         "energi"      
[41] "earlier"      "educ"         "hoc"          "stein"        "emerg"       
[46] "fair"         "dna"          "appeal"       "sign"         "singlestep"  
[51] "drug"         "microarray"   "statistician" "jeffrey"      "year"        
[56] "fewer"        "fisher"       "paper"        "resembl"      "paradox"     
[61] "share"        "twodimension" "nonzero"      "stepdown"     "seek"        
[66] "expect"      

[1] "coeffici" "regress"  "linear"   "vari"    

 [1] "unbound"   "novelti"   "function"  "yield"     "oracl"     "tail"     
 [7] "decreas"   "satisfi"   "anisotrop" "inequ"     "median"    "slower"   
[13] "literatur" "bivari"    "free"      "vast"      "fast"      "input"    
[19] "setup"     "output"    "aggreg"    "aforement" "behav"     "influenti"
[25] "iii"       "bound"     "univers"   "main"      "nuclear"   "radius"   
[31] "need"      "tilt"      "hyperplan" "higherord" "symmetri"  "equivari" 
[37] "gee"       "scatter"   "bin"       "quadrat"  

 [1] "wishart"     "graph"       "cone"        "graphic"     "famili"     
 [6] "matric"      "conjug"      "gaussian"    "matrix"      "covari"     
[11] "decompos"    "prior"       "paramet"     "edg"         "definit"    
[16] "paper"       "homogen"     "space"       "correspond"  "posit"      
[21] "standard"    "shape"       "form"        "ann"         "miss"       
[26] "zero"        "equal"       "eigenvalu"   "dimens"      "close"      
[31] "respect"     "invers"      "sigma"       "chisquar"    "distinct"   
[36] "flexibl"     "margin"      "precis"      "bay"         "undirect"   
[41] "fix"         "refer"       "direct"      "constant"    "acycl"      
[46] "satisfi"     "expect"      "encod"       "entri"       "enrich"     
[51] "accept"      "phi"         "scalabl"     "omega"       "nonhomogen" 
[56] "probab"      "euclidean"   "dual"        "read"        "restrict"   
[61] "centr"       "characteris" "deep"        "tangent"     "fourth"     
[66] "perfect"    

 [1] "schedul"         "longitudin"      "followup"        "analys"         
 [5] "phase"           "generat"         "incomplet"       "flexibl"        
 [9] "respons"         "avail"           "ill"             "unbalanc"       
[13] "pursu"           "offer"           "enter"           "resourc"        
[17] "impact"          "merg"            "concret"         "intermitt"      
[21] "interim"         "preced"          "perfect"         "divid"          
[25] "maker"           "face"            "preliminari"     "fluctuat"       
[29] "missingatrandom" "versatil"        "alloc"           "timetoev"       
[33] "withinsubject"   "compromis"       "manag"           "metropoli"      
[37] "missingdata"     "walk"            "logrank"        

[1] "real"    "simul"   "data"    "illustr"

 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "identif"       "stein"         "admit"        
 [9] "moder"         "satisfi"       "moment"        "nontrivi"     
[13] "iid"           "impli"         "detail"        "deviat"       
[17] "decay"         "loglikelihood" "benchmark"     "nest"         
[21] "yield"         "posit"         "deal"          "difficulti"   
[25] "mild"          "prove"         "exponenti"     "relat"        
[29] "version"       "preliminari"   "nation"        "ratio"        
[33] "populationbas" "order"         "specif"       

 [1] "chi"           "test"          "distribut"     "space"        
 [5] "ratio"         "restrict"      "statist"       "conveni"      
 [9] "tail"          "goodnessoffit" "pearson"      

[1] "size"   "sampl"  "number" "small"  "larg"  

[1] "misspecifi" "robust"     "misspecif" 

 [1] "climat"      "temperatur"  "chang"       "greenhous"   "global"     
 [6] "earth"       "uncertainti" "northern"    "atmospher"   "quantifi"   
[11] "trend"       "increas"     "reconstruct" "averag"      "region"     
[16] "separ"       "tempor"      "concentr"    "surfac"      "longterm"   
[21] "pollut"      "period"      "centuri"     "opposit"     "tree"       
[26] "gas"         "creat"       "purpos"      "futur"       "record"     
[31] "remot"       "understand"  "radiat"      "emiss"       "proxi"      
[36] "histor"      "air"         "ecolog"      "forest"      "magnitud"   
[41] "massiv"      "cloud"       "gather"      "forc"        "weather"    
[46] "synthet"     "actual"      "pattern"     "expert"      "extern"     
[51] "current"     "quantif"     "agreement"   "institut"    "act"        
[1] "diseas"   "individu" "level"   

[1] "model"

[1] "estim"

[1] "time" "seri"

[1] "control"   "fals"      "multipl"   "discoveri" "hypothes"  "fdr"      
[7] "reject"    "rate"      "pvalu"    

[1] "risk"      "bound"     "adapt"     "threshold"

[1] "vector"   "classif"  "classifi"

[1] "space" "shape"

[1] "time"    "surviv"  "hazard"  "event"   "censor"  "failur"  "proport"

[1] "popul"  "survey" "weight"

[1] "data"

[1] "treatment" "outcom"    "causal"    "assign"   

[1] "articl"

[1] "bayesian"  "posterior"

[1] "trial"     "patient"   "clinic"    "treatment"

[1] "statist"

[1] "select"

[1] "gene"       "express"    "microarray" "differenti"

[1] "process"

[1] "matrix"

[1] "predict"

[1] "method"

[1] "carlo"  "mont"   "markov" "chain" 

[1] "function"

[1] "algorithm"

[1] "test"

[1] "problem"

[1] "fit"

[1] "point"

[1] "likelihood" "maximum"   

[1] "confid"    "interv"    "construct" "bootstrap"

[1] "comput"

[1] "bias"

[1] "optim"

[1] "prior"

[1] "distribut"

[1] "paper"

[1] "propos"

[1] "observ"

[1] "densiti"

[1] "smooth"

[1] "rate"    "converg"

[1] "structur"

[1] "analysi"

[1] "design"

[1] "requir" "direct"

[1] "paramet"

[1] "random"

[1] "local" "deriv"

[1] "case"

[1] "robust"

[1] "respons"   "predictor"

[1] "develop"

[1] "studi"

[1] "properti"

[1] "measur"

[1] "empir"

[1] "assumpt"

[1] "number"

[1] "error"

[1] "simul"

[1] "general"

[1] "limit"

[1] "compar"

[1] "asymptot"

[1] "larg"

[1] "variabl"

[1] "probabl"

[1] "condit"

[1] "sampl"

[1] "infer"

[1] "compon"  "princip"

[1] "null"      "hypothesi" "altern"   

[1] "linear"

[1] "effect"

[1] "provid" "addit" 

[1] "size"

[1] "depend"

[1] "effici"

[1] "regress"

[1] "perform"

[1] "procedur"

[1] "class"

[1] "correl"

[1] "illustr"

[1] "approach"

[1] "applic"

[1] "consist"

[1] "coeffici"

[1] "cluster"

[1] "independ"

[1] "set"

[1] "nonparametr"

[1] "base"

[1] "semiparametr" "parametr"    

[1] "standard"

[1] "appli"

[1] "normal"

[1] "covari"

[1] "varianc"
[1] "level"    "popul"    "individu"

[1] "estim"

[1] "time"   "surviv" "hazard" "event"  "censor" "failur"

[1] "model"

[1] "space"  "dimens" "shape" 

[1] "control"   "fals"      "discoveri" "multipl"   "rate"      "fdr"      
[7] "hypothes"  "reject"    "pvalu"    

[1] "data"

[1] "time" "seri"

[1] "treatment" "trial"     "patient"  

[1] "statist"

[1] "articl"

[1] "gene"       "express"    "microarray"

[1] "outcom"

[1] "select"

[1] "risk"  "bound" "loss" 

[1] "matrix" "vector"

[1] "bayesian"

[1] "predict"

[1] "process"

[1] "adapt"

[1] "method"

[1] "test"

[1] "problem"

[1] "algorithm"

[1] "design"

[1] "function"

[1] "point"

[1] "likelihood" "maximum"   

[1] "fit"   "model"

[1] "distribut"

[1] "bias"

[1] "prior"

[1] "perform"

[1] "observ"

[1] "studi"

[1] "propos"

[1] "number"

[1] "properti"

[1] "confid"    "interv"    "construct" "bootstrap"

[1] "sampl"

[1] "size"

[1] "analysi"

[1] "rate"    "converg"

[1] "random"

[1] "probabl"

[1] "optim"

[1] "paramet"

[1] "structur"

[1] "comput"

[1] "respons"   "predictor"

[1] "smooth"

[1] "develop"

[1] "markov" "chain" 

[1] "assumpt"

[1] "densiti"

[1] "paper"

[1] "case"

[1] "empir"

[1] "error"

[1] "requir"

[1] "exist"    "demonstr"

[1] "general"

[1] "effici"

[1] "asymptot"

[1] "compon"  "princip"

[1] "measur"

[1] "simul"

[1] "effect"

[1] "condit"

[1] "local"

[1] "variabl"

[1] "infer"

[1] "procedur"

[1] "limit"

[1] "class"

[1] "linear"

[1] "provid"

[1] "regress"

[1] "null"      "hypothesi" "altern"   

[1] "approach"

[1] "base"

[1] "consist"

[1] "correl"

[1] "independ"

[1] "applic"

[1] "carlo" "mont" 

[1] "depend"

[1] "illustr"

[1] "set"

[1] "normal"

[1] "deriv"

[1] "semiparametr" "parametr"    

[1] "appli"

[1] "approxim"

[1] "coeffici"

[1] "cluster"

[1] "nonparametr"

[1] "covari"

[1] "standard"

[1] "varianc"

It turns out that the flash fit (with psedocount =1) is actually a better fit by Frobenius norm than the maximum likelihood fit! Maybe the greedy approach of flash is helping it to find better solutions? In general these plots don’t really show very close correspondence between the data and the fit.

  fv= fitted(fit.nn.s.10)
  sub = sample(1:length(fv),100000)
  plot(lmat_s_10[sub],fv[sub],main="flash fit (pseudocount 10)")

  fv= fitted(fit.nn.s.1)
  plot(lmat_s_1[sub],fv[sub],main="flash fit (pseudocount 1)")

  fv= fitted(fit.nn.s.01)
  plot(lmat_s_01[sub],fv[sub],main="flash fit (pseudocount 0.1)")

  fv= fitted(fit.nn.s.001)
  plot(lmat_s_001[sub],fv[sub],main="flash fit (pseudocount 0.01)")

  fv= %*% (*
  plot(lmat_s_1[sub],fv[sub], main = "mle fit")

  mean(( %*% (*^2)
[1] 0.0316755
[1] 0.01768353

Comparing the fits

It is hard to go through all the different keyword lists, so I tried comparing fits pairwise. The idea is to focus on factors being found by one fit and not the other when trying to assess whether you prefer one fit or the other.

First I compare pseudocount 1 and 10:

cc = cor(fit.nn.s.1$F_pm,fit.nn.s.10$F_pm)
[1] 26

See which ones are fit-specific

spec1 = apply(cc,1,max)<0.9
spec2 = apply(cc,2,max)<0.9
 [1] "treatment" "trial"     "random"    "assign"    "patient"   "effect"   
 [7] "outcom"    "clinic"    "causal"    "placebo"   "assumpt"  

[1] "surviv" "time"   "hazard" "censor" "failur" "studi" 

[1] "wilk"

[1] "rankbas"  "effici"   "asymptot" "rank"    

[1] "varyingcoeffici"

[1] "depth"   "project"

[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[1] "penal"      "nonconcav"  "likelihood" "select"     "variabl"   
[6] "oracl"      "penalti"    "regular"   

[1] "spline" "smooth"

[1] "survey" "popul"  "sampl" 

[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[1] "onestep"

[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[1] "nonnorm"

[1] "theta"   "paramet"

[1] "robin"     "miss"      "zhao"      "rotnitzki" "effici"   

[1] "mestim" "robust"

[1] "finitesampl"

[1] "elect" "vote"  "poll" 

[1] "errorpron" "error"    

[1] "stock"

[1] "garch"   "process" "volatil"

[1] "slice"   "invers"  "regress" "dimens"  "method" 

[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[1] "slope"

[1] "chi"  "test"

[1] "function"   "eigenfunct" "analysi"    "random"     "princip"   
[6] "compon"     "data"      

[1] "tabl"    "conting"

[1] "criterion" "akaik"     "select"    "model"    

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

[1] "neighborhood"

[1] "maximum"    "welldefin"  "posteriori"
[1] "lengthbias" "surviv"     "preval"     "cohort"    

[1] "hazard"  "proport"

[1] "meansquar" "predict"   "error"     "small"     "area"     

[1] "onestep"

[1] "polymorph" "genotyp"   "haplotyp"  "snp"      

[1] "equivari"  "depth"     "breakdown" "concept"   "introduc" 

[1] "nonrespons" "imput"      "survey"     "respons"   

[1] "robin"      "miss"       "zhao"       "casecohort" "rotnitzki" 

[1] "vote"   "elect"  "candid"

[1] "sampl"     "survey"    "designbas" "infer"     "weight"    "modelbas" 
[7] "popul"    

[1] "test"      "logrank"   "weight"    "treatment" "formula"   "patient"  
[7] "supremum"  "standard"  "twostag"  

[1] "track"  "replac" "usag"  

[1] "precipit" "spatial" 


[1] "twostep"  "submodel"

[1] "design"  "paramet" "effici" 

[1] "timedepend" "covari"     "treatment" 

[1] "miss" "data"

[1] "trim"   "robust" "depth" 

[1] "substitut" "euclidean"

[1] "empir"      "likelihood" "bartlett"   "adjust"    

[1] "volatil"   "highfrequ" "asset"     "price"    

[1] "nonneg"

[1] "norm"   "matrix"

[1] "popul"      "superpopul"

[1] "misspecif"

[1] "file"   "linkag"

[1] "kaplanmei" "quantil"   "surviv"    "censor"   

[1] "axe"    "rotat"  "matric"

[1] "mutual" "empir"  "genet" 

[1] "innov"   "process" "residu" 

[1] "monoton"  "function"

Here I compare the fits with lower pseudocounts.

compare = function(fit1,fit2){
  cc = cor(fit1$F_pm,fit2$F_pm)
  spec1 = apply(cc,1,max)<0.9
  spec2 = apply(cc,2,max)<0.9

Pseudocount 1 vs 0.1:

[1] "assoc"   "amer"    "statist" "ann"    

[1] "choleski"   "matrix"     "covari"     "decomposit" "factor"    
[6] "interpret" 

[1] "mse"       "predictor" "linear"    "error"     "squar"     "empir"    

[1] "depth"   "project"

[1] "jackknif" "mix"      "squar"    "area"     "varianc" 

[1] "spline" "smooth"

[1] "survey" "popul"  "sampl" 

[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[1] "onestep"

[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[1] "sobolev" "densiti" "minimax" "rate"   

[1] "errorpron" "error"    

[1] "panel" "count"

[1] "stock"

[1] "secondord"

[1] "equat" "estim"

[1] "slice"   "invers"  "regress" "dimens"  "method" 

[1] "survivor"

[1] "slope"

[1] "tabl"    "conting"

[1] "criterion" "akaik"     "select"    "model"    

[1] "neighborhood"

[1] "maximum"    "welldefin"  "posteriori"

 [1] "penalis"       "newtonraphson" "framingham"    "penalti"      
 [5] "likelihood"    "heart"         "failur"        "carri"        
 [9] "algorithm"     "proper"        "conduct"       "advanc"       
[13] "grow"          "dropout"       "familiar"      "prospect"     

[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[1] "quantil" "regress"

 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

[1] "maximum"    "likelihood" "estim"     

[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[1] "coeffici" "regress" 

 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[1] "unequ"     "designbas" "survey"    "weight"   

[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[1] "variancecovari" "matrix"         "analyz"        

[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[1] "bspline" "kernel"  "penal"  

 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[1] "bar"    "vertic" "cap"    "lambda"

 [1] "integ"      "algebra"    "coher"      "ail"        "ident"     
 [6] "countabl"   "multist"    "system"     "appl"       "finit"     
[11] "classic"    "object"     "ideal"      "grid"       "util"      
[16] "math"       "fewer"      "state"      "call"       "binari"    
[21] "inequ"      "pure"       "geometri"   "comprehens" "alpha"     
[26] "posit"      "socal"      "repres"     "idea"       "complex"   
[31] "probabl"    "yield"      "failur"     "relat"      "type"      

 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[1] "iid"   "prove"

 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[1] "goodnessoffit" "test"          "includ"        "residu"       

 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[1] "size"   "sampl"  "number"

 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[1] "tilt"       "exponenti"  "constraint" "employ"    

 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "admit"         "stein"         "identif"      
 [9] "moder"         "satisfi"       "nontrivi"      "impli"        
[13] "detail"        "deviat"        "loglikelihood" "benchmark"    
[17] "moment"        "nest"          "yield"         "exponenti"    
[21] "decay"         "deal"          "difficulti"    "mild"         
[25] "posit"         "relat"         "version"       "prove"        

 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 

[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  

Pseudocount 0.1 vs 0.01. The 0.01 are not as bad as I expected.

 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[1] "unequ"     "designbas" "survey"    "weight"   

[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[1] "varianc"  "asymptot"

 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[1] "iid"   "prove"

 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

[1] "goodnessoffit" "test"          "includ"        "residu"       

 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[1] "tilt"       "exponenti"  "constraint" "employ"    

 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 

[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  

 [1] "dichotom"        "outcom"          "exposur"         "genet"          
 [5] "inherit"         "confound"        "interact"        "causal"         
 [9] "trial"           "factor"          "binari"          "presenc"        
[13] "categor"         "assess"          "alcohol"         "continu"        
[17] "disord"          "misspecif"       "ordin"           "clinic"         
[21] "postul"          "trait"           "topic"           "environment"    
[25] "subgroup"        "potenti"         "geneenviron"     "alter"          
[29] "adequ"           "examin"          "adjust"          "intermedi"      
[33] "cancer"          "robin"           "stage"           "logist"         
[37] "arm"             "firststag"       "generic"         "latent"         
[41] "build"           "variabl"         "conduct"         "affect"         
[45] "accommod"        "prone"           "submodel"        "transmiss"      
[49] "mental"          "mediat"          "unspecifi"       "quantit"        
[53] "expos"           "major"           "multipli"        "sever"          
[57] "believ"          "gene"            "zhang"           "distributionfre"
[61] "routin"          "today"          

 [1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
 [6] "viral"        "transmiss"    "vaccin"       "subject"      "genet"       
[11] "drug"         "develop"      "efficaci"     "mutat"        "outcom"      
[16] "causal"       "cell"         "syndrom"      "medic"        "pathway"     
[21] "resist"       "evolutionari" "therapi"      "pressur"     

 [1] "dropout"       "stratum"       "prevent"       "reduc"        
 [5] "oil"           "trial"         "adjust"        "longitudin"   
 [9] "cancer"        "prostat"       "mechan"        "men"          
[13] "find"          "stratifi"      "arm"           "nuisanc"      
[17] "treatment"     "assign"        "grade"         "doubleblind"  
[21] "avoid"         "colleagu"      "randomeffect"  "sever"        
[25] "verif"         "agent"         "conjectur"     "annual"       
[29] "nonignor"      "placebo"       "volum"         "elect"        
[33] "caus"          "daili"         "visit"         "preval"       
[37] "absolut"       "lie"           "indic"         "sensit"       
[41] "frequent"      "particip"      "year"          "reduct"       
[45] "causal"        "report"        "newtonraphson" "adopt"        
[49] "question"      "women"         "elder"         "surrog"       
[53] "inform"        "elicit"        "prospect"      "collabor"     
[57] "drawn"         "ignor"         "differ"        "link"         
[61] "retain"        "tilt"          "random"        "constraint"   
[65] "status"        "impli"         "doubli"        "expert"       
[69] "nonidentifi"   "intermitt"     "satur"         "sex"          
[73] "characterist"  "invers"       

  [1] "polici"       "statistician" "maker"        "decis"        "scienc"      
  [6] "role"         "technolog"    "today"        "chang"        "live"        
 [11] "bring"        "social"       "communic"     "integr"       "individu"    
 [16] "futur"        "knowledg"     "disciplin"    "nation"       "public"      
 [21] "scientif"     "health"       "activ"        "human"        "impact"      
 [26] "organ"        "inform"       "protect"      "promot"       "qualiti"     
 [31] "understand"   "program"      "way"          "student"      "mathemat"    
 [36] "increas"      "face"         "foundat"      "play"         "essenti"     
 [41] "uncertainti"  "effort"       "engin"        "expect"       "advanc"      
 [46] "confidenti"   "children"     "relev"        "make"         "industri"    
 [51] "govern"       "countri"      "encourag"     "polit"        "place"       
 [56] "modern"       "intern"       "scientist"    "closer"       "benefit"     
 [61] "reflect"      "explor"       "stronger"     "purpos"       "univers"     
 [66] "spread"       "environment"  "network"      "grow"         "forc"        
 [71] "access"       "devic"        "ingredi"      "excel"        "comprehens"  
 [76] "pollut"       "attract"      "broader"      "elementari"   "evolv"       
 [81] "train"        "pressur"      "air"          "option"       "imposs"      
 [86] "secondari"    "map"          "edg"          "success"      "progress"    
 [91] "critic"       "global"       "action"       "year"         "agenc"       
 [96] "communiti"    "american"     "quantit"      "genom"        "system"      
[101] "fundament"    "discoveri"    "evid"         "guarante"     "mortal"      
[106] "address"      "citi"         "requir"       "technic"      "serv"        
[111] "path"         "statist"      "separ"        "climat"       "contribut"   
[116] "opportun"     "adequaci"     "disabl"       "affect"       "driven"      
[121] "grade"        "psycholog"    "diagnost"     "morbid"       "view"        
[126] "delay"        "primari"      "state"       

 [1] "slice"     "invers"    "dimens"    "reduct"    "regress"   "averag"   
 [7] "sir"       "direct"    "central"   "goal"      "respons"   "save"     
[13] "subset"    "method"    "predictor" "subspac"   "varianc"   "preserv"  
[19] "replac"    "suffici"   "systemat" 

 [1] "band"       "confid"     "simultan"   "consid"     "trajectori"
 [6] "extend"     "choos"      "regular"    "asymptot"   "ball"      
[11] "uniform"   

[1] "absolut"  "deviat"   "clip"     "oracl"    "progress"

 [1] "breakdown"  "point"      "robust"     "depth"      "locat"     
 [6] "project"    "equivari"   "finit"      "function"   "possess"   
[11] "contamin"   "competitor" "affin"      "definit"    "introduc"  
[16] "lead"       "induc"      "influenc"   "high"       "outlier"   
[21] "strong"     "trim"       "median"     "region"     "york"      
[26] "scale"      "desir"      "favor"      "turn"       "pursu"     
[31] "enjoy"      "scatter"    "suffic"     "behav"      "uniform"   
[36] "relat"      "comparison" "suggest"    "fact"       "univari"   
[41] "ann"        "radius"    

 [1] "spacetim"       "spatial"        "fit"            "year"          
 [5] "site"           "separ"          "intens"         "california"    
 [9] "thin"           "process"        "monitor"        "residu"        
[13] "tempor"         "activ"          "multidimension" "occurr"        
[17] "space"          "background"     "appear"         "origin"        
[21] "smoother"       "irregular"      "earthquak"      "indic"         
[25] "asymmetr"       "trend"          "hazard"         "spectral"      
[29] "symmetr"        "environment"    "ozon"           "wind"          
[33] "meteorolog"     "daili"          "allow"          "rescal"        
[37] "season"         "time"           "anisotrop"      "cross"         
[41] "insid"          "bear"           "arbitrari"      "autoregress"   
[45] "interact"       "magnitud"       "sequenc"        "homogen"       
[49] "widespread"     "sphere"         "coordin"        "highlight"     
[53] "elabor"         "extrem"         "ascertain"      "forest"        
[57] "counti"         "rotat"          "month"          "threat"        
[61] "govern"         "secondari"      "aic"            "account"       
[65] "aid"            "emphas"         "routin"         "assess"        
[69] "departur"       "rare"          

 [1] "survey"      "nonrespons"  "census"      "nation"      "respond"    
 [6] "imput"       "popul"       "health"      "race"        "bureau"     
[11] "nonignor"    "unit"        "respons"     "item"        "incom"      
[16] "miss"        "person"      "year"        "state"       "bias"       
[21] "employ"      "higher"      "valu"        "sensit"      "interview"  
[26] "labor"       "nonrespond"  "age"         "feder"       "collect"    
[31] "measur"      "handl"       "assess"      "report"      "level"      
[36] "counti"      "domain"      "preval"      "agenc"       "confidenti" 
[41] "benchmark"   "incorpor"    "protect"     "status"      "cell"       
[46] "earn"        "produc"      "sourc"       "relat"       "weight"     
[51] "propens"     "public"      "household"   "area"        "geograph"   
[56] "nutrit"      "document"    "lower"       "plan"        "bodi"       
[61] "gender"      "extrapol"    "preliminari" "birth"       "polit"      
[66] "correct"     "american"    "proxi"       "requir"      "previous"   
[71] "children"    "york"        "unemploy"    "death"      

 [1] "jackknif"  "file"      "replic"    "varianc"   "inconsist" "strata"   
 [7] "analyt"    "unbias"    "met"       "domain"    "schedul"   "freedom"  
[13] "survey"    "attain"    "balanc"    "mix"       "ensur"     "public"   
[19] "repeat"    "upper"     "bootstrap" "uncondit"  "plausibl"  "person"   
[25] "pseudo"    "concern"   "linkag"   

[1] "root"     "squar"    "approxim"

 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "gene"          "latent"        "viral"         "initi"        
 [9] "biomark"       "understand"    "protein"       "pronounc"     
[13] "infect"        "therapi"       "supplementari" "quantifi"     
[17] "concentr"      "chemic"        "tackl"         "incorrect"    
[21] "healthi"       "identifi"      "molecular"     "human"        
[25] "serum"         "hormon"        "investig"      "experiment"   
[29] "search"        "status"        "sort"          "drug"         
[33] "inflat"        "pertin"        "mediat"        "mutat"        
[37] "resist"        "absent"        "blood"         "exemplifi"    
[41] "valuabl"       "phenotyp"      "led"           "indic"        
[45] "subsequ"       "format"        "framework"    

[1] "establish" "asymptot"  "consist"   "converg"  

[1] "bootstrap" "confid"    "distribut" "sampl"     "interv"    "method"   
[7] "correct"   "seri"      "empir"    

 [1] "imag"     "magnet"   "reson"    "field"    "brain"    "fmri"    
 [7] "activ"    "voxel"    "signal"   "detect"   "locat"    "volum"   
[13] "accur"    "follow"   "task"     "motion"   "region"   "visual"  
[19] "identifi" "exploit"  "tissu"    "aim"      "contigu"  "map"     
[25] "rotat"    "neuron"  

 [1] "ozon"            "maxima"          "splinebas"       "nonlinear"      
 [5] "piecewiselinear" "concentr"        "pressur"         "cycl"           
 [9] "transport"       "variat"          "contribut"       "atmospher"      
[13] "peak"            "trend"           "measur"          "basi"           
[17] "evid"            "instrument"      "thought"         "greater"        
[21] "link"            "scientif"        "lag"             "dimensionreduct"
[25] "absenc"          "wave"            "global"          "separ"          
[29] "month"           "coincid"         "influenc"        "lowdimension"   
[33] "clear"           "contrast"        "lower"           "year"           
[37] "site"            "qualiti"         "profil"          "sequenc"        
[41] "sensit"          "origin"          "relat"           "presenc"        
[45] "satellit"        "partial"         "pattern"         "identifi"       

 [1] "experienc" "event"     "deterior"  "trial"     "aberr"     "patient"  
 [7] "import"    "die"       "protocol"  "benefici"  "rank"      "treatment"
[13] "mention"   "wilcoxon"  "receiv"    "children"  "aspect"    "consequ"  
[19] "exact"     "preserv"   "fisher"    "placebo"   "sort"      "magnitud" 
[25] "longer"    "medic"     "exposur"   "adequ"     "discard"   "greatest" 
[31] "fact"      "need"      "invert"    "substanti" "subsequ"   "tabl"     
[37] "remov"     "exhibit"   "way"       "basic"     "singl"     "health"   
[43] "aim"       "care"      "interv"    "complet"   "specif"    "sum"      
[49] "question"  "cubic"     "cancer"    "situat"    "extrem"    "splinebas"
[55] "outcom"    "treat"     "rotat"     "control"   "binari"    "effect"   

 [1] "electr"        "forecast"      "renew"         "bivari"       
 [5] "load"          "market"        "daili"         "power"        
 [9] "serial"        "shortterm"     "wind"          "autoregress"  
[13] "diagon"        "speed"         "time"          "season"       
[17] "focus"         "difficult"     "peak"          "spectrum"     
[21] "temperatur"    "regressor"     "heteroscedast" "firstord"     
[25] "total"         "highlight"     "energi"        "justifi"      
[29] "simpl"         "week"          "vari"          "hour"         
[33] "trend"         "citi"          "recogn"        "stationari"   
[37] "autocovari"    "detail"        "promis"        "realiti"      
[41] "favor"         "reveal"        "year"          "longmemori"   
[45] "gain"          "accuraci"      "exploit"       "predict"      
[49] "option"        "reliabl"       "price"         "evolut"       
[53] "avail"         "superpopul"   

 [1] "day"        "daili"      "record"     "time"       "financi"   
 [6] "activ"      "short"      "peak"       "consecut"   "help"      
[11] "autocovari" "appropri"   "intens"     "physic"     "character" 
[16] "measur"     "children"   "trade"      "strength"   "scalar"    
[21] "superposit" "incomplet"  "copi"      

[1] "secondord"   "firstord"    "accur"       "expans"      "unbias"     
[6] "moment"      "approxim"    "frequentist" "exact"      

 [1] "treatment"  "assign"     "causal"     "score"      "outcom"    
 [6] "propens"    "averag"     "effect"     "grade"      "school"    
[11] "potenti"    "stratif"    "promot"     "confound"   "rubin"     
[16] "student"    "unit"       "regim"      "educ"       "adjust"    
[21] "children"   "plausibl"   "polici"     "program"    "evid"      
[26] "pretreat"   "posttreat"  "summar"     "stage"      "child"     
[31] "intermedi"  "assumpt"    "retain"     "multilevel" "block"     
[36] "econom"     "experiment" "stabl"      "arbitrari"  "nation"    
[41] "articl"     "balanc"     "learn"      "perspect"   "status"    
[46] "unmeasur"   "fewer"      "scalar"     "affect"     "low"       
[51] "mathemat"   "track"      "twostag"    "covari"     "tradeoff"  
[56] "recov"      "nonrandom"  "bind"       "pose"       "estimand"  
[61] "impos"      "feasibl"    "return"    

 [1] "extrapol"      "errorpron"     "posttreat"     "instrument"   
 [5] "classic"       "baselin"       "replic"        "subsampl"     
 [9] "nonlinear"     "daili"         "summari"       "air"          
[13] "encount"       "subset"        "bias"          "efficaci"     
[17] "heteroscedast" "frequenc"      "trajectori"    "spheric"      
[21] "supplementari" "correct"       "multiscal"     "scatter"      
[25] "reconstruct"   "subject"       "error"         "temperatur"   

 [1] "admiss"       "inadmiss"     "loss"         "bay"          "risk"        
 [6] "endpoint"     "action"       "ann"          "accept"       "math"        
[11] "genom"        "screen"       "stringent"    "result"       "complet"     
[16] "stepup"       "character"    "formul"       "treat"        "pearson"     
[21] "amer"         "assoc"        "biometrika"   "prototyp"     "vector"      
[26] "pay"          "reject"       "decad"        "revisit"      "metaanalysi" 
[31] "criteria"     "effort"       "bioassay"     "thought"      "hard"        
[36] "psycholog"    "nonneg"       "predetermin"  "fals"         "energi"      
[41] "earlier"      "educ"         "hoc"          "stein"        "emerg"       
[46] "fair"         "dna"          "appeal"       "sign"         "singlestep"  
[51] "drug"         "microarray"   "statistician" "jeffrey"      "year"        
[56] "fewer"        "fisher"       "paper"        "resembl"      "paradox"     
[61] "share"        "twodimension" "nonzero"      "stepdown"     "seek"        
[66] "expect"      

 [1] "unbound"   "novelti"   "function"  "yield"     "oracl"     "tail"     
 [7] "decreas"   "satisfi"   "anisotrop" "inequ"     "median"    "slower"   
[13] "literatur" "bivari"    "free"      "vast"      "fast"      "input"    
[19] "setup"     "output"    "aggreg"    "aforement" "behav"     "influenti"
[25] "iii"       "bound"     "univers"   "main"      "nuclear"   "radius"   
[31] "need"      "tilt"      "hyperplan" "higherord" "symmetri"  "equivari" 
[37] "gee"       "scatter"   "bin"       "quadrat"  

 [1] "schedul"         "longitudin"      "followup"        "analys"         
 [5] "phase"           "generat"         "incomplet"       "flexibl"        
 [9] "respons"         "avail"           "ill"             "unbalanc"       
[13] "pursu"           "offer"           "enter"           "resourc"        
[17] "impact"          "merg"            "concret"         "intermitt"      
[21] "interim"         "preced"          "perfect"         "divid"          
[25] "maker"           "face"            "preliminari"     "fluctuat"       
[29] "missingatrandom" "versatil"        "alloc"           "timetoev"       
[33] "withinsubject"   "compromis"       "manag"           "metropoli"      
[37] "missingdata"     "walk"            "logrank"        

[1] "real"    "simul"   "data"    "illustr"

[1] "misspecifi" "robust"     "misspecif" 

Overall the results for pseudocounts 0.01-1 look kind of reasonable…

Looking at memberships

Looking at the non-zero memberships, it seems all four pseudo-counts result in similar overall levels of sparsity of \(L\).

hist_lnorm = function(fit,...){
  LL = fit$L_pm
  Lnorm = t(t(LL)/apply(LL,2,max))

Here I threshold the normalized L values at 0.2 to get an idea of how many factors are present per document. All the documents are loaded on the first factor so the ones that load on only one factor can be thought of as not really being assigned to any topic.

LL = fit.nn.s.01$L_pm
FF = fit.nn.s.01$F_pm
Lnorm = t(t(LL)/apply(LL,2,max))
Fnorm = t(t(FF)*apply(LL,2,max))

nfac = rowSums(Lnorm>0.2)
hist(nfac,breaks = seq(0.5,9.5,length=10))

Here I make an initial structure plot of the results.

structure_plot_general = function(Lhat,Fhat,grouping,title=NULL,
                                  loadings_order = 'embed',
                                  n_samples = NULL,
                                  std_L_method = 'sum_to_1',
                                  K = NULL
  #s       <- apply(Lhat,2,max)
  #Lhat    <-   t(t(Lhat) / s)

  if(is.null(n_samples)&all(loadings_order == "embed")){
    n_samples = 2000

    Lhat = Lhat/rowSums(Lhat)
    Lhat = Lhat/c(apply(Lhat,1,max))
    Lhat = apply(Lhat,2,function(z){z/max(z)})
    Lhat = apply(Lhat,2,function(z){z/norm(z,'2')})
    Lhat = Lhat[,1:K]
    Fhat = Fhat[,1:K]
  Fhat = matrix(1,nrow=3,ncol=ncol(Lhat))
    colnames(Lhat) <- paste0("k",1:ncol(Lhat))
  fit_list     <- list(L = Lhat,F = Fhat)
  class(fit_list) <- c("multinom_topic_model_fit", "list")
  p <- structure_plot(fit_list,grouping = grouping,
                      loadings_order = loadings_order,
                      n = n_samples,gap = gap,verbose=F) +
    labs(y = "loading",color = "dim",fill = "dim") + ggtitle(title)
    p <- p + theme(legend.position="none")

This is structure plot (with first common factor set to 0)

Running tsne on 1924 x 108 matrix.

structure_plot_general(Lnorm,Fnorm,std_L_method = "col_max_1")
Running tsne on 1924 x 108 matrix.

Repeat for smaller pseudocount

LL = fit.nn.s.001$L_pm
FF = fit.nn.s.001$F_pm
Lnorm = t(t(LL)/apply(LL,2,max))
Fnorm = t(t(FF)*apply(LL,2,max))

nfac = rowSums(Lnorm>0.2)
hist(nfac,breaks = seq(0.5,9.5,length=10))

This is structure plot (with first common factor set to 0)

Running tsne on 1924 x 84 matrix.

Here without making the columns sum to 1. It is interesting that the plot seems to make the memberships here look more “binary” than for the larger pseudo-count.

structure_plot_general(Lnorm,Fnorm,std_L_method = "col_max_1")
Running tsne on 1924 x 84 matrix.

Thresholding factors

One thing I noticed is that some factors have a single document that is “driving” them (membership 1 in the normalized L), and no other document that has appreciable membership (say 0.5) even though several documents will have membership. For example, take topic 86 in the 01 fit. From the keywords it looks like “recommender system” factor, but also a “nearest neighbor” factor. It seems to be driven by a single document that has both those features.

 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   
LL = fit.nn.s.01$L_pm
FF = fit.nn.s.01$F_pm
Lnorm = t(t(LL)/apply(LL,2,max))
Fnorm = t(t(FF)*apply(LL,2,max))


order(Lnorm[,86],decreasing = TRUE)[1:4]
[1] 1181 1395 1460 1024
[1] "Collaborative recommendation is an information-filtering technique that attempts to present information items that are likely of interest to an Internet user. Traditionally, collaborative systems deal with situations with two types of variables, users and items. In its most common form, the problem is framed as trying to estimate ratings for items that have not yet been consumed by a user. Despite wide-ranging literature, little is known about the statistical properties of recommendation systems. In fact, no clear probabilistic model even exists which would allow us to precisely describe the mathematical forces driving collaborative filtering. To provide an initial contribution to this, we propose to set out a general sequential stochastic model for collaborative recommendation. We offer an in-depth analysis of the so-called cosine-type nearest neighbor collaborative method, which is one of the most widely used algorithms in collaborative filtering, and analyze its asymptotic performance as the number of users grows. We establish consistency of the procedure under mild assumptions on the model. Rates of convergence and examples are also provided."
[1] "It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of the actual sample size, in the case of with-replacement bagging, or less than 50% of the sample size, for without-replacement bagging. However, for larger sampling fractions there is no asymptotic difference between the risk of the regular nearest neighbour classifier and its bagged version. In particular, neither achieves the large sample performance of the Bayes classifier. In contrast, when the sampling fractions converge to 0, but the resample sizes diverge to infinity, the bagged classifier converges to the optimal Bayes rule and its risk converges to the risk of the latter. These results are most readily seen when the two populations have well-defined densities, but they may also be derived in other cases, where densities exist in only a relative sense. Cross-validation can be used effectively to choose the sampling fraction. Numerical calculation is used to illustrate these theoretical properties."
[1] "Traditionally the neighbourhood size k in the k-nearest-neighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the k-nearest-neighbour algorithm using likelihood-based inference. Our method takes the form of a generalised linear regression on a set of k-nearest-neighbour autocovariates. By defining the k-nearest-neighbour algorithm in this way we are able to extend the method to accommodate the original predictor variables as possible linear effects as well as allowing for the inclusion of multiple nearest-neighbour terms. The choice of the final model proceeds via a stepwise regression procedure. It is shown that our method incorporates a conventional generalised linear model and a conventional k-nearest-neighbour algorithm as special cases. Empirical results suggest that the method out-performs the standard k-nearest-neighbour method in terms of misclassification rate on a wide variety of data-sets."
[1] "In this article we study random forests through their connection with a new framework of adaptive nearest-neighbor methods. We introduce a concept of potential nearest neighbors (k-PNNs) and show that random forests can be viewed as adaptively weighted k-PNN methods. Various aspects of random forests can be studied from this perspective. We study the effect of terminal node sizes on the prediction accuracy of random forests. We further show that random forests with adaptive splitting schemes assign weights to k-PNNs in a desirable way: for the estimation at a given target point, these random forests assign voting weights to the k-PNNs of the target point according to the local importance of different input variables. We propose a new simple splitting scheme that achieves desirable adaptivity in a straightforward fashion. This simple scheme can be combined with existing algorithms. The resulting algorithm is computationally faster and gives comparable results. Other possible aspects of random forests, such as using linear combinations in splitting, are also discussed. Simulations and real datasets are used to illustrate the results."

It seems that this factor is being “polluted” by the strongest single document - it is perhaps actually a “nearest neighbor” factor, not a “recommender system” factor.

Here I look at some other factors that have a single outlying document to see what they look like

 [1]  17  68  69  74  86  92  99 100 101 105 106
 [1] "penalis"       "newtonraphson" "framingham"    "penalti"      
 [5] "likelihood"    "heart"         "failur"        "carri"        
 [9] "algorithm"     "proper"        "conduct"       "advanc"       
[13] "grow"          "dropout"       "familiar"      "prospect"     

 [1] "integ"      "algebra"    "coher"      "ail"        "ident"     
 [6] "countabl"   "multist"    "system"     "appl"       "finit"     
[11] "classic"    "object"     "ideal"      "grid"       "util"      
[16] "math"       "fewer"      "state"      "call"       "binari"    
[21] "inequ"      "pure"       "geometri"   "comprehens" "alpha"     
[26] "posit"      "socal"      "repres"     "idea"       "complex"   
[31] "probabl"    "yield"      "failur"     "relat"      "type"      

 [1] "car"         "polytop"     "partit"      "height"      "combinatori"
 [6] "mechan"      "rais"        "hierarchi"   "convex"      "need"       
[11] "extrem"      "stein"       "descript"    "meaning"     "discret"    
[16] "object"      "geometr"     "parsimoni"   "oil"         "notion"     
[21] "satisfi"     "character"   "exponenti"   "interpret"   "unusu"      
[26] "maxim"       "neighbor"    "assumpt"     "uniform"     "dramat"     
[31] "class"       "point"       "sure"       

 [1] "digit"       "fals"        "alarm"       "imag"        "geometr"    
 [6] "definit"     "expect"      "sequenti"    "minim"       "principl"   
[11] "meaning"     "meet"        "framework"   "kind"        "priori"     
[16] "maxim"       "prove"       "theori"      "contain"     "mathemat"   
[21] "compat"      "align"       "display"     "part"        "occurr"     
[26] "explain"     "basic"       "structur"    "number"      "hidden"     
[31] "stop"        "delay"       "probabilist" "rigor"       "fine"       
[36] "walk"        "chang"       "changepoint" "renew"      

 [1] "collabor"    "nearest"     "item"        "user"        "consum"     
 [6] "tradit"      "recommend"   "system"      "neighbor"    "filter"     
[11] "frame"       "clear"       "fact"        "contribut"   "forc"       
[16] "grow"        "drive"       "probabilist" "mathemat"    "precis"     
[21] "socal"       "initi"       "deal"        "mild"        "attempt"    
[26] "offer"       "neighbour"   "provid"      "literatur"   "algorithm"  
[31] "sequenti"   

 [1] "seri"        "week"        "time"        "stationari"  "generat"    
 [6] "superposit"  "autoregress" "renew"       "autocovari"  "binomi"     
[11] "day"         "longmemori"  "count"       "predict"     "thin"       
[16] "focus"       "fit"         "contrast"    "consecut"    "integ"      
[21] "simpl"       "poisson"     "short"       "geometr"     "parsimoni"  
[26] "copi"        "bernoulli"   "previous"    "discret"     "electr"     
[31] "daili"       "key"         "differ"      "trial"       "market"     
[36] "margin"      "sequenc"     "forecast"    "load"       

 [1] "extrem"      "precipit"    "spatial"     "pareto"      "station"    
 [6] "uncertainti" "climatolog"  "hierarchi"   "exceed"      "threshold"  
[11] "quantif"     "produc"      "return"      "captur"      "region"     
[16] "intens"      "frequenc"    "hierarch"    "plan"        "weather"    
[21] "interpol"    "map"         "purpos"      "binomi"      "coordin"    
[26] "driven"      "geograph"    "daili"       "separ"       "character"  
[31] "fulli"       "latent"      "improv"     

 [1] "enter"     "pursu"     "project"   "preced"    "phase"     "maker"    
 [7] "schedul"   "resourc"   "decis"     "minim"     "concret"   "perfect"  
[13] "divid"     "total"     "strategi"  "alloc"     "expect"    "face"     
[19] "generat"   "manag"     "chosen"    "state"     "formul"    "unknown"  
[25] "point"     "exampl"    "breakdown" "unit"     

 [1] "polya"       "appreci"     "tree"        "cancer"      "surveil"    
 [6] "spatial"     "sophist"     "epidemiolog" "unrealist"   "institut"   
[11] "offer"       "program"     "fulli"       "analyt"      "nation"     
[16] "flexibl"     "lattic"      "compet"      "orient"      "feasibl"    
[21] "impos"       "obtain"      "aspect"      "remain"      "timetoev"   
[26] "breast"      "ignor"       "urn"         "mixtur"      "advantag"   
[31] "framework"   "featur"     

 [1] "underestim"    "overestim"     "lemma"         "abrupt"       
 [5] "respect"       "admit"         "stein"         "identif"      
 [9] "moder"         "satisfi"       "nontrivi"      "impli"        
[13] "detail"        "deviat"        "loglikelihood" "benchmark"    
[17] "moment"        "nest"          "yield"         "exponenti"    
[21] "decay"         "deal"          "difficulti"    "mild"         
[25] "posit"         "relat"         "version"       "prove"        

 [1] "retail"    "custom"    "compani"   "deliveri"  "consum"    "tradit"   
 [7] "onlin"     "tail"      "quantiti"  "frequenc"  "market"    "total"    
[13] "joint"     "differ"    "firm"      "articl"    "cost"      "week"     
[19] "daili"     "translat"  "tie"       "decis"     "intend"    "household"
[25] "prevent"   "bivari"    "activ"     "aid"       "simpli"    "accur"    
[31] "forecast"  "compon"    "element"   "commerci"  "success"   "bank"     
[37] "incur"     "period"    "center"    "repres"    "arriv"     "frequent" 
[43] "organ"     "concern"   "impact"    "descript" 

order(Lnorm[,17],decreasing = TRUE)[1:4]
[1] 1789  475  792 1781
[1] "In this paper, we propose a penalised pseudo-partial likelihood method for variable selection with multivariate failure time data with a growing number of regression coefficients. Under certain regularity conditions, we show the consistency and asymptotic normality of the penalised likelihood estimators. We further demonstrate that, for certain penalty functions with proper choices of regularisation parameters, the resulting estimator can correctly identify the true model, as if it were known in advance. Based on a simple approximation of the penalty function, the proposed method can be easily carried out with the Newton-Raphson algorithm. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed procedures. We illustrate the proposed method by analysing a dataset from the Framingham Heart Study."
[1] "Pattern-mixture models are frequently used for longitudinal data analysis with dropouts because they do not require explicit specification Of the dropout mechanism. These models stratify the data according to time to dropout and formulate a model for each stratum. This usually results in underindentifiability, because we need to estimate many pattern-specific parameters even though the eventual interest is usually or, the marginal parameters. In this article we extend this framework to a random pattern-mixture model, where the pattern-specific parameters are treated as nuisance parameters and modeled as random instead of fixed. The pattern is defined according to a surrogate for the dropout process. A constraint is then put oil the pattern by linking it to the time to dropout using a random-effects survival model. We assume, conditional on the latent pattern effects. that the longitudinal outcome and the dropout process are independent. This model retains the robustness of the traditional pattern-mixture models. while avoiding the overparameterization problem. When we define each subject as a separate stratum. this model reduces to the shared parameter model. Maximum likelihood estimates are obtained using an EM Newton-Raphson algorithm. We apply the method to the depression data from the Prevention of Suicide in Primary Care Elderly Collaborative Trial (PROSPECT). We show when the dropout information is adjusted for under the proposed model, the treatment seems to reduce depression in the elderly."
[1] "We propose a nonparametric method for identifying parsimony and for producing a statistically efficient estimator of a large covariance matrix. We reparameterise a covariance matrix through the modified Cholesky decomposition of its inverse or the one-step-ahead predictive representation of the vector of responses and reduce the nonintuitive task of modelling covariance matrices to the familiar task of model selection and estimation for a sequence of regression models. The Cholesky factor containing these regression coefficients is likely to have many off-diagonal elements that are zero or close to zero. Penalised normal likelihoods in this situation with L-1 and L-2 penalities are shown to be closely related to Tibshirani's (1996) LASSO approach and to ridge regression. Adding either penalty to the likelihood helps to produce more stable estimators by introducing shrinkage to the elements in the Cholesky factor, while, because of its singularity, the L-1 penalty will set some elements to zero and produce interpretable models. An algorithm is developed for computing the estimator and selecting the tuning parameter. The proposed maximum penalised likelihood estimator is illustrated using simulation and a real dataset involving estimation of a 102 x 102 covariance matrix."
[1] "This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton-Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost."

order(Lnorm[,69],decreasing = TRUE)[1:4]
[1] 1867  288 1751 1258
[1] "We show that the class of conditional distributions satisfying the coarsening at random (CAR) property for discrete data has a simple and robust algorithmic description based oil randomized uniform multicovers: combinatorial objects generalizing the notion of partition of a set. However, the complexity of a given CAR mechanism can be large: the maximal \"height\" of the needed multicovers can be exponential in the number of points, in the sample space. The results stein from a geometric interpretation of the set of CAR distributions as a convex polytope and a characterization of its extreme points. The hierarchy of CAR models defined in this way could be useful in parsimonious statistical modeling of CAR mechanisms, though the results also raise doubts in applied work as to the meaningfulness of the CAR assumption in its full generality."
[1] "Attachment loss, the extent of a tooth's root (in millimeters) that is no longer attached to surrounding bone by periodontal ligament, is often used to measure the current state of a patient's periodontal disease and monitor disease progression. Attachment loss data can be analyzed using a conditionally autoregressive (CAR) prior distribution that smooths fitted values toward neighboring values. However, it may be desirable to have more than one class of neighbor relation in the spatial structure, so the different classes of neighbor relations can induce different degrees of smoothing. For example, we may wish to allow smoothing of neighbor pairs bridging the gap between teeth to differ from smoothing of pairs that do not bridge such gaps. Adequately modeling the spatial structure may improve the monitoring of periodontal disease progression. This article develops a two-neighbor-relation CAR model to handle this situation and presents associated theory to help explain the sometimes unusual posterior distributions of the parameters controlling the different types of smoothing. The posterior of these smoothing parameters often has long upper tails, and its shape can change dramatically depending on the spatial structure. Like previous authors, we show that the prior distribution on these parameters has little effect on the posterior of the fixed effects but has a marked influence on the posterior of both the random effects and the smoothing parameters. Our analysis of attachment loss data also suggests that the spatial structure itself varies between individuals."
[1] "An easy-to-implement global procedure for testing the four assumptions of the linear model is proposed. The test can be viewed as a Neyman smooth test and relies only on the standardized residual vector. If the global procedure indicates a violation of at least one of the assumptions, then the components of the global test statistic can be used to gain insight into which assumptions have been violated. The procedure can also be used in conjunction with associated deletion statistics to detect unusual observations. Simulation results are presented indicating the sensitivity of the procedure in detecting model violations under a variety of situations, and its performance is compared with three potential competitors, including a procedure based on the Box-Cox power transformation. The procedure is demonstrated by applying it to a new car mileage dataset and a water salinity dataset that has been used earlier to illustrate model diagnostics."
[1] "This paper provides answers to questions regarding the almost sure limiting behavior of rooted, binary tree-structured rules for regression. Examples show that questions raised by Gordon and Olshen in 1984 have negative answers. For these examples of regression functions and sequences of their associated binary tree-structured approximations, for all regression functions except those in a set of the first category, almost sure consistency fails dramatically on events of full probability. One consequence is that almost sure consistency of binary tree-structured rules such as CART requires conditions beyond requiring that (1) the regression function be in L-1, (2) partitions of a Euclidean feature space be into polytopes with sides parallel to coordinate axes, (3) the mesh of the partitions becomes arbitrarily fine almost surely and (4) the empirical learning sample content of each polytope be \"large enough.\" The material in this paper includes the solution to a problem raised by Dudley in discussions. The main results have a corollary regarding the lack of almost sure consistency of certain Bayes-risk consistent rules for classification."

Generally speaking it seems that these factors are not very interpretable, and should perhaps be filtered out. That is what motivated me to implement the ‘docfilter’ variable in the ’get_keywords” function.

print(get_keywords(fit.nn.s.1,docfilter = 1))
[1] "model"  "estim"  "method" "data"  

 [1] "fals"      "procedur"  "control"   "test"      "discoveri" "rate"     
 [7] "reject"    "hypothes"  "fdr"       "multipl"   "pvalu"     "null"     
[13] "number"    "kfwer"    

[1] "test"      "null"      "hypothesi" "distribut"

 [1] "treatment" "trial"     "random"    "assign"    "patient"   "effect"   
 [7] "outcom"    "clinic"    "causal"    "placebo"   "assumpt"  

[1] "surviv" "time"   "hazard" "censor" "failur" "studi" 

[1] "simex"              "measur"             "simulationextrapol"
[4] "error"             

[1] "wilk"

[1] "lasso"    "select"   "variabl"  "regress"  "coeffici"

[1] "rankbas"  "effici"   "asymptot" "rank"    

[1] "nconsist"

[1] "assoc"   "amer"    "statist" "ann"    

[1] "mle"        "likelihood" "maximum"   

[1] "varyingcoeffici"

[1] "semiparametr" "estim"        "model"        "parametr"    

 [1] "adapt"      "wavelet"    "besov"      "minimax"    "ball"      
 [6] "rang"       "threshold"  "risk"       "deconvolut" "nois"      

[1] "memori"

[1] "bandwidth" "kernel"    "local"     "select"   

[1] "forecast"    "predict"     "wind"        "weather"     "spatial"    
[6] "calibr"      "speed"       "meteorolog"  "probabilist"

[1] "choleski"   "matrix"     "covari"     "decomposit" "factor"    
[6] "interpret" 

[1] "mse"       "predictor" "linear"    "error"     "squar"     "empir"    

[1] "depth"   "project"

[1] "singleindex" "function"    "link"        "compon"      "unknown"    

[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[1] "penal"      "nonconcav"  "likelihood" "select"     "variabl"   
[6] "oracl"      "penalti"    "regular"   

[1] "jackknif" "mix"      "squar"    "area"     "varianc" 

[1] "homoscedast"   "heteroscedast"

[1] "spline" "smooth"

[1] "survey" "popul"  "sampl" 

[1] "equivari"  "affin"     "matrix"    "introduc"  "breakdown" "concept"  
[7] "scatter"  

[1] "onestep"

[1] "process"    "thin"       "point"      "fit"        "spatial"   
[6] "residu"     "stationari" "intens"    

[1] "nonnorm"

[1] "polynomi" "local"    "regress" 

[1] "gee"     "equat"   "correl"  "general" "binari"  "work"   

[1] "theta"   "paramet"

[1] "robin"     "miss"      "zhao"      "rotnitzki" "effici"   

[1] "mestim" "robust"

[1] "finitesampl"

[1] "sobolev" "densiti" "minimax" "rate"   

[1] "elect" "vote"  "poll" 

[1] "errorpron" "error"    

[1] "panel" "count"

[1] "stock"

[1] "garch"   "process" "volatil"

[1] "secondord"

[1] "equat" "estim"

[1] "slice"   "invers"  "regress" "dimens"  "method" 

[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[1] "survivor"

[1] "slope"

[1] "chi"  "test"

[1] "varianc"

[1] "function"   "eigenfunct" "analysi"    "random"     "princip"   
[6] "compon"     "data"      

[1] "tabl"    "conting"

[1] "criterion" "akaik"     "select"    "model"    

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian" 

[1] "neighborhood"

[1] "maximum"    "welldefin"  "posteriori"
print(get_keywords(fit.nn.s.01,docfilter = 1))
 [1] "model"       "estim"       "data"        "method"      "propos"     
 [6] "studi"       "simul"       "distribut"   "function"    "sampl"      
[11] "paramet"     "approach"    "statist"     "base"        "asymptot"   
[16] "problem"     "general"     "regress"     "analysi"     "test"       
[21] "develop"     "procedur"    "perform"     "illustr"     "condit"     
[26] "set"         "applic"      "observ"      "variabl"     "likelihood" 
[31] "consist"     "time"        "appli"       "covari"      "properti"   
[36] "random"      "comput"      "articl"      "linear"      "case"       
[41] "process"     "infer"       "error"       "select"      "number"     
[46] "effici"      "rate"        "nonparametr" "deriv"       "measur"     
[51] "effect"      "algorithm"   "class"       "paper"       "compar"     
[56] "provid"      "includ"      "depend"     

 [1] "fals"       "control"    "procedur"   "test"       "rate"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "fdr"        "hochberg"   "number"     "stepdown"  
[16] "kfwer"      "familywis"  "error"      "depend"     "proport"   
[21] "benjamini"  "fwer"       "statist"    "fdp"        "soc"       
[26] "divid"      "power"      "roy"        "stepup"     "alpha"     
[31] "deriv"      "abil"       "ser"        "individu"   "detect"    
[36] "gamma"      "total"      "hypothesi"  "conserv"    "toler"     
[41] "attent"     "defin"      "singlestep" "construct"  "fix"       
[46] "simultan"   "probabl"    "independ"   "ann"        "usual"     
[51] "sime"       "improv"     "increas"   

 [1] "treatment"   "random"      "trial"       "patient"     "effect"     
 [6] "assign"      "noncompli"   "assumpt"     "outcom"      "complianc"  
[11] "causal"      "adher"       "depress"     "placebo"     "receiv"     
[16] "care"        "subject"     "clinic"      "intervent"   "drug"       
[21] "arm"         "dose"        "improv"      "primari"     "treat"      
[26] "princip"     "analys"      "latent"      "elder"       "control"    
[31] "sever"       "contrast"    "instrument"  "stratif"     "activ"      
[36] "particip"    "framework"   "prevent"     "potenti"     "physician"  
[41] "benefit"     "infer"       "imperfect"   "children"    "encourag"   
[46] "estimand"    "doserespons"

 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "event"        "semiparametr" "proport"      "data"        
[11] "cancer"       "covari"       "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "rightcensor" 
[21] "consist"      "nonparametr"  "trial"       

[1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
[7] "power"     "asymptot"  "hypothes" 

 [1] "simex"              "simulationextrapol" "measur"            
 [4] "error"              "undersmooth"        "asymptot"          
 [7] "longer"             "accuraci"           "finitesampl"       
[10] "principl"           "bias"               "presenc"           
[13] "selector"           "wang"               "rootn"             

 [1] "wilk"       "ratio"      "phenomenon" "correct"    "relax"     
 [6] "conduct"    "newli"      "unspecifi"  "freedom"    "follow"    
[11] "backfit"    "nuisanc"    "theorem"    "degre"      "chisquar"  
[16] "likelihood" "empir"      "ask"        "hold"      

 [1] "mle"         "maximum"     "likelihood"  "main"        "prove"      
 [6] "asymptot"    "converg"     "limit"       "mles"        "status"     
[11] "rate"        "current"     "brownian"    "behavior"    "motion"     
[16] "estim"       "proof"       "uniqu"       "nonparametr"

 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "algorithm"
 [7] "posterior" "infer"     "prior"     "model"     "mcmc"     

 [1] "lasso"     "select"    "variabl"   "regress"   "coeffici"  "spars"    
 [7] "penalti"   "adapt"     "linear"    "oracl"     "penal"     "problem"  
[13] "sparsiti"  "algorithm" "regular"  

[1] "varyingcoeffici" "nonparametr"     "coeffici"        "linear"         
[5] "longitudin"      "conduct"         "propos"          "vari"           
[9] "regress"        

 [1] "rankbas"      "effici"       "asymptot"     "rank"         "ellipt"      
 [6] "cam"          "class"        "densiti"      "uniform"      "normal"      
[11] "version"      "sign"         "multivari"    "matric"       "symmetri"    
[16] "valid"        "finit"        "scatter"      "ann"          "contour"     
[21] "tradit"       "assumpt"      "sens"         "irrespect"    "rootn"       
[26] "semiparametr" "center"      

 [1] "nconsist" "root"     "reduct"   "dimens"   "exist"    "direct"  
 [7] "central"  "slice"    "exhaust"  "contour"  "ellipt"   "advantag"
[13] "mild"     "strong"   "regress"  "varianc"  "suffici"  "invers"  
[19] "averag"  

 [1] "semiparametr" "estim"        "nonparametr"  "parametr"     "paramet"     
 [6] "model"        "effici"       "asymptot"     "likelihood"   "regress"     
[11] "function"    

 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  

 [1] "nonconcav"     "penal"         "select"        "oracl"        
 [5] "penalti"       "variabl"       "likelihood"    "regular"      
 [9] "fan"           "challeng"      "nondifferenti" "maxim"        
[13] "sandwich"      "onestep"       "establish"     "concav"       
[17] "broad"         "enjoy"         "employ"        "selector"     
[21] "encourag"      "cost"         

[1] NA

[1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
[5] "famili"        "error"        

[1] "nonnorm"   "normal"    "mix"       "linear"    "exponenti"

[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "taper"         "frequenc"      "long"          "fraction"     
 [9] "averag"        "depend"        "paramet"       "periodogram"  
[13] "stationari"    "move"          "slowli"        "whittl"       
[17] "eigenvector"   "local"         "nonstationari" "distinct"     
[21] "angl"         

 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

[1] "polynomi"    "local"       "regress"     "smooth"      "nonparametr"
[6] "asymptot"   

 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "condit"       
 [9] "moment"        "autoregress"   "financi"       "local"        
[13] "standard"      "innov"         "sequenc"       "satisfi"      
[17] "move"          "iid"           "time"          "averag"       
[21] "root"          "mont"          "carlo"        

[1] "quantil" "regress"

 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

 [1] "singleindex" "unknown"     "link"        "compon"      "equat"      
 [6] "function"    "varianc"     "nonparametr" "beta"        "femal"      
[11] "structur"    "smaller"     "compos"      "vectorvalu"  "eigenfunct" 
[16] "composit"    "econometr"  

[1] "finitesampl" "propos"     

 [1] "wavelet"    "adapt"      "besov"      "minimax"    "ball"      
 [6] "threshold"  "rang"       "nois"       "wide"       "unknown"   
[11] "rate"       "risk"       "bound"      "deconvolut" "smooth"    
[16] "problem"    "function"   "signal"     "white"      "converg"   
[21] "gaussian"   "transform"  "recov"      "densiti"    "shape"     
[26] "view"       "noisi"      "discret"    "nearoptim"  "spars"     
[31] "blur"       "fourier"    "decay"      "upper"      "convolut"  
[36] "invers"    

 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "weight"     "casecohort" "design"     "invers"     "twophas"   
[11] "cohort"     "random"     "causal"     "outcom"     "biometrika"
[16] "prentic"    "calcul"     "purpos"     "confound"   "lemma"     
[21] "mar"        "exemplifi"  "suit"       "amer"       "assoc"     
[26] "proceed"    "summar"     "cox"        "ser"        "soc"       
[31] "roy"        "iid"        "appear"     "unbias"    

[1] "maximum"    "likelihood" "estim"     

[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[1] "chi"       "test"      "distribut" "space"     "ratio"     "restrict" 
[7] "statist"  

[1] "coeffici" "regress" 

 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "low"           "optim"         "nonasymptot"   "highdimension"
[13] "convex"        "spars"         "minimax"       "noisi"        
[17] "element"       "minim"         "error"         "singular"     
[21] "setup"         "vector"        "theori"        "precis"       
[25] "autoregress"   "predict"      

 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[1] "unequ"     "designbas" "survey"    "weight"   

[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[1] "variancecovari" "matrix"         "analyz"        

[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[1] "bspline" "kernel"  "penal"  

[1] "varianc"  "asymptot"

 [1] "eigenfunct" "function"   "princip"    "compon"     "random"    
 [6] "analysi"    "data"       "smooth"     "eigenvalu"  "deriv"     
[11] "curv"       "spars"      "trajectori" "space"      "score"     

 [1] "forecast"    "predict"     "weather"     "spatial"     "wind"       
 [6] "probabilist" "northwest"   "calibr"      "pacif"       "meteorolog" 
[11] "temperatur"  "speed"       "hour"        "energi"      "atmospher"  
[16] "averag"      "ensembl"     "geostatist"  "futur"       "center"     
[21] "north"       "precipit"    "accur"       "tempor"      "daili"      
[26] "event"       "resourc"     "site"        "american"    "state"      
[31] "sharp"       "spacetim"    "qualiti"     "climat"      "ozon"       
[36] "concentr"    "generat"     "regim"       "transport"   "season"     
[41] "shortterm"   "determinist" "input"      

 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[1] "bar"    "vertic" "cap"    "lambda"

[1] NA

[1] NA

 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

[1] NA

 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[1] "iid"   "prove"

 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[1] "goodnessoffit" "test"          "includ"        "residu"       

[1] NA

 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[1] "size"   "sampl"  "number"

[1] NA

[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[1] "tilt"       "exponenti"  "constraint" "employ"    

 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

 [1] "elect"      "vote"       "poll"       "evid"       "candid"    
 [6] "presidenti" "count"      "station"    "forecast"   "proport"   
[11] "polit"      "prefer"     "counti"     "record"     "lower"     

[1] NA

[1] NA

[1] NA

 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

[1] NA

[1] NA

[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  
print(get_keywords(fit.nn.s.001,docfilter = 1))
  [1] "model"        "estim"        "data"         "method"       "propos"      
  [6] "studi"        "simul"        "distribut"    "function"     "sampl"       
 [11] "base"         "paramet"      "approach"     "statist"      "asymptot"    
 [16] "problem"      "general"      "regress"      "analysi"      "develop"     
 [21] "illustr"      "perform"      "procedur"     "test"         "applic"      
 [26] "condit"       "set"          "observ"       "variabl"      "appli"       
 [31] "consist"      "properti"     "likelihood"   "articl"       "time"        
 [36] "comput"       "covari"       "random"       "case"         "linear"      
 [41] "process"      "infer"        "number"       "error"        "effici"      
 [46] "select"       "rate"         "nonparametr"  "deriv"        "effect"      
 [51] "compar"       "measur"       "includ"       "provid"       "paper"       
 [56] "algorithm"    "class"        "depend"       "normal"       "demonstr"    
 [61] "bayesian"     "larg"         "assumpt"      "probabl"      "approxim"    
 [66] "addit"        "size"         "structur"     "optim"        "varianc"     
 [71] "exist"        "independ"     "construct"    "introduc"     "smooth"      
 [76] "real"         "theoret"      "compon"       "point"        "methodolog"  
 [81] "investig"     "requir"       "predict"      "standard"     "respons"     
 [86] "establish"    "common"       "empir"        "practic"      "converg"     
 [91] "work"         "maximum"      "term"         "discuss"      "combin"      
 [96] "finit"        "framework"    "design"       "parametr"     "multipl"     
[101] "assum"        "form"         "theori"       "simpl"        "carlo"       
[106] "limit"        "mont"         "lead"         "altern"       "numer"       
[111] "improv"       "local"        "involv"       "high"         "identifi"    
[116] "space"        "techniqu"     "prior"        "level"        "multivari"   
[121] "correl"       "fit"          "semiparametr" "increas"      "unknown"     
[126] "bias"         "small"        "exampl"       "order"        "direct"      
[131] "extend"       "defin"        "matrix"       "coeffici"     "dataset"     
[136] "implement"    "weight"       "control"      "densiti"      "markov"      
[141] "extens"       "adapt"        "evalu"        "relat"        "power"       
[146] "consid"       "analyz"       "robust"       "type"         "result"      
[151] "valu"         "assess"       "vector"       "seri"         "factor"      
[156] "popul"       

 [1] "fals"       "control"    "procedur"   "rate"       "test"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "familywis"  "hochberg"   "fdr"        "stepdown"  
[16] "error"      "kfwer"      "number"     "proport"    "benjamini" 
[21] "fwer"       "depend"     "statist"    "soc"        "divid"     
[26] "fdp"        "roy"        "abil"       "ser"        "alpha"     
[31] "deriv"      "individu"   "total"      "stepup"     "detect"    
[36] "toler"      "attent"     "power"      "gamma"      "defin"     
[41] "singlestep" "conserv"    "probabl"    "construct"  "hypothesi" 
[46] "fix"        "ann"        "simultan"   "restrict"   "usual"     
[51] "increas"    "structur"   "contrast"   "prove"      "goal"      
[56] "implicit"   "replac"     "resampl"    "independ"   "sime"      
[61] "holm"       "improv"     "sens"       "configur"   "stat"      
[66] "stringent"  "intersect"  "bonferroni" "der"        "appl"      
[71] "van"        "deal"       "order"     

 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "semiparametr" "proport"      "event"        "cancer"      
[11] "covari"       "data"         "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "consist"     
[21] "rightcensor"  "trial"        "subject"      "analysi"      "nonparametr" 
[26] "simul"        "equat"        "cohort"       "diseas"       "incid"       
[31] "patient"      "clinic"       "cure"         "recurr"       "compet"      
[36] "associ"       "joint"        "followup"     "frailti"      "timevari"    
[41] "bivari"       "margin"       "lengthbias"   "prostat"      "assumpt"     
[46] "coeffici"     "medic"        "breast"       "extens"       "propos"      

 [1] "simex"              "simulationextrapol" "undersmooth"       
 [4] "error"              "measur"             "asymptot"          
 [7] "accuraci"           "longer"             "bias"              
[10] "principl"           "finitesampl"        "selector"          
[13] "bandwidth"          "wang"               "epidemiolog"       
[16] "cook"               "rootn"              "difficulti"        
[19] "presenc"            "nutrit"             "decreas"           
[22] "compar"             "coverag"            "appropri"          
[25] "simul"              "tractabl"           "need"              
[28] "recommend"          "polynomi"           "engin"             
[31] "chisquar"           "scientist"          "errorpron"         

 [1] "wilk"           "ratio"          "phenomenon"     "correct"       
 [5] "relax"          "power"          "conduct"        "null"          
 [9] "newli"          "freedom"        "unspecifi"      "follow"        
[13] "hypothesi"      "degre"          "ask"            "nuisanc"       
[17] "chisquar"       "test"           "theorem"        "hold"          
[21] "backfit"        "attempt"        "admit"          "constant"      
[25] "demonstr"       "rescal"         "biascorrect"    "answer"        
[29] "zhang"          "scientif"       "fan"            "likelihood"    
[33] "withinsubject"  "pitman"         "asymptot"       "side"          
[37] "share"          "contemporari"   "popular"        "variancecovari"
[41] "singleindex"    "save"           "tau"            "kendal"        
[45] "coverag"       

 [1] "mle"         "maximum"     "likelihood"  "main"        "asymptot"   
 [6] "mles"        "prove"       "converg"     "limit"       "status"     
[11] "estim"       "brownian"    "current"     "motion"      "behavior"   
[16] "rate"        "proof"       "uniqu"       "siev"        "nonparametr"
[21] "ann"         "gap"         "drift"       "naiv"        "global"     
[26] "monoton"     "simpler"     "parametr"    "result"      "discuss"    
[31] "ergod"      

 [1] "varyingcoeffici" "nonparametr"     "linear"          "coeffici"       
 [5] "longitudin"      "conduct"         "vari"            "regress"        
 [9] "partial"         "propos"          "simul"           "backfit"        
[13] "thought"         "illustr"         "enjoy"           "fashion"        
[17] "twostep"         "contamin"        "pose"           

 [1] "rankbas"         "asymptot"        "effici"          "rank"           
 [5] "ellipt"          "cam"             "class"           "uniform"        
 [9] "test"            "densiti"         "version"         "multivari"      
[13] "normal"          "sign"            "valid"           "scatter"        
[17] "symmetri"        "matrix"          "matric"          "assumpt"        
[21] "finit"           "sens"            "ann"             "contour"        
[25] "irrespect"       "tradit"          "rootn"           "moment"         
[29] "actual"          "center"          "strict"          "equivari"       
[33] "gaussian"        "onestep"         "invari"          "finitesampl"    
[37] "concept"         "local"           "serial"          "bernoulli"      
[41] "shape"           "unspecifi"       "classic"         "acceler"        
[45] "respect"         "semiparametr"    "depth"           "null"           
[49] "univari"         "median"          "prespecifi"      "spheric"        
[53] "biometrika"      "distributionfre" "excel"          

 [1] "nconsist"   "root"       "reduct"     "exist"      "central"   
 [6] "direct"     "dimens"     "varianc"    "slice"      "exhaust"   
[11] "contour"    "mild"       "ellipt"     "strong"     "advantag"  
[16] "invers"     "averag"     "asymptot"   "suffici"    "predictor" 
[21] "regress"    "identif"    "subspac"    "guarante"   "space"     
[26] "attack"     "accuraci"   "span"       "plugin"     "synthes"   
[31] "digit"      "squar"      "complement" "normal"     "eas"       
[36] "variat"     "landmark"   "realdata"  

 [1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
 [7] "hypothes"  "power"     "asymptot"  "procedur"  "ratio"     "reject"   
[13] "control"  

 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "posterior"
 [7] "algorithm" "infer"     "prior"     "mcmc"      "model"     "hierarch" 
[13] "sampler"   "mixtur"    "space"    

 [1] "lasso"         "select"        "variabl"       "regress"      
 [5] "coeffici"      "spars"         "penalti"       "adapt"        
 [9] "linear"        "oracl"         "penal"         "sparsiti"     
[13] "problem"       "algorithm"     "regular"       "matrix"       
[17] "nonzero"       "path"          "shrinkag"      "vector"       
[21] "larger"        "absolut"       "high"          "highdimension"
[25] "true"          "method"        "group"         "dimension"    
[29] "nois"          "connect"      

[1] "bar"     "vertic"  "cap"     "lambda"  "beta"    "theta"   "alpha"  
[8] "element"

 [1] "singleindex"  "unknown"      "nonparametr"  "link"         "compon"      
 [6] "equat"        "structur"     "varianc"      "beta"         "smaller"     
[11] "function"     "semiparametr" "econometr"    "achiev"       "femal"       
[16] "compos"       "vectorvalu"   "linear"       "eigenfunct"   "rateoptim"   
[21] "composit"     "isol"         "ball"         "singl"       

 [1] "genet"       "trait"       "loci"        "quantit"     "diseas"     
 [6] "linkag"      "map"         "gene"        "phenotyp"    "pedigre"    
[11] "allel"       "marker"      "popul"       "associ"      "genotyp"    
[16] "locus"       "chromosom"   "frequenc"    "polymorph"   "genom"      
[21] "multipl"     "complex"     "involv"      "domin"       "interact"   
[26] "casecontrol" "haplotyp"    "treat"       "individu"    "nucleotid"  
[31] "unifi"       "singl"       "simultan"    "snp"         "inherit"    
[36] "geneenviron" "distinguish" "suscept"     "dichotom"    "score"      
[41] "mutat"       "aim"         "genomewid"   "member"      "dna"        
[46] "ascertain"   "parent"      "descent"     "crucial"     "arbitrari"  
[51] "retrospect"  "tau"         "softwar"    

 [1] "dichotom"        "outcom"          "exposur"         "genet"          
 [5] "inherit"         "confound"        "interact"        "causal"         
 [9] "trial"           "factor"          "binari"          "presenc"        
[13] "categor"         "assess"          "alcohol"         "continu"        
[17] "disord"          "misspecif"       "ordin"           "clinic"         
[21] "postul"          "trait"           "topic"           "environment"    
[25] "subgroup"        "potenti"         "geneenviron"     "alter"          
[29] "adequ"           "examin"          "adjust"          "intermedi"      
[33] "cancer"          "robin"           "stage"           "logist"         
[37] "arm"             "firststag"       "generic"         "latent"         
[41] "build"           "variabl"         "conduct"         "affect"         
[45] "accommod"        "prone"           "submodel"        "transmiss"      
[49] "mental"          "mediat"          "unspecifi"       "quantit"        
[53] "expos"           "major"           "multipli"        "sever"          
[57] "believ"          "gene"            "zhang"           "distributionfre"
[61] "routin"          "today"          

 [1] "treatment"     "random"        "trial"         "noncompli"    
 [5] "patient"       "assumpt"       "effect"        "adher"        
 [9] "complianc"     "assign"        "depress"       "outcom"       
[13] "causal"        "receiv"        "care"          "placebo"      
[17] "subject"       "intervent"     "clinic"        "improv"       
[21] "primari"       "drug"          "arm"           "treat"        
[25] "dose"          "elder"         "latent"        "princip"      
[29] "analys"        "contrast"      "sever"         "instrument"   
[33] "control"       "particip"      "stratif"       "benefit"      
[37] "physician"     "imperfect"     "encourag"      "prevent"      
[41] "fisher"        "strata"        "prescrib"      "children"     
[45] "activ"         "reason"        "strict"        "rubin"        
[49] "efron"         "behavior"      "educ"          "estimand"     
[53] "plausibl"      "doserespons"   "meet"          "suffer"       
[57] "protocol"      "framework"     "collabor"      "debat"        
[61] "doubleblind"   "potenti"       "blind"         "status"       
[65] "opposit"       "guidelin"      "logic"         "acknowledg"   
[69] "nonrandom"     "import"        "substanti"     "infer"        
[73] "prospect"      "summar"        "heart"         "childhood"    
[77] "subjectspecif" "access"       

 [1] "nonconcav"     "penal"         "select"        "penalti"      
 [5] "oracl"         "variabl"       "regular"       "nondifferenti"
 [9] "fan"           "likelihood"    "challeng"      "sandwich"     
[13] "establish"     "maxim"         "broad"         "find"         
[17] "concav"        "onestep"       "employ"        "encourag"     
[21] "enjoy"         "finit"         "cost"          "distinguish"  
[25] "dramat"        "selector"      "appropri"      "render"       
[29] "conduct"       "heavili"       "possess"       "newli"        
[33] "converg"       "paramet"       "function"      "discontinu"   
[37] "aic"           "algorithm"     "bic"           "encompass"    
[41] "guarante"      "object"        "metropoli"    

 [1] "semiparametr" "estim"        "parametr"     "nonparametr"  "paramet"     
 [6] "asymptot"     "model"        "effici"       "likelihood"   "regress"     
[11] "function"     "normal"       "simul"        "compon"       "achiev"      

 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  
[11] "choic"      "choos"      "squar"      "bootstrap"  "datadriven"
[16] "version"    "asymptot"   "global"     "chosen"    

 [1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
 [6] "viral"        "transmiss"    "vaccin"       "subject"      "genet"       
[11] "drug"         "develop"      "efficaci"     "mutat"        "outcom"      
[16] "causal"       "cell"         "syndrom"      "medic"        "pathway"     
[21] "resist"       "evolutionari" "therapi"      "pressur"     

 [1] "dropout"       "stratum"       "prevent"       "reduc"        
 [5] "oil"           "trial"         "adjust"        "longitudin"   
 [9] "cancer"        "prostat"       "mechan"        "men"          
[13] "find"          "stratifi"      "arm"           "nuisanc"      
[17] "treatment"     "assign"        "grade"         "doubleblind"  
[21] "avoid"         "colleagu"      "randomeffect"  "sever"        
[25] "verif"         "agent"         "conjectur"     "annual"       
[29] "nonignor"      "placebo"       "volum"         "elect"        
[33] "caus"          "daili"         "visit"         "preval"       
[37] "absolut"       "lie"           "indic"         "sensit"       
[41] "frequent"      "particip"      "year"          "reduct"       
[45] "causal"        "report"        "newtonraphson" "adopt"        
[49] "question"      "women"         "elder"         "surrog"       
[53] "inform"        "elicit"        "prospect"      "collabor"     
[57] "drawn"         "ignor"         "differ"        "link"         
[61] "retain"        "tilt"          "random"        "constraint"   
[65] "status"        "impli"         "doubli"        "expert"       
[69] "nonidentifi"   "intermitt"     "satur"         "sex"          
[73] "characterist"  "invers"       

  [1] "polici"       "statistician" "maker"        "decis"        "scienc"      
  [6] "role"         "technolog"    "today"        "chang"        "live"        
 [11] "bring"        "social"       "communic"     "integr"       "individu"    
 [16] "futur"        "knowledg"     "disciplin"    "nation"       "public"      
 [21] "scientif"     "health"       "activ"        "human"        "impact"      
 [26] "organ"        "inform"       "protect"      "promot"       "qualiti"     
 [31] "understand"   "program"      "way"          "student"      "mathemat"    
 [36] "increas"      "face"         "foundat"      "play"         "essenti"     
 [41] "uncertainti"  "effort"       "engin"        "expect"       "advanc"      
 [46] "confidenti"   "children"     "relev"        "make"         "industri"    
 [51] "govern"       "countri"      "encourag"     "polit"        "place"       
 [56] "modern"       "intern"       "scientist"    "closer"       "benefit"     
 [61] "reflect"      "explor"       "stronger"     "purpos"       "univers"     
 [66] "spread"       "environment"  "network"      "grow"         "forc"        
 [71] "access"       "devic"        "ingredi"      "excel"        "comprehens"  
 [76] "pollut"       "attract"      "broader"      "elementari"   "evolv"       
 [81] "train"        "pressur"      "air"          "option"       "imposs"      
 [86] "secondari"    "map"          "edg"          "success"      "progress"    
 [91] "critic"       "global"       "action"       "year"         "agenc"       
 [96] "communiti"    "american"     "quantit"      "genom"        "system"      
[101] "fundament"    "discoveri"    "evid"         "guarante"     "mortal"      
[106] "address"      "citi"         "requir"       "technic"      "serv"        
[111] "path"         "statist"      "separ"        "climat"       "contribut"   
[116] "opportun"     "adequaci"     "disabl"       "affect"       "driven"      
[121] "grade"        "psycholog"    "diagnost"     "morbid"       "view"        
[126] "delay"        "primari"      "state"       

[1] NA

 [1] "nonnorm"         "normal"          "mix"             "linear"         
 [5] "exponenti"       "piecewiselinear" "general"         "abund"          
 [9] "famili"          "examin"         

 [1] "seem"           "unrel"          "spline"         "retail"        
 [5] "credit"         "vehicl"         "dataadapt"      "correl"        
 [9] "knot"           "residu"         "conveni"        "nongaussian"   
[13] "univari"        "allevi"         "leav"           "reversiblejump"
[17] "part"           "neglig"         "difficulti"     "smooth"        
[21] "latent"         "sampler"        "compani"        "abil"          
[25] "wang"           "withinclust"    "smallest"       "consum"        

 [1] "slice"     "invers"    "dimens"    "reduct"    "regress"   "averag"   
 [7] "sir"       "direct"    "central"   "goal"      "respons"   "save"     
[13] "subset"    "method"    "predictor" "subspac"   "varianc"   "preserv"  
[19] "replac"    "suffici"   "systemat" 

 [1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
 [5] "famili"        "multiscal"     "quadrat"       "respect"      
 [9] "poisson"       "regress"       "epidemiolog"   "stabil"       
[13] "wavelet"       "explain"       "contribut"    

 [1] "band"       "confid"     "simultan"   "consid"     "trajectori"
 [6] "extend"     "choos"      "regular"    "asymptot"   "ball"      
[11] "uniform"   

 [1] "administr"      "secondari"      "fda"            "food"          
 [5] "endpoint"       "drug"           "efficaci"       "health"        
 [9] "adjust"         "prevent"        "record"         "separ"         
[13] "agent"          "cardiovascular" "primari"        "instrument"    
[17] "simplifi"       "frequenc"       "dose"           "week"          
[21] "maintain"       "databas"        "deliveri"       "clinic"        
[25] "benefit"        "birth"          "path"           "trial"         
[29] "drastic"        "odd"            "guidanc"        "perspect"      
[33] "intersect"      "guid"           "biomark"        "morbid"        
[37] "emerg"          "fwer"           "serniparametr"  "hour"          
[41] "make"           "stepwis"        "safeti"         "led"           
[45] "nutrit"         "decis"          "describ"        "errorpron"     
[49] "infant"         "serum"          "exemplifi"      "insight"       
[53] "feder"          "advers"         "prospect"       "valid"         
[57] "follow"         "likelihoodbas"  "energi"         "combin"        

 [1] "distort"         "respons"         "unobserv"        "confound"       
 [5] "predictor"       "under"           "adjust"          "serum"          
 [9] "factor"          "magnitud"        "generat"         "alter"          
[13] "intens"          "absent"          "explanatori"     "indirect"       
[17] "likelihoodbas"   "straightforward" "multipl"         "datagener"      
[21] "leastsquar"      "identifi"        "decid"           "stepwis"        
[25] "observ"          "intervent"       "sever"           "relationship"   
[29] "recov"           "system"          "car"             "coeffici"       
[33] "census"          "releas"          "agenc"           "closest"        
[37] "electr"          "shortcom"        "analyst"        

 [1] "motif"       "regul"       "gene"        "dna"         "transcript" 
 [6] "bind"        "sequenc"     "protein"     "factor"      "short"      
[11] "conserv"     "discoveri"   "nucleotid"   "cluster"     "biolog"     
[16] "high"        "site"        "mixtur"      "process"     "call"       
[21] "width"       "genom"       "vari"        "hierarch"    "dirichlet"  
[26] "pattern"     "priori"      "cell"        "strategi"    "organ"      
[31] "databas"     "matric"      "group"       "technolog"   "repres"     
[36] "stochast"    "refin"       "switch"      "substant"    "segment"    
[41] "aid"         "delet"       "similar"     "gibb"        "reduct"     
[46] "regulatori"  "express"     "core"        "find"        "live"       
[51] "yeast"       "composit"    "dictionari"  "accompani"   "appear"     
[56] "missingdata" "genomewid"   "generat"     "principl"    "facilit"    
[61] "recurs"      "background"  "specif"      "chromosom"   "address"    
[66] "wish"        "cycl"        "name"        "understand"  "adjac"      
[71] "variabl"    

[1] "absolut"  "deviat"   "clip"     "oracl"    "progress"

[1] "quantil" "regress"

 [1] "breakdown"  "point"      "robust"     "depth"      "locat"     
 [6] "project"    "equivari"   "finit"      "function"   "possess"   
[11] "contamin"   "competitor" "affin"      "definit"    "introduc"  
[16] "lead"       "induc"      "influenc"   "high"       "outlier"   
[21] "strong"     "trim"       "median"     "region"     "york"      
[26] "scale"      "desir"      "favor"      "turn"       "pursu"     
[31] "enjoy"      "scatter"    "suffic"     "behav"      "uniform"   
[36] "relat"      "comparison" "suggest"    "fact"       "univari"   
[41] "ann"        "radius"    

 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "frequenc"      "long"          "taper"         "fraction"     
 [9] "averag"        "stationari"    "depend"        "periodogram"  
[13] "move"          "whittl"        "slowli"        "nonstationari"
[17] "local"         "process"       "eigenvector"   "angl"         
[21] "paramet"       "period"        "short"         "univari"      
[25] "distinct"      "autoregress"   "volatil"       "fourier"      
[29] "infin"         "longrang"      "delta"         "residu"       
[33] "trim"          "raw"           "log"           "question"     
[37] "break"         "stress"        "know"          "gamma"        
[41] "serniparametr" "subspac"      

 [1] "auxiliari" "survey"    "varianc"   "design"    "popul"     "sampl"    
 [7] "variabl"   "weight"    "calibr"    "designbas" "probabl"   "servic"   
[13] "total"     "finit"     "work"      "feasibl"   "explain"   "miss"     

 [1] "lin"           "addit"         "transplant"    "bone"         
 [5] "work"          "carrol"        "registri"      "intern"       
 [9] "termin"        "multist"       "complic"       "serv"         
[13] "progress"      "transit"       "death"         "domin"        
[17] "backfit"       "implicit"      "largesampl"    "longer"       
[21] "inconsist"     "withinsubject" "withinclust"   "margin"       

 [1] "taper"       "approxim"    "matrix"      "gaussian"    "consist"    
 [6] "spars"       "oper"        "spatial"     "covari"      "block"      
[11] "requir"      "balanc"      "norm"        "precipit"    "station"    
[16] "weather"     "technic"     "manipul"     "matern"      "infeas"     
[21] "multipli"    "wild"        "simpli"      "eigenvector" "sever"      
[26] "onestep"     "resampl"     "oil"         "lose"        "expans"     
[31] "finitesampl" "emphasi"    

[1] "finitesampl" "propos"      "properti"    "simul"      

 [1] "wavelet"     "adapt"       "besov"       "minimax"     "threshold"  
 [6] "rang"        "ball"        "nois"        "wide"        "rate"       
[11] "unknown"     "smooth"      "risk"        "bound"       "function"   
[16] "deconvolut"  "problem"     "white"       "converg"     "signal"     
[21] "recov"       "gaussian"    "transform"   "noisi"       "view"       
[26] "blur"        "discret"     "shape"       "invers"      "spars"      
[31] "densiti"     "nearoptim"   "convolut"    "fourier"     "upper"      
[36] "decay"       "chosen"      "block"       "basi"        "dens"       
[41] "attain"      "waveletbas"  "continu"     "mathemat"    "counterpart"
[46] "physic"      "possess"     "lower"       "global"      "achiev"     
[51] "boundari"    "distinct"    "belong"      "domin"       "estim"      
[56] "place"      

 [1] "forecast"      "predict"       "weather"       "northwest"    
 [5] "spatial"       "probabilist"   "pacif"         "calibr"       
 [9] "wind"          "meteorolog"    "hour"          "temperatur"   
[13] "speed"         "atmospher"     "energi"        "north"        
[17] "center"        "geostatist"    "event"         "futur"        
[21] "averag"        "ensembl"       "american"      "tempor"       
[25] "accur"         "resourc"       "precipit"      "daili"        
[29] "state"         "sharp"         "qualiti"       "site"         
[33] "spacetim"      "generat"       "transport"     "concentr"     
[37] "season"        "climat"        "regim"         "shortterm"    
[41] "numer"         "determinist"   "ozon"          "input"        
[45] "climatolog"    "previous"      "output"        "parsimoni"    
[49] "perturb"       "geograph"      "period"        "trend"        
[53] "correl"        "vari"          "break"         "favor"        
[57] "quantit"       "laplac"        "caus"          "merg"         
[61] "safeti"        "station"       "agricultur"    "accumul"      
[65] "oppos"         "benefit"       "vast"          "global"       
[69] "stateoftheart" "featur"        "system"        "activ"        
[73] "dispers"       "simpler"       "decad"         "organ"        
[77] "crossvalid"    "member"       

 [1] "spacetim"       "spatial"        "fit"            "year"          
 [5] "site"           "separ"          "intens"         "california"    
 [9] "thin"           "process"        "monitor"        "residu"        
[13] "tempor"         "activ"          "multidimension" "occurr"        
[17] "space"          "background"     "appear"         "origin"        
[21] "smoother"       "irregular"      "earthquak"      "indic"         
[25] "asymmetr"       "trend"          "hazard"         "spectral"      
[29] "symmetr"        "environment"    "ozon"           "wind"          
[33] "meteorolog"     "daili"          "allow"          "rescal"        
[37] "season"         "time"           "anisotrop"      "cross"         
[41] "insid"          "bear"           "arbitrari"      "autoregress"   
[45] "interact"       "magnitud"       "sequenc"        "homogen"       
[49] "widespread"     "sphere"         "coordin"        "highlight"     
[53] "elabor"         "extrem"         "ascertain"      "forest"        
[57] "counti"         "rotat"          "month"          "threat"        
[61] "govern"         "secondari"      "aic"            "account"       
[65] "aid"            "emphas"         "routin"         "assess"        
[69] "departur"       "rare"          

 [1] "inhomogen"   "intens"      "spatial"     "process"     "poisson"    
 [6] "point"       "thin"        "stationari"  "function"    "firstord"   
[11] "efficaci"    "secondord"   "caus"        "infecti"     "network"    
[16] "infect"      "transmiss"   "respiratori" "environ"     "epidem"     
[21] "unrealist"   "lend"        "syndrom"     "hospit"      "emphasi"    
[26] "unusu"       "paid"        "peak"       

 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "moment"       
 [9] "autoregress"   "local"         "financi"       "condit"       
[13] "standard"      "move"          "averag"        "sequenc"      
[17] "mont"          "carlo"         "innov"         "satisfi"      
[21] "iid"           "root"          "time"          "forecast"     
[25] "nonstationari" "fourth"        "capabl"        "residu"       
[29] "return"        "rescal"        "exponenti"     "exchang"      
[33] "reparameter"   "arma"          "ergod"         "homogen"      
[37] "simpli"        "normal"        "explain"       "uniqu"        
[41] "exist"        

 [1] "withinclust"   "cluster"       "correl"        "account"      
 [5] "frequent"      "frailti"       "varianc"       "carri"        
 [9] "arbitrari"     "abil"          "achiev"        "hormon"       
[13] "generalis"     "tackl"         "characteris"   "evalu"        
[17] "simplic"       "fashion"       "closedform"    "noninform"    
[21] "hamper"        "intuit"        "dementia"      "birth"        
[25] "errorpron"     "ill"           "copula"        "withinsubject"

[1] "polynomi"    "local"       "smooth"      "regress"     "nonparametr"
[6] "asymptot"    "spline"     

 [1] "elect"        "vote"         "poll"         "presidenti"   "evid"        
 [6] "candid"       "polit"        "count"        "station"      "proport"     
[11] "forecast"     "nonrespons"   "elimin"       "prefer"       "counti"      
[16] "scientist"    "permit"       "lower"        "incom"        "fisher"      
[21] "york"         "record"       "heterogen"    "purpos"       "respond"     
[26] "percentag"    "particip"     "quick"        "transfer"     "week"        
[31] "spatiotempor" "evolut"       "california"   "histor"       "krige"       
[36] "list"         "appar"        "outcom"       "invalid"      "nonignor"    
[41] "publish"      "nonrespond"  

 [1] "survey"      "nonrespons"  "census"      "nation"      "respond"    
 [6] "imput"       "popul"       "health"      "race"        "bureau"     
[11] "nonignor"    "unit"        "respons"     "item"        "incom"      
[16] "miss"        "person"      "year"        "state"       "bias"       
[21] "employ"      "higher"      "valu"        "sensit"      "interview"  
[26] "labor"       "nonrespond"  "age"         "feder"       "collect"    
[31] "measur"      "handl"       "assess"      "report"      "level"      
[36] "counti"      "domain"      "preval"      "agenc"       "confidenti" 
[41] "benchmark"   "incorpor"    "protect"     "status"      "cell"       
[46] "earn"        "produc"      "sourc"       "relat"       "weight"     
[51] "propens"     "public"      "household"   "area"        "geograph"   
[56] "nutrit"      "document"    "lower"       "plan"        "bodi"       
[61] "gender"      "extrapol"    "preliminari" "birth"       "polit"      
[66] "correct"     "american"    "proxi"       "requir"      "previous"   
[71] "children"    "york"        "unemploy"    "death"      

 [1] "jackknif"  "file"      "replic"    "varianc"   "inconsist" "strata"   
 [7] "analyt"    "unbias"    "met"       "domain"    "schedul"   "freedom"  
[13] "survey"    "attain"    "balanc"    "mix"       "ensur"     "public"   
[19] "repeat"    "upper"     "bootstrap" "uncondit"  "plausibl"  "person"   
[25] "pseudo"    "concern"   "linkag"   

[1] "variancecovari"  "matrix"          "analyz"          "respect"        
[5] "quasilikelihood" "criterion"       "coin"            "efron"          

[1] "root"     "squar"    "approxim"

[1] "maximum"    "likelihood" "estim"      "paramet"   

 [1] "pca"           "princip"       "compon"        "matrix"       
 [5] "eigenvector"   "size"          "dimension"     "reduct"       
 [9] "eigenvalu"     "analysi"       "spike"         "perturb"      
[13] "logp"          "succeed"       "transit"       "dimens"       
[17] "maxim"         "highdimension" "set"           "sampl"        
[21] "threshold"     "nonzero"       "oil"           "direct"       
[25] "critic"        "sophist"       "recov"         "hold"         
[29] "sharp"         "larger"        "theorem"       "relax"        
[33] "high"          "diagon"        "overlap"       "domin"        
[37] "success"       "geometr"       "regim"         "tractabl"     
[41] "popul"         "ill"           "behav"         "extract"      
[45] "exhibit"       "support"       "tool"          "crossov"      
[49] "sudden"        "track"         "lose"          "infinit"      
[53] "evolutionari"  "tree"          "complex"       "largest"      
[57] "phenomenon"    "program"       "describ"       "nonasymptot"  
[61] "branch"        "topolog"       "row"           "embed"        
[65] "euclidean"     "geodes"        "anim"          "nois"         
[69] "machin"        "phase"         "speci"         "twoway"       
[73] "rise"         

 [1] "eigenfunct"  "function"    "princip"     "compon"      "random"     
 [6] "analysi"     "smooth"      "eigenvalu"   "data"        "curv"       
[11] "spars"       "space"       "trajectori"  "score"       "noisi"      
[16] "deriv"       "lead"        "sampl"       "longitudin"  "eigenvector"
[21] "expans"      "impact"      "elucid"      "decomposit"  "firstord"   
[26] "repres"      "differenti"  "measur"      "dynam"       "intrins"    
[31] "similar"     "plan"       

 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "gene"          "latent"        "viral"         "initi"        
 [9] "biomark"       "understand"    "protein"       "pronounc"     
[13] "infect"        "therapi"       "supplementari" "quantifi"     
[17] "concentr"      "chemic"        "tackl"         "incorrect"    
[21] "healthi"       "identifi"      "molecular"     "human"        
[25] "serum"         "hormon"        "investig"      "experiment"   
[29] "search"        "status"        "sort"          "drug"         
[33] "inflat"        "pertin"        "mediat"        "mutat"        
[37] "resist"        "absent"        "blood"         "exemplifi"    
[41] "valuabl"       "phenotyp"      "led"           "indic"        
[45] "subsequ"       "format"        "framework"    

[1] "establish" "asymptot"  "consist"   "converg"  

 [1] "classifi"        "classif"         "discrimin"       "distancebas"    
 [5] "vector"          "centroid"        "support"         "machin"         
 [9] "theoret"         "popul"           "featur"          "rule"           
[13] "poor"            "popular"         "produc"          "distanc"        
[17] "method"          "highdimension"   "accumul"         "varieti"        
[21] "heavytail"       "differ"          "diverg"          "nearest"        
[25] "train"           "median"          "difficulti"      "spectra"        
[29] "componentwis"    "replac"          "excess"          "convent"        
[33] "frequent"        "truncat"         "boundari"        "counterpart"    
[37] "insensit"        "encount"         "closest"         "entail"         
[41] "case"            "allevi"          "problemat"       "today"          
[45] "argument"        "euclidean"       "inconsist"       "caus"           
[49] "straightforward" "neighbour"       "suffer"          "anneal"         
[53] "attempt"         "perform"         "misclassif"      "alloc"          
[57] "volatil"         "believ"          "explor"          "help"           
[61] "inher"           "explos"          "earthquak"       "base"           
[65] "consequ"         "achiev"          "jin"             "kullbackleibl"  
[69] "contemporari"    "construct"       "drawback"        "tstatist"       

 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "random"     "casecohort" "weight"     "invers"     "twophas"   
[11] "cohort"     "biometrika" "design"     "prentic"    "causal"    
[16] "purpos"     "lemma"      "exemplifi"  "unbias"     "mar"       
[21] "suit"       "amer"       "assoc"      "proceed"    "summar"    
[26] "ser"        "soc"        "roy"        "calcul"     "iid"       
[31] "appear"     "cox"        "imput"      "visit"      "ann"       
[36] "augment"    "percentag"  "schedul"    "direct"     "unbalanc"  
[41] "mediat"     "day"        "embed"      "mental"     "equat"     
[46] "nice"       "month"     

[1] "bootstrap" "confid"    "distribut" "sampl"     "interv"    "method"   
[7] "correct"   "seri"      "empir"    

 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "optim"         "low"           "highdimension" "nonasymptot"  
[13] "convex"        "minimax"       "noisi"         "spars"        
[17] "vector"        "singular"      "element"       "error"        
[21] "minim"         "setup"         "predict"       "autoregress"  
[25] "recoveri"      "theori"        "trace"         "obtain"       
[29] "decomposit"    "class"         "excel"         "mean"         
[33] "lower"         "instanc"       "yield"         "sharp"        
[37] "agreement"     "precis"        "mestim"        "complementari"
[41] "lowdimension"  "entri"         "analyz"        "oper"         
[45] "meansquar"     "relax"         "hold"          "determinist"  
[49] "observ"        "condit"        "autocovari"    "decompos"     
[53] "notion"        "stay"          "restrict"      "stronger"     
[57] "krige"        

 [1] "minimax"  "rate"     "densiti"  "optim"    "unknown"  "adapt"   
 [7] "loss"     "class"    "prove"    "sens"     "converg"  "problem" 
[13] "bound"    "estim"    "risk"     "vector"   "set"      "gaussian"
[19] "lower"   

 [1] "imag"     "magnet"   "reson"    "field"    "brain"    "fmri"    
 [7] "activ"    "voxel"    "signal"   "detect"   "locat"    "volum"   
[13] "accur"    "follow"   "task"     "motion"   "region"   "visual"  
[19] "identifi" "exploit"  "tissu"    "aim"      "contigu"  "map"     
[25] "rotat"    "neuron"  

[1] NA

[1] NA

 [1] "bspline"   "kernel"    "tackl"     "represent" "spline"    "penal"    
 [7] "tempor"    "proceed"   "splinebas" "truncat"   "solut"     "rigor"    
[13] "account"  

[1] NA

 [1] "electr"        "forecast"      "renew"         "bivari"       
 [5] "load"          "market"        "daili"         "power"        
 [9] "serial"        "shortterm"     "wind"          "autoregress"  
[13] "diagon"        "speed"         "time"          "season"       
[17] "focus"         "difficult"     "peak"          "spectrum"     
[21] "temperatur"    "regressor"     "heteroscedast" "firstord"     
[25] "total"         "highlight"     "energi"        "justifi"      
[29] "simpl"         "week"          "vari"          "hour"         
[33] "trend"         "citi"          "recogn"        "stationari"   
[37] "autocovari"    "detail"        "promis"        "realiti"      
[41] "favor"         "reveal"        "year"          "longmemori"   
[45] "gain"          "accuraci"      "exploit"       "predict"      
[49] "option"        "reliabl"       "price"         "evolut"       
[53] "avail"         "superpopul"   

 [1] "highfrequ"       "financi"         "asset"           "volatil"        
 [5] "price"           "lowfrequ"        "exchang"         "dynam"          
 [9] "stock"           "matrix"          "daili"           "period"         
[13] "nois"            "realiz"          "pool"            "market"         
[17] "matric"          "infin"           "diffus"          "return"         
[21] "day"             "trade"           "captur"          "forecast"       
[25] "vast"            "overcom"         "variat"          "hundr"          
[29] "pertin"          "dimensionreduct" "econom"          "iii"            
[33] "alloc"           "noisi"           "industri"        "zhang"          
[37] "guidanc"         "merit"           "adequ"           "size"           
[41] "highdimension"   "fan"             "eigenvector"     "option"         
[45] "wavelet"         "built"           "avail"          

 [1] "day"        "daili"      "record"     "time"       "financi"   
 [6] "activ"      "short"      "peak"       "consecut"   "help"      
[11] "autocovari" "appropri"   "intens"     "physic"     "character" 
[16] "measur"     "children"   "trade"      "strength"   "scalar"    
[21] "superposit" "incomplet"  "copi"      

[1] "secondord"   "firstord"    "accur"       "expans"      "unbias"     
[6] "moment"      "approxim"    "frequentist" "exact"      

 [1] "treatment"  "assign"     "causal"     "score"      "outcom"    
 [6] "propens"    "averag"     "effect"     "grade"      "school"    
[11] "potenti"    "stratif"    "promot"     "confound"   "rubin"     
[16] "student"    "unit"       "regim"      "educ"       "adjust"    
[21] "children"   "plausibl"   "polici"     "program"    "evid"      
[26] "pretreat"   "posttreat"  "summar"     "stage"      "child"     
[31] "intermedi"  "assumpt"    "retain"     "multilevel" "block"     
[36] "econom"     "experiment" "stabl"      "arbitrari"  "nation"    
[41] "articl"     "balanc"     "learn"      "perspect"   "status"    
[46] "unmeasur"   "fewer"      "scalar"     "affect"     "low"       
[51] "mathemat"   "track"      "twostag"    "covari"     "tradeoff"  
[56] "recov"      "nonrandom"  "bind"       "pose"       "estimand"  
[61] "impos"      "feasibl"    "return"    

 [1] "extrapol"      "errorpron"     "posttreat"     "instrument"   
 [5] "classic"       "baselin"       "replic"        "subsampl"     
 [9] "nonlinear"     "daili"         "summari"       "air"          
[13] "encount"       "subset"        "bias"          "efficaci"     
[17] "heteroscedast" "frequenc"      "trajectori"    "spheric"      
[21] "supplementari" "correct"       "multiscal"     "scatter"      
[25] "reconstruct"   "subject"       "error"         "temperatur"   

 [1] "admiss"       "inadmiss"     "loss"         "bay"          "risk"        
 [6] "endpoint"     "action"       "ann"          "accept"       "math"        
[11] "genom"        "screen"       "stringent"    "result"       "complet"     
[16] "stepup"       "character"    "formul"       "treat"        "pearson"     
[21] "amer"         "assoc"        "biometrika"   "prototyp"     "vector"      
[26] "pay"          "reject"       "decad"        "revisit"      "metaanalysi" 
[31] "criteria"     "effort"       "bioassay"     "thought"      "hard"        
[36] "psycholog"    "nonneg"       "predetermin"  "fals"         "energi"      
[41] "earlier"      "educ"         "hoc"          "stein"        "emerg"       
[46] "fair"         "dna"          "appeal"       "sign"         "singlestep"  
[51] "drug"         "microarray"   "statistician" "jeffrey"      "year"        
[56] "fewer"        "fisher"       "paper"        "resembl"      "paradox"     
[61] "share"        "twodimension" "nonzero"      "stepdown"     "seek"        
[66] "expect"      

[1] "coeffici" "regress"  "linear"   "vari"    

 [1] "unbound"   "novelti"   "function"  "yield"     "oracl"     "tail"     
 [7] "decreas"   "satisfi"   "anisotrop" "inequ"     "median"    "slower"   
[13] "literatur" "bivari"    "free"      "vast"      "fast"      "input"    
[19] "setup"     "output"    "aggreg"    "aforement" "behav"     "influenti"
[25] "iii"       "bound"     "univers"   "main"      "nuclear"   "radius"   
[31] "need"      "tilt"      "hyperplan" "higherord" "symmetri"  "equivari" 
[37] "gee"       "scatter"   "bin"       "quadrat"  

 [1] "wishart"     "graph"       "cone"        "graphic"     "famili"     
 [6] "matric"      "conjug"      "gaussian"    "matrix"      "covari"     
[11] "decompos"    "prior"       "paramet"     "edg"         "definit"    
[16] "paper"       "homogen"     "space"       "correspond"  "posit"      
[21] "standard"    "shape"       "form"        "ann"         "miss"       
[26] "zero"        "equal"       "eigenvalu"   "dimens"      "close"      
[31] "respect"     "invers"      "sigma"       "chisquar"    "distinct"   
[36] "flexibl"     "margin"      "precis"      "bay"         "undirect"   
[41] "fix"         "refer"       "direct"      "constant"    "acycl"      
[46] "satisfi"     "expect"      "encod"       "entri"       "enrich"     
[51] "accept"      "phi"         "scalabl"     "omega"       "nonhomogen" 
[56] "probab"      "euclidean"   "dual"        "read"        "restrict"   
[61] "centr"       "characteris" "deep"        "tangent"     "fourth"     
[66] "perfect"    

 [1] "schedul"         "longitudin"      "followup"        "analys"         
 [5] "phase"           "generat"         "incomplet"       "flexibl"        
 [9] "respons"         "avail"           "ill"             "unbalanc"       
[13] "pursu"           "offer"           "enter"           "resourc"        
[17] "impact"          "merg"            "concret"         "intermitt"      
[21] "interim"         "preced"          "perfect"         "divid"          
[25] "maker"           "face"            "preliminari"     "fluctuat"       
[29] "missingatrandom" "versatil"        "alloc"           "timetoev"       
[33] "withinsubject"   "compromis"       "manag"           "metropoli"      
[37] "missingdata"     "walk"            "logrank"        

[1] "real"    "simul"   "data"    "illustr"

[1] NA

 [1] "chi"           "test"          "distribut"     "space"        
 [5] "ratio"         "restrict"      "statist"       "conveni"      
 [9] "tail"          "goodnessoffit" "pearson"      

[1] "size"   "sampl"  "number" "small"  "larg"  

[1] "misspecifi" "robust"     "misspecif" 

 [1] "climat"      "temperatur"  "chang"       "greenhous"   "global"     
 [6] "earth"       "uncertainti" "northern"    "atmospher"   "quantifi"   
[11] "trend"       "increas"     "reconstruct" "averag"      "region"     
[16] "separ"       "tempor"      "concentr"    "surfac"      "longterm"   
[21] "pollut"      "period"      "centuri"     "opposit"     "tree"       
[26] "gas"         "creat"       "purpos"      "futur"       "record"     
[31] "remot"       "understand"  "radiat"      "emiss"       "proxi"      
[36] "histor"      "air"         "ecolog"      "forest"      "magnitud"   
[41] "massiv"      "cloud"       "gather"      "forc"        "weather"    
[46] "synthet"     "actual"      "pattern"     "expert"      "extern"     
[51] "current"     "quantif"     "agreement"   "institut"    "act"        

For the future: it seems worth looking at whether generalized binary priors on L might help with this, since they might help avoid this kind of fit. I also wonder whether document-specific variances could help model outlying documents better.


I try backfitting one fit - it did not change things much. (When i tried backfitting the results with pseudocount 0.1 I got an error.)

fit.nn.s.1.2 = flash_backfit(fit.nn.s.1)
Backfitting 58 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+04...
  Difference between iterations is within 1.0e+03...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
[1] "singleindex" "function"    "link"        "compon"      "unknown"    

[1] "norm"      "matrix"    "rank"      "matric"    "frobenius" "bound"    

[1] "singleindex" "link"        "unknown"    

[1] "norm"   "estim"  "matrix"

Adding a factor

I was struck by the “mri” factor in the fit with pseudocount =0.01 that did not appear in the fit with 0.1. This factor makes a lot of sense so I thought maybe this is just a failure to “find” this factor in the fit with pseudocount = 0.1, rather than an indication of its absense in that data set. Here I confirm this by adding this factor and backfitting - the new factor is kept indicating that it improves the ELBO.

fit.nn.s.01.b = flash_factors_init(fit.nn.s.01,init = list(u = cbind(fit.nn.s.001$L_pm[,64]) ,d=cbind(c(1),drop=FALSE), v=cbind(fit.nn.s.001$F_pm[,64]) ))
fit.nn.s.01.b %>% flash_backfit(kset=109)
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Flash object with 109 factors.
  Proportion of variance explained*:
    Factor 1: 0.146
    Factor 2: 0.003
    Factor 3: 0.001
    Factor 4: 0.003
    Factor 5: 0.003
    Factor 9: 0.004
    Factor 10: 0.002
    Factor 14: 0.003
    Factor 15: 0.001
    Factor 38: 0.002
    Factor 43: 0.003
    Factor 48: 0.002
    Factor 50: 0.001
    Factor 56: 0.002
    Factor 65: 0.001
    Factor 91: 0.002
    *Factors with PVE < 0.001 are omitted from this summary.
  Variational lower bound: -109847.549
 [1] "ozon"            "maxima"          "splinebas"       "nonlinear"      
 [5] "piecewiselinear" "concentr"        "pressur"         "cycl"           
 [9] "transport"       "variat"          "contribut"       "atmospher"      
[13] "peak"            "trend"           "measur"          "basi"           
[17] "evid"            "instrument"      "thought"         "greater"        
[21] "link"            "scientif"        "lag"             "dimensionreduct"
[25] "absenc"          "wave"            "global"          "separ"          
[29] "month"           "coincid"         "influenc"        "lowdimension"   
[33] "clear"           "contrast"        "lower"           "year"           
[37] "site"            "qualiti"         "profil"          "sequenc"        
[41] "sensit"          "origin"          "relat"           "presenc"        
[45] "satellit"        "partial"         "pattern"         "identifi"       

Adding factors from 0.01 fit to 0.1 fit

I wondered how many of the differences are due to this kind of issue. So I tried adding all the factors from the 0.01 fit that did not appear in the original 0.1 fit and backfitting. I tried adding them all at once and backfitting but my initial attempt at that gave an error (I may not have done it correctly though), so here i add them one at a time. Most (but not all) are kept, indicating that maybe many of the differences between the runs are simply due to the runs finding different solutions, rather than due to differences in the structure present by pseudocount.

fit1 = fit.nn.s.01
fit2 = fit.nn.s.001
cc = cor(fit1$F_pm,fit2$F_pm)
spec2 = which(apply(cc,2,max)<0.9)
fit.nn.s.01.2 = fit.nn.s.01
for(i in spec2){
  init = list(u = cbind(fit2$L_pm[,i]),d= diag(1, nrow=1), v = cbind(fit2$F_pm[,i]))
  fit.nn.s.01.2 <- flash_factors_init(fit.nn.s.01.2, init = init, ebnm_fn= ebnm_point_exponential) 
  fit.nn.s.01.2 <- flash_backfit(fit.nn.s.01.2,kset=fit.nn.s.01.2$n_factors)
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  --Estimate of factor 112 is numerically zero!
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  --Estimate of factor 115 is numerically zero!
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
  Difference between iterations is within 1.0e-01...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
Backfitting 1 factors (tolerance: 6.23e-02)...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
  Difference between iterations is within 1.0e+00...
Wrapping up...
fit.nn.s.01.2 <- flash_nullcheck(fit.nn.s.01.2)
Nullchecking 136 factors...
  2 factors are identically zero.
Wrapping up...
  Removed 2 factors.
get_keywords(fit.nn.s.01.2,docfilter = 1)
 [1] "model"       "estim"       "data"        "method"      "propos"     
 [6] "studi"       "simul"       "distribut"   "function"    "sampl"      
[11] "paramet"     "approach"    "statist"     "base"        "asymptot"   
[16] "problem"     "general"     "regress"     "analysi"     "test"       
[21] "develop"     "procedur"    "perform"     "illustr"     "condit"     
[26] "set"         "applic"      "observ"      "variabl"     "likelihood" 
[31] "consist"     "time"        "appli"       "covari"      "properti"   
[36] "random"      "comput"      "articl"      "linear"      "case"       
[41] "process"     "infer"       "error"       "select"      "number"     
[46] "effici"      "rate"        "nonparametr" "deriv"       "measur"     
[51] "effect"      "algorithm"   "class"       "paper"       "compar"     
[56] "provid"      "includ"      "depend"     

 [1] "fals"       "control"    "procedur"   "test"       "rate"      
 [6] "discoveri"  "reject"     "hypothes"   "multipl"    "null"      
[11] "pvalu"      "fdr"        "hochberg"   "number"     "stepdown"  
[16] "kfwer"      "familywis"  "error"      "depend"     "proport"   
[21] "benjamini"  "fwer"       "statist"    "fdp"        "soc"       
[26] "divid"      "power"      "roy"        "stepup"     "alpha"     
[31] "deriv"      "abil"       "ser"        "individu"   "detect"    
[36] "gamma"      "total"      "hypothesi"  "conserv"    "toler"     
[41] "attent"     "defin"      "singlestep" "construct"  "fix"       
[46] "simultan"   "probabl"    "independ"   "ann"        "usual"     
[51] "sime"       "improv"     "increas"   

 [1] "treatment"   "random"      "trial"       "patient"     "effect"     
 [6] "assign"      "noncompli"   "assumpt"     "outcom"      "complianc"  
[11] "causal"      "adher"       "depress"     "placebo"     "receiv"     
[16] "care"        "subject"     "clinic"      "intervent"   "drug"       
[21] "arm"         "dose"        "improv"      "primari"     "treat"      
[26] "princip"     "analys"      "latent"      "elder"       "control"    
[31] "sever"       "contrast"    "instrument"  "stratif"     "activ"      
[36] "particip"    "framework"   "prevent"     "potenti"     "physician"  
[41] "benefit"     "infer"       "imperfect"   "children"    "encourag"   
[46] "estimand"    "doserespons"

 [1] "surviv"       "time"         "hazard"       "censor"       "failur"      
 [6] "studi"        "event"        "semiparametr" "proport"      "data"        
[11] "cancer"       "covari"       "estim"        "risk"         "cox"         
[16] "baselin"      "regress"      "cumul"        "illustr"      "rightcensor" 
[21] "consist"      "nonparametr"  "trial"       

[1] "null"      "test"      "hypothesi" "distribut" "altern"    "statist"  
[7] "power"     "asymptot"  "hypothes" 

 [1] "simex"              "simulationextrapol" "measur"            
 [4] "error"              "undersmooth"        "asymptot"          
 [7] "longer"             "accuraci"           "finitesampl"       
[10] "principl"           "bias"               "presenc"           
[13] "selector"           "wang"               "rootn"             

 [1] "wilk"       "ratio"      "phenomenon" "correct"    "relax"     
 [6] "conduct"    "newli"      "unspecifi"  "freedom"    "follow"    
[11] "backfit"    "nuisanc"    "theorem"    "degre"      "chisquar"  
[16] "likelihood" "empir"      "ask"        "hold"      

 [1] "mle"         "maximum"     "likelihood"  "main"        "prove"      
 [6] "asymptot"    "converg"     "limit"       "mles"        "status"     
[11] "rate"        "current"     "brownian"    "behavior"    "motion"     
[16] "estim"       "proof"       "uniqu"       "nonparametr"

 [1] "chain"     "markov"    "mont"      "carlo"     "bayesian"  "algorithm"
 [7] "posterior" "infer"     "prior"     "model"     "mcmc"     

 [1] "lasso"     "select"    "variabl"   "regress"   "coeffici"  "spars"    
 [7] "penalti"   "adapt"     "linear"    "oracl"     "penal"     "problem"  
[13] "sparsiti"  "algorithm" "regular"  

[1] "varyingcoeffici" "nonparametr"     "coeffici"        "linear"         
[5] "longitudin"      "conduct"         "propos"          "vari"           
[9] "regress"        

 [1] "rankbas"      "effici"       "asymptot"     "rank"         "ellipt"      
 [6] "cam"          "class"        "densiti"      "uniform"      "normal"      
[11] "version"      "sign"         "multivari"    "matric"       "symmetri"    
[16] "valid"        "finit"        "scatter"      "ann"          "contour"     
[21] "tradit"       "assumpt"      "sens"         "irrespect"    "rootn"       
[26] "semiparametr" "center"      

 [1] "nconsist" "root"     "reduct"   "dimens"   "exist"    "direct"  
 [7] "central"  "slice"    "exhaust"  "contour"  "ellipt"   "advantag"
[13] "mild"     "strong"   "regress"  "varianc"  "suffici"  "invers"  
[19] "averag"  

 [1] "semiparametr" "estim"        "nonparametr"  "parametr"     "paramet"     
 [6] "model"        "effici"       "asymptot"     "likelihood"   "regress"     
[11] "function"    

 [1] "bandwidth"  "kernel"     "local"      "select"     "smooth"    
 [6] "densiti"    "estim"      "crossvalid" "selector"   "polynomi"  

 [1] "nonconcav"     "penal"         "select"        "oracl"        
 [5] "penalti"       "variabl"       "likelihood"    "regular"      
 [9] "fan"           "challeng"      "nondifferenti" "maxim"        
[13] "sandwich"      "onestep"       "establish"     "concav"       
[17] "broad"         "enjoy"         "employ"        "selector"     
[21] "encourag"      "cost"         

[1] NA

[1] "homoscedast"   "heteroscedast" "varianc"       "transform"    
[5] "famili"        "error"        

[1] "nonnorm"   "normal"    "mix"       "linear"    "exponenti"

[1] "inhomogen"  "intens"     "process"    "spatial"    "point"     
[6] "poisson"    "thin"       "stationari" "function"  

 [1] "seem"           "unrel"          "spline"         "correl"        
 [5] "credit"         "retail"         "neglig"         "nongaussian"   
 [9] "dataadapt"      "vehicl"         "allevi"         "knot"          
[13] "leav"           "reversiblejump" "part"           "genotyp"       
[17] "conveni"        "residu"         "wang"           "withinclust"   

 [1] "memori"        "seri"          "differenc"     "longmemori"   
 [5] "taper"         "frequenc"      "long"          "fraction"     
 [9] "averag"        "depend"        "paramet"       "periodogram"  
[13] "stationari"    "move"          "slowli"        "whittl"       
[17] "eigenvector"   "local"         "nonstationari" "distinct"     
[21] "angl"         

 [1] "distort"         "respons"         "confound"        "predictor"      
 [5] "unobserv"        "under"           "explanatori"     "serum"          
 [9] "adjust"          "magnitud"        "indirect"        "identifi"       
[13] "coeffici"        "factor"          "absent"          "system"         
[17] "alter"           "observ"          "datagener"       "leastsquar"     
[21] "decid"           "straightforward" "generat"         "stepwis"        
[25] "intervent"       "sever"          

[1] "polynomi"    "local"       "regress"     "smooth"      "nonparametr"
[6] "asymptot"   

 [1] "equivari"   "affin"      "introduc"   "depth"      "breakdown" 
 [6] "scatter"    "locat"      "point"      "project"    "robust"    
[11] "concept"    "general"    "multivari"  "function"   "influenc"  
[16] "matrix"     "median"     "definit"    "hyperplan"  "high"      
[21] "heavytail"  "competitor" "fact"       "translat"   "comparison"
[26] "open"      

 [1] "save"      "sir"       "slice"     "averag"    "root"      "invers"   
 [7] "candid"    "reveal"    "theoret"   "reduct"    "comput"    "contrast" 
[13] "recommend"

 [1] "nonrespons" "survey"     "respons"    "imput"      "nonignor"  
 [6] "valu"       "miss"       "respond"    "nation"     "varianc"   
[11] "nonrespond" "weight"     "popul"      "requir"     "bias"      
[16] "probabl"    "unit"       "mechan"     "item"       "adjust"    
[21] "health"     "variabl"    "calibr"     "race"       "domain"    
[26] "handl"      "incom"     

 [1] "taper"    "approxim" "matrix"   "gaussian" "covari"   "spars"   
 [7] "consist"  "oper"     "block"    "norm"     "balanc"   "requir"  
[13] "spatial" 

 [1] "jackknif"  "mix"       "varianc"   "area"      "squar"     "appli"    
 [7] "inconsist" "uncondit"  "replic"    "strata"   

[1] "mestim"  "robust"  "weak"    "yield"   "outlier" "nuisanc"

 [1] "garch"         "process"       "seri"          "volatil"      
 [5] "stationari"    "paper"         "heteroscedast" "condit"       
 [9] "moment"        "autoregress"   "financi"       "local"        
[13] "standard"      "innov"         "sequenc"       "satisfi"      
[17] "move"          "iid"           "time"          "averag"       
[21] "root"          "mont"          "carlo"        

[1] "quantil" "regress"

 [1] "gee"       "equat"     "correl"    "general"   "sandwich"  "binari"   
 [7] "work"      "misspecif" "cluster"   "scientif"  "enhanc"    "effort"   
[13] "equival"   "lead"      "repeat"    "diverg"   

 [1] "popul"      "superpopul" "survey"     "finit"      "boxcox"    
 [6] "modelbas"   "design"     "predict"    "realiz"     "auxiliari" 
[11] "sampl"      "handl"      "twophas"    "revisit"    "mild"      
[16] "benchmark"  "rich"       "life"       "probabl"    "ensur"     

 [1] "claim"     "insur"     "vehicl"    "damag"     "age"       "year"     
 [7] "turn"      "compani"   "detail"    "tail"      "sever"     "coverag"  
[13] "record"    "risk"      "price"     "financi"   "describ"   "major"    
[19] "gender"    "discount"  "logit"     "amount"    "person"    "kind"     
[25] "multinomi" "frequenc"  "justif"    "surpris"   "binomi"    "oil"      
[31] "pointwis"  "split"     "negat"    

[1] "logit"       "finitesampl" "root"        "probit"      "variat"     
[6] "mix"         "fraction"    "multinomi"  

 [1] "expenditur"   "physician"    "servic"       "skew"         "care"        
 [6] "lognorm"      "profil"       "conduct"      "patient"      "person"      
[11] "contribut"    "health"       "randomeffect" "smoke"        "fact"        
[16] "survey"       "manag"        "incur"        "medic"        "debat"       
[21] "custom"       "qualiti"      "topic"        "industri"     "appropri"    
[26] "pulmonari"    "conceptu"     "monitor"      "regard"       "prescrib"    
[31] "subsequ"      "way"          "financi"      "hierarch"     "lung"        
[36] "percentil"    "attribut"     "closedform"  

[1] "confid"    "interv"    "construct" "coverag"   "bootstrap" "region"   

 [1] "singleindex" "unknown"     "link"        "compon"      "equat"      
 [6] "function"    "varianc"     "nonparametr" "beta"        "femal"      
[11] "structur"    "smaller"     "compos"      "vectorvalu"  "eigenfunct" 
[16] "composit"    "econometr"  

[1] "finitesampl" "propos"     

 [1] "wavelet"    "adapt"      "besov"      "minimax"    "ball"      
 [6] "threshold"  "rang"       "nois"       "wide"       "unknown"   
[11] "rate"       "risk"       "bound"      "deconvolut" "smooth"    
[16] "problem"    "function"   "signal"     "white"      "converg"   
[21] "gaussian"   "transform"  "recov"      "densiti"    "shape"     
[26] "view"       "noisi"      "discret"    "nearoptim"  "spars"     
[31] "blur"       "fourier"    "decay"      "upper"      "convolut"  
[36] "invers"    

 [1] "robin"      "miss"       "zhao"       "rotnitzki"  "effici"    
 [6] "weight"     "casecohort" "design"     "invers"     "twophas"   
[11] "cohort"     "random"     "causal"     "outcom"     "biometrika"
[16] "prentic"    "calcul"     "purpos"     "confound"   "lemma"     
[21] "mar"        "exemplifi"  "suit"       "amer"       "assoc"     
[26] "proceed"    "summar"     "cox"        "ser"        "soc"       
[31] "roy"        "iid"        "appear"     "unbias"    

[1] "maximum"    "likelihood" "estim"     

[1] "dimensionreduct" "invers"          "dimens"          "factor"         
[5] "highdimension"   "chisquar"        "reduct"         

[1] "lin"        "addit"      "work"       "carrol"     "bone"      
[6] "transplant" "margin"    

 [1] "withinclust" "cluster"     "correl"      "account"     "hamper"     
 [6] "frequent"    "carri"       "frailti"     "parsimoni"   "abil"       
[11] "birth"       "ill"         "generalis"   "impact"      "intuit"     
[16] "achiev"     

[1] "chi"       "test"      "distribut" "space"     "ratio"     "restrict" 
[7] "statist"  

[1] "coeffici" "regress" 

 [1] "norm"          "matrix"        "frobenius"     "rank"         
 [5] "matric"        "nuclear"       "bound"         "regular"      
 [9] "low"           "optim"         "nonasymptot"   "highdimension"
[13] "convex"        "spars"         "minimax"       "noisi"        
[17] "element"       "minim"         "error"         "singular"     
[21] "setup"         "vector"        "theori"        "precis"       
[25] "autoregress"   "predict"      

 [1] "minimax" "rate"    "densiti" "optim"   "adapt"   "unknown" "estim"  
 [8] "loss"    "converg" "class"   "prove"   "bound"  

[1] "unequ"     "designbas" "survey"    "weight"   

[1] "auxiliari" "survey"    "varianc"   "variabl"   "sampl"     "weight"   
[7] "design"    "calibr"    "popul"    

[1] "variancecovari" "matrix"         "analyz"        

[1] "contamin"    "robust"      "water"       "influenc"    "explanatori"

[1] "bspline" "kernel"  "penal"  

[1] "varianc"  "asymptot"

 [1] "eigenfunct" "function"   "princip"    "compon"     "random"    
 [6] "analysi"    "data"       "smooth"     "eigenvalu"  "deriv"     
[11] "curv"       "spars"      "trajectori" "space"      "score"     

 [1] "forecast"    "predict"     "weather"     "spatial"     "wind"       
 [6] "probabilist" "northwest"   "calibr"      "pacif"       "meteorolog" 
[11] "temperatur"  "speed"       "hour"        "energi"      "atmospher"  
[16] "averag"      "ensembl"     "geostatist"  "futur"       "center"     
[21] "north"       "precipit"    "accur"       "tempor"      "daili"      
[26] "event"       "resourc"     "site"        "american"    "state"      
[31] "sharp"       "spacetim"    "qualiti"     "climat"      "ozon"       
[36] "concentr"    "generat"     "regim"       "transport"   "season"     
[41] "shortterm"   "determinist" "input"      

 [1] "highfrequ" "volatil"   "financi"   "asset"     "price"     "lowfrequ" 
 [7] "exchang"   "nois"      "dynam"     "market"    "matrix"    "stock"    
[13] "period"    "daili"     "realiz"    "pool"      "matric"    "variat"   
[19] "diffus"   

 [1] "earthquak"      "process"        "discrimin"      "seri"          
 [5] "featur"         "explos"         "event"          "time"          
 [9] "form"           "california"     "spectra"        "transform"     
[13] "background"     "extract"        "occurr"         "intens"        
[17] "diverg"         "wavelet"        "step"           "occur"         
[21] "decomposit"     "thin"           "separ"          "basi"          
[25] "multidimension" "spacetim"       "rate"           "poisson"       
[29] "residu"         "spectrum"       "goal"           "rescal"        
[33] "magnitud"       "evolutionari"   "purpos"         "homogen"       

 [1] "climat"      "chang"       "temperatur"  "greenhous"   "global"     
 [6] "earth"       "trend"       "uncertainti" "increas"     "atmospher"  
[11] "northern"    "quantifi"    "reconstruct" "futur"       "separ"      
[16] "tempor"     

 [1] "motif"      "gene"       "sequenc"    "regul"      "transcript"
 [6] "bind"       "dna"        "protein"    "cluster"    "factor"    
[11] "nucleotid"  "discoveri"  "conserv"    "short"      "high"      
[16] "call"       "pattern"    "dirichlet"  "biolog"     "site"      
[21] "process"    "genom"      "mixtur"     "width"      "vari"      
[26] "priori"     "hierarch"   "strategi"   "cell"       "databas"   
[31] "repres"     "organ"      "delet"      "matric"     "similar"   
[36] "gibb"       "switch"     "technolog"  "generat"    "segment"   
[41] "refin"      "aid"        "substant"   "stochast"   "live"      
[46] "group"      "core"       "regulatori"

 [1] "wishart"    "graph"      "cone"       "famili"     "graphic"   
 [6] "matric"     "conjug"     "paramet"    "prior"      "gaussian"  
[11] "covari"     "matrix"     "decompos"   "edg"        "definit"   
[16] "homogen"    "paper"      "shape"      "invers"     "correspond"
[21] "standard"   "ann"        "posit"      "equal"      "space"     
[26] "respect"    "eigenvalu"  "zero"       "sigma"      "dimens"    
[31] "bay"        "chisquar"   "miss"       "form"       "precis"    
[36] "flexibl"    "distinct"   "close"     

 [1] "pca"          "princip"      "compon"       "matrix"       "eigenvector" 
 [6] "analysi"      "eigenvalu"    "reduct"       "dimension"    "set"         
[11] "perturb"      "size"         "transit"      "dimens"       "spike"       
[16] "direct"       "maxim"        "hold"         "popul"        "tool"        
[21] "tree"         "high"         "theorem"      "geometr"      "succeed"     
[26] "sharp"        "logp"         "oil"          "embed"        "evolutionari"

[1] "dirichlet" "process"   "mixtur"    "prior"     "bayesian"  "hierarch" 
[7] "posterior" "cluster"  

 [1] "famili"        "subfamili"     "symmetr"       "asymmetr"     
 [5] "skew"          "reparameter"   "discuss"       "transform"    
 [9] "properti"      "explor"        "mise"          "urn"          
[13] "behav"         "generat"       "pursu"         "adequ"        
[17] "distribut"     "adopt"         "emphasi"       "symmetri"     
[21] "map"           "submodel"      "option"        "stateoftheart"
[25] "heavytail"     "superior"      "attract"       "tractabl"     
[29] "place"         "member"        "counterpart"   "spacetim"     

[1] "bar"    "vertic" "cap"    "lambda"

[1] NA

[1] NA

 [1] "paradox"     "prior"       "surrog"      "true"        "bay"        
 [6] "posit"       "criteria"    "frequentist" "jeffrey"     "sign"       
[11] "point"       "avoid"       "causal"      "turn"        "negat"      
[16] "invari"     

 [1] "probab"  "appl"    "proc"    "situat"  "ann"     "shape"   "field"  
 [8] "point"   "gamma"   "univari" "roy"    

 [1] "chart"       "cusum"       "detect"      "shift"       "cumul"      
 [6] "control"     "sum"         "base"        "perform"     "length"     
[11] "refer"       "averag"      "ratio"       "monitor"     "likelihood" 
[16] "convent"     "delta"       "infin"       "articl"      "event"      
[21] "outlier"     "stop"        "alarm"       "changepoint" "small"      

 [1] "twoparamet" "focus"      "famili"     "choos"      "exampl"    
 [6] "basic"      "desir"      "popular"    "express"    "tune"      
[11] "stepup"     "compromis"  "conserv"    "shortcom"   "represent" 
[16] "lifetim"    "priori"     "meaning"    "prefer"     "segment"   
[21] "stepwis"    "convolut"   "feasibl"    "bay"       

[1] NA

 [1] "manifold"   "space"      "intrins"    "metric"     "shape"     
 [6] "riemannian" "tensor"     "euclidean"  "matric"     "diagnost"  
[11] "geodes"     "develop"    "planar"     "sphere"     "examin"    
[16] "imag"       "perturb"    "human"      "embed"      "gender"    
[21] "medic"      "dimens"     "differenti" "diffus"    

[1] "kendal"  "tau"     "truncat" "copula"  "shape"   "densiti" "symmetr"
[8] "reli"    "angl"   

 [1] "improp"    "proprieti" "posterior" "uniform"   "proper"    "prior"    
 [7] "miss"      "suffici"   "theorem"   "character" "complet"   "carri"    
[13] "examin"    "colon"     "beta"      "dataset"   "cumul"     "tree"     
[19] "glms"     

[1] "ser"     "soc"     "roy"     "stat"    "ann"     "particl" "central"
[8] "util"    "statist"

[1] "iid"   "prove"

 [1] "classifi"        "distancebas"     "centroid"        "classif"        
 [5] "discrimin"       "popul"           "vector"          "distanc"        
 [9] "theoret"         "machin"          "support"         "heavytail"      
[13] "median"          "differ"          "difficulti"      "popular"        
[17] "convent"         "replac"          "componentwis"    "produc"         
[21] "accumul"         "closest"         "varieti"         "truncat"        
[25] "poor"            "entail"          "highdimension"   "insensit"       
[29] "allevi"          "excess"          "problemat"       "today"          
[33] "euclidean"       "encount"         "inconsist"       "caus"           
[37] "suffer"          "nearest"         "counterpart"     "volatil"        
[41] "argument"        "alloc"           "straightforward" "attempt"        
[45] "frequent"        "boundari"        "believ"          "help"           
[49] "case"            "inher"           "neighbour"      

 [1] "administr"      "fda"            "secondari"      "endpoint"      
 [5] "drug"           "efficaci"       "food"           "health"        
 [9] "combin"         "record"         "agent"          "trial"         
[13] "clinic"         "benefit"        "primari"        "adjust"        
[17] "databas"        "prevent"        "path"           "cardiovascular"
[21] "make"           "separ"          "report"         "perspect"      
[25] "decis"          "simplifi"       "safeti"         "maintain"      

 [1] "supremum"    "shift"       "dataset"     "changepoint" "power"      
 [6] "test"        "debat"       "logrank"     "north"       "window"     
[11] "categor"     "record"      "speed"       "wind"        "controversi"
[16] "frequenc"    "elabor"      "opposit"     "pearson"     "discontinu" 
[21] "cumul"       "attribut"    "multinomi"   "bridg"       "mainten"    
[26] "formula"     "conclus"     "rigor"       "appear"      "sum"        
[31] "brownian"    "statist"     "strength"    "chisquar"    "autocovari" 
[36] "sequenc"     "receiv"     

[1] "theta"     "paramet"   "cap"       "distribut" "vector"    "unknown"  
[7] "nuisanc"  

 [1] "genet"       "loci"        "trait"       "diseas"      "quantit"    
 [6] "linkag"      "map"         "allel"       "phenotyp"    "gene"       
[11] "pedigre"     "popul"       "marker"      "associ"      "genotyp"    
[16] "frequenc"    "chromosom"   "locus"       "polymorph"   "genom"      
[21] "complex"     "haplotyp"    "interact"    "casecontrol" "involv"     
[26] "domin"       "individu"   

[1] "goodnessoffit" "test"          "includ"        "residu"       

[1] NA

 [1] "selector"    "dantzig"     "lregular"    "extend"      "path"       
 [6] "result"      "bound"       "nonasymptot" "uncertainti" "angl"       
[11] "remark"      "tune"        "entir"       "final"       "question"   
[16] "cost"        "principl"   

 [1] "subtl"    "jin"      "nonzero"  "critic"   "fraction" "boundari"
 [7] "tukey"    "higher"   "signific" "succeed"  "detect"   "normal"  
[13] "region"   "interest" "precis"   "amplitud" "alpha"    "concept" 
[19] "sparsiti" "concern"  "mention"  "high"     "work"     "resolv"  
[25] "nonnul"   "bodi"     "lower"   

 [1] "expert"      "languag"     "uncertainti" "abil"        "learn"      
 [6] "elicit"      "intermitt"   "system"      "natur"       "kind"       
[11] "amount"      "inform"      "peopl"       "mathemat"    "make"       
[16] "histor"      "need"        "content"     "respond"     "grow"       
[21] "happen"     

 [1] "absolut"       "deviat"        "clip"          "smooth"       
 [5] "scad"          "oracl"         "size"          "true"         
 [9] "microarray"    "nonzero"       "dimens"        "fan"          
[13] "highdimension" "identifi"      "sparsiti"      "confirm"      
[17] "slowli"        "larger"       

[1] "size"   "sampl"  "number"

[1] NA

[1] "spectral"   "densiti"    "time"       "seri"       "domain"    
[6] "stationari" "frequenc"  

[1] "tilt"       "exponenti"  "constraint" "employ"    

 [1] "earn"       "person"     "interview"  "employ"     "document"  
 [6] "survey"     "health"     "level"      "census"     "peopl"     
[11] "report"     "incom"      "higher"     "educ"       "feder"     
[16] "sensit"     "preval"     "analys"     "conduct"    "famili"    
[21] "imput"      "year"       "key"        "sourc"      "total"     
[26] "file"       "instrument" "ratio"      "status"     "encourag"  
[31] "nation"     "way"        "subsequ"    "monitor"    "lower"     
[36] "item"       "accept"     "multipli"   "rich"       "violat"    
[41] "previous"  

 [1] "statistician" "polici"       "scienc"       "statist"      "decis"       
 [6] "role"         "today"        "technolog"    "scientif"     "maker"       
[11] "bring"        "challeng"     "scientist"    "inform"       "integr"      
[16] "communic"     "individu"     "increas"      "knowledg"     "polit"       
[21] "live"         "disciplin"    "address"      "social"       "effort"      
[26] "essenti"      "organ"        "solv"         "engin"        "student"     
[31] "opportun"     "impact"       "face"         "grow"         "chang"       
[36] "play"         "govern"       "american"     "countri"      "mathemat"    
[41] "closer"       "centuri"      "modern"       "intern"       "spread"      
[46] "human"        "relev"        "ingredi"      "place"        "public"      
[51] "devic"        "success"      "explor"       "pressur"      "guarante"    
[56] "imposs"       "train"        "view"         "excel"        "presidenti"  
[61] "progress"     "edg"          "way"          "genom"        "support"     
[66] "communiti"    "promot"       "action"       "advanc"       "map"         
[71] "understand"  

 [1] "toxic"      "dose"       "trial"      "dosefind"   "phase"     
 [6] "clinic"     "target"     "design"     "probabl"    "escal"     
[11] "assign"     "patient"    "reassess"   "continu"    "ethic"     
[16] "prespecifi" "common"     "enhanc"     "concern"    "robust"    
[21] "parallel"   "previous"   "overcom"    "coher"      "variant"   
[26] "competit"  

 [1] "elect"      "vote"       "poll"       "evid"       "candid"    
 [6] "presidenti" "count"      "station"    "forecast"   "proport"   
[11] "polit"      "prefer"     "counti"     "record"     "lower"     

[1] NA

[1] NA

[1] NA

 [1] "delay"         "combin"        "issu"          "activ"        
 [5] "unit"          "year"          "monitor"       "program"      
 [9] "incid"         "concern"       "major"         "servic"       
[13] "surveil"       "develop"       "registri"      "populationbas"
[17] "trend"         "reason"       

[1] "laplac"    "approxim"  "posterior" "integr"    "mode"     

[1] "subjectspecif"    "random"           "longitudin"       "correl"          
[5] "populationaverag" "latent"           "logist"           "followup"        

[1] NA

[1] NA

[1] "oneparamet" "famili"     "normal"     "general"    "exponenti" 
[6] "detect"     "binomi"    

 [1] "intersect"  "close"      "hypothes"   "familywis"  "bonferroni"
 [6] "logic"      "critic"     "requir"     "elementari" "multipl"   
[11] "monoton"    "holm"       "valu"       "principl"  

 [1] "dichotom"    "exposur"     "outcom"      "genet"       "interact"   
 [6] "inherit"     "factor"      "alcohol"     "confound"    "trait"      
[11] "categor"     "assess"      "presenc"     "binari"      "ordin"      
[16] "disord"      "environment" "topic"       "examin"      "geneenviron"
[21] "cancer"      "causal"      "adequ"       "stage"       "alter"      
[26] "intermedi"   "conduct"     "continu"     "subgroup"    "postul"     
[31] "misspecif"  

[1] "virus"        "human"        "immunodefici" "hiv"          "infect"      
[6] "viral"       

 [1] "dropout"      "stratum"      "prevent"      "oil"          "reduc"       
 [6] "cancer"       "prostat"      "longitudin"   "find"         "adjust"      
[11] "stratifi"     "nuisanc"      "men"          "arm"          "trial"       
[16] "randomeffect" "mechan"       "sever"        "verif"        "frequent"    
[21] "conjectur"    "grade"        "colleagu"     "annual"       "agent"       
[26] "placebo"      "volum"        "drawn"        "doubleblind"  "caus"        
[31] "absolut"      "preval"       "daili"        "lie"          "reduct"      

 [1] "slice"   "invers"  "dimens"  "reduct"  "regress" "method"  "central"
 [8] "direct"  "goal"    "subspac"

[1] "band"   "consid"

 [1] "breakdown" "robust"    "point"     "outlier"   "definit"   "finit"    
 [7] "suggest"   "possess"   "previous"  "region"    "suffic"    "lead"     

 [1] "spacetim"    "site"        "spatial"     "monitor"     "year"       
 [6] "tempor"      "separ"       "fit"         "smoother"    "trend"      
[11] "ozon"        "environment" "relat"       "meteorolog"  "space"      
[16] "arbitrari"   "indic"       "daili"       "interact"    "wind"       
[21] "autoregress" "avoid"       "cross"      

 [1] "census"   "survey"   "bureau"   "relat"    "area"     "count"   
 [7] "incorpor" "collect"  "labor"    "protect"  "race"    

[1] "microarray" "gene"       "express"    "analysi"    "differenti"
[6] "data"       "experi"    

[1] "root"  "squar"

 [1] "pathway"       "biolog"        "pattern"       "presenc"      
 [5] "latent"        "gene"          "viral"         "protein"      
 [9] "biomark"       "initi"         "infect"        "therapi"      
[13] "understand"    "pronounc"      "supplementari" "concentr"     

[1] "establish"

[1] "bootstrap" "distribut"

 [1] "imag"   "magnet" "reson"  "field"  "brain"  "fmri"   "activ"  "signal"
 [9] "voxel"  "detect" "locat"  "volum" 

 [1] "alloc"         "responseadapt" "treatment"     "random"       
 [5] "design"        "optim"         "trial"         "clinic"       
 [9] "proport"       "target"        "criteria"      "coin"         
[13] "power"         "sequenti"      "procedur"      "rule"         
[17] "assign"        "taylor"        "relationship"  "reli"         
[21] "expans"        "bias"          "failur"        "patient"      
[25] "induc"         "paper"         "author"        "lower"        
[29] "efron"         "binari"        "expect"        "discontinu"   
[33] "prefer"        "nondifferenti" "earlier"       "lot"          
[37] "stop"          "stage"        

 [1] "design"        "aberr"         "factori"       "minimum"      
 [5] "construct"     "factor"        "theori"        "fraction"     
 [9] "doubl"         "project"       "pattern"       "run"          
[13] "twolevel"      "complementari" "defin"         "repeat"       
[17] "maxim"         "link"          "criteria"      "ident"        
[21] "import"       

 [1] "electr"        "load"          "power"         "forecast"     
 [5] "bivari"        "daili"         "market"        "wind"         
 [9] "serial"        "shortterm"     "speed"         "diagon"       
[13] "difficult"     "temperatur"    "price"         "season"       
[17] "spectrum"      "heteroscedast" "regressor"     "firstord"     
[21] "peak"          "citi"          "vari"          "justifi"      
[25] "highlight"     "energi"       

 [1] "day"       "daili"     "activ"     "financi"   "peak"      "record"   
 [7] "help"      "character" "account"   "appropri" 

[1] "secondord" "firstord" 

 [1] "school"     "promot"     "assign"     "treatment"  "grade"     
 [6] "children"   "score"      "averag"     "student"    "outcom"    
[11] "potenti"    "polici"     "propens"    "causal"     "retain"    
[16] "evid"       "child"      "regim"      "program"    "stratif"   
[21] "block"      "unit"       "rubin"      "nation"     "plausibl"  
[26] "stage"      "summar"     "multilevel" "educ"       "affect"    
[31] "fewer"      "stabl"      "impos"      "year"       "scalar"    
[36] "twostag"    "articl"     "consid"     "effect"     "learn"     
[41] "intermedi"  "low"        "pretreat"   "confound"   "track"     

 [1] "extrapol"      "errorpron"     "posttreat"     "baselin"      
 [5] "subsampl"      "instrument"    "replic"        "classic"      
 [9] "treatment"     "nonlinear"     "bias"          "daili"        
[13] "summari"       "air"           "encount"       "efficaci"     
[17] "heteroscedast" "supplementari" "spheric"       "frequenc"     
[21] "trajectori"    "multiscal"     "correct"       "subset"       
[25] "scatter"       "temperatur"   

 [1] "admiss"      "inadmiss"    "endpoint"    "loss"        "risk"       
 [6] "pearson"     "action"      "genom"       "screen"      "ann"        
[11] "biometrika"  "bay"         "amer"        "assoc"       "accept"     
[16] "math"        "paper"       "complet"     "character"   "stepup"     
[21] "revisit"     "stringent"   "metaanalysi" "year"        "thought"    
[26] "hard"        "nonneg"      "share"       "nonzero"     "fisher"     
[31] "formul"      "upper"       "reject"     

 [1] "unbound"   "novelti"   "oracl"     "function"  "anisotrop" "tail"     
 [7] "inequ"     "aforement" "satisfi"   "literatur" "decreas"   "vast"     
[13] "fast"      "setup"     "bivari"    "slower"    "free"      "input"    
[19] "output"    "aggreg"    "yield"     "iii"       "behav"     "residu"   
[25] "main"      "inform"    "univari"   "univers"  

 [1] "schedul"         "longitudin"      "miss"            "respons"        
 [5] "incomplet"       "analys"          "followup"        "missingatrandom"
 [9] "assess"          "intermitt"       "ill"             "account"        
[13] "avail"           "data"            "impact"          "offer"          
[17] "missingdata"     "joint"           "visit"           "unbalanc"       
[21] "merg"            "indic"           "naiv"            "equat"          
[25] "appeal"         

[1] "real"  "simul" "data" 

[1] "misspecifi" "robust"    

Constant variance

I thought I would try constant variance flash to see what happens (no need to regularize tau this way). It turns out to fit a very large number of single word factors… I ran it with Kmax=200 and it fit all 200 factors. I do just 30 here to illustrate more quickly. You can see it reduces the mean squared error compared with the “maximum likelihood” perhaps suggesting the greedy approach helps find a better fit?

fit.nn.s.v0 = flash(lmat_s_1,ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=0,greedy_Kmax = 30)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Wrapping up...
Nullchecking 30 factors...
[1] "estim"    "model"    "method"   "data"     "propos"   "function" "studi"   

[1] "test"     "procedur" "statist"  "null"    

[1] "treatment" "random"    "effect"    "studi"     "outcom"    "trial"    
[7] "design"   

[1] "model"    "bayesian"

[1] "select"  "variabl" "regress"

[1] "data"    "analysi"

[1] "function"

[1] "sampl" "size" 

[1] "problem"

[1] "statist"

[1] "time" "seri"

[1] "method"

[1] "design"

[1] "estim"

[1] "rate"    "converg"

[1] "propos"

[1] "gene"       "express"    "microarray"

[1] "approach"

[1] "distribut"

[1] "error"  "measur"

[1] "general"

[1] "develop"

[1] "covari"

[1] "number"

[1] "process"

[1] "risk"

[1] "space"

[1] "predict"

[1] "articl"

[1] "level"
[1] 0.01649761

Topic model

Here I fit a topic model with k= 100; this yields a visually better fit to large values.

fit_nmf_k100 = fit_poisson_nmf(mat,k=100,init.method="random")
Fitting rank-100 Poisson NMF to 1924 x 2172 sparse matrix.
Running 100 SCD updates, without extrapolation (fastTopics 0.6-158).
fvals.nmf.k100 = fit_nmf_k100$L %*% t(fit_nmf_k100$F)

I tried fitting flash to the transform of the fitted values. The rationale here is to use topic modelling to “denoise” the data and then transform the denoised data. However, there are computational issues with this approach in general… it seems like it will not be tractible in general because it cannot exploit sparsity, which is essential for big datasets. The keywords seem promising. Maybe we should experiment some more(?)

fit.nn.nmf.k100 = flash(log(fvals.nmf.k100+1),ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200,S=0.01)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Warning in scale.EF(EF): Fitting stopped after the initialization function
failed to find a non-zero factor.
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Nullchecking 51 factors...

 [1] "model"     "estim"     "data"      "method"    "studi"     "propos"   
 [7] "distribut" "statist"   "approach"  "function"  "asymptot"  "simul"    
[13] "general"   "base"      "sampl"     "problem"   "analysi"   "paramet"  
[19] "procedur"  "regress"   "test"     

 [1] "coeffici" "partial"  "hazard"   "proport"  "estim"    "model"   
 [7] "covari"   "surviv"   "studi"    "baselin"  "vari"     "regress" 

[1] "weight"  "miss"    "imput"   "handl"   "data"    "mechan"  "augment"
[8] "covari"  "effici" 

[1] "spars"    "lasso"    "select"   "sparsiti" "oracl"    "coeffici" "nonzero" 
[8] "adapt"   

[1] "local"     "kernel"    "bandwidth" "global"    "polynomi"  "estim"    
[7] "asymptot" 

[1] "likelihood" "maximum"    "ratio"      "estim"      "paramet"   
[6] "asymptot"   "distribut"  "normal"    

[1] "respons"   "predictor" "interpret" "regress"   "linear"    "function" 
[7] "anova"    

[1] "depend" "censor" "surviv" "copula" "compet" "bivari" "time"   "data"  

[1] "robust"    "correct"   "presenc"   "outcom"    "misspecif" "model"    
[7] "assumpt"  

[1] "smooth" "addit"  "spline" "select"

[1] "error"   "squar"   "measur"  "estim"   "predict" "price"  

[1] "group"   "activ"   "sourc"   "brain"   "imag"    "heart"   "analysi"
[8] "separ"  

[1] "structur"   "correl"     "screen"     "independ"   "longitudin"

 [1] "nonparametr"  "covari"       "parametr"     "semiparametr" "estim"       
 [6] "propos"       "model"        "function"     "asymptot"     "regress"     
[11] "effici"      

 [1] "procedur"  "control"   "fals"      "discoveri" "reject"    "test"     
 [7] "pvalu"     "fdr"       "rate"      "hypothes"  "multipl"   "null"     
[13] "power"     "conserv"  

[1] "matrix"    "covari"    "matric"    "eigenvalu" "vector"   

[1] "rank"     "sign"     "attribut" "rankbas" 

[1] "test"      "altern"    "hypothesi" "null"      "statist"   "power"    
[7] "hypothes"  "asymptot" 

[1] "popul"      "survey"     "calibr"     "sampl"      "nonrespons"
[6] "unit"       "auxiliari"  "census"     "modelbas"  

 [1] "project"   "depth"     "concept"   "robust"    "scatter"   "dispers"  
 [7] "trim"      "breakdown" "ellipt"    "definit"   "defin"     "equivari" 
[13] "median"    "point"     "introduc" 

[1] "high"          "dimens"        "dimension"     "reduct"       
[5] "invers"        "highdimension"

[1] "threshold" "rang"      "nois"      "signal"    "wavelet"   "wide"     
[7] "adapt"     "shrinkag" 

[1] "equat"      "stochast"   "dynam"      "diffus"     "differenti"
[6] "solut"      "infer"      "discret"   

[1] "select"  "penal"   "penalti" "variabl" "regular"

[1] "gaussian"    "fraction"    "expans"      "truncat"     "nongaussian"

[1] "bootstrap" "calcul"    "block"     "accuraci"  "resampl"   "mestim"   
[7] "accur"    

[1] "varianc" "mix"     "fix"     "sampl"   "outlier"

[1] "bayesian"  "prior"     "mixtur"    "posterior" "hierarch"  "model"    
[7] "dirichlet"

 [1] "point"     "prove"     "statist"   "consist"   "result"    "main"     
 [7] "condit"    "uniform"   "paper"     "weak"      "ann"       "assumpt"  
[13] "establish"

 [1] "implement" "nonlinear" "iter"      "step"      "easi"      "exploit"  
 [7] "filter"    "comput"    "algorithm" "recurs"   

[1] "theoret" "practic" "numer"   "improv"  "effici"  "adapt"  

[1] "sequenc"   "oper"      "volatil"   "financi"   "jump"      "surfac"   
[7] "pattern"   "highfrequ"

[1] "propos"   "procedur"

[1] "densiti"    "bound"      "constraint" "minimax"    "lower"     
[6] "upper"      "inequ"     

[1] "space"     "transform" "invari"   

[1] "compon"   "princip"  "analysi"  "function"

[1] "beta"     "bar"      "vertic"   "theta"    "cap"      "lambda"   "parallel"
[8] "vote"     "elect"   

[1] "class"   "unknown" "vector"  "element"

[1] "trend"    "tree"     "tempor"   "histor"   "time"     "year"     "spatial" 
[8] "spacetim" "season"  

[1] "seri"          "time"          "onlin"         "materi"       
[5] "autoregress"   "supplementari" "supplement"   

[1] "number" "size"   "larg"   "small"  "sampl" 

 [1] "factor"  "cancer"  "cure"    "breast"  "prostat" "incid"   "report" 
 [8] "diseas"  "assoc"   "amer"   

[1] "averag"   "diagnost" "imag"     "tensor"  

 [1] "scale"    "assess"   "distanc"  "continu"  "influenc" "degre"   
 [7] "perturb"  "tool"     "composit" "issu"     "freedom" 

[1] "approxim" "forecast" "accur"    "wind"     "speed"    "cost"    

[1] "framework" "area"      "unbias"    "unifi"     "basic"     "deal"     
[7] "great"    

[1] "variabl"     "latent"      "explanatori"

[1] "direct"   "type"     "classic"  "integr"   "locat"    "indirect" "claim"   

 [1] "effect"     "treatment"  "random"     "causal"     "assign"    
 [6] "outcom"     "assumpt"    "infer"      "instrument" "bias"      
[11] "studi"     

[1] "trial"     "treatment" "clinic"    "patient"   "stage"     "alloc"    
[7] "arm"       "placebo"  

[1] "design"     "orthogon"   "experiment" "balanc"     "nest"      
[6] "construct" 
  fv= fitted(fit.nn.nmf.k100)
  sub = sample(1:length(fv),100000)

Version Author Date
0346f50 Matthew Stephens 2023-11-08

Anscombe transform

This is a very brief look at the anscombe transformation for comparison:

fit.nn.a = flash(sqrt(mat+3/8),ebnm_fn = c(ebnm::ebnm_point_exponential,ebnm::ebnm_point_exponential),var_type=2,greedy_Kmax = 200, S=0.01)
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Adding factor 29 to flash object...
Adding factor 30 to flash object...
Adding factor 31 to flash object...
Adding factor 32 to flash object...
Adding factor 33 to flash object...
Adding factor 34 to flash object...
Adding factor 35 to flash object...
Adding factor 36 to flash object...
Adding factor 37 to flash object...
Adding factor 38 to flash object...
Adding factor 39 to flash object...
Adding factor 40 to flash object...
Adding factor 41 to flash object...
Adding factor 42 to flash object...
Adding factor 43 to flash object...
Adding factor 44 to flash object...
Adding factor 45 to flash object...
Adding factor 46 to flash object...
Adding factor 47 to flash object...
Adding factor 48 to flash object...
Adding factor 49 to flash object...
Adding factor 50 to flash object...
Adding factor 51 to flash object...
Adding factor 52 to flash object...
Adding factor 53 to flash object...
Adding factor 54 to flash object...
Adding factor 55 to flash object...
Adding factor 56 to flash object...
Adding factor 57 to flash object...
Adding factor 58 to flash object...
Adding factor 59 to flash object...
Adding factor 60 to flash object...
Adding factor 61 to flash object...
Adding factor 62 to flash object...
Adding factor 63 to flash object...
Adding factor 64 to flash object...
Adding factor 65 to flash object...
Adding factor 66 to flash object...
Adding factor 67 to flash object...
Adding factor 68 to flash object...
Adding factor 69 to flash object...
Adding factor 70 to flash object...
Adding factor 71 to flash object...
Adding factor 72 to flash object...
Adding factor 73 to flash object...
Adding factor 74 to flash object...
Adding factor 75 to flash object...
Adding factor 76 to flash object...
Adding factor 77 to flash object...
Adding factor 78 to flash object...
Adding factor 79 to flash object...
Adding factor 80 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Nullchecking 79 factors...
  [1] "estim"        "model"        "data"         "method"       "propos"      
  [6] "function"     "studi"        "distribut"    "sampl"        "paramet"     
 [11] "simul"        "test"         "statist"      "asymptot"     "regress"     
 [16] "approach"     "problem"      "base"         "general"      "procedur"    
 [21] "analysi"      "variabl"      "condit"       "covari"       "likelihood"  
 [26] "develop"      "observ"       "time"         "set"          "random"      
 [31] "perform"      "process"      "select"       "consist"      "applic"      
 [36] "illustr"      "linear"       "error"        "properti"     "comput"      
 [41] "case"         "rate"         "number"       "appli"        "infer"       
 [46] "effici"       "nonparametr"  "measur"       "algorithm"    "articl"      
 [51] "effect"       "class"        "deriv"        "depend"       "paper"       
 [56] "compar"       "provid"       "includ"       "normal"       "probabl"     
 [61] "optim"        "bayesian"     "approxim"     "varianc"      "design"      
 [66] "compon"       "assumpt"      "larg"         "structur"     "size"        
 [71] "smooth"       "predict"      "demonstr"     "independ"     "addit"       
 [76] "point"        "respons"      "construct"    "empir"        "exist"       
 [81] "converg"      "prior"        "densiti"      "introduc"     "standard"    
 [86] "correl"       "methodolog"   "local"        "maximum"      "treatment"   
 [91] "multipl"      "theoret"      "parametr"     "combin"       "requir"      
 [96] "investig"     "establish"    "space"        "theori"       "common"      
[101] "term"         "matrix"       "real"         "limit"        "work"        
[106] "multivari"    "practic"      "bias"         "finit"        "level"       
[111] "control"      "altern"       "coeffici"     "discuss"      "framework"   
[116] "semiparametr" "order"        "assum"        "simpl"        "weight"      
[121] "carlo"        "form"         "mont"         "fit"          "robust"      
[126] "identifi"     "lead"         "adapt"        "improv"       "factor"      
[131] "small"        "high"         "direct"       "seri"         "techniqu"    
[136] "power"        "numer"        "cluster"      "spatial"      "involv"      
[141] "predictor"    "unknown"      "increas"     

[1] "miss"      "robin"     "rotnitzki" "zhao"     

[1] "cancer" "studi"  "diseas" "data"  

[1] "rightcensor"  "surviv"       "estim"        "semiparametr"

[1] "retail"   "deliveri" "tradit"   "frequenc" "servic"   "birth"    "tail"    
[8] "compani"  "differ"  

[1] "wilk" "test"

[1] "simex"  "measur"

[1] "select"  "lasso"   "spars"   "penalti" "penal"  

[1] "forecast"    "predict"     "probabilist"

 [1] "motif"      "cluster"    "gene"       "transcript" "factor"    
 [6] "bind"       "sequenc"    "protein"    "discoveri"  "regul"     
[11] "conserv"    "pattern"    "dirichlet"  "call"      

[1] "climat"     "temperatur" "chang"      "model"      "futur"     

[1] "nonrespons" "survey"     "imput"      "respons"   

[1] "missingdata" "covari"      "miss"        "mechan"     


[1] "markov"    "chain"     "mont"      "carlo"     "algorithm"

[1] "reml"      "smooth"    "criterion" "converg"   "akaik"     "maximum"  
[7] "direct"    "restrict"  "criteria" 

[1] "varyingcoeffici" "propos"         

[1] "hazard"  "proport" "surviv"  "time"   

[1] "nconsist"

[1] "elicit"   "interact" "exposur"  "prone"   

[1] "mles"       "likelihood"

[1] "singleindex"

[1] "semiparametr" "estim"        "model"       

 [1] "claim"  "insur"  "vehicl" "type"   "age"    "damag"  "year"   "turn"  
 [9] "detail" "experi"

[1] "pollut"   "air"      "nation"   "mortal"   "confound" "coeffici" "time"    

[1] "depth"    "project"  "function" "robust"  

[1] "loglinear" "model"     "tabl"     

 [1] "procedur"  "fals"      "control"   "test"      "reject"    "hypothes" 
 [7] "rate"      "discoveri" "null"      "multipl"   "pvalu"     "fdr"      
[13] "kfwer"     "stepdown"  "number"    "fwer"      "depend"   

[1] "spacetim" "site"     "time"    

 [1] "loci"         "genet"        "popul"        "genom"        "allel"       
 [6] "map"          "outlier"      "region"       "statist"      "diverg"      
[11] "relationship" "variat"      

[1] "dirichlet" "process"   "mixtur"    "prior"    

[1] "volatil"   "highfrequ" "asset"     "financi"   "price"     "matrix"   

[1] "bandwidth" "kernel"    "local"     "select"   

[1] "jackknif" "mix"      "squar"    "varianc"  "area"     "respons"  "uncondit"

[1] "tensor"      "diffus"      "imag"        "eigenvalu"   "eigenvector"
[6] "develop"     "nois"       

[1] "auxiliari" "survey"    "sampl"     "variabl"  

[1] "onestep"    "estim"      "likelihood"

 [1] "manifest" "variabl"  "latent"   "model"    "type"     "pseudo"  
 [7] "ordin"    "under"    "covari"   "induc"   

[1] "besov"      "wavelet"    "adapt"      "minimax"    "rang"      
[6] "deconvolut" "function"  


[1] "tau"     "yield"   "factor"  "month"   "truncat"

[1] "gee"     "equat"   "correl"  "binari"  "work"    "general"


[1] "propag"

[1] "homoscedast"

[1] "covari"    "error"     "errorpron" "studi"    

[1] "twostep"  "estim"    "submodel"

[1] "drift"   "process" "diffus" 

 [1] "flow"         "traffic"      "network"      "dynam"        "intervent"   
 [6] "causal"       "forecast"     "articl"       "identifi"     "manag"       
[11] "seri"         "relationship" "monitor"     

[1] "satur"    "shrinkag" "adapt"    "candid"   "oneway"  

[1] "quasilikelihood" "function"       

[1] "spatiotempor" "spatial"      "process"     

[1] "area"      "unemploy"  "benchmark" "census"   

 [1] "gene"       "microarray" "express"    "cdna"       "intens"    
 [6] "imag"       "normal"     "replic"     "array"      "background"
[11] "differenti" "outlier"   

[1] "taper"    "approxim" "matrix"   "covari"   "gaussian"

[1] "seem"   "spline"

[1] "errorsinvari" "error"       

[1] "nonnorm"

[1] "polynomi" "local"    "regress" 

[1] "axe"    "rotat"  "matric" "motion"

[1] "biascorrect"

[1] "equivari" "matrix"  

[1] "unbias" "estim" 

[1] "substitut"

[1] "equat" "estim"

[1] "trajectori" "function"   "time"       "longitudin" "data"      

[1] "test"      "null"      "hypothesi"

[1] "aic"       "select"    "criterion" "bic"       "akaik"    

[1] "nonidentifi" "identifi"   

[1] "net"     "elast"   "prior"   "regress" "path"   

[1] "instabl" "select"  "combin" 

[1] "robust" "out"    "curv"   "altern"

 [1] "depress"   "random"    "treatment" "care"      "patient"   "subject"  
 [7] "outcom"    "trial"     "adher"     "noncompli" "intervent" "health"   
[13] "meet"      "improv"    "receiv"    "primari"   "latent"   


 [1] "trait"       "alcohol"     "genet"       "ordin"       "exist"      
 [6] "associ"      "complex"     "famili"      "dichotom"    "environment"

[1] "vanish"    "interact"  "nonlinear"

[1] "agre"

[1] "posterior" "proprieti" "miss"      "dataset"   "improp"   

[1] "subgroup" "interact"
fv= fitted(fit.nn.a)
sub = sample(1:length(fv),100000)

