Last updated: 2021-05-22
Checks: 7 0
Knit directory: stat34800/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20180411)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 1c35453. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: analysis/currency_analysis.Rmd
Untracked: analysis/currency_read_transform.Rmd
Untracked: analysis/stocks_analysis.Rmd
Untracked: data/currency.csv
Untracked: data/prices.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/stocks.Rmd
) and HTML (docs/stocks.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 1c35453 | Matthew Stephens | 2021-05-22 | workflowr::wflow_publish(“stocks.Rmd”) |
html | 1dd5515 | Matthew Stephens | 2021-05-22 | Build site. |
Rmd | e6d3362 | Matthew Stephens | 2021-05-22 | workflowr::wflow_publish(“stocks.Rmd”) |
html | e0195f2 | Matthew Stephens | 2021-05-22 | Build site. |
Rmd | 147eb45 | Matthew Stephens | 2021-05-22 | workflowr::wflow_publish(“stocks.Rmd”) |
Here I download and save some stock price data. I got some help from https://www.codingfinance.com/post/2018-03-27-download-price/
Here are the stocks I download:
# AAPL: Apple
# NFLX: Netflix
# AMZN: Amazon
# MMM: 3M
# K: Kellogs
# O: Realty Income Corp
# NSRGY: Nestle
# LDSVF: Lindt
# JPM: JP Morgan Chase
# JNJ: Johnson and Johnson
# TSLA: Tesla
# V: Visa
# PFE: Pfizer
Here I use the quantmod package to download and save the data:
library(tidyquant)
Warning: package 'tidyquant' was built under R version 3.6.2
Loading required package: lubridate
Warning: package 'lubridate' was built under R version 3.6.2
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
Loading required package: PerformanceAnalytics
Loading required package: xts
Warning: package 'xts' was built under R version 3.6.2
Loading required package: zoo
Warning: package 'zoo' was built under R version 3.6.2
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Attaching package: 'PerformanceAnalytics'
The following object is masked from 'package:graphics':
legend
Loading required package: quantmod
Warning: package 'quantmod' was built under R version 3.6.2
Loading required package: TTR
Warning: package 'TTR' was built under R version 3.6.2
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
══ Need to Learn tidyquant? ════════════════════════════════════════════════════
Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
</> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
library(purrr)
Warning: package 'purrr' was built under R version 3.6.2
tickers = c("AAPL", "NFLX", "AMZN", "MMM", "K", "O", "NSRGY", "LDSVF", "JPM", "JNJ", "TSLA", "V", "PFE")
getSymbols(tickers)
'getSymbols' currently uses auto.assign=TRUE by default, but will
use auto.assign=FALSE in 0.5-0. You will still be able to use
'loadSymbols' to automatically load data. getOption("getSymbols.env")
and getOption("getSymbols.auto.assign") will still be checked for
alternate defaults.
This message is shown once per session and may be disabled by setting
options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
pausing 1 second between requests for more than 5 symbols
[1] "AAPL" "NFLX" "AMZN" "MMM" "K" "O" "NSRGY" "LDSVF" "JPM"
[10] "JNJ" "TSLA" "V" "PFE"
prices <- map(tickers,function(x) Ad(get(x))) # gets the adjusted prices of each stock
prices <- reduce(prices,merge)
colnames(prices) <- tickers
head(prices)
AAPL NFLX AMZN MMM K O NSRGY LDSVF
2007-01-03 2.573566 3.801429 38.70 52.71381 32.51891 13.37398 23.23697 NA
2007-01-04 2.630688 3.621429 38.90 52.50500 32.26753 13.47530 23.21061 NA
2007-01-05 2.611954 3.544286 38.37 52.14801 32.04837 13.07003 22.86792 NA
2007-01-08 2.624853 3.404286 37.50 52.26250 32.12571 13.03626 22.81520 NA
2007-01-09 2.842900 3.427143 37.78 52.32314 32.19663 13.17135 22.78884 NA
2007-01-10 2.978950 3.438571 37.15 52.43765 32.36421 13.22924 22.86792 NA
JPM JNJ TSLA V PFE
2007-01-03 33.34949 43.58500 NA NA 13.82006
2007-01-04 33.43275 44.12983 NA NA 13.86737
2007-01-05 33.15523 43.72942 NA NA 13.82532
2007-01-08 33.26624 43.65722 NA NA 13.75173
2007-01-09 33.12750 43.49312 NA NA 13.75698
2007-01-10 33.37030 43.42091 NA NA 13.77275
Some companies (eg TLSA) were not listed for the entire period avoilable. I’m going to narrow down the time window so no missing data. This ends up with the period 2010-10-15 to 2021-05-21.
nomiss = function(x){all(!is.na(x))}
prices = prices[apply(prices,1,nomiss),]
head(prices)
AAPL NFLX AMZN MMM K O NSRGY LDSVF
2010-10-15 9.665921 22.24571 164.64 66.58378 35.79071 21.30972 40.26300 2262.3
2010-10-18 9.766039 21.85714 163.56 66.97220 35.80500 21.59911 40.36640 2262.3
2010-10-19 9.504692 21.33286 158.67 66.14308 35.66207 21.46981 39.44311 2262.3
2010-10-20 9.536628 21.87857 158.67 66.83025 35.74784 21.95006 40.10788 2262.3
2010-10-21 9.505616 24.67000 164.97 67.63698 35.55486 21.89465 40.78743 2262.3
2010-10-22 9.442654 24.01429 169.13 67.55482 35.41193 21.85154 39.79028 2262.3
JPM JNJ TSLA V PFE
2010-10-15 27.92472 46.70719 4.108 17.96467 11.31108
2010-10-18 28.71398 46.92027 4.046 18.14062 11.34294
2010-10-19 28.33062 46.50147 4.010 17.95310 11.08167
2010-10-20 28.63881 46.72923 4.130 18.40916 11.25372
2010-10-21 28.33813 47.01579 4.150 18.33507 11.23461
2010-10-22 28.33813 46.88353 4.144 18.35591 11.15177
tail(prices)
AAPL NFLX AMZN MMM K O NSRGY LDSVF JPM
2021-05-14 127.45 493.37 3222.90 202.8870 66.54 65.34 120.89 9200.0 164.01
2021-05-17 126.27 488.94 3270.39 203.6117 66.32 65.40 121.31 9300.0 164.67
2021-05-18 124.85 486.28 3232.28 201.5469 66.04 65.48 121.57 9300.0 162.35
2021-05-19 124.69 487.70 3231.80 201.1200 66.04 65.66 121.07 9299.8 161.11
2021-05-20 127.31 501.67 3247.68 201.6500 66.32 66.91 122.78 9250.0 160.83
2021-05-21 125.43 497.89 3203.08 201.8600 66.50 66.58 123.07 9250.0 162.66
JNJ TSLA V PFE
2021-05-14 170.22 589.74 226.94 40.02
2021-05-17 170.39 576.83 226.44 40.11
2021-05-18 170.45 577.87 225.57 40.05
2021-05-19 170.08 563.46 224.59 39.83
2021-05-20 171.07 586.78 226.44 40.12
2021-05-21 170.96 580.88 226.77 39.95
write.csv(prices, file="../data/prices.csv",quote=FALSE,row.names=FALSE)
Here are brief suggestions for reading and processing the data. First since stocks are positive, and what generally matters in stocks is percentage change in price, it makes sense to take logs and look at differences. These are called “log-returns” in finance. Here I load in the data and compute the log returns.
prices = read.csv("../data/prices.csv")
log_prices = log(prices)
log_returns = apply(log_prices,2, diff)
Note that while the stock prices from day to day are highly correlated (as they form a time series), log returns are much less correlated (and so easier to study and model statistically).
plot(log_prices[,1], main="Apple (log) stock price over time")
plot(log_returns[,1], main="Apple log-returns over time")
One question of interest is to what extent changes in stocks are correlated with one another. Here is an initial plot showing some of the correlation structure in the data….
S = cor(log_returns)
heatmap(S, xlab = names(prices), symm=TRUE)
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_0.3.4 tidyquant_1.0.3
[3] quantmod_0.4.18 TTR_0.24.2
[5] PerformanceAnalytics_2.0.4 xts_0.12.1
[7] zoo_1.8-8 lubridate_1.7.9.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 pillar_1.4.6 compiler_3.6.0 later_1.1.0.1
[5] git2r_0.27.1 workflowr_1.6.2 tools_3.6.0 digest_0.6.27
[9] gtable_0.3.0 jsonlite_1.7.2 evaluate_0.14 lifecycle_1.0.0
[13] tibble_3.0.4 lattice_0.20-41 pkgconfig_2.0.3 rlang_0.4.10
[17] cli_2.4.0 rstudioapi_0.13 curl_4.3 yaml_2.2.1
[21] xfun_0.16 dplyr_1.0.2 httr_1.4.2 stringr_1.4.0
[25] knitr_1.29 generics_0.0.2 fs_1.5.0 vctrs_0.3.8
[29] tidyselect_1.1.0 rprojroot_1.3-2 grid_3.6.0 glue_1.4.2
[33] R6_2.4.1 Quandl_2.10.0 rmarkdown_2.3 ggplot2_3.3.2
[37] magrittr_1.5 whisker_0.4 scales_1.1.1 backports_1.1.10
[41] promises_1.1.1 ellipsis_0.3.1 htmltools_0.5.0 colorspace_1.4-1
[45] httpuv_1.5.4 quadprog_1.5-8 stringi_1.4.6 munsell_0.5.0
[49] crayon_1.3.4