shorten readme for CRAN, minor corrections

jamesdunham · May 30, 2017 · 6523686 · 6523686
1 parent 7aba43f
commit 6523686
Show file tree

Hide file tree

Showing 2 changed files with 54 additions and 736 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -1,12 +1,12 @@
 ---
-output:
-  md_document:
-    variant: markdown_github
+output: github_document
 ---
 [![Build Status](https://travis-ci.org/jamesdunham/dgo.svg?branch=master)](https://travis-ci.org/jamesdunham/dgo)
 [![Build status](https://ci.appveyor.com/api/projects/status/1ta36kmoqen98k87?svg=true)](https://ci.appveyor.com/project/jamesdunham/dgo)
 [![codecov](https://codecov.io/gh/jamesdunham/dgo/branch/master/graph/badge.svg)](https://codecov.io/gh/jamesdunham/dgo)
 
+# Introduction
+
 dgo is an R package for the dynamic estimation of group-level opinion. The
 package can be used to estimate subpopulation groups' average latent
 conservatism (or other latent trait) from individuals' responses to dichotomous
@@ -44,317 +44,53 @@ knitr::opts_chunk$set(
 
 # Installation
 
-dgo requires a working installation of [RStan](http://mc-stan.org/interfaces/rstan.html).
-If you don't have already have RStan, follow its
-"[Getting Started](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started)"
-guide before continuing.
-
-dgo can be installed from [GitHub](https://github.com/jamesdunham/dgo) using
-[devtools](https://github.com/hadley/devtools/):
+dgo can be installed from CRAN:
 
 ```{r, eval = FALSE}
-if (!require(devtools, quietly = TRUE)) install.packages("devtools")
-devtools::install_github("jamesdunham/dgo")
+install.packages("dgo")
 ```
 
-# Getting started
+Or get the latest version from [GitHub](https://github.com/jamesdunham/dgo)
+using [devtools](https://github.com/hadley/devtools/):
 
-```{r}
-library(dgo)
+```{r, eval = FALSE}
+if (!require(devtools, quietly = TRUE)) install.packages("devtools")
+devtools::install_github("jamesdunham/dgo")
 ```
 
-The minimal workflow from raw data to estimation is:
-
-1.  shape input data using the `shape` function; and
-2.  pass the result to the `dgirt` function to estimate a latent trait (e.g.,
-    conservatism) or `dgmrp` function to estimate opinion on a single survey
-    question.
-
+dgo requires a working installation of [RStan](http://mc-stan.org/interfaces/rstan.html).
+If you don't have already have RStan, follow its
+"[Getting Started](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started)" guide.
 
-### Set RStan options
+# Usage
 
-These are RStan's recommended options on a local, multicore machine with excess
-RAM:
+Load the package and set RStan's recommended options for a local, multicore
+machine with excess RAM:
 
 ```{r}
+library(dgo)
 rstan_options(auto_write = TRUE)
 options(mc.cores = parallel::detectCores())
 ```
 
-## Abortion Attitudes
-
-### Prepare input data with `shape`
-
-DGIRT models are *dynamic*, so we need to specify which variable in the data
-represents time. They are also *group-level*, with groups defined by one
-variable for respondents' local geographic area and one or more variables for
-respondent characteristics.
-
-The `time_filter` and `geo_filter` arguments optionally subset the data.
-Finally, `shape` requires the names of the survey identifier and survey weight
-variables in the data.
-
-```{r}
-dgirt_in_abortion <- shape(opinion,
-                  item_names = "abortion",
-                  time_name = "year",
-                  geo_name = "state",
-                  group_names = "race3",
-                  geo_filter = c("CA", "GA", "LA", "MA"),
-                  id_vars = "source")
-```
-
-The reshaped and subsetted data can be summarized in a few ways before model
-fitting.
-
-```{r}
-summary(dgirt_in_abortion)
-```
-
-Response counts by state:
-
-```{r}
-get_n(dgirt_in_abortion, by = c("state"))
-```
-
-Response counts by item-year:
-
-```{r}
-get_item_n(dgirt_in_abortion, by = "year")
-```
-
-### Fit a model with `dgirt` or `dgmrp`
-
-`dgirt` and `dgmrp` fit estimation models to data from `shape`. `dgirt` can be
-used to estimate a latent variable based on responses to multiple survey
-questions (e.g., latent policy conservatism), while `dgmrp` can be used to
-estimate public opinion on an individual survey question (e.g., abortion) using
-a dynamic multi-level regression and post-stratification (MRP) model. In this
-case, we use `dgmrp` to model abortion attitudes.
-
-Under the hood, these functions use RStan for MCMC sampling, and arguments can
-be passed to RStan's `stan` via the `...` argument of `dgirt` and `dgmrp`. This
-will almost always be desirable, at a minimum to specify the number of sampler
-iterations, chains, and cores.
-
-```{r, warning = FALSE, message = FALSE, results = 'hide'}
-dgmrp_out_abortion <- dgmrp(dgirt_in_abortion, iter = 1500, chains = 4, cores =
-  4, seed = 42)
-```
-
-The model results are held in a `dgirtfit` object. Methods from RStan like
-`extract` are available if needed because `dgirtfit` is a subclass of `stanfit`.
-But dgo provides its own methods for typical post-estimation tasks.
-
-### Work with `dgirt` or `dgmrp` results
-
-For a high-level summary of the result, use `summary`.
-
-```{r}
-summary(dgmrp_out_abortion)
-```
-
-To summarize posterior samples, use `summarize`. The default output gives
-summary statistics for the `theta_bar` parameters, which represent the mean of
-the latent outcome for the groups defined by time, local geographic area, and
-the demographic characteristics specified in the earlier call to `shape`.
-
-```{r}
-head(summarize(dgmrp_out_abortion))
-```
-
-Alternatively, `summarize` can apply arbitrary functions to posterior samples
-for whatever parameter is given by its `pars` argument. Enclose function names
-with quotes. For convenience, `"q_025"` and `"q_975"` give the 2.5th and 97.5th
-posterior quantiles.
-
-```{r}
-summarize(dgmrp_out_abortion, pars = "xi", funs = "var")
-```
-
-To access posterior samples in tabular form use `as.data.frame`. By default,
-this method returns post-warmup samples for the `theta_bar` parameters, but like
-other methods takes a `pars` argument.
-
-```{r}
-head(as.data.frame(dgmrp_out_abortion))
-```
-
-To poststratify the results use `poststratify`. The following example uses the
-group population proportions bundled as `annual_state_race_targets` to reweight
-and aggregate estimates to strata defined by state-years.
-
-Read `help("poststratify")` for more details.
-
-```{r}
-poststratify(dgmrp_out_abortion, annual_state_race_targets, strata_names =
-  c("state", "year"), aggregated_names = "race3")
-```
-
-To plot the results use `dgirt_plot`. This method plots summaries of posterior
-samples by time period. By default, it shows a 95% credible interval around
-posterior medians for the `theta_bar` parameters, for each local geographic
-area. For this (unconverged) toy example we omit the CIs.
-
-```{r dgmrp_plot, fig.show = 'hide'}
-dgirt_plot(dgmrp_out_abortion, y_min = NULL, y_max = NULL)
-```
-
-![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgmrp_plot-1.png)
-
-Output from `dgirt_plot` can be customized to some extent using objects from the
-ggplot2 package.
-
-```{r dgmrp_plot_plus, fig.show = 'hide'}
-dgirt_plot(dgmrp_out_abortion, y_min = NULL, y_max = NULL) + theme_classic()
-```
-
-![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgmrp_plot_plus-1.png)
-
-`dgirt_plot` can also plot the `data.frame` output from `poststratify`. This
-requires arguments that identify the relevant variables in the `data.frame`.
-Below, `poststratify` aggregates over the demographic grouping variable `race3`,
-resulting in a `data.frame` of estimates by state-year. So, in the subsequent
-call to `dgirt_plot`, we pass the names of the state and year variables. The
-`group_names` argument is `NULL` because there are no grouping variables left
-after aggregating over `race3`.
-
-```{r dgmrp_plot_ps, fig.show = 'hide'}
-ps <- poststratify(dgmrp_out_abortion, annual_state_race_targets, strata_names =
-  c("state", "year"), aggregated_names = "race3")
-head(ps)
-dgirt_plot(ps, group_names = NULL, time_name = "year", geo_name = "state")
-```
-
-![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgmrp_plot_ps-1.png)
-
-## Policy Liberalism
-
-### Prepare input data with `shape`
-
-```{r}
-dgirt_in_liberalism <- shape(opinion, item_names = c("abortion",
-    "affirmative_action","stemcell_research" , "gaymarriage_amendment",
-    "partialbirth_abortion") , time_name = "year", geo_name = "state",
-  group_names = "race3", geo_filter = c("CA", "GA", "LA", "MA"))
-```
-
-The reshaped and subsetted data can be summarized in a few ways before model
-fitting.
-
-```{r}
-summary(dgirt_in_liberalism)
-```
-
-Response counts by item-year:
-
-```{r}
-get_item_n(dgirt_in_liberalism, by = "year")
-```
-
-### Fit a model with `dgirt`
-
-`dgirt` and `dgmrp` fit estimation models to data from `shape`. `dgirt` can be
-used to estimate a latent variable based on responses to multiple survey
-questions (e.g., latent policy conservatism), while `dgmrp` can be used to
-estimate public opinion on an individual survey question using a dynamic
-multi-level regression and post-stratification (MRP) model.  
-
-Under the hood, these functions use RStan for MCMC sampling, and arguments can
-be passed to RStan's `stan` via the `...` argument of `dgirt` and `dgmrp`. This
-will almost always be desirable, at a minimum to specify the number of sampler
-iterations, chains, and cores.
-
-```{r, warning = FALSE, message = FALSE, results = 'hide'}
-dgirt_out_liberalism <- dgirt(dgirt_in_liberalism, iter = 3000, chains = 4,
-  cores = 4, seed = 42)
-```
-
-The model results are held in a `dgirtfit` object. Methods from RStan like
-`extract` are available if needed because `dgirtfit` is a subclass of `stanfit`.
-But dgo provides its own methods for typical post-estimation tasks.
-
-### Work with `dgirt` results
-
-For a high-level summary of the result, use `summary`.
-
-```{r}
-summary(dgirt_out_liberalism)
-```
-
-To summarize posterior samples, use `summarize`. The default output gives
-summary statistics for the `theta_bar` parameters, which represent the mean of
-the latent outcome for the groups defined by time, local geographic area, and
-the demographic characteristics specified in the earlier call to `shape`.
-
-```{r}
-head(summarize(dgirt_out_liberalism))
-```
-
-Alternatively, `summarize` can apply arbitrary functions to posterior samples
-for whatever parameter is given by its `pars` argument. Enclose function names
-with quotes. For convenience, `"q_025"` and `"q_975"` give the 2.5th and 97.5th
-posterior quantiles.
-
-```{r}
-summarize(dgirt_out_liberalism, pars = "xi", funs = "var")
-```
-
-To access posterior samples in tabular form use `as.data.frame`. By default,
-this method returns post-warmup samples for the `theta_bar` parameters, but like
-other methods takes a `pars` argument.
-
-```{r}
-head(as.data.frame(dgirt_out_liberalism))
-```
-
-To poststratify the results use `poststratify`. The following example uses the
-group population proportions bundled as `annual_state_race_targets` to reweight and aggregate
-estimates to strata defined by state-years. Read `help("poststratify")` for more
-details.
-
-```{r}
-poststratify(dgirt_out_liberalism, annual_state_race_targets, strata_names = c("state",
-    "year"), aggregated_names = "race3")
-```
-
-To plot the results use `dgirt_plot`. This method plots summaries of posterior
-samples by time period. By default, it shows a 95% credible interval around
-posterior medians for the `theta_bar` parameters, for each local geographic
-area. For this (unconverged) toy example we omit the CIs.
-
-```{r dgirt_plot, fig.show = 'hide'}
-dgirt_plot(dgirt_out_liberalism, y_min = NULL, y_max = NULL)
-```
-
-![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgirt_plot-1.png)
-
-`dgirt_plot` can also plot the `data.frame` output from `poststratify`. This
-requires arguments that identify the relevant variables in the `data.frame`.
-Below, `poststratify` aggregates over the demographic grouping variable `race3`,
-resulting in a `data.frame` of estimates by state-year. So, in the subsequent
-call to `dgirt_plot`, we pass the names of the state and year variables. The
-`group_names` argument is `NULL` because there are no grouping variables left
-after aggregating over `race3`.
+The minimal workflow from raw data to estimation is:
 
-```{r dgirt_plot_ps, fig.show = 'hide'}
-ps <- poststratify(dgirt_out_liberalism, annual_state_race_targets, strata_names = c("state",
-    "year"), aggregated_names = "race3")
-head(ps)
-dgirt_plot(ps, group_names = NULL, time_name = "year", geo_name = "state")
-```
+1.  shape input data using the `shape()` function; and
+2.  pass the result to the `dgirt()` function to estimate a latent trait (e.g.,
+    conservatism) or `dgmrp()` function to estimate opinion on a single survey
+    question.
 
-![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgirt_plot_ps-1.png)
+See the [package site](https://jdunham.io/dgo) for worked examples. 
 
-## Troubleshooting
+# Troubleshooting
 
 Please [report issues](https://github.com/jamesdunham/dgo/issues) that you
 encounter.
 
   * OS X only: RStan creates temporary files during estimation in a location
-    given by `tempdir`, typically an arbitrary location in `/var/folders`. If a
-    model runs for days, these files can be cleaned up while still needed, which
-    induces an error. A good solution is to set a safer path for temporary
+    given by `tempdir()`, typically an arbitrary location in `/var/folders`. If
+    a model runs for days, these files can be cleaned up while still needed,
+    which induces an error. A good solution is to set a safer path for temporary
     files, using an environment variable checked at session startup. For help
     setting environment variables, see the Stack Overflow question
     [here](https://stackoverflow.com/questions/17107206/change-temporary-directory).
@@ -363,7 +99,7 @@ encounter.
 
   * Models fitted before October 2016 (specifically <
     [#8e6a2cf](https://github.com/jamesdunham/dgo/commit/8e6a2cfbe00b2cd4a908b3067241e06124d143cd))
-    using dgirtfit are not fully compatible with dgo. Their contents can be
+    using dgirt are not fully compatible with dgo. Their contents can be
     extracted without using dgo, however, with the `$` indexing operator. For
     example: `as.data.frame(dgirtfit_object$stan.cmb)`.
 
@@ -372,13 +108,11 @@ encounter.
     compilation. These are safe to ignore, or can be suppressed by following the
     linked instructions.
 
-## Contributing and citing
+# Contributing and citing
 
 dgo is under development and we welcome
 [suggestions](https://github.com/jamesdunham/dgo/issues).
-
 The package citation is 
 
 > Dunham, James, Devin Caughey, and Christopher Warshaw. 2017. dgo: Dynamic
-> Estimation of Group-level Opinion. R package.
-> https://jamesdunham.github.io/dgo/.
+> Estimation of Group-level Opinion. R package. https://jdunham.io/dgo/.