\documentclass{article}
\usepackage{natbib}
\usepackage{graphics}
\usepackage{amsmath,amssymb}
\usepackage{indentfirst}
\usepackage[utf8]{inputenc}
\usepackage[tableposition=top]{caption}
\usepackage{url}
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\cov}{cov}
\DeclareMathOperator{\E}{E}
\newcommand{\inner}[1]{\langle #1 \rangle}
% \VignetteIndexEntry{MCMC Morph Example}
\begin{document}
<>=
options(keep.source = TRUE, width = 60)
foo <- packageDescription("mcmc")
@
\title{Morphometric MCMC (mcmc Package Ver.~\Sexpr{foo$Version})}
% $ (Just to make emacs syntax highlighting work properly)
\author{Leif T. Johnson \and Charles J. Geyer}
\maketitle
\section{Overview}
This is an example how to use morphometric Markov chains as implemented in
the \verb@mcmc@ package in R.
Let $X$ be an $\mathbb{R}^k$ valued random variable with probability density
function, $f_X$. Let $g$ be a diffeomorphism, and $Y=g(X)$. Then the
probability density function of $Y$, $f_Y$ is given by
\begin{equation}\label{eq:def-fy}
f_Y(y) = f_X\bigl(g^{-1}(y)\bigr) \det\bigl( \nabla g^{-1}(y) \bigr).
\end{equation}
Since $g$ is a diffeomorphism, we can draw inference about $X$ from information
about $Y$ (and vice versa).
It is not unusual for $f_X$ to either be known only up to a normalizing
constant, or to be analytically intractable in other ways --- such as
being high dimensional.
A common solution to this problem is to use Markov chain
Monte Carlo (MCMC) methods to learn about $f_X$.
When using MCMC, a primary concern of the practitioner should be the question
``Does the Markov chain converge fast enough to be useful?'' One very useful
convergence rate is called \emph{geometrically ergodic}
\citep[Chapter~1]{johnson-thesis}.
The \texttt{mcmc} package implements the Metropolis random-walk algorithm for
arbitrary log unnormalized probability densities. But the Metropolis
random-walk algorithm does not always perform well. As is demonstrated in
\citet{johnson-geyer}, for $f_X$ and $f_Y$ related by diffeomorphism as in
\eqref{eq:def-fy}, a Metropolis random-walk for $f_Y$ can be geometrically
ergodic
even though a Metropolis random-walk for $f_X$ is not.
Since the transformation is
one-to-one, inference about $f_X$ can be drawn from the Markov chain for $f_Y$.
The \texttt{morph.metrop} and \texttt{morph} functions in the \texttt{mcmc}
package provide this functionality, and this vignette gives a demonstration
on how to use them.
\section{T Distribution} \label{sec:toy}
We start with a univariate example, which is a Student $t$ distribution
with three degrees of freedom.
Of course, one doesn't need MCMC to simulate this distribution
(the R function \texttt{rt} does that), so this is just a toy problem.
But it does illustrate some aspects of using variable transformation.
A necessary condition for geometric ergodicity of a random-walk Metropolis
algorithm is that the target density $\pi$ have a moment generating function
\citep{jarner-tweedie}.
For a univariate target density, which we have in this section,
a sufficient condition for geometric ergodicity of a random-walk Metropolis
algorithm is that the target density $\pi$ be exponentially light
\citet{mengersen-tweedie}.
Thus if we do not use variable transformation,
the Markov chain simulated by the \texttt{metrop} function will not
be geometrically ergodic.
\citet[Example 4.2]{johnson-geyer} show that a $t$ distribution is
sub-exponentially light. Hence using the transformations
described in their Corollaries~1 and~2 will induce a target density
$\pi_\gamma$ for which a Metropolis random-walk will be geometrically
ergodic.
using the transformation described as $h_2$ in
\citet[Corollary~2]{johnson-geyer} will induce a target density for which a
Metropolis random-walk will be geometrically ergodic.
Passing a positive value for \texttt{b} to \texttt{morph} function will
create the aforementioned transformation, $h_2$. It's as simple as
<<>>=
library(mcmc)
h2 <- morph(b=1)
@
We can now see the induced density. Note that \texttt{morph} works for
log unnormalized densities, so we need exponentiate the induced density to
plot it on the usual scale.
<<>>=
lud <- function(x) dt(x, df=3, log=TRUE)
lud.induced <- h2$lud(lud)
@
We can plot the two densities,
<>=
curve(exp(Vectorize(lud.induced)(x)), from = -3, to = 3, lty = 2,
xlab = "t", ylab = "density")
curve(exp(lud(x)), add = TRUE)
legend("topright", c("t density", "induced density"), lty=1:2)
@
The \texttt{Vectorize} in this example is necessary because
the function \texttt{lud.induced} is not vectorized.
Instead, it treats any vector passed as a single input, which
is rescaled (using the specified diffeomorphism) and passed to
\texttt{lud}. Compare the behavior of \texttt{lud} and
\texttt{lud.induced} in the following example.
<<>>=
lud(1:4)
lud(1)
foo <- try(lud.induced(1:4))
class(foo)
cat(foo, "\n")
lud.induced(1)
@
Because the function \texttt{dt} is vectorized, the function \texttt{lud}
is also vectorized, mapping vectors to vectors,
whereas the function \texttt{lud.induced} is not vectorized,
mapping vectors to scalars.
Before we start using random numbers, we set the seed of the random number
generator so this document always produces the same results.
<>=
set.seed(42)
@
To change the results, change the seed or delete the \texttt{set.seed}
statement.
Running a Markov chain for the induced density is done with
\texttt{morph.metrop}.
<<>>=
out <- morph.metrop(lud, 0, blen=100, nbatch=100, morph=morph(b=1))
@
The content of \texttt{out\$batch} is on the scale of used by
\texttt{lud}. Once the transformation has been set, no adjustment is
needed (unless you want to change transformations). We start by adjusting
the scale.
<<>>=
# adjust scale to find a roughly 20% acceptance rate
out$accept
@
An acceptance rate of \Sexpr{round(100 * out$accept, 1)}\%
%$ to fix emacs highlighting
is probably too high. By increasing the scale of the proposal distribution
we can bring it down towards 20\%.
<<>>=
out <- morph.metrop(out, scale=4)
out$accept
@
We now use this Markov chain to estimate the expectation of the target
distribution.
But first we need to check whether our batch length is good.
The following code
<