pyears {survival}  R Documentation 
This function computes the personyears of followup time contributed by a cohort of subjects, stratified into subgroups. It also computes the number of subjects who contribute to each cell of the output table, and optionally the number of events and/or expected number of events in each cell.
pyears(formula, data, weights, subset, na.action, rmap, ratetable, scale=365.25, expect=c('event', 'pyears'), model=FALSE, x=FALSE, y=FALSE, data.frame=FALSE)
formula 
a formula object.
The response variable will be a vector of followup times
for each subject, or a 
data 
a data frame in which to interpret the variables named in
the 
weights 
case weights. 
subset 
expression saying that only a subset of the rows of the data should be used in the fit. 
na.action 
a missingdata filter function, applied to the model.frame, after any

rmap 
an optional list that maps data set names to the ratetable names. See the details section below. 
ratetable 
a table of event rates, such as 
scale 
a scaling for the results. As most rate tables are in units/day, the default value of 365.25 causes the output to be reported in years. 
expect 
should the output table include the expected number of events, or the expected number of personyears of observation. This is only valid with a rate table. 
data.frame 
return a data frame rather than a set of arrays. 
model, x, y 
If any of these is true, then the model frame, the model matrix, and/or the vector of response times will be returned as components of the final result. 
Because pyears
may have several time variables, it is necessary that all
of them be in the same units. For instance, in the call
py < pyears(futime ~ rx, rmap=list(age=age, sex=sex, year=entry.dt), ratetable=survexp.us)
the natural unit of the ratetable is hazard per day, it is important that
futime
,
age
and entry.dt
all be in days.
Given the wide range of possible inputs,
it is difficult for the routine to do sanity checks of this aspect.
The ratetable being used may have different variable names than the user's
data set, this is dealt with by the rmap
argument.
The rate table for the above calculation was survexp.us
, a call to
summary{survexp.us}
reveals that it expects to have variables
age
= age in days, sex
, and year
= the date of study
entry, we create them in the rmap
line. The sex variable is not
mapped, therefore the code assumes that it exists in mydata
in the
correct format. (Note: for factors such as sex, the program will match on
any unique abbreviation, ignoring case.)
A special function tcut
is needed to specify timedependent cutpoints.
For instance, assume that age is in years, and that the desired final
arrays have as one of their margins the age groups 02, 210, 1025, and 25+.
A subject who enters the study at age 4 and remains under observation for
10 years will contribute followup time to both the 210 and 1025
subsets. If cut(age, c(0,2,10,25,100))
were used in the formula, the
subject would be classified according to his starting age only.
The tcut
function has the same arguments as cut
,
but produces a different output object which allows the pyears
function
to correctly track the subject.
The results of pyears
are normally used as input to further calculations.
The print
routine, therefore, is designed to give only a summary of the
table.
a list with components:
pyears 
an array containing the personyears of exposure. (Or other units, depending on the rate table and the scale). The dimension and dimnames of the array correspond to the variables on the right hand side of the model equation. 
n 
an array containing the number of subjects who contribute time to each cell
of the 
event 
an array containing the observed number of events. This will be present only
if the response variable is a 
expected 
an array containing the expected number of events (or person years if

data 
if the 
offtable 
the number of personyears of exposure in the cohort that was not part of
any cell in the 
tcut 
whether the call included any timedependent cutpoints. 
summary 
a summary of the ratetable matching. This is also useful as an error check. 
call 
an image of the call to the function. 
observations 
the number of observations in the input data set, after any missings were removed. 
na.action 
the 
# Look at progression rates jointly by calendar date and age # temp.yr < tcut(mgus$dxyr, 55:92, labels=as.character(55:91)) temp.age < tcut(mgus$age, 34:101, labels=as.character(34:100)) ptime < ifelse(is.na(mgus$pctime), mgus$futime, mgus$pctime) pstat < ifelse(is.na(mgus$pctime), 0, 1) pfit < pyears(Surv(ptime/365.25, pstat) ~ temp.yr + temp.age + sex, mgus, data.frame=TRUE) # Turn the factor back into numerics for regression tdata < pfit$data tdata$age < as.numeric(as.character(tdata$temp.age)) tdata$year< as.numeric(as.character(tdata$temp.yr)) fit1 < glm(event ~ year + age+ sex +offset(log(pyears)), data=tdata, family=poisson) ## Not run: # fit a gam model gfit.m < gam(y ~ s(age) + s(year) + offset(log(time)), family = poisson, data = tdata) ## End(Not run) # Example #2 Create the hearta data frame: hearta < by(heart, heart$id, function(x)x[x$stop == max(x$stop),]) hearta < do.call("rbind", hearta) # Produce pyears table of death rates on the surgical arm # The first is by age at randomization, the second by current age fit1 < pyears(Surv(stop/365.25, event) ~ cut(age + 48, c(0,50,60,70,100)) + surgery, data = hearta, scale = 1) fit2 < pyears(Surv(stop/365.25, event) ~ tcut(age + 48, c(0,50,60,70,100)) + surgery, data = hearta, scale = 1) fit1$event/fit1$pyears #death rates on the surgery and nonsurg arm fit2$event/fit2$pyears #death rates on the surgery and nonsurg arm