Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Misspecified poisson regression models for large-scale registry data : inference for 'large n and small p'. / Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.

In: Statistics in Medicine, Vol. 35, No. 7, 30.03.2016, p. 1117-1129.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Grøn, R, Gerds, TA & Andersen, PK 2016, 'Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'', Statistics in Medicine, vol. 35, no. 7, pp. 1117-1129. https://doi.org/10.1002/sim.6755

APA

Grøn, R., Gerds, T. A., & Andersen, P. K. (2016). Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'. Statistics in Medicine, 35(7), 1117-1129. https://doi.org/10.1002/sim.6755

Vancouver

Grøn R, Gerds TA, Andersen PK. Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'. Statistics in Medicine. 2016 Mar 30;35(7):1117-1129. https://doi.org/10.1002/sim.6755

Author

Grøn, Randi ; Gerds, Thomas A. ; Andersen, Per K. / Misspecified poisson regression models for large-scale registry data : inference for 'large n and small p'. In: Statistics in Medicine. 2016 ; Vol. 35, No. 7. pp. 1117-1129.

Bibtex

@article{fce7e4685cf540be8bdc8d07af4b02dc,
title = "Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'",
abstract = "Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population.",
author = "Randi Gr{\o}n and Gerds, {Thomas A.} and Andersen, {Per K.}",
note = "Copyright {\textcopyright} 2015 John Wiley & Sons, Ltd.",
year = "2016",
month = mar,
day = "30",
doi = "10.1002/sim.6755",
language = "English",
volume = "35",
pages = "1117--1129",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "JohnWiley & Sons Ltd",
number = "7",

}

RIS

TY - JOUR

T1 - Misspecified poisson regression models for large-scale registry data

T2 - inference for 'large n and small p'

AU - Grøn, Randi

AU - Gerds, Thomas A.

AU - Andersen, Per K.

N1 - Copyright © 2015 John Wiley & Sons, Ltd.

PY - 2016/3/30

Y1 - 2016/3/30

N2 - Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population.

AB - Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population.

U2 - 10.1002/sim.6755

DO - 10.1002/sim.6755

M3 - Journal article

C2 - 26423319

VL - 35

SP - 1117

EP - 1129

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 7

ER -

ID: 157491044