Linear and Generalized Linear Models
Simulation Studies
Linear Models
Generalized Linear Models
Nonparameteric Models
Simulation Studies
Linear Models
Generalized Linear Models
Nonparameteric Models
Simulation Studies
Linear Models
Generalized Linear Models
Nonparameteric Models
An exponential family of distributions are random variables that allow their probability density function to have the following form:
\[ f(y; \theta,\phi) = a(y,\phi)\exp\left\{\frac{y\theta-\kappa(\theta)}{\phi}\right\} \]
\(\theta\): is the canonical parameter (also a function of other parameters)
\(\kappa(\theta)\): is a known cumulant function
\(\phi>0\): dispersion parameter function
\(a(y,\phi)\): normalizing constant
The canonical parameter represents the relationship between the random variable and the \(E(Y)=\mu\)
\[ f(y;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(y-\mu)^2}{2\sigma^2}} \]
\[ f(y;n,p)=\big(^n_y\big) p^y(1-p)^{n-y} \]
\[ f(y;\lambda) = \frac{e^{-\lambda}\lambda^y}{y!} \]
\[ f(y;\mu, \theta) = \left(\begin{array}{c} y+\theta+1\\ y \end{array}\right) \left(\frac{\mu}{\mu+\theta}\right)^y \left(\frac{\theta}{\mu+\theta}\right)^\theta \]
Random Variable | Canonical Parameter |
---|---|
Normal | \(\mu\) |
Binomial | \(\log\left(\frac{\mu}{1-\mu}\right)\) |
Negative Binomial | \(\log\left(\frac{\mu}{\mu+k}\right)\) |
Poisson | \(\log(\mu)\) |
Gamma | \(-\frac{1}{\mu}\) |
Inverse Gaussian | \(-\frac{1}{2\mu^2}\) |
A generalized linear model (GLM) is used to model the association between an outcome variable (of any data type) and a set of predictor values. We estimate a set of regression coefficients \(\boldsymbol \beta\) to explain how each predictor is related to the expected value of the outcome.
A GLM is composed of a systematic and random component.
The random component is the random variable that defines the randomness and variation of the outcome variable.
The systematic component is the linear model that models the association between a set of predictors and the expected value of Y:
\[ g(\mu)=\eta=\boldsymbol X_i^\mathrm T \boldsymbol \beta \]
\(\boldsymbol\beta\): regression coefficients
\(\boldsymbol X_i=(1, X_{i1}, \ldots, X_{ip})^\mathrm T\): design vector
\(\eta\): linear model
\(\mu=E(Y)\)
\(g(\cdot)\): link function
Logistic Regression is used when your outcome is binary:
Poisson Regression is used when the outcome is count data:
Gamma Regression is used when modeling the association between predictors and positive continuous values:
Negative Binomial Regression is used four with overdispersed count data, where the variance is larger than expected.
Simulation Studies
Linear Models
Generalized Linear Models
Nonparameteric Models
\[ Y = f(X) + \varepsilon \]
\[ \hat Y = \hat f(X) \]
x1 <- rnorm(1000, 2)
x2 <- rnorm(1000, -4)
y <- sinpi(x1/2) + cospi(x2/2) + rnorm(1000, sd = 0.5)
df <- tibble(x1, x2, y)
xgam <- gam(y ~ rcs(x1,10) + rcs(x2, 10))
xgam$coefficients
df1 <- tibble(x = x1,
y = sinpi(x1/2),
pred = rcs(x1,10) %*% xgam$coefficients[2:10])
df2 <- tibble(x = x2,
y = cospi(x2/2),
pred = rcs(x2,10) %*% xgam$coefficients[11:19])
df1 |> ggplot(aes(x, y)) +
geom_point() +
geom_line(aes(x, pred), col = "red")
df2 |> ggplot(aes(x, y)) +
geom_point() +
geom_line(aes(x, pred), col = "red")
plot(xgam)