Skip to contents

Linear discriminant analysis (LDA) using covariance matrices projected via the gips framework to enforce permutation symmetry and improve numerical stability.

Usage

gipslda(x, ...)

# S3 method for class 'formula'
gipslda(formula, data, ..., subset, na.action)

# Default S3 method
gipslda(x, grouping, prior = proportions,
  tol = 1e-4, weighted_avg = FALSE,
  MAP = TRUE, optimizer = NULL, max_iter = NULL, ...)

# S3 method for class 'data.frame'
gipslda(x, ...)

# S3 method for class 'matrix'
gipslda(x, grouping, ..., subset, na.action)

Arguments

x

(required if no formula is given as the principal argument) a matrix or data frame or Matrix containing the explanatory variables.

...

Arguments passed to or from other methods.

formula

A formula of the form groups ~ x1 + x2 + .... The response is the grouping factor and the right-hand side specifies the (non-factor) discriminators.

data

An optional data frame, list or environment from which variables specified in formula are preferentially taken.

grouping

(required if no formula principal argument is given) a factor specifying the class for each observation.

prior

The prior probabilities of class membership. If unspecified, the class proportions for the training set are used.

tol

A tolerance to decide if a matrix is singular; variables whose variance is less than tol^2 are rejected.

subset

An index vector specifying the cases to be used in the training sample. (NOTE: must be named.)

na.action

A function specifying the action for NAs. #' @param weighted_avg Logical; if TRUE, uses a weighted average of class-specific covariance matrices instead of the pooled covariance.

MAP

Logical; whether to compute a Maximum A Posteriori gips projection of the covariance matrix.

optimizer

Character; optimization method used by gips (e.g. "BF" or "MH").

max_iter

Maximum number of iterations for the optimizer.

weighted_avg

Logical; Whether to compute scatter from all classes at once or to compute them within classes and compute the main one as average weighted by class proportions.

Value

An object of class "gipslda" containing:

  • prior: prior class probabilities

  • counts: number of observations per class

  • means: group means

  • scaling: linear discriminant coefficients

  • svd: singular values of the between-class scatter

  • N: number of observations

  • optimization_info: information about the gips optimization

  • call: matched call

Details

This function is a minor modification of lda, replacing the classical sample covariance estimators by projected covariance matrices obtained using project_covs().

Unlike classical LDA, the within-class covariance matrix is first projected onto a permutation-invariant structure using the gips framework. This can stabilize covariance estimation in high dimensions or when symmetry assumptions are justified.

The choice of optimizer and MAP estimation affects both the covariance estimate and the resulting discriminant directions.

See Chojecki et al. (2025) for theoretical background.

Note

This function is inspired by lda but is not a drop-in replacement. The covariance estimator, optimization procedure, and returned object differ substantially.

References

Chojecki, A., et al. (2025). Learning Permutation Symmetry of a Gaussian Vector with gips in R. Journal of Statistical Software, 112(7), 1–38. doi:10.18637/jss.v112.i07

See also

Examples

Iris <- data.frame(rbind(iris3[, , 1], iris3[, , 2], iris3[, , 3]),
  Sp = rep(c("s", "c", "v"), rep(50, 3))
)
train <- sample(1:150, 75)
z <- gipslda(Sp ~ ., Iris, prior = c(1, 1, 1) / 3, subset = train)
predict(z, Iris[-train, ])$class
#>  [1] s s s s s s s s s s s s s s s s s s s s s s s s s c c c c c c c c c c c c c
#> [39] c c c v c c c c c c v v v v v v v v v v v v v v v c v v v v v v v v v v v
#> Levels: c s v
(z1 <- update(z, . ~ . - Petal.W.))
#> Call:
#> gipslda(Sp ~ Sepal.L. + Sepal.W. + Petal.L., data = Iris, prior = c(1, 
#>     1, 1)/3, subset = train)
#> 
#> Prior probabilities of groups:
#>         c         s         v 
#> 0.3333333 0.3333333 0.3333333 
#> 
#> Group means:
#>   Sepal.L. Sepal.W. Petal.L.
#> c 5.885185 2.766667 4.166667
#> s 5.020000 3.476000 1.452000
#> v 6.630435 2.952174 5.508696
#> 
#> Coefficients of linear discriminants:
#>                 LD1       LD2
#> Sepal.L.  0.5096716 -1.145938
#> Sepal.W.  1.0822205  3.458572
#> Petal.L. -2.5577491  0.969699
#> 
#> Proportion of trace:
#>    LD1    LD2 
#> 0.9884 0.0116 
#> 
#> Permutations with their estimated probabilities:
#> [1] (23)