sweepystats package#

Submodules#

sweepystats.linreg module#

class sweepystats.linreg.LinearRegression(X, y, weights=None)#

Bases: object

A class to perform linear regression based on the sweep operation.

Parameters:

Xarray-like: Design matrix of shape (n, p)
yarray-like: Response vector of shape (n,)
weightsarray-like, optional: Weight vector of shape (n,). If provided, performs weighted least squares. Weights should be non-negative. If None (default), performs ordinary least squares.

R2()#: Computes the R² (coefficient of determination) of fit. For weighted least squares, uses weighted statistics.

coef()#: Fitted coefficient values (beta hat). Only returns the beta for variables that have been swept in.

coef_std()#: Standard deviation of the fitted coefficient values

cov()#: Estimated variance-covariance of beta hat, i.e. Var(b) = sigma2 * inv(X’X)

exclude_k(k, force=False)#: Exclude the `k`th variable in regression

f_test(k)#

Tests whether the `k`th variable is significant by performing an F-test. The model must already be fitted.

Returns: + f_stat: The F-statistic + pval: The associated p-value

fit(verbose=True)#: Perform least squares fitting by sweeping in all variables.

include_k(k, force=False)#: Include the `k`th variable in regression

is_fitted()#

resid()#: Estimate of residuals = ||y - yhat||^2

sigma2()#: Estimate of sigma square. For weighted least squares, returns the weighted variance estimate.

sweepystats.sweep_matrix module#

class sweepystats.sweep_matrix.SweepMatrix(A, storage=None)#

Bases: object

Thin wrapper over a numpy array. The original array will not be copied if it is a double-precision 2D array stored in column-major (Fortran-style).

det(restore=True, verbose=True)#: Computes the determinant by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property dtype#

isposdef(restore=True, verbose=True, tol=1e-12)#: Checks whether the matrix is positive definite by checking if A[k, k] > tol (note: strict inequality) for each k before being swept. If restore=True (default), then the original matrix is untouched.

property ndim#

rank(restore=True, verbose=True, tol=1e-12)#: Computes matrix rank by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property shape#

property size#

sweep(inv=False, verbose=True, symmetrize=True, tol=1e-12)#: Sweeps the entire matrix. If inv=True, we perform the inverse sweep on the kth row/col. If symmetrize=False, then only the upper-triangle is read/swept. A progress bar is displayed unless verbose=False.

sweep_k(k, inv=False, symmetrize=True, tol=1e-12)#

Sweeps on the kth row/column, returns A[k, k] before it is swept.

If inv=True, then the inverse-sweep is performed. If symmetrize = False, then only the upper-triangular matrix is touched. tol is the smallest diagonal element that is treated as numerically 0.

Module contents#

class sweepystats.ANOVA(df, formula)#

Bases: object

A class to perform (k-way) ANOVA based on the sweep operation.

Parameters: + df: A pandas dataframe containing the covariates and outcome. + formula: A formula string to define the model, e.g.

‘y ~ Group + Factor + Group:Factor’.

f_test(variable)#

Tests whether variable in self.formula is significant by performing an F-test. The model must already be fitted.

Returns: + f_stat: The F-statistic + pval: The associated p-value

fit(verbose=True)#: Fit ANOVA model by sweep operation

sum_sq()#: Computes sum of squared error for all variables that are currently swept in

class sweepystats.LinearRegression(X, y, weights=None)#

Bases: object

A class to perform linear regression based on the sweep operation.

Parameters:

Xarray-like: Design matrix of shape (n, p)
yarray-like: Response vector of shape (n,)
weightsarray-like, optional: Weight vector of shape (n,). If provided, performs weighted least squares. Weights should be non-negative. If None (default), performs ordinary least squares.

R2()#: Computes the R² (coefficient of determination) of fit. For weighted least squares, uses weighted statistics.

coef()#: Fitted coefficient values (beta hat). Only returns the beta for variables that have been swept in.

coef_std()#: Standard deviation of the fitted coefficient values

cov()#: Estimated variance-covariance of beta hat, i.e. Var(b) = sigma2 * inv(X’X)

exclude_k(k, force=False)#: Exclude the `k`th variable in regression

f_test(k)#

Tests whether the `k`th variable is significant by performing an F-test. The model must already be fitted.

Returns: + f_stat: The F-statistic + pval: The associated p-value

fit(verbose=True)#: Perform least squares fitting by sweeping in all variables.

include_k(k, force=False)#: Include the `k`th variable in regression

is_fitted()#

resid()#: Estimate of residuals = ||y - yhat||^2

sigma2()#: Estimate of sigma square. For weighted least squares, returns the weighted variance estimate.

class sweepystats.Normal(mu, sigma)#

Bases: object

A class that computes the density and conditional distributions of the multivariate Gaussian using the sweep operation.

cond_mean(y, yidx)#: Computes the conditional expectation E(Z | Y = y) where (Y, Z) is assumed to be jointly Gaussian with mean mu and cov sigma. The vector yidx indicates the indices of the observed y.

cond_var(y, yidx)#: Computes the conditional variance Var(Z | Y = y) where (Y, Z) is assumed to be jointly Gaussian with mean mu and cov sigma. The vector yidx indicates the indices of the observed y.

loglikelihood(x, verbose=True)#: Evaluates the loglikelihood of obsering X=x.

class sweepystats.SweepMatrix(A, storage=None)#

Bases: object

Thin wrapper over a numpy array. The original array will not be copied if it is a double-precision 2D array stored in column-major (Fortran-style).

det(restore=True, verbose=True)#: Computes the determinant by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property dtype#

isposdef(restore=True, verbose=True, tol=1e-12)#: Checks whether the matrix is positive definite by checking if A[k, k] > tol (note: strict inequality) for each k before being swept. If restore=True (default), then the original matrix is untouched.

property ndim#

rank(restore=True, verbose=True, tol=1e-12)#: Computes matrix rank by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property shape#

property size#

sweep(inv=False, verbose=True, symmetrize=True, tol=1e-12)#: Sweeps the entire matrix. If inv=True, we perform the inverse sweep on the kth row/col. If symmetrize=False, then only the upper-triangle is read/swept. A progress bar is displayed unless verbose=False.

sweep_k(k, inv=False, symmetrize=True, tol=1e-12)#

Sweeps on the kth row/column, returns A[k, k] before it is swept.

If inv=True, then the inverse-sweep is performed. If symmetrize = False, then only the upper-triangular matrix is touched. tol is the smallest diagonal element that is treated as numerically 0.