sweepystats package#

Submodules#

sweepystats.linreg module#

class sweepystats.linreg.LinearRegression(X, y, weights=None)#

Bases: object

A class to perform linear regression based on the sweep operation.

Parameters:
Xarray-like

Design matrix of shape (n, p)

yarray-like

Response vector of shape (n,)

weightsarray-like, optional

Weight vector of shape (n,). If provided, performs weighted least squares. Weights should be non-negative. If None (default), performs ordinary least squares.

R2()#

Computes the R² (coefficient of determination) of fit. For weighted least squares, uses weighted statistics.

coef()#

Fitted coefficient values (beta hat). Only returns the beta for variables that have been swept in.

coef_std()#

Standard deviation of the fitted coefficient values

cov()#

Estimated variance-covariance of beta hat, i.e. Var(b) = sigma2 * inv(X’X)

exclude_k(k, force=False)#

Exclude the `k`th variable in regression

f_test(k)#

Tests whether the `k`th variable is significant by performing an F-test. The model must already be fitted.

Returns: + f_stat: The F-statistic + pval: The associated p-value

fit(verbose=True)#

Perform least squares fitting by sweeping in all variables.

include_k(k, force=False)#

Include the `k`th variable in regression

is_fitted()#
resid()#

Estimate of residuals = ||y - yhat||^2

sigma2()#

Estimate of sigma square. For weighted least squares, returns the weighted variance estimate.

sweepystats.sweep_matrix module#

class sweepystats.sweep_matrix.SweepMatrix(A, storage=None)#

Bases: object

Thin wrapper over a numpy array. The original array will not be copied if it is a double-precision 2D array stored in column-major (Fortran-style).

det(restore=True, verbose=True)#

Computes the determinant by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property dtype#
isposdef(restore=True, verbose=True, tol=1e-12)#

Checks whether the matrix is positive definite by checking if A[k, k] > tol (note: strict inequality) for each k before being swept. If restore=True (default), then the original matrix is untouched.

property ndim#
rank(restore=True, verbose=True, tol=1e-12)#

Computes matrix rank by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property shape#
property size#
sweep(inv=False, verbose=True, symmetrize=True, tol=1e-12)#

Sweeps the entire matrix. If inv=True, we perform the inverse sweep on the kth row/col. If symmetrize=False, then only the upper-triangle is read/swept. A progress bar is displayed unless verbose=False.

sweep_k(k, inv=False, symmetrize=True, tol=1e-12)#

Sweeps on the kth row/column, returns A[k, k] before it is swept.

If inv=True, then the inverse-sweep is performed. If symmetrize = False, then only the upper-triangular matrix is touched. tol is the smallest diagonal element that is treated as numerically 0.

Module contents#

class sweepystats.ANOVA(df, formula)#

Bases: object

A class to perform (k-way) ANOVA based on the sweep operation.

Parameters: + df: A pandas dataframe containing the covariates and outcome. + formula: A formula string to define the model, e.g.

‘y ~ Group + Factor + Group:Factor’.

f_test(variable)#

Tests whether variable in self.formula is significant by performing an F-test. The model must already be fitted.

Returns: + f_stat: The F-statistic + pval: The associated p-value

fit(verbose=True)#

Fit ANOVA model by sweep operation

sum_sq()#

Computes sum of squared error for all variables that are currently swept in

class sweepystats.LinearRegression(X, y, weights=None)#

Bases: object

A class to perform linear regression based on the sweep operation.

Parameters:
Xarray-like

Design matrix of shape (n, p)

yarray-like

Response vector of shape (n,)

weightsarray-like, optional

Weight vector of shape (n,). If provided, performs weighted least squares. Weights should be non-negative. If None (default), performs ordinary least squares.

R2()#

Computes the R² (coefficient of determination) of fit. For weighted least squares, uses weighted statistics.

coef()#

Fitted coefficient values (beta hat). Only returns the beta for variables that have been swept in.

coef_std()#

Standard deviation of the fitted coefficient values

cov()#

Estimated variance-covariance of beta hat, i.e. Var(b) = sigma2 * inv(X’X)

exclude_k(k, force=False)#

Exclude the `k`th variable in regression

f_test(k)#

Tests whether the `k`th variable is significant by performing an F-test. The model must already be fitted.

Returns: + f_stat: The F-statistic + pval: The associated p-value

fit(verbose=True)#

Perform least squares fitting by sweeping in all variables.

include_k(k, force=False)#

Include the `k`th variable in regression

is_fitted()#
resid()#

Estimate of residuals = ||y - yhat||^2

sigma2()#

Estimate of sigma square. For weighted least squares, returns the weighted variance estimate.

class sweepystats.Normal(mu, sigma)#

Bases: object

A class that computes the density and conditional distributions of the multivariate Gaussian using the sweep operation.

cond_mean(y, yidx)#

Computes the conditional expectation E(Z | Y = y) where (Y, Z) is assumed to be jointly Gaussian with mean mu and cov sigma. The vector yidx indicates the indices of the observed y.

cond_var(y, yidx)#

Computes the conditional variance Var(Z | Y = y) where (Y, Z) is assumed to be jointly Gaussian with mean mu and cov sigma. The vector yidx indicates the indices of the observed y.

loglikelihood(x, verbose=True)#

Evaluates the loglikelihood of obsering X=x.

class sweepystats.SweepMatrix(A, storage=None)#

Bases: object

Thin wrapper over a numpy array. The original array will not be copied if it is a double-precision 2D array stored in column-major (Fortran-style).

det(restore=True, verbose=True)#

Computes the determinant by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property dtype#
isposdef(restore=True, verbose=True, tol=1e-12)#

Checks whether the matrix is positive definite by checking if A[k, k] > tol (note: strict inequality) for each k before being swept. If restore=True (default), then the original matrix is untouched.

property ndim#
rank(restore=True, verbose=True, tol=1e-12)#

Computes matrix rank by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.

property shape#
property size#
sweep(inv=False, verbose=True, symmetrize=True, tol=1e-12)#

Sweeps the entire matrix. If inv=True, we perform the inverse sweep on the kth row/col. If symmetrize=False, then only the upper-triangle is read/swept. A progress bar is displayed unless verbose=False.

sweep_k(k, inv=False, symmetrize=True, tol=1e-12)#

Sweeps on the kth row/column, returns A[k, k] before it is swept.

If inv=True, then the inverse-sweep is performed. If symmetrize = False, then only the upper-triangular matrix is touched. tol is the smallest diagonal element that is treated as numerically 0.