sweepystats package#
Submodules#
sweepystats.linreg module#
- class sweepystats.linreg.LinearRegression(X, y, weights=None)#
Bases:
objectA class to perform linear regression based on the sweep operation.
- Parameters:
- Xarray-like
Design matrix of shape (n, p)
- yarray-like
Response vector of shape (n,)
- weightsarray-like, optional
Weight vector of shape (n,). If provided, performs weighted least squares. Weights should be non-negative. If None (default), performs ordinary least squares.
- R2()#
Computes the R² (coefficient of determination) of fit. For weighted least squares, uses weighted statistics.
- coef()#
Fitted coefficient values (beta hat). Only returns the beta for variables that have been swept in.
- coef_std()#
Standard deviation of the fitted coefficient values
- cov()#
Estimated variance-covariance of beta hat, i.e. Var(b) = sigma2 * inv(X’X)
- f_test(k)#
Tests whether the `k`th variable is significant by performing an F-test. The model must already be fitted.
Returns: + f_stat: The F-statistic + pval: The associated p-value
- fit(verbose=True)#
Perform least squares fitting by sweeping in all variables.
- is_fitted()#
- resid()#
Estimate of residuals = ||y - yhat||^2
- sigma2()#
Estimate of sigma square. For weighted least squares, returns the weighted variance estimate.
sweepystats.sweep_matrix module#
- class sweepystats.sweep_matrix.SweepMatrix(A, storage=None)#
Bases:
objectThin wrapper over a numpy array. The original array will not be copied if it is a double-precision 2D array stored in column-major (Fortran-style).
- det(restore=True, verbose=True)#
Computes the determinant by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.
- property dtype#
- isposdef(restore=True, verbose=True, tol=1e-12)#
Checks whether the matrix is positive definite by checking if A[k, k] > tol (note: strict inequality) for each k before being swept. If restore=True (default), then the original matrix is untouched.
- property ndim#
- rank(restore=True, verbose=True, tol=1e-12)#
Computes matrix rank by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.
- property shape#
- property size#
- sweep(inv=False, verbose=True, symmetrize=True, tol=1e-12)#
Sweeps the entire matrix. If inv=True, we perform the inverse sweep on the kth row/col. If symmetrize=False, then only the upper-triangle is read/swept. A progress bar is displayed unless verbose=False.
- sweep_k(k, inv=False, symmetrize=True, tol=1e-12)#
Sweeps on the kth row/column, returns A[k, k] before it is swept.
If inv=True, then the inverse-sweep is performed. If symmetrize = False, then only the upper-triangular matrix is touched. tol is the smallest diagonal element that is treated as numerically 0.
Module contents#
- class sweepystats.ANOVA(df, formula)#
Bases:
objectA class to perform (k-way) ANOVA based on the sweep operation.
Parameters: + df: A pandas dataframe containing the covariates and outcome. + formula: A formula string to define the model, e.g.
‘y ~ Group + Factor + Group:Factor’.
- f_test(variable)#
Tests whether variable in self.formula is significant by performing an F-test. The model must already be fitted.
Returns: + f_stat: The F-statistic + pval: The associated p-value
- fit(verbose=True)#
Fit ANOVA model by sweep operation
- sum_sq()#
Computes sum of squared error for all variables that are currently swept in
- class sweepystats.LinearRegression(X, y, weights=None)#
Bases:
objectA class to perform linear regression based on the sweep operation.
- Parameters:
- Xarray-like
Design matrix of shape (n, p)
- yarray-like
Response vector of shape (n,)
- weightsarray-like, optional
Weight vector of shape (n,). If provided, performs weighted least squares. Weights should be non-negative. If None (default), performs ordinary least squares.
- R2()#
Computes the R² (coefficient of determination) of fit. For weighted least squares, uses weighted statistics.
- coef()#
Fitted coefficient values (beta hat). Only returns the beta for variables that have been swept in.
- coef_std()#
Standard deviation of the fitted coefficient values
- cov()#
Estimated variance-covariance of beta hat, i.e. Var(b) = sigma2 * inv(X’X)
- f_test(k)#
Tests whether the `k`th variable is significant by performing an F-test. The model must already be fitted.
Returns: + f_stat: The F-statistic + pval: The associated p-value
- fit(verbose=True)#
Perform least squares fitting by sweeping in all variables.
- is_fitted()#
- resid()#
Estimate of residuals = ||y - yhat||^2
- sigma2()#
Estimate of sigma square. For weighted least squares, returns the weighted variance estimate.
- class sweepystats.Normal(mu, sigma)#
Bases:
objectA class that computes the density and conditional distributions of the multivariate Gaussian using the sweep operation.
- cond_mean(y, yidx)#
Computes the conditional expectation E(Z | Y = y) where (Y, Z) is assumed to be jointly Gaussian with mean mu and cov sigma. The vector yidx indicates the indices of the observed y.
- cond_var(y, yidx)#
Computes the conditional variance Var(Z | Y = y) where (Y, Z) is assumed to be jointly Gaussian with mean mu and cov sigma. The vector yidx indicates the indices of the observed y.
- loglikelihood(x, verbose=True)#
Evaluates the loglikelihood of obsering X=x.
- class sweepystats.SweepMatrix(A, storage=None)#
Bases:
objectThin wrapper over a numpy array. The original array will not be copied if it is a double-precision 2D array stored in column-major (Fortran-style).
- det(restore=True, verbose=True)#
Computes the determinant by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.
- property dtype#
- isposdef(restore=True, verbose=True, tol=1e-12)#
Checks whether the matrix is positive definite by checking if A[k, k] > tol (note: strict inequality) for each k before being swept. If restore=True (default), then the original matrix is untouched.
- property ndim#
- rank(restore=True, verbose=True, tol=1e-12)#
Computes matrix rank by sweeping the entire matrix. If restore=True (default), then the original matrix is untouched.
- property shape#
- property size#
- sweep(inv=False, verbose=True, symmetrize=True, tol=1e-12)#
Sweeps the entire matrix. If inv=True, we perform the inverse sweep on the kth row/col. If symmetrize=False, then only the upper-triangle is read/swept. A progress bar is displayed unless verbose=False.
- sweep_k(k, inv=False, symmetrize=True, tol=1e-12)#
Sweeps on the kth row/column, returns A[k, k] before it is swept.
If inv=True, then the inverse-sweep is performed. If symmetrize = False, then only the upper-triangular matrix is touched. tol is the smallest diagonal element that is treated as numerically 0.