spreg.OLS¶
-
class
spreg.
OLS
(y, x, w=None, robust=None, gwk=None, sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None)[source]¶ Ordinary least squares with results and diagnostics.
- Parameters
- yarray
nx1 array for dependent variable
- xarray
Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant
- wpysal W object
Spatial weights object (required if running spatial diagnostics)
- robuststring
If ‘white’, then a White consistent estimator of the variance-covariance matrix is given. If ‘hac’, then a HAC consistent estimator of the variance-covariance matrix is given. Default set to None.
- gwkpysal W object
Kernel spatial weights needed for HAC estimation. Note: matrix must have ones along the main diagonal.
- sig2n_kboolean
If True, then use n-k to estimate sigma^2. If False, use n.
- nonspat_diagboolean
If True, then compute non-spatial diagnostics on the regression.
- spat_diagboolean
If True, then compute Lagrange multiplier tests (requires w). Note: see moran for further tests.
- moranboolean
If True, compute Moran’s I on the residuals. Note: requires spat_diag=True.
- white_testboolean
If True, compute White’s specification robust test. (requires nonspat_diag=True)
- vmboolean
If True, include variance-covariance matrix in summary results
- name_ystring
Name of dependent variable for use in output
- name_xlist of strings
Names of independent variables for use in output
- name_wstring
Name of weights matrix for use in output
- name_gwkstring
Name of kernel weights matrix for use in output
- name_dsstring
Name of dataset for use in output
Examples
>>> import numpy as np >>> import libpysal
Open data on Columbus neighborhood crime (49 areas) using libpysal.io.open(). This is the DBF associated with the Columbus shapefile. Note that libpysal.io.open() also reads data in CSV format; also, the actual OLS class requires data to be passed in as numpy arrays so the user can read their data in using any method.
>>> db = libpysal.io.open(libpysal.examples.get_path('columbus.dbf'),'r')
Extract the HOVAL column (home values) from the DBF file and make it the dependent variable for the regression. Note that PySAL requires this to be an nx1 numpy array.
>>> hoval = db.by_col("HOVAL") >>> y = np.array(hoval) >>> y.shape = (len(hoval), 1)
Extract CRIME (crime) and INC (income) vectors from the DBF to be used as independent variables in the regression. Note that PySAL requires this to be an nxj numpy array, where j is the number of independent variables (not including a constant). spreg.OLS adds a vector of ones to the independent variables passed in.
>>> X = [] >>> X.append(db.by_col("INC")) >>> X.append(db.by_col("CRIME")) >>> X = np.array(X).T
The minimum parameters needed to run an ordinary least squares regression are the two numpy arrays containing the independent variable and dependent variables respectively. To make the printed results more meaningful, the user can pass in explicit names for the variables used; this is optional.
>>> ols = OLS(y, X, name_y='home value', name_x=['income','crime'], name_ds='columbus', white_test=True)
spreg.OLS computes the regression coefficients and their standard errors, t-stats and p-values. It also computes a large battery of diagnostics on the regression. In this example we compute the white test which by default isn’t (‘white_test=True’). All of these results can be independently accessed as attributes of the regression object created by running spreg.OLS. They can also be accessed at one time by printing the summary attribute of the regression object. In the example below, the parameter on crime is -0.4849, with a t-statistic of -2.6544 and p-value of 0.01087.
>>> ols.betas array([[ 46.42818268], [ 0.62898397], [ -0.48488854]]) >>> print round(ols.t_stat[2][0],3) -2.654 >>> print round(ols.t_stat[2][1],3) 0.011 >>> print round(ols.r2,3) 0.35
Or we can easily obtain a full summary of all the results nicely formatted and ready to be printed:
>>> print ols.summary REGRESSION ---------- SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES ----------------------------------------- Data set : columbus Dependent Variable : home value Number of Observations: 49 Mean dependent var : 38.4362 Number of Variables : 3 S.D. dependent var : 18.4661 Degrees of Freedom : 46 R-squared : 0.3495 Adjusted R-squared : 0.3212 Sum squared residual: 10647.015 F-statistic : 12.3582 Sigma-square : 231.457 Prob(F-statistic) : 5.064e-05 S.E. of regression : 15.214 Log likelihood : -201.368 Sigma-square ML : 217.286 Akaike info criterion : 408.735 S.E of regression ML: 14.7406 Schwarz criterion : 414.411 ------------------------------------------------------------------------------------ Variable Coefficient Std.Error t-Statistic Probability ------------------------------------------------------------------------------------ CONSTANT 46.4281827 13.1917570 3.5194844 0.0009867 crime -0.4848885 0.1826729 -2.6544086 0.0108745 income 0.6289840 0.5359104 1.1736736 0.2465669 ------------------------------------------------------------------------------------ REGRESSION DIAGNOSTICS MULTICOLLINEARITY CONDITION NUMBER 12.538 TEST ON NORMALITY OF ERRORS TEST DF VALUE PROB Jarque-Bera 2 39.706 0.0000 DIAGNOSTICS FOR HETEROSKEDASTICITY RANDOM COEFFICIENTS TEST DF VALUE PROB Breusch-Pagan test 2 5.767 0.0559 Koenker-Bassett test 2 2.270 0.3214 SPECIFICATION ROBUST TEST TEST DF VALUE PROB White 5 2.906 0.7145 ================================ END OF REPORT =====================================
If the optional parameters w and spat_diag are passed to spreg.OLS, spatial diagnostics will also be computed for the regression. These include Lagrange multiplier tests and Moran’s I of the residuals. The w parameter is a PySAL spatial weights matrix. In this example, w is built directly from the shapefile columbus.shp, but w can also be read in from a GAL or GWT file. In this case a rook contiguity weights matrix is built, but PySAL also offers queen contiguity, distance weights and k nearest neighbor weights among others. In the example, the Moran’s I of the residuals is 0.204 with a standardized value of 2.592 and a p-value of 0.0095.
>>> w = libpysal.weights.Rook.from_shapefile(libpysal.examples.get_path("columbus.shp")) >>> ols = OLS(y, X, w, spat_diag=True, moran=True, name_y='home value', name_x=['income','crime'], name_ds='columbus') >>> ols.betas array([[ 46.42818268], [ 0.62898397], [ -0.48488854]]) >>> print round(ols.moran_res[0],3) 0.204 >>> print round(ols.moran_res[1],3) 2.592 >>> print round(ols.moran_res[2],4) 0.0095
- Attributes
- summarystring
Summary of regression results and diagnostics (note: use in conjunction with the print command)
- betasarray
kx1 array of estimated coefficients
- uarray
nx1 array of residuals
- predyarray
nx1 array of predicted y values
- ninteger
Number of observations
- kinteger
Number of variables for which coefficients are estimated (including the constant)
- yarray
nx1 array for dependent variable
- xarray
Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant
- robuststring
Adjustment for robust standard errors
- mean_yfloat
Mean of dependent variable
- std_yfloat
Standard deviation of dependent variable
- vmarray
Variance covariance matrix (kxk)
- r2float
R squared
- ar2float
Adjusted R squared
- utufloat
Sum of squared residuals
- sig2float
Sigma squared used in computations
- sig2MLfloat
Sigma squared (maximum likelihood)
- f_stattuple
Statistic (float), p-value (float)
- logllfloat
Log likelihood
- aicfloat
Akaike information criterion
- schwarzfloat
Schwarz information criterion
- std_errarray
1xk array of standard errors of the betas
- t_statlist of tuples
t statistic; each tuple contains the pair (statistic, p-value), where each is a float
- mulCollifloat
Multicollinearity condition number
- jarque_beradictionary
‘jb’: Jarque-Bera statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- breusch_pagandictionary
‘bp’: Breusch-Pagan statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- koenker_bassettdictionary
‘kb’: Koenker-Bassett statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- whitedictionary
‘wh’: White statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- lm_errortuple
Lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float
- lm_lagtuple
Lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float
- rlm_errortuple
Robust lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float
- rlm_lagtuple
Robust lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float
- lm_sarmatuple
Lagrange multiplier test for spatial SARMA model; tuple contains the pair (statistic, p-value), where each is a float
- moran_restuple
Moran’s I for the residuals; tuple containing the triple (Moran’s I, standardized Moran’s I, p-value)
- name_ystring
Name of dependent variable for use in output
- name_xlist of strings
Names of independent variables for use in output
- name_wstring
Name of weights matrix for use in output
- name_gwkstring
Name of kernel weights matrix for use in output
- name_dsstring
Name of dataset for use in output
- titlestring
Name of the regression method used
- sig2nfloat
Sigma squared (computed with n in the denominator)
- sig2n_kfloat
Sigma squared (computed with n-k in the denominator)
- xtxfloat
\(X'X\)
- xtxifloat
\((X'X)^{-1}\)
-
__init__
(self, y, x, w=None, robust=None, gwk=None, sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(self, y, x[, w, robust, gwk, …])Initialize self.
Attributes
mean_y
sig2n
sig2n_k
std_y
utu
vm