
Purposeful Selection of Variables
Download:
Purposeful Selection
Macro Beta Version 1.1 September 2007
Programmers:
Zoran Bursac
Heath Gauss
Keith Williams
Dave Hosmer
Macro
variables:
DATASET -
we have just placed ours in the WORK library,
but
you could use it from anywhere, just specify that in the macro call
i.e.
SASUSER.YOUR_DATASET_NAME
OUTCOME -
binary variable, ideally coded as 0 or 1. Macro is set up to use
descending option, or to model the probability of Y=1 so be aware
of
that.
COVARIATES
- either binary variables coded as 0 or 1, or
continuous
variables.
While you could insert dummy variables it will only
retain
the significant ones or the confouders so you would
have to
force
the other not-retained dummies back in after the selection
is
complete. We have not tested this or
designed the macro to
handle
it yet, so use at your own risk.
PVALUEI -
inclusion criteria for covariates into the multivariable
model.
This p-value is the result of the univariate test between
Y and
each X separately and it creates a subset of candidate variables
for
inclusion into the multivariable model.
We
recommend setting this liberally to .25 because if we set it lower
we
could miss potentially important variables.
PVALUER -
once candidate variables are fitted in the multivariable
model
this becomes their retention criteria. We recommend setting
this
to 0.1 .
CHBETA - this
is % change in parameter estimates that we consider
confounding once any X variable is removed from the model. In our
simulations we have found that setting this to 15 (15% change)
seems
to give us optimal results. You can also test other levels like
20 or 25
and compare the findings.
PVALUENC
- this is the newest macro variable we implemented before
JSM 2007
and it is the inclusion criteria for non-candidate variables
i.e.
the variables that did not make it into the initial multivariable
model.
Through our simulation studies we found that setting this to
0.15
gives us the optimal inclusion/retention results.
Retention
criteria for non-candidate variables is preset in the macro
at
0.1 level.
User
instructions:
1. Place
your data set in the WORK library and recode your variables
to match the
instructions above.
2. Run
this macro.
3. Call
the macro as follows:
%PurposefulSelection (YOUR_DATASET_NAME,
YOUR_Y,
YOUR_X1 YOUR_X2 YOUR_X3 YOUR_XN, 0.25, 0.1, 15, 0.15);
Note that
the LOG screen will have more notes than you would
ideally
like to see. While we may suppress some of this in the
future
we haven’t yet. On the positive side it will give you
more
step by step information on what has happened along the way.
OUTPUT
screen, like other selection procedures, will also give you
step
by step analysis results. Last output should be your
"final" main effects model. Be warned to carefully
examine your
model
and determine why the selected variables are there. Compare your
findings
with other available selection procedures.
If you
use this macro for work to be published please use the
following
citations:
Bursac Z, Gauss CH, Williams DK, Hosmer
DW. (2008). Purposeful Selection
of Variables in Logistic Regression. Source Code for Biology and
Medicine, 3(17): 1-8.
[BioMed Central] [PubMed Central]
Bursac Z, Gauss CH, Williams DK, Hosmer
DW. (2008). Purposeful Selection
of Variables in Logistic Regression: Macro and Simulation Results.
ASA Proceedings of the Joint Statistical
Meetings, Statistical Computing Section,
Bursac Z, Gauss CH, Williams DK, Hosmer DW. (2007).
A Purposeful Selection of Variables Macro for Logistic Regression.
SAS Global Forum Proceedings, Paper 173: 1-5.
http://www2.sas.com/proceedings/forum2007/173-2007.pdf