Purposeful Selection of Variables

 

 

Download:

 

Purposeful Selection Macro Beta Version 1.1 September 2007

 

Programmers: Zoran Bursac

                       Heath Gauss

                       Keith Williams

                       Dave Hosmer

 

Macro variables:

 

DATASET - we have just placed ours in the WORK library,

but you could use it from anywhere, just specify that in the macro call

i.e. SASUSER.YOUR_DATASET_NAME

 

OUTCOME - binary variable, ideally coded as 0 or 1. Macro is set up to use

descending option, or to model the probability of Y=1 so be aware

of that.

 

COVARIATES - either binary variables coded as 0 or 1, or continuous

variables. While you could insert dummy variables it will only

retain the significant ones or the confouders so you would have to

force the other not-retained dummies back in after the selection

is complete.  We have not tested this or designed the macro to

handle it yet, so use at your own risk.

 

PVALUEI - inclusion criteria for covariates into the multivariable

model. This p-value is the result of the univariate test between

Y and each X separately and it creates a subset of candidate variables

for inclusion into the multivariable model.

We recommend setting this liberally to .25 because if we set it lower

we could miss potentially important variables.

 

PVALUER - once candidate variables are fitted in the multivariable

model this becomes their retention criteria. We recommend setting

this to 0.1 .

 

CHBETA - this is % change in parameter estimates that we consider

confounding once any X variable is removed from the model. In our

simulations we have found that setting this to 15 (15% change)

seems to give us optimal results. You can also test other levels like

20 or 25 and compare the findings.

 

PVALUENC - this is the newest macro variable we implemented before

JSM 2007 and it is the inclusion criteria for non-candidate variables

i.e. the variables that did not make it into the initial multivariable

model. Through our simulation studies we found that setting this to

0.15 gives us the optimal inclusion/retention results.

 

Retention criteria for non-candidate variables is preset in the macro

at 0.1 level.

 

 

User instructions:

 

1. Place your data set in the WORK library and recode your variables

    to match the instructions above.

2. Run this macro.

3. Call the macro as follows:

 

%PurposefulSelection (YOUR_DATASET_NAME, YOUR_Y,

YOUR_X1 YOUR_X2 YOUR_X3 YOUR_XN, 0.25, 0.1, 15, 0.15);

 

Note that the LOG screen will have more notes than you would

ideally like to see. While we may suppress some of this in the

future we haven’t yet. On the positive side it will give you

more step by step information on what has happened along the way.

 

OUTPUT screen, like other selection procedures, will also give you

step by step analysis results. Last output should be your

"final" main effects model. Be warned to carefully examine your

model and determine why the selected variables are there. Compare your

findings with other available selection procedures.

 

If you use this macro for work to be published please use the

following citations:

 

Bursac Z, Gauss CH, Williams DK, Hosmer DW. (2008). Purposeful Selection

of Variables in Logistic Regression. Source Code for Biology and Medicine, 3(17): 1-8.

 

[BioMed Central] [PubMed Central]

 

Bursac Z, Gauss CH, Williams DK, Hosmer DW. (2008). Purposeful Selection

of Variables in Logistic Regression: Macro and Simulation Results.

ASA Proceedings of the Joint Statistical Meetings, Statistical Computing Section,

Alexandria, VA: 1886-1891.

 

[PDF]

 

Bursac Z, Gauss CH, Williams DK, Hosmer DW. (2007).

A Purposeful Selection of Variables Macro for Logistic Regression.

SAS Global Forum Proceedings, Paper 173: 1-5.

 

http://www2.sas.com/proceedings/forum2007/173-2007.pdf