Data mining is a procedure to find relationships among a group of variables in a database. SAFE TOOLBOXES® comes with two data mining models: forward stepwise regression and backward stepwise regression. Both methods are based on running multivariate linear regression models multiple times, but they differ in how the variables are included or excluded.
The forward stepwise regression starts from an empty model, allowing variables to be added and removed. The backward stepwise regression starts from a complete model, allowing variables to be added and removed. The result of the process is a multivariate linear regression that supposedly contains the “best” set of explanatory variables.
Now, let’s illustrate this tool with an example.
Suppose that you have the following database:
A |
B |
C |
D |
E |
F |
|
1 |
Y |
X1 |
X2 |
X3 |
X4 |
X5 |
2 |
80.9 |
21.3 |
79.3 |
92.1 |
96.0 |
191.8 |
3 |
46.3 |
11.9 |
46.8 |
135.8 |
100.5 |
89.9 |
4 |
61.2 |
23.6 |
62.1 |
101.3 |
83.7 |
241.8 |
5 |
61.5 |
25.7 |
62.8 |
91.9 |
80.9 |
231.7 |
6 |
78.0 |
11.4 |
77.4 |
78.1 |
68.6 |
325.9 |
7 |
58.6 |
5.5 |
57.6 |
104.0 |
65.7 |
213.1 |
8 |
95.2 |
12.2 |
97.4 |
73.8 |
72.4 |
136.7 |
9 |
47.0 |
21.3 |
47.7 |
148.5 |
87.2 |
77.1 |
10 |
95.3 |
25.3 |
92.7 |
63.3 |
86.6 |
187.3 |
11 |
48.8 |
8.5 |
50.3 |
102.2 |
79.1 |
197.0 |
12 |
74.8 |
14.8 |
74.1 |
86.3 |
87.0 |
391.9 |
13 |
57.7 |
10.2 |
57.0 |
141.0 |
85.6 |
207.2 |
14 |
59.8 |
17.0 |
57.6 |
70.8 |
70.9 |
169.3 |
... |
... |
... |
... |
... |
... |
... |
288 |
59.2 |
-0.7 |
60.7 |
99.3 |
44.9 |
240.0 |
289 |
65.2 |
13.8 |
65.8 |
118.7 |
90.0 |
198.9 |
290 |
62.0 |
16.4 |
64.6 |
143.7 |
59.6 |
187.7 |
291 |
92.8 |
26.7 |
91.7 |
115.8 |
62.9 |
230.1 |
292 |
88.9 |
3.6 |
88.5 |
73.7 |
100.8 |
182.2 |
293 |
63.7 |
1.6 |
62.7 |
27.4 |
89.1 |
78.3 |
294 |
73.3 |
18.7 |
72.3 |
121.3 |
71.1 |
92.6 |
295 |
37.7 |
7.0 |
38.0 |
82.0 |
90.1 |
135.8 |
296 |
56.3 |
16.5 |
57.9 |
92.0 |
101.0 |
86.2 |
297 |
97.9 |
3.5 |
97.4 |
66.5 |
94.1 |
203.6 |
298 |
51.1 |
8.5 |
51.5 |
61.0 |
79.9 |
249.9 |
299 |
37.2 |
27.3 |
37.7 |
87.4 |
76.7 |
218.4 |
300 |
60.5 |
21.7 |
60.4 |
167.9 |
105.6 |
214.5 |
301 |
44.9 |
16.6 |
44.3 |
68.2 |
98.8 |
178.2 |
302 |
|
|
|
|
|
|
If you want, for instance, to run a forward stepwise regression, follow these steps:
Examining the final equation model of the regression in tab , we will find: