Sheet1
Page 1 | ||||||
Regression # 1 with Demand vs. price, advertising, Income | ||||||
Demand | Price | Advertising | Income | |||
42100 | 11.77 | 46100 | 38000 | |||
55500 | 9.96 | 47200 | 39100 | |||
71100 | 12.36 | 60900 | 40100 | |||
63200 | 12.49 | 55600 | 44200 | |||
77200 | 10.68 | 64400 | 41800 | |||
70900 | 12.07 | 60700 | 44800 | |||
55600 | 11.97 | 52100 | 39900 | |||
70700 | 11.23 | 57900 | 43600 | |||
71400 | 11.26 | 55600 | 41700 | |||
79400 | 9.79 | 60100 | 41200 | |||
60600 | 12.29 | 50700 | 44000 | |||
50800 | 12.7 | 46500 | 43300 | |||
61800 | 12.33 | 58600 | 41000 | |||
40500 | 10.88 | 42800 | 38300 | |||
85300 | 10.14 | 64800 | 42100 | |||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9745062006 | |||||
R Square | 0.9496623349 | |||||
Adjusted R Square | 0.9359338808 | |||||
Standard Error | 3333.092314398 | |||||
Observations | 15 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3 | 2305491451.86071 | 768497150.620236 | 69.1747466484 | 0.0000002002 | |
Residual | 11 | 122204548.139292 | 11109504.3762993 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -33301.7280624811 | 17898.4143542384 | -1.8605965536 | 0.0897240189 | -72695.8923688769 | 6092.4362439147 |
Price | -4041.5338282552 | 1040.6404220588 | -3.883698675 | 0.0025470799 | -6331.9691125829 | -1751.0985439276 |
Advertising | 1.4543896909 | 0.1516168655 | 9.5925323722 | 0.0000011186 | 1.120683051 | 1.7880963308 |
Income | 1.527891689 | 0.5128017369 | 2.9794978821 | 0.0125307541 | 0.3992221052 | 2.6565612727 |
Regression # 2 with Demand vs. price | ||||||
Demand | Price | |||||
42100 | 11.77 | |||||
55500 | 9.96 | |||||
71100 | 12.36 | |||||
63200 | 12.49 | |||||
77200 | 10.68 | |||||
70900 | 12.07 | |||||
55600 | 11.97 | |||||
70700 | 11.23 | |||||
71400 | 11.26 | Page 2 | ||||
79400 | 9.79 | |||||
60600 | 12.29 | |||||
50800 | 12.7 | |||||
61800 | 12.33 | |||||
40500 | 10.88 | |||||
85300 | 10.14 | |||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.350203681 | |||||
R Square | 0.1226426182 | |||||
Adjusted R Square | 0.0551535888 | |||||
Standard Error | 12800.1111966511 | |||||
Observations | 15 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 297738993.593787 | 297738993.593787 | 1.8172230261 | 0.2006653343 | |
Residual | 13 | 2129957006.40621 | 163842846.646632 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 117762.55345144 | 40210.8157751149 | 2.9286288075 | 0.0117442613 | 30892.3841521681 | 204632.722750712 |
Price | -4713.4615040228 | 3496.5186485085 | -1.3480441484 | 0.2006653343 | -12267.2293423258 | 2840.3063342802 |
Regression # 3 with Demand vs. advertising, Income | ||||||
Demand | Advertising | Income | ||||
42100 | 46100 | 38000 | ||||
55500 | 47200 | 39100 | ||||
71100 | 60900 | 40100 | ||||
63200 | 55600 | 44200 | ||||
77200 | 64400 | 41800 | ||||
70900 | 60700 | 44800 | ||||
55600 | 52100 | 39900 | ||||
70700 | 57900 | 43600 | ||||
71400 | 55600 | 41700 | ||||
79400 | 60100 | 41200 | ||||
60600 | 50700 | 44000 | ||||
50800 | 46500 | 43300 | ||||
61800 | 58600 | 41000 | ||||
40500 | 42800 | 38300 | ||||
85300 | 64800 | 42100 | ||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9384240597 | |||||
R Square | 0.8806397157 | |||||
Adjusted R Square | 0.860746335 | |||||
Standard Error | 4914.0146915052 | |||||
Observations | 15 | Page 3 | ||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 2137925515.34005 | 1068962757.67003 | 44.2679767993 | 0.0000028917 | |
Residual | 12 | 289770484.659947 | 24147540.388329 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -53658.5380752887 | 25230.920508409 | -2.1266975994 | 0.0548767297 | -108631.990443247 | 1314.9142926693 |
Advertising | 1.6733636428 | 0.2074992554 | 8.0644320378 | 0.0000034631 | 1.2212616108 | 2.1254656748 |
Income | 0.6132666177 | 0.671586564 | 0.9131609394 | 0.3791392279 | -0.8499947791 | 2.0765280145 |
Regression # 4 with Demand vs. price, advertising | ||||||
Demand | Price | Advertising | ||||
42100 | 11.77 | 46100 | ||||
55500 | 9.96 | 47200 | ||||
71100 | 12.36 | 60900 | ||||
63200 | 12.49 | 55600 | ||||
77200 | 10.68 | 64400 | ||||
70900 | 12.07 | 60700 | ||||
55600 | 11.97 | 52100 | ||||
70700 | 11.23 | 57900 | ||||
71400 | 11.26 | 55600 | ||||
79400 | 9.79 | 60100 | ||||
60600 | 12.29 | 50700 | ||||
50800 | 12.7 | 46500 | ||||
61800 | 12.33 | 58600 | ||||
40500 | 10.88 | 42800 | ||||
85300 | 10.14 | 64800 | ||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9534348294 | |||||
R Square | 0.9090379738 | |||||
Adjusted R Square | 0.8938776362 | |||||
Standard Error | 4289.7955181119 | |||||
Observations | 15 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 2206867852.95345 | 1103433926.47672 | 59.9615913769 | 0.0000005664 | |
Residual | 12 | 220828147.046554 | 18402345.5872128 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 817.610310666 | 17704.0238072218 | 0.0461821742 | 0.9639247114 | -37756.1432376096 | 39391.3638589416 |
Price | -2617.5932710115 | 1189.7435071254 | -2.200132428 | 0.0481253769 | -5209.8216440211 | -25.364898002 |
Advertising | 1.6915685443 | 0.1660766479 | 10.1854689741 | 0.0000002935 | 1.3297186194 | 2.0534184692 |
Page 4 | ||||||
Month | Ads Cost | # of Ads | Sales | |||
1 | 13.9 | 12 | 43.6 | |||
2 | 12 | 11 | 38 | |||
3 | 9.3 | 9 | 30.1 | |||
4 | 9.7 | 7 | 35.3 | |||
5 | 12.3 | 12 | 46.4 | |||
6 | 11.4 | 8 | 34.2 | |||
7 | 9.3 | 6 | 30.2 | |||
8 | 14.3 | 13 | 40.7 | |||
9 | 10.2 | 8 | 38.5 | |||
10 | 8.4 | 6 | 22.6 | |||
11 | 11.2 | 8 | 37.6 | |||
12 | 11.1 | 10 | 35.2 | |||
Cost and Sales Data in thousands. | ||||||
Regression Run # 1 Sales vs. # of Ads | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.7808328324 | |||||
R Square | 0.6096999122 | |||||
Adjusted R Square | 0.5706699034 | |||||
Standard Error | 4.2056960876 | |||||
Observations | 12 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 276.3078708551 | 276.3078708551 | 15.6213111689 | 0.0027197818 | |
Residual | 10 | 176.8787958115 | 17.6878795812 | |||
Total | 11 | 453.1866666667 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 16.9369109948 | 4.9818265158 | 3.3997392204 | 0.00677404 | 5.8367078603 | 28.0371141293 |
# of Ads | 2.0832460733 | 0.5270864238 | 3.9523804433 | 0.0027197818 | 0.9088241309 | 3.2576680157 |
Regression Run # 2 Sales vs. Cost of Ads | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.8204249527 | |||||
R Square | 0.6730971031 | |||||
Adjusted R Square | 0.6404068134 | |||||
Standard Error | 3.849000314 | |||||
Observations | 12 | |||||
Page 5 | ||||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 305.038632494 | 305.038632494 | 20.590123534 | 0.0010786286 | |
Residual | 10 | 148.1480341726 | 14.8148034173 | |||
Total | 11 | 453.1866666667 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 4.1727002232 | 7.1087903141 | 0.5869775361 | 0.5702325128 | -11.6666744073 | 20.0120748536 |
Ads Cost | 2.8724838266 | 0.6330355736 | 4.537634134 | 0.0010786286 | 1.4619924264 | 4.2829752268 |
Regression Run # 3 Sales vs. Cost of Ads and # of Ads | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.8270511549 | |||||
R Square | 0.6840136128 | |||||
Adjusted R Square | 0.6137944156 | |||||
Standard Error | 3.9888848001 | |||||
Observations | 12 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 309.9858491336 | 154.9929245668 | 9.7411198143 | 0.0056041233 | |
Residual | 9 | 143.200817533 | 15.9112019481 | |||
Total | 11 | 453.1866666667 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 6.5835783983 | 8.5421553852 | 0.7707163007 | 0.4606279385 | -12.7401343195 | 25.907291116 |
Ads Cost | 2.1388637996 | 1.4701498969 | 1.4548610343 | 0.1796798793 | -1.1868488547 | 5.4645764539 |
# of Ads | 0.6246753408 | 1.1202769287 | 0.5576079671 | 0.5907093723 | -1.9095690692 | 3.1589197509 |
Analysis of Residuals (Autocorrelation) | ||||||
Sex | Months Employed | Base Salary | Months Employed | Gender | ||
Men | 6 | 7.5 | 6 | 0 | ||
Men | 10 | 8.6 | 10 | 0 | ||
Men | 12 | 9.1 | 12 | 0 | ||
Men | 18 | 10.3 | 18 | 0 | ||
Men | 30 | 13 | 30 | 0 | ||
Women | 5 | 6.2 | 5 | 1 | ||
Women | 13 | 8.7 | 13 | 1 | ||
Women | 15 | 9.4 | 15 | 1 | ||
Women | 21 | 9.8 | 21 | 1 | ||
Page 6 | ||||||
Regression of Salary vs. Months Employed | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9622940283 | |||||
R Square | 0.926009797 | |||||
Adjusted R Square | 0.915439768 | |||||
Standard Error | 0.5493933476 | |||||
Observations | 9 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 26.4427242027 | 26.4427242027 | 87.6071197878 | 0.0000330326 | |
Residual | 7 | 2.1128313528 | 0.3018330504 | |||
Total | 8 | 28.5555555556 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 5.8092778793 | 0.4038020095 | 14.3864511385 | 0.000001867 | 4.8544385379 | 6.7641172208 |
Months Employed | 0.2332038391 | 0.0249152928 | 9.3598675091 | 0.0000330326 | 0.1742885758 | 0.2921191025 |
RESIDUAL OUTPUT | ||||||
Observation | Predicted Base Salary | Residuals | ||||
1 | 7.2085009141 | 0.2914990859 | ||||
2 | 8.1413162706 | 0.4586837294 | ||||
3 | 8.6077239488 | 0.4922760512 | ||||
4 | 10.0069469835 | 0.2930530165 | ||||
5 | 12.805393053 | 0.194606947 | ||||
6 | 6.975297075 | -0.775297075 | ||||
7 | 8.8409277879 | -0.1409277879 | ||||
8 | 9.3073354662 | 0.0926645338 | ||||
9 | 10.7065585009 | -0.9065585009 | ||||
Note that the residuals are not random. Most of these are negative for Women | ||||||
and mostly positive for men. This is perhaps the most important item to | ||||||
analyze in a regression. If the model is good then the residuals would | ||||||
be scattered randomly around the regression line. This causes us to | ||||||
conclude that something else is going on which must be analyzed. | ||||||
The obvious thing is to test for possible discrimination. | ||||||
To analyze this we will add a variable called Gender and give men a value of 0 | ||||||
and Women a value of 1. We will now do a multiple regression with both | ||||||
gender and months employed as independent variables. | ||||||
Page 7 | ||||||
Regression of Salary vs. Months Employed and Gender | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9868191554 | |||||
R Square | 0.9738120456 | |||||
Adjusted R Square | 0.9650827274 | |||||
Standard Error | 0.3530372002 | |||||
Observations | 9 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 27.8077439674 | 13.9038719837 | 111.5564845782 | 0.0000179599 | |
Residual | 6 | 0.7478115882 | 0.1246352647 | |||
Total | 8 | 28.5555555556 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 6.2484785327 | 0.2914501108 | 21.4392731454 | 0.0000006718 | 5.5353252812 | 6.9616317843 |
Months Employed | 0.2270737807 | 0.016117228 | 14.0888855955 | 0.0000079815 | 0.1876363158 | 0.2665112457 |
Gender | -0.7889745727 | 0.2384042577 | -3.3093979969 | 0.0162168909 | -1.3723292028 | -0.2056199427 |
RESIDUAL OUTPUT | ||||||
Observation | Predicted Base Salary | Residuals | ||||
1 | 7.6109212172 | -0.1109212172 | ||||
2 | 8.5192163401 | 0.0807836599 | ||||
3 | 8.9733639016 | 0.1266360984 | ||||
4 | 10.3358065861 | -0.0358065861 | ||||
5 | 13.060691955 | -0.060691955 | ||||
6 | 6.5948728637 | -0.3948728637 | ||||
7 | 8.4114631096 | 0.2885368904 | ||||
8 | 8.8656106711 | 0.5343893289 | ||||
9 | 10.2280533556 | -0.4280533556 | ||||
Note that the regression equation now has | ||||||
* Better Coefficient of Correlation | ||||||
* Higher F Value | ||||||
* All Variables pass T Test meaning all are significant | ||||||
* Lower Standard Error | ||||||
* Residuals are random | ||||||
In every category tested it is a better model. | ||||||
Also note that Gender passed T Test. Our Null was no discrimination | ||||||
and we rejected this hypothesis. It means that we have proven | ||||||
discrimination from a legal perspective. | ||||||
Page 8 | ||||||
Nonlinear Regression | RESIDUAL OUTPUT | |||||
Batch Size | # Defect | Observation | Predicted # Defect | Residuals | ||
100 | 5 | 1 | -11.1875482377 | 16.1875482377 | ||
125 | 10 | 2 | -2.0092616414 | 12.0092616414 | ||
125 | 6 | 3 | -2.0092616414 | 8.0092616414 | ||
125 | 7 | 4 | -2.0092616414 | 9.0092616414 | ||
150 | 6 | 5 | 7.169024955 | -1.169024955 | ||
150 | 7 | 6 | 7.169024955 | -0.169024955 | ||
175 | 17 | 7 | 16.3473115513 | 0.6526884487 | ||
175 | 15 | 8 | 16.3473115513 | -1.3473115513 | ||
200 | 24 | 9 | 25.5255981477 | -1.5255981477 | ||
200 | 21 | 10 | 25.5255981477 | -4.5255981477 | ||
200 | 22 | 11 | 25.5255981477 | -3.5255981477 | ||
225 | 26 | 12 | 34.703884744 | -8.703884744 | ||
225 | 29 | 13 | 34.703884744 | -5.703884744 | ||
225 | 25 | 14 | 34.703884744 | -9.703884744 | ||
250 | 34 | 15 | 43.8821713404 | -9.8821713404 | ||
250 | 37 | 16 | 43.8821713404 | -6.8821713404 | ||
250 | 41 | 17 | 43.8821713404 | -2.8821713404 | ||
250 | 34 | 18 | 43.8821713404 | -9.8821713404 | ||
275 | 49 | 19 | 53.0604579367 | -4.0604579367 | ||
300 | 53 | 20 | 62.2387445331 | -9.2387445331 | ||
300 | 54 | 21 | 62.2387445331 | -8.2387445331 | ||
325 | 69 | 22 | 71.4170311294 | -2.4170311294 | ||
350 | 82 | 23 | 80.5953177258 | 1.4046822742 | ||
350 | 81 | 24 | 80.5953177258 | 0.4046822742 | ||
350 | 84 | 25 | 80.5953177258 | 3.4046822742 | ||
375 | 92 | 26 | 89.7736043221 | 2.2263956779 | ||
375 | 96 | 27 | 89.7736043221 | 6.2263956779 | ||
375 | 97 | 28 | 89.7736043221 | 7.2263956779 | ||
400 | 109 | 29 | 98.9518909184 | 10.0481090816 | ||
400 | 112 | 30 | 98.9518909184 | 13.0481090816 | ||
SUMMARY OUTPUT | ||||||
Note that residuals have positive signs | ||||||
Regression Statistics | followed by negative signs followed by | |||||
Multiple R | 0.9764238729 | positive signs. This is a clear sign that | ||||
R Square | 0.9534035796 | we are trying to fit a straight line to a set | ||||
Adjusted R Square | 0.9517394217 | data points which may be quadratic. | ||||
Standard Error | 7.5601037854 | We will add a new variable called Batch | ||||
Observations | 30 | squared and try a multiple regression. | ||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 32744.4552611268 | 32744.4552611268 | 572.9045280313 | 3.47597579694458E-20 | |
Residual | 28 | 1600.3447388732 | 57.1551692455 | |||
Total | 29 | 34344.8 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -47.9006946231 | 4.111558341 | -11.6502529335 | 0 | -56.3228495503 | -39.4785396959 |
Batch Size | 0.3671314639 | 0.0153384148 | 23.9354241247 | 3.47597579694456E-20 | 0.3357121101 | 0.3985508176 |
RESIDUAL OUTPUT | Page 9 | |||||
# Defect | Batch Size | Batch Sq. | Observation | Predicted # Defect | Residuals | |
5 | 100 | 10000 | 1 | 4.3827161756 | 0.6172838244 | |
10 | 125 | 15625 | 2 | 6.7213072886 | 3.2786927114 | |
6 | 125 | 15625 | 3 | 6.7213072886 | -0.7213072886 | |
7 | 125 | 15625 | 4 | 6.7213072886 | 0.2786927114 | |
6 | 150 | 22500 | 5 | 10.2468217618 | -4.2468217618 | |
7 | 150 | 22500 | 6 | 10.2468217618 | -3.2468217618 | |
17 | 175 | 30625 | 7 | 14.959259595 | 2.040740405 | |
15 | 175 | 30625 | 8 | 14.959259595 | 0.040740405 | |
24 | 200 | 40000 | 9 | 20.8586207885 | 3.1413792115 | |
21 | 200 | 40000 | 10 | 20.8586207885 | 0.1413792115 | |
22 | 200 | 40000 | 11 | 20.8586207885 | 1.1413792115 | |
26 | 225 | 50625 | 12 | 27.944905342 | -1.944905342 | |
29 | 225 | 50625 | 13 | 27.944905342 | 1.055094658 | |
25 | 225 | 50625 | 14 | 27.944905342 | -2.944905342 | |
34 | 250 | 62500 | 15 | 36.2181132557 | -2.2181132557 | |
37 | 250 | 62500 | 16 | 36.2181132557 | 0.7818867443 | |
41 | 250 | 62500 | 17 | 36.2181132557 | 4.7818867443 | |
34 | 250 | 62500 | 18 | 36.2181132557 | -2.2181132557 | |
49 | 275 | 75625 | 19 | 45.6782445296 | 3.3217554704 | |
53 | 300 | 90000 | 20 | 56.3252991635 | -3.3252991635 | |
54 | 300 | 90000 | 21 | 56.3252991635 | -2.3252991635 | |
69 | 325 | 105625 | 22 | 68.1592771577 | 0.8407228423 | |
82 | 350 | 122500 | 23 | 81.1801785119 | 0.8198214881 | |
81 | 350 | 122500 | 24 | 81.1801785119 | -0.1801785119 | |
84 | 350 | 122500 | 25 | 81.1801785119 | 2.8198214881 | |
92 | 375 | 140625 | 26 | 95.3880032263 | -3.3880032263 | |
96 | 375 | 140625 | 27 | 95.3880032263 | 0.6119967737 | |
97 | 375 | 140625 | 28 | 95.3880032263 | 1.6119967737 | |
109 | 400 | 160000 | 29 | 110.7827513008 | -1.7827513008 | |
112 | 400 | 160000 | 30 | 110.7827513008 | 1.2172486992 | |
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9976895195 | |||||
R Square | 0.9953843772 | |||||
Adjusted R Square | 0.9950424793 | |||||
Standard Error | 2.4230577581 | |||||
Observations | 30 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 34186.2773597291 | 17093.1386798645 | 2911.3490891121 | 2.93162976922614E-32 | |
Residual | 27 | 158.5226402709 | 5.8712088989 | |||
Total | 29 | 34344.8 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 6.8975853249 | 3.7368943622 | 1.8458068803 | 0.075912814 | -0.7698834287 | 14.5650540785 |
Batch Size | -0.1201025603 | 0.0314780261 | -3.8154412804 | 0.0007196857 | -0.1846900916 | -0.055515029 |
Batch Sq. | 0.0009495387 | 0.0000605927 | 15.670832634 | 0 | 0.0008252127 | 0.0010738646 |
Sheet1
1 |
Quantity
Predicted Quantity
Sheet1
Page 1 | ||||||
Regression # 1 with Demand vs. price, advertising, Income | ||||||
Demand | Price | Advertising | Income | |||
42100 | 11.77 | 46100 | 38000 | |||
55500 | 9.96 | 47200 | 39100 | |||
71100 | 12.36 | 60900 | 40100 | |||
63200 | 12.49 | 55600 | 44200 | |||
77200 | 10.68 | 64400 | 41800 | |||
70900 | 12.07 | 60700 | 44800 | |||
55600 | 11.97 | 52100 | 39900 | |||
70700 | 11.23 | 57900 | 43600 | |||
71400 | 11.26 | 55600 | 41700 | |||
79400 | 9.79 | 60100 | 41200 | |||
60600 | 12.29 | 50700 | 44000 | |||
50800 | 12.7 | 46500 | 43300 | |||
61800 | 12.33 | 58600 | 41000 | |||
40500 | 10.88 | 42800 | 38300 | |||
85300 | 10.14 | 64800 | 42100 | |||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9745062006 | |||||
R Square | 0.9496623349 | |||||
Adjusted R Square | 0.9359338808 | |||||
Standard Error | 3333.092314398 | |||||
Observations | 15 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3 | 2305491451.86071 | 768497150.620236 | 69.1747466484 | 0.0000002002 | |
Residual | 11 | 122204548.139292 | 11109504.3762993 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -33301.7280624811 | 17898.4143542384 | -1.8605965536 | 0.0897240189 | -72695.8923688769 | 6092.4362439147 |
Price | -4041.5338282552 | 1040.6404220588 | -3.883698675 | 0.0025470799 | -6331.9691125829 | -1751.0985439276 |
Advertising | 1.4543896909 | 0.1516168655 | 9.5925323722 | 0.0000011186 | 1.120683051 | 1.7880963308 |
Income | 1.527891689 | 0.5128017369 | 2.9794978821 | 0.0125307541 | 0.3992221052 | 2.6565612727 |
Regression # 2 with Demand vs. price | ||||||
Demand | Price | |||||
42100 | 11.77 | |||||
55500 | 9.96 | |||||
71100 | 12.36 | |||||
63200 | 12.49 | |||||
77200 | 10.68 | |||||
70900 | 12.07 | |||||
55600 | 11.97 | |||||
70700 | 11.23 | |||||
71400 | 11.26 | Page 2 | ||||
79400 | 9.79 | |||||
60600 | 12.29 | |||||
50800 | 12.7 | |||||
61800 | 12.33 | |||||
40500 | 10.88 | |||||
85300 | 10.14 | |||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.350203681 | |||||
R Square | 0.1226426182 | |||||
Adjusted R Square | 0.0551535888 | |||||
Standard Error | 12800.1111966511 | |||||
Observations | 15 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 297738993.593787 | 297738993.593787 | 1.8172230261 | 0.2006653343 | |
Residual | 13 | 2129957006.40621 | 163842846.646632 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 117762.55345144 | 40210.8157751149 | 2.9286288075 | 0.0117442613 | 30892.3841521681 | 204632.722750712 |
Price | -4713.4615040228 | 3496.5186485085 | -1.3480441484 | 0.2006653343 | -12267.2293423258 | 2840.3063342802 |
Regression # 3 with Demand vs. advertising, Income | ||||||
Demand | Advertising | Income | ||||
42100 | 46100 | 38000 | ||||
55500 | 47200 | 39100 | ||||
71100 | 60900 | 40100 | ||||
63200 | 55600 | 44200 | ||||
77200 | 64400 | 41800 | ||||
70900 | 60700 | 44800 | ||||
55600 | 52100 | 39900 | ||||
70700 | 57900 | 43600 | ||||
71400 | 55600 | 41700 | ||||
79400 | 60100 | 41200 | ||||
60600 | 50700 | 44000 | ||||
50800 | 46500 | 43300 | ||||
61800 | 58600 | 41000 | ||||
40500 | 42800 | 38300 | ||||
85300 | 64800 | 42100 | ||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9384240597 | |||||
R Square | 0.8806397157 | |||||
Adjusted R Square | 0.860746335 | |||||
Standard Error | 4914.0146915052 | |||||
Observations | 15 | Page 3 | ||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 2137925515.34005 | 1068962757.67003 | 44.2679767993 | 0.0000028917 | |
Residual | 12 | 289770484.659947 | 24147540.388329 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -53658.5380752887 | 25230.920508409 | -2.1266975994 | 0.0548767297 | -108631.990443247 | 1314.9142926693 |
Advertising | 1.6733636428 | 0.2074992554 | 8.0644320378 | 0.0000034631 | 1.2212616108 | 2.1254656748 |
Income | 0.6132666177 | 0.671586564 | 0.9131609394 | 0.3791392279 | -0.8499947791 | 2.0765280145 |
Regression # 4 with Demand vs. price, advertising | ||||||
Demand | Price | Advertising | ||||
42100 | 11.77 | 46100 | ||||
55500 | 9.96 | 47200 | ||||
71100 | 12.36 | 60900 | ||||
63200 | 12.49 | 55600 | ||||
77200 | 10.68 | 64400 | ||||
70900 | 12.07 | 60700 | ||||
55600 | 11.97 | 52100 | ||||
70700 | 11.23 | 57900 | ||||
71400 | 11.26 | 55600 | ||||
79400 | 9.79 | 60100 | ||||
60600 | 12.29 | 50700 | ||||
50800 | 12.7 | 46500 | ||||
61800 | 12.33 | 58600 | ||||
40500 | 10.88 | 42800 | ||||
85300 | 10.14 | 64800 | ||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9534348294 | |||||
R Square | 0.9090379738 | |||||
Adjusted R Square | 0.8938776362 | |||||
Standard Error | 4289.7955181119 | |||||
Observations | 15 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 2206867852.95345 | 1103433926.47672 | 59.9615913769 | 0.0000005664 | |
Residual | 12 | 220828147.046554 | 18402345.5872128 | |||
Total | 14 | 2427696000 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 817.610310666 | 17704.0238072218 | 0.0461821742 | 0.9639247114 | -37756.1432376096 | 39391.3638589416 |
Price | -2617.5932710115 | 1189.7435071254 | -2.200132428 | 0.0481253769 | -5209.8216440211 | -25.364898002 |
Advertising | 1.6915685443 | 0.1660766479 | 10.1854689741 | 0.0000002935 | 1.3297186194 | 2.0534184692 |
Page 4 | ||||||
Month | Ads Cost | # of Ads | Sales | |||
1 | 13.9 | 12 | 43.6 | |||
2 | 12 | 11 | 38 | |||
3 | 9.3 | 9 | 30.1 | |||
4 | 9.7 | 7 | 35.3 | |||
5 | 12.3 | 12 | 46.4 | |||
6 | 11.4 | 8 | 34.2 | |||
7 | 9.3 | 6 | 30.2 | |||
8 | 14.3 | 13 | 40.7 | |||
9 | 10.2 | 8 | 38.5 | |||
10 | 8.4 | 6 | 22.6 | |||
11 | 11.2 | 8 | 37.6 | |||
12 | 11.1 | 10 | 35.2 | |||
Cost and Sales Data in thousands. | ||||||
Regression Run # 1 Sales vs. # of Ads | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.7808328324 | |||||
R Square | 0.6096999122 | |||||
Adjusted R Square | 0.5706699034 | |||||
Standard Error | 4.2056960876 | |||||
Observations | 12 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 276.3078708551 | 276.3078708551 | 15.6213111689 | 0.0027197818 | |
Residual | 10 | 176.8787958115 | 17.6878795812 | |||
Total | 11 | 453.1866666667 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 16.9369109948 | 4.9818265158 | 3.3997392204 | 0.00677404 | 5.8367078603 | 28.0371141293 |
# of Ads | 2.0832460733 | 0.5270864238 | 3.9523804433 | 0.0027197818 | 0.9088241309 | 3.2576680157 |
Regression Run # 2 Sales vs. Cost of Ads | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.8204249527 | |||||
R Square | 0.6730971031 | |||||
Adjusted R Square | 0.6404068134 | |||||
Standard Error | 3.849000314 | |||||
Observations | 12 | |||||
Page 5 | ||||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 305.038632494 | 305.038632494 | 20.590123534 | 0.0010786286 | |
Residual | 10 | 148.1480341726 | 14.8148034173 | |||
Total | 11 | 453.1866666667 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 4.1727002232 | 7.1087903141 | 0.5869775361 | 0.5702325128 | -11.6666744073 | 20.0120748536 |
Ads Cost | 2.8724838266 | 0.6330355736 | 4.537634134 | 0.0010786286 | 1.4619924264 | 4.2829752268 |
Regression Run # 3 Sales vs. Cost of Ads and # of Ads | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.8270511549 | |||||
R Square | 0.6840136128 | |||||
Adjusted R Square | 0.6137944156 | |||||
Standard Error | 3.9888848001 | |||||
Observations | 12 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 309.9858491336 | 154.9929245668 | 9.7411198143 | 0.0056041233 | |
Residual | 9 | 143.200817533 | 15.9112019481 | |||
Total | 11 | 453.1866666667 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 6.5835783983 | 8.5421553852 | 0.7707163007 | 0.4606279385 | -12.7401343195 | 25.907291116 |
Ads Cost | 2.1388637996 | 1.4701498969 | 1.4548610343 | 0.1796798793 | -1.1868488547 | 5.4645764539 |
# of Ads | 0.6246753408 | 1.1202769287 | 0.5576079671 | 0.5907093723 | -1.9095690692 | 3.1589197509 |
Analysis of Residuals (Autocorrelation) | ||||||
Sex | Months Employed | Base Salary | Months Employed | Gender | ||
Men | 6 | 7.5 | 6 | 0 | ||
Men | 10 | 8.6 | 10 | 0 | ||
Men | 12 | 9.1 | 12 | 0 | ||
Men | 18 | 10.3 | 18 | 0 | ||
Men | 30 | 13 | 30 | 0 | ||
Women | 5 | 6.2 | 5 | 1 | ||
Women | 13 | 8.7 | 13 | 1 | ||
Women | 15 | 9.4 | 15 | 1 | ||
Women | 21 | 9.8 | 21 | 1 | ||
Page 6 | ||||||
Regression of Salary vs. Months Employed | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9622940283 | |||||
R Square | 0.926009797 | |||||
Adjusted R Square | 0.915439768 | |||||
Standard Error | 0.5493933476 | |||||
Observations | 9 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 26.4427242027 | 26.4427242027 | 87.6071197878 | 0.0000330326 | |
Residual | 7 | 2.1128313528 | 0.3018330504 | |||
Total | 8 | 28.5555555556 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 5.8092778793 | 0.4038020095 | 14.3864511385 | 0.000001867 | 4.8544385379 | 6.7641172208 |
Months Employed | 0.2332038391 | 0.0249152928 | 9.3598675091 | 0.0000330326 | 0.1742885758 | 0.2921191025 |
RESIDUAL OUTPUT | ||||||
Observation | Predicted Base Salary | Residuals | ||||
1 | 7.2085009141 | 0.2914990859 | ||||
2 | 8.1413162706 | 0.4586837294 | ||||
3 | 8.6077239488 | 0.4922760512 | ||||
4 | 10.0069469835 | 0.2930530165 | ||||
5 | 12.805393053 | 0.194606947 | ||||
6 | 6.975297075 | -0.775297075 | ||||
7 | 8.8409277879 | -0.1409277879 | ||||
8 | 9.3073354662 | 0.0926645338 | ||||
9 | 10.7065585009 | -0.9065585009 | ||||
Note that the residuals are not random. Most of these are negative for Women | ||||||
and mostly positive for men. This is perhaps the most important item to | ||||||
analyze in a regression. If the model is good then the residuals would | ||||||
be scattered randomly around the regression line. This causes us to | ||||||
conclude that something else is going on which must be analyzed. | ||||||
The obvious thing is to test for possible discrimination. | ||||||
To analyze this we will add a variable called Gender and give men a value of 0 | ||||||
and Women a value of 1. We will now do a multiple regression with both | ||||||
gender and months employed as independent variables. | ||||||
Page 7 | ||||||
Regression of Salary vs. Months Employed and Gender | ||||||
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9868191554 | |||||
R Square | 0.9738120456 | |||||
Adjusted R Square | 0.9650827274 | |||||
Standard Error | 0.3530372002 | |||||
Observations | 9 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 27.8077439674 | 13.9038719837 | 111.5564845782 | 0.0000179599 | |
Residual | 6 | 0.7478115882 | 0.1246352647 | |||
Total | 8 | 28.5555555556 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 6.2484785327 | 0.2914501108 | 21.4392731454 | 0.0000006718 | 5.5353252812 | 6.9616317843 |
Months Employed | 0.2270737807 | 0.016117228 | 14.0888855955 | 0.0000079815 | 0.1876363158 | 0.2665112457 |
Gender | -0.7889745727 | 0.2384042577 | -3.3093979969 | 0.0162168909 | -1.3723292028 | -0.2056199427 |
RESIDUAL OUTPUT | ||||||
Observation | Predicted Base Salary | Residuals | ||||
1 | 7.6109212172 | -0.1109212172 | ||||
2 | 8.5192163401 | 0.0807836599 | ||||
3 | 8.9733639016 | 0.1266360984 | ||||
4 | 10.3358065861 | -0.0358065861 | ||||
5 | 13.060691955 | -0.060691955 | ||||
6 | 6.5948728637 | -0.3948728637 | ||||
7 | 8.4114631096 | 0.2885368904 | ||||
8 | 8.8656106711 | 0.5343893289 | ||||
9 | 10.2280533556 | -0.4280533556 | ||||
Note that the regression equation now has | ||||||
* Better Coefficient of Correlation | ||||||
* Higher F Value | ||||||
* All Variables pass T Test meaning all are significant | ||||||
* Lower Standard Error | ||||||
* Residuals are random | ||||||
In every category tested it is a better model. | ||||||
Also note that Gender passed T Test. Our Null was no discrimination | ||||||
and we rejected this hypothesis. It means that we have proven | ||||||
discrimination from a legal perspective. | ||||||
Page 8 | ||||||
Nonlinear Regression | RESIDUAL OUTPUT | |||||
Batch Size | # Defect | Observation | Predicted # Defect | Residuals | ||
100 | 5 | 1 | -11.1875482377 | 16.1875482377 | ||
125 | 10 | 2 | -2.0092616414 | 12.0092616414 | ||
125 | 6 | 3 | -2.0092616414 | 8.0092616414 | ||
125 | 7 | 4 | -2.0092616414 | 9.0092616414 | ||
150 | 6 | 5 | 7.169024955 | -1.169024955 | ||
150 | 7 | 6 | 7.169024955 | -0.169024955 | ||
175 | 17 | 7 | 16.3473115513 | 0.6526884487 | ||
175 | 15 | 8 | 16.3473115513 | -1.3473115513 | ||
200 | 24 | 9 | 25.5255981477 | -1.5255981477 | ||
200 | 21 | 10 | 25.5255981477 | -4.5255981477 | ||
200 | 22 | 11 | 25.5255981477 | -3.5255981477 | ||
225 | 26 | 12 | 34.703884744 | -8.703884744 | ||
225 | 29 | 13 | 34.703884744 | -5.703884744 | ||
225 | 25 | 14 | 34.703884744 | -9.703884744 | ||
250 | 34 | 15 | 43.8821713404 | -9.8821713404 | ||
250 | 37 | 16 | 43.8821713404 | -6.8821713404 | ||
250 | 41 | 17 | 43.8821713404 | -2.8821713404 | ||
250 | 34 | 18 | 43.8821713404 | -9.8821713404 | ||
275 | 49 | 19 | 53.0604579367 | -4.0604579367 | ||
300 | 53 | 20 | 62.2387445331 | -9.2387445331 | ||
300 | 54 | 21 | 62.2387445331 | -8.2387445331 | ||
325 | 69 | 22 | 71.4170311294 | -2.4170311294 | ||
350 | 82 | 23 | 80.5953177258 | 1.4046822742 | ||
350 | 81 | 24 | 80.5953177258 | 0.4046822742 | ||
350 | 84 | 25 | 80.5953177258 | 3.4046822742 | ||
375 | 92 | 26 | 89.7736043221 | 2.2263956779 | ||
375 | 96 | 27 | 89.7736043221 | 6.2263956779 | ||
375 | 97 | 28 | 89.7736043221 | 7.2263956779 | ||
400 | 109 | 29 | 98.9518909184 | 10.0481090816 | ||
400 | 112 | 30 | 98.9518909184 | 13.0481090816 | ||
SUMMARY OUTPUT | ||||||
Note that residuals have positive signs | ||||||
Regression Statistics | followed by negative signs followed by | |||||
Multiple R | 0.9764238729 | positive signs. This is a clear sign that | ||||
R Square | 0.9534035796 | we are trying to fit a straight line to a set | ||||
Adjusted R Square | 0.9517394217 | data points which may be quadratic. | ||||
Standard Error | 7.5601037854 | We will add a new variable called Batch | ||||
Observations | 30 | squared and try a multiple regression. | ||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 32744.4552611268 | 32744.4552611268 | 572.9045280313 | 3.47597579694458E-20 | |
Residual | 28 | 1600.3447388732 | 57.1551692455 | |||
Total | 29 | 34344.8 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -47.9006946231 | 4.111558341 | -11.6502529335 | 0 | -56.3228495503 | -39.4785396959 |
Batch Size | 0.3671314639 | 0.0153384148 | 23.9354241247 | 3.47597579694456E-20 | 0.3357121101 | 0.3985508176 |
RESIDUAL OUTPUT | Page 9 | |||||
# Defect | Batch Size | Batch Sq. | Observation | Predicted # Defect | Residuals | |
5 | 100 | 10000 | 1 | 4.3827161756 | 0.6172838244 | |
10 | 125 | 15625 | 2 | 6.7213072886 | 3.2786927114 | |
6 | 125 | 15625 | 3 | 6.7213072886 | -0.7213072886 | |
7 | 125 | 15625 | 4 | 6.7213072886 | 0.2786927114 | |
6 | 150 | 22500 | 5 | 10.2468217618 | -4.2468217618 | |
7 | 150 | 22500 | 6 | 10.2468217618 | -3.2468217618 | |
17 | 175 | 30625 | 7 | 14.959259595 | 2.040740405 | |
15 | 175 | 30625 | 8 | 14.959259595 | 0.040740405 | |
24 | 200 | 40000 | 9 | 20.8586207885 | 3.1413792115 | |
21 | 200 | 40000 | 10 | 20.8586207885 | 0.1413792115 | |
22 | 200 | 40000 | 11 | 20.8586207885 | 1.1413792115 | |
26 | 225 | 50625 | 12 | 27.944905342 | -1.944905342 | |
29 | 225 | 50625 | 13 | 27.944905342 | 1.055094658 | |
25 | 225 | 50625 | 14 | 27.944905342 | -2.944905342 | |
34 | 250 | 62500 | 15 | 36.2181132557 | -2.2181132557 | |
37 | 250 | 62500 | 16 | 36.2181132557 | 0.7818867443 | |
41 | 250 | 62500 | 17 | 36.2181132557 | 4.7818867443 | |
34 | 250 | 62500 | 18 | 36.2181132557 | -2.2181132557 | |
49 | 275 | 75625 | 19 | 45.6782445296 | 3.3217554704 | |
53 | 300 | 90000 | 20 | 56.3252991635 | -3.3252991635 | |
54 | 300 | 90000 | 21 | 56.3252991635 | -2.3252991635 | |
69 | 325 | 105625 | 22 | 68.1592771577 | 0.8407228423 | |
82 | 350 | 122500 | 23 | 81.1801785119 | 0.8198214881 | |
81 | 350 | 122500 | 24 | 81.1801785119 | -0.1801785119 | |
84 | 350 | 122500 | 25 | 81.1801785119 | 2.8198214881 | |
92 | 375 | 140625 | 26 | 95.3880032263 | -3.3880032263 | |
96 | 375 | 140625 | 27 | 95.3880032263 | 0.6119967737 | |
97 | 375 | 140625 | 28 | 95.3880032263 | 1.6119967737 | |
109 | 400 | 160000 | 29 | 110.7827513008 | -1.7827513008 | |
112 | 400 | 160000 | 30 | 110.7827513008 | 1.2172486992 | |
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.9976895195 | |||||
R Square | 0.9953843772 | |||||
Adjusted R Square | 0.9950424793 | |||||
Standard Error | 2.4230577581 | |||||
Observations | 30 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 34186.2773597291 | 17093.1386798645 | 2911.3490891121 | 2.93162976922614E-32 | |
Residual | 27 | 158.5226402709 | 5.8712088989 | |||
Total | 29 | 34344.8 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 6.8975853249 | 3.7368943622 | 1.8458068803 | 0.075912814 | -0.7698834287 | 14.5650540785 |
Batch Size | -0.1201025603 | 0.0314780261 | -3.8154412804 | 0.0007196857 | -0.1846900916 | -0.055515029 |
Batch Sq. | 0.0009495387 | 0.0000605927 | 15.670832634 | 0 | 0.0008252127 | 0.0010738646 |
Sheet1
1 |
Quantity
Predicted Quantity