I want to explore the relationship between width and length of shoes.


Link to data:
https://docs.google.com/spreadsheets/d/1-4FVia132mdr2GQmLz5RcaQRLgmY0N1k_Q5GYGHHRxM




Simple linear regression results:Dependent Variable: LengthIndependent Variable: Width Length = 12.325859 + 1.5597885 WidthSample size: 29R (correlation coefficient) = 0.85807111R-sq = 0.73628603Estimate of error standard deviation: 1.2631656Parameter estimates:Parameter Estimate Std. Err. Alternative DF T-Stat P-valueIntercept 12.325859 1.8283346 ≠ 0 27 6.741577 <0.0001Slope 1.5597885 0.17964996 ≠ 0 27 8.6823761 <0.0001Analysis of variance table for regression model:Source DF SS MS F-stat P-valueModel 1 120.28121 120.28121 75.383655 <0.0001Error 27 43.08086 1.5955874 Total 28 163.36207



R and R^2 are moderately strong, showing that the data is linear
external image oYYSBObxaENSxUuHd48YYI3UosfSvWPe1S9kKcHoPlkQZuM9Es_5CFeUgc1zYVtyPlXjX5M8c0Sc39v1n7uLCiBa4UELSJPIot-iN5n75SQK6qsNT7w9KFI4sr7_owpi5oI_Crut
external image O4ogEjRpjxyfSCsX57EhrWEnzB5ChPAx5KWJjGE2yZ32aGnZgsfsTw5sYsGOPDuQQTJ5rpRabJ9eBU0OsRPLSCEiDiGTKMuM2zgW2WzOMgi6G4yZUMIwXSJHlqjXNYVzvK8CSZq1There is a strong positive association in both the fitted line and the scatter plot, however the residual scatter plot has no patterns that are glaringly obvious.


external image kTwWMB3huQnsO-clQkH_T5H6538JxwLGGxljJgz6P2bYPp6Euj7IuQIja-hZPlnjr1aCnxew0aapNa8amRc4YbD86BeDnWJ0VtPEPbnP7UpEvtk5qOuIOGiusi6rtXbY2TuA20xb


I am going to try to straighten the data and see if I can strengthen the residual


Squaring the data resulted in this:


external image Rr2KTSIMsuojSOJ0ErbTs0UGSwPMdeUtviM73tkdIkER2pGLuoNfQXBiJkw2M0sqx0gyfaHhXnlOpbLV8bqLgqqQNxc2uGPQSGbiG86uB1ziGCHODJiSfYW2Awxswk647ntSANVY


external image PhSkltagvqpOCgW9fd73Qt3b51yT5yKavz0UZHpBuehhLBjEb0Twex8e7hTDmyyg_KXD8Wz46Dysua4xa9vNezVgTwTRlTPYDv2v86uFEeRALiTRARl1o9EsXwO5GX6ntZI0HMFz


It’s identical to the original.


Finding the log of the lengths gave me these results:
external image FZPOxuM8nF4tqJjN39Sg6_lFb29JMhmGUFjpQ8mGUwde0hSfiwKzfc7gEDcAK33eQ-l1mBycrJ-9c3WL1BHP6kX3XggR_BdxudeKAc5U03coe81gU_5ELaBNBGgI054dbddDaY7Kexternal image AtzDyEbM2NdREvBqTC6sqRXDqzQg_zCDG6u3_dAsqBUmKGkSci5E1u8rcQ-zKXDE0oclBF1xfPPNFQyJep3wXq43xhSV6Nsa8cb_LTxNSuGyjTxG6A7czZRMV3M-3KNSpqjmizLv


The log of the lengths gave the smallest residual and therefore is the best choice for straightening the data.



Simple linear regression results:
Dependent Variable: log of length
Independent Variable: Width
log of length = 1.2091506 + 0.023538532 Width
Sample size: 29
R (correlation coefficient) = 0.85890519
R-sq = 0.73771813
Estimate of error standard deviation: 0.018991951


Parameter estimates:
Parameter
Estimate
Std. Err.
Alternative
DF
T-Stat
P-value
Intercept
1.2091506
0.027489381
≠ 0
27
43.986098
<0.0001
Slope
0.023538532
0.0027010735
≠ 0
27
8.7145099
<0.0001


Analysis of variance table for regression model:
Source
DF
SS
MS
F-stat
P-value
Model
1
0.027392086
0.027392086
75.942684
<0.0001
Error
27
0.0097387435
0.0003606942


Total
28
0.037130829






Bringing this back to my mini proposal, I’m not sure the data is strong enough to prove a very strong correlation between the width of a shoe and it’s length. There are other factors that could impact the width of the shoe. Like style, brand, ect. I also think where on the shoe you measure would make a difference. I measured like widest part of the shoe, where the ball of your foot goes, but maybe measuring the middle of the shoe would have brought more diversity.



Here is my picture of all the shoes measured,
external image 9BTvZJZzGaju8H5iUdlgmqTD3muqU3R3nP2JOqkrWp4YZ51YysZPi495Hl_HDXY9wZdlK7tnkHQmMOzXVZPlPoLO5yNzCh39n0ktJoMElJkTab_YjEg6OMeLgcmPZQpIbXLK3V3T