I was thinking because it would illegal for me to take signs away from there proper positions that I could collect data on the mass of a bottle and the max volume of water it could contain. I know this is not extra ordinary. I would need a scale and a graduated cylinder that I could perhaps borrow from Dr. Carson. I'll send her an email in advance.
I believe there should be a moderate positive association between mass and max volume of water. Bottles are different heights and made of different material. I should just need Monday to be able to collect all the data. I apologize for not getting this done sooner.


Data of mass of containers in grams and their max capacity of water in milliliters.

Mass of container in grams (g)
Max volume of water in milliliters (ml)
28.5
350
172.8
740
164.6
800
256
630
191.8
580
127.6
740
111.4
840
130.8
700
270
1000
148
800
245
650
9.7
520
181.7
1120
129.9
900
268
980
132.2
860
248.7
430
193.3
480
189.1
800
241.9
1000
61.9
810
120.5
480
290
520
151.9
1140
170.5
800
119.7
600
179.6
1150
89.8
510
72.5
570
197.2
800

This is my scatterplot of my collected data.chart (2).png

The best fit line is below this sentence.

Mass vs Volume Fitted Line.PNG

Mass vs Volume.PNG

The correlation coefficient (R) was 0.26035746

The coefficient of determination (R^2) was 6.779%

The Least Square Regression equation is

→ Max volume of water in milliliters (mL) = 614.37686 + 0.79040046(Mass of container in grams (g))


chart (3).png

This residual plot compares the residuals to the mass of containers.






It seems as if my prediction was not in my favor. The positive linear association between mass of containers and their max capacity of water was weak. There was a variety of containers measured. Their mass had been measured in grams (g) and their max capacity of water had been measured in milliliters (mL). It does not matter what units they were measured by because it would not change the correlation coefficient. The y-intercept would change and as would the slope.
In reality, there is not a container without mass which would hold 614 milliliters of water. The Least Square Regression equation is able to account for 6.779% of variation in capacity of containers. It does seem fitting considering the correlation coefficient (R) was .26035746 and also by viewing the scatterplot it could be inferred the equation could not accurately calculate predictions with small residuals. The residuals are immense due to the equation under or over estimating. Though between a range of 120 grams to 200 grams the equation could possibly be able to predict the capacity of a container though it would depend on the containers.
There are two data points which strongly stand out among the others. Two bottles had measured masses under 50 grams. The one with the lowest mass of all measured bottles had a larger capacity for water. It had a smaller residual compared to the other container. The other had the lowest capacity of all measured containers. It did not have the greatest residual. These two containers had something in common.

These residuals are immense so there needs to a transformation to make them very petite.


Mass vs Volume Reciprocal.png

Reciprocal 1.PNG

Reciprocal Fitted Line.PNG
Mass vs Volume Reciprocal Residuals.png

Using the the reciprocal on the values of volume of water the scatterplot shown at the top was created. The data has been straightened in the slightest amount. It appears to have made the slope decreasing for every additional gram. It has a slightly stronger correlation coefficient.
R being -0.27916906
R^2 being 7.794%
The Least Square Regression equation is → Max volume of water in milliliters (mL) = 0.0017756328 - 0.0000018594894(Mass of container in grams (g))
The linear model is able to account for 7.794% of the variability in max volume of water. It seems the best transformation for residuals because other transformations had too large of residuals. It would be almost impossible to straighten the data further because of the variability in data. Looking at the residual plot, there is not a lot of randomness. The least square regression equation tended to over estimate a majority of the observed values.



It was previously mentioned that there were two containers with something in common. It was also mentioned that the residual would depend on the containers. This is because there was a variety of containers used as shown below in the photos taken during collection of data. The two containers with the least mass were plastic containers. The containers were made of different materials. The metal containers tend to have more mass and not hold as much water. There were very round containers which held a lot of water. Material could be considered a lurking variable. Looking at data is not enough sometimes. A plastic gallon could weigh about 35 grams yet hold 3785 milliliters.





Photos Taken During Data Collection


Graduated Liter.jpg
I used a graduated liter because it would have taken way too long to measure with a graduated cylinder.
Scale and Containers.jpg
The masses are being measured on a scale.
There are a variety of containers.