Since we have two variables that may bear some relation to one another, a scatter plot is probably the best option for visualizing this data.
A scatter plot would show us whether there is any relationship between the two variables, and give us a place to start in devising models to analyze and predict that relationship.
For example, if all the points lie close to a straight line, we have a linear correlation, and we could run linear regressions to understand the relationship between the two variables. But if they follow a different shaped curve such as a logarithmic curve or a parabola, this won't be picked up in a linear regression even though it could be a real effect.
The hardest part in this case would be to determine which variable to put on the X axis, normally the independent variable, and which to put on the Y axis, normally the dependent variable. There are plausible theories that would make causation run both ways: Perhaps fish survive better in highly-oxygenated water, so more oxygen causes more fish; on the other hand, perhaps large fish populations deplete the oxygen in the water, so more fish causes less oxygen. Both effects could exist, and drive the system toward an equilibrium---in which case we might observe no correlation despite the two variables being closely related. (This happens in economics all the time; price and quantity are almost never directly correlated, even though price and quantity drive each other directly by supply and demand.)
No comments:
Post a Comment