Eric J Ma's Website

Collaborations Between Experimentalists and Statisticians

written by Eric J. Ma on 2016-10-11


I just saw an awesome pair of blog posts by Roger Peng and Elizabeth Matsui, detailing how scientists and statisticians can work together more effectively. Highly recommended read! In the duration of my PhD, I've been one of two resident statisticians for our research group (it's a relative expertise thing - I know enough to be dangerous), and so the article really resonated with me. Here's my summary.

Firstly, both ought to take the time to learn each others' core concepts and methods. For the experimentalist, learn some basics about distributions of a random variable and what 'inference' really means beyond p=0.05 (FYI, p<0.05 ≠ inference). For the statistician, take the time to learn some of the basic (bio)chemistry, the experimental setup, and how difficult it is to collect data.

Secondly, have some patience! Both sides will need some time to learn the ins-and-outs of the data collection, analysis, inference reasoning process and the likes.

Thirdly, involve the statistician earlier on in a collaboration, not towards the end. The statistician can help make sense of the scientific question in terms of the data collected, and thus can help with the experimental design.

Fourthly, and this point really matters for PIs, give the statistician sufficient support, particularly on the grants. 1% effort doesn't cut it. Even 5% is on the low side of reasonable. Something more like 10-20% effort makes more sense.

Finally, a statistician's primary goal should be to advance the science through the proper deployment of statistics. A scientist's primary goal is to help identify the most important problems of a (sub-(sub-))field to solve. Together as collaborators, they can find the right experimental design towards this pair of goals.

I'll add in a pointer of my own here - as a statistician, it really shouldn't be about helping your scientist collaborator find "what's significant", which (ahem) often means p<0.05. My view of inference has, over the past year, turned Bayesian, and I'm now increasingly confident in the notion that Bayesian-style reasoning is not only the most natural way to reason through the data, it's actually the right way to do inference on scientific data. Given priors, some perhaps uninformative, about the state of nature, our goal in science is to compute the probability distribution over some parameters of nature.


Cite this blog post:
@article{
    ericmjl-2016-collaborations-statisticians,
    author = {Eric J. Ma},
    title = {Collaborations Between Experimentalists and Statisticians},
    year = {2016},
    month = {10},
    day = {11},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2016/10/11/collaborations-between-experimentalists-and-statisticians},
}
  

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!