Eric J Ma's Website

Principled Git-based Workflow in Collaborative Data Science Projects

written by Eric J. Ma on 2019-11-09 | tags: data science git workflow


Having worked with GitFlow on a data science project and coming to a few epiphanies with it, I decided to share some of my thoughts in an essay.

One of my thoughts here is that most data scientists aren't resistant to using GitFlow (and more generally, just being more intentional about what gets worked on) because it's a bad idea, but because there's a lack of incentives to do so. In there, I try to address this concern.

And because GitFlow does require knowledge of Git, it can trigger an, "Oh no, one more thing to learn!" response. These things do take time to learn, yes, but I see it also as an investment of time with a future payoff.

Apart from that, I hope you enjoy the essay; writing it was also a great opportunity for me to pick up more advanced features of pymdownx, a package that extends Markdown syntax with other really cool features.


Cite this blog post:
@article{
    ericmjl-2019-principled-projects,
    author = {Eric J. Ma},
    title = {Principled Git-based Workflow in Collaborative Data Science Projects},
    year = {2019},
    month = {11},
    day = {09},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2019/11/9/principled-git-based-workflow-in-collaborative-data-science-projects},
}
  

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!