written by Eric J. Ma on 2023-08-30 | tags: code review unit tests research chemistry ml chemistry data splitting property prediction software testing code correctness
In this blog post, I discuss the importance of unit tests in research code. I share an experience from a code review with our intern, Matthieu, where we realized the need for rigorous testing of a non-standard splitting strategy in our ML model. We concluded that even research code, which might be discarded eventually, can benefit from thorough testing to ensure its correctness. This is particularly crucial when the code is used for comparisons or collaborations.
Read on... (473 words, approximately 3 minutes reading time)written by Eric J. Ma on 2023-08-28 | tags: automation biotech collaboration data insights data product data science model building predictive models product oriented protein engineering service oriented software engineering team collaboration tool building
In this blog post, I explore the two flavours of data science work: service-oriented and product-oriented. Service-oriented data science serves others in a one-off fashion, while product-oriented data science builds a reusable tool for a well-defined problem. Both have their value depending on the situation. I discuss the challenges in navigating between the two and emphasize the importance of adopting a product-first orientation. As an individual contributor or team lead, it's crucial to shift from being mere consumers of tooling to makers of tools, enhancing efficiency and scalability!
Read on... (935 words, approximately 5 minutes reading time)written by Eric J. Ma on 2023-08-27 | tags: til github actions cicd continuous integration continuous delivery dokku digitalocean deployment coding devops cost efficiency dokku server git dokku deployment
In my latest blog post, I share my experience of hosting a Dokku server on DigitalOcean and how I've managed to automate the deployment process using GitHub Actions. I delve into the cost benefits of using Dokku on DigitalOcean over other services like Heroku and Fly.io. I also provide a step-by-step guide on how to configure GitHub Actions to deploy apps to DigitalOcean automatically. If you're interested in saving time and money on app deployment, this post is a must-read.
Read on... (379 words, approximately 2 minutes reading time)written by Eric J. Ma on 2023-08-26 | tags: til github github actions github workflow git configuration workflow runner github permissions repo settings workflow permissions github token
Today, I learned how about a hidden setting that's needed to enable GitHub Actions to push code to its associated repo, and wrote it out as a tutorial. As a bonus, I also share how to correctly configure git
within a GitHub actions workflow. Discover the trick with me!
written by Eric J. Ma on 2023-08-22 | tags: automation command execution cron cron jobs init.d linux linux commands linux startup linux tutorial rc.local startup scripts systemd til
Today I learned how to execute arbitrary commands on startup on a Linux machine. It's pretty simple. Curious to hear more?
Read on... (113 words, approximately 1 minute reading time)written by Eric J. Ma on 2023-08-07 | tags: machine learning phylogenetics protein engineering protein sequences ancestral sequences bioinformatics computational biology deep learning differentiable computing gradient-based optimization phylogenetic trees sequence representation tree adjacency vae protein design jax python
I've just explored a fascinating paper on differentiable search of evolutionary trees. It's a creative blend of math and biology, using mathematical structures to solve a biological problem. The authors have developed a way to infer both phylogenetic trees and ancestral protein sequences in a continuous, differentiable manner. This opens up exciting new avenues for protein engineering and design. Plus, the paper's figures are top-notch! 🧬🌳📊
Read on... (4001 words, approximately 21 minutes reading time)written by Eric J. Ma on 2023-08-04 | tags: wayne gretzky tech adoption data science emerging tech competitive advantage foresight execution capability risk management teamwork profit innovation strategic thinking technology ladder individual contributor team strategy thought framework
In this post, I discuss the 'Wayne Gretzky move' in tech adoption, especially in data science. It's about foreseeing the future of tech and making bold moves to stay ahead of the game. It's risky but with the right team, it can lead to significant gains. 🏒💻🚀
Read on... (269 words, approximately 2 minutes reading time)written by Eric J. Ma on 2023-07-24 | tags: data science technology ladder technological trinity decision making protein engineering methodologies innovation team building technical strategy bayesian optimization graph neural networks variational autoencoder machine learning library design lead optimization
In this post, I discuss how to decide which technological models to adopt in our work. I introduce two frameworks: the 'technological trinity' and the 'technology ladder'. The former helps us evaluate if a new technology is worth investing in, while the latter places it within a broader context. I illustrate these concepts using protein engineering as an example.
Read on... (1474 words, approximately 8 minutes reading time)written by Eric J. Ma on 2023-07-21 | tags: llamabot openai gpt4 chatbot software development ai innovation langchain llama_index code ghostwriting zoterobot
Just built LlamaBot, a chatbot using OpenAI GPT-4, and it's been a wild ride! 🎢 From dealing with rapid innovation and versioning issues to discovering the power of code ghostwriting, it's been a learning curve. 🧠💡 Also, I explored some cool LLM frameworks. Dive in to learn more! 🏊♂️
Read on... (2403 words, approximately 13 minutes reading time)written by Eric J. Ma on 2023-07-12 | tags: til python python310 pythonversioning pyproject.toml pythontips llamabot
In today's blog post, we dive into setting the minimum and maximum Python versions in pyproject.toml
🐍.
We explore how this impacts llamabot
and discuss the new syntax for indicating the union of types in Python 3.10.
This new syntax is visually easier to understand, making our coding journey a bit smoother! 🚀👩💻👨💻