written by Eric J. Ma on 2024-01-10 | tags: llamabot api chromadb openai mistral anthropic claude mixtral simplebot chatbot querybot llm large language model
In this blog post, I discuss the major changes I've made to LlamaBot, a project I've been working on in my spare time. I've integrated LiteLLM for text-based models, created a new DocumentStore class, and reimagined the SimpleBot interface. I've also experimented with the Mixin pattern to create more complex bots and switched to character-based lengths for more user-friendly calculations. How did these changes improve the functionality and efficiency of LlamaBot? Read on to find out!
Read on... (2451 words, approximately 13 minutes reading time)written by Eric J. Ma on 2023-12-13 | tags: alphafold moderna technology cloudcomputing aws infrastructure code optimization refactoring scaling challenges continuous integration
In this blog post, I share my experience of refactoring the AlphaFold execution script at Moderna, which led to significant cost savings and efficiency. I discuss the challenges faced, including hitting AWS' GPU instance availability limits, and the lessons learned, such as the importance of static analysis tools, CI/CD caching, reading code before writing, and working openly. Curious about the technical details and the lessons I learned from this experience?
Read on... (1333 words, approximately 7 minutes reading time)written by Eric J. Ma on 2023-12-12 | tags: data science programming style function-based programming class-based programming data processing object-oriented data structures neural network data transformation callable objects
In this blog post, I discuss the choice between class- or function-based programming for data scientists. I argue that objects are best for grouping data, while functions are ideal for processing data. However, configurable functions that need to be reused can be implemented both ways. I lean towards a functional programming style, using classes to organize related data. But sometimes, like with callable objects, I adopt a different approach. Curious about when to use each style in your data science projects? Read on!
Read on... (1177 words, approximately 6 minutes reading time)written by Eric J. Ma on 2023-12-11 | tags: data science team management culture feedback coaching code review asynchronous feedback technical feedback team morale continuous improvement
In this blog post, I share my experiences and insights on providing effective feedback in a data science team. I discuss the importance of positivity, specificity, self-reflection, effusiveness, in situ technical feedback, connecting accomplishments to broader impacts, and uplifting when mistakes occur. These strategies foster a supportive environment, promote continuous improvement, and align team members with the broader mission. How can these feedback strategies improve your team's dynamics and performance? I hope my experiences shared here can give you inspiration!
Read on... (1773 words, approximately 9 minutes reading time)written by Eric J. Ma on 2023-12-03 | tags: leadership podcast hesitation coding perfectionism team power environment technical credibility bayesian network deep learning llms growth moderna skiplevels feedback psychological safety google mistakes graduation development obama humility servant
In this blog post, I explore my thoughts on leadership as a data science team lead. I discuss the power of leadership, the importance of technical skill and credibility, encouraging independent growth, fostering psychological safety, embracing graduation, sharing credit, and the challenge of humility. I conclude with my belief in servant leadership. What does leadership mean to you and how do you navigate its challenges?
Read on... (723 words, approximately 4 minutes reading time)written by Eric J. Ma on 2023-11-26 | tags: continuous integration python micromamba llamabot conda yaml mambaforge caching
In this blog post, I experimented with speeding up LlamaBot's CI system by switching from Miniconda to micromamba. The results were impressive, with more consistent timings and a significant reduction in build and test times. The primary advantage was the built-in, turnkey caching of the entire environment. This change made a noticeable difference, especially when testing against bleeding-edge packages. Could micromamba be the solution to your CI delays? Read on to find out!
Read on... (402 words, approximately 3 minutes reading time)written by Eric J. Ma on 2023-11-19 | tags: huggingface zephyr gpt4 benchmarking gitbot llm language models code summarization prompt engineering machine learning
In this blog post, I benchmarked Zephyr, a new language model by HuggingFace, against GPT-4 using GitBot. I found that while Zephyr shows promise, GPT-4 seems to offer a more out-of-the-box solution for accurately interpreting and summarizing code changes. However, different models may require different prompts to perform optimally. Curious about how these language models could change up your coding workflow?
Read on... (2045 words, approximately 11 minutes reading time)written by Eric J. Ma on 2023-11-12 | tags: job hunt hiring process career advice application tips interview preparation talent acquisition job offer negotiation tips career development job seeking
In this blog post, I share insights from my experience as both a job candidate and a hiring manager. I break down the hiring process into five stages: application submission, TA phone screen, hiring manager phone screen, onsite interview, and offer negotiation. Each stage has its own expectations and timelines, which can help you plan your job-seeking journey. Patience is key, and remember, each step is an opportunity to showcase your skills. Curious about what to expect at each stage and how long it might take? Read on!
Read on... (991 words, approximately 5 minutes reading time)written by Eric J. Ma on 2023-11-05 | tags: professional transition public profile skill adaptation networking phd careers masters careers career strategies career development career advice
In this blog post, I share strategies to bridge the gap between academia and industry. I discuss enhancing your public profile, adapting your academic skills for industry, networking effectively, and communicating your value. These strategies can help PhDs, Master's grads, and those considering their academic future navigate their career more confidently. Curious about how to translate your academic achievements into industry value? Read on!
Read on... (957 words, approximately 5 minutes reading time)written by Eric J. Ma on 2023-10-29 | tags: pre-commit hooks code quality debugging python version setuptools dependency management github actions code style checks
In this blog post, I discuss a ModuleNotFoundError
I encountered while using the pre-commit hook, interrogate, in Python 3.12. The issue arose due to a missing package, setuptools
, which is no longer included by default in Python 3.12's virtual environments. I proposed a solution and provided a workaround by using Python<3.12 for pre-commit installation. This experience highlights the importance of tracking dependencies and adapting to language and library updates. Have you ever faced similar issues in your development workflow? Read on to find out more about my debugging journey.