written by Eric J. Ma on 2018-12-25 | tags: data science conda hacks

The conda package manager has, over the years, become an integral part of my workflow. I use it to manage project environments, and have built a bunch of very simple hacks around it that you can adopt too. I'd like to share them with... (read more)

(801 words, approximately 5 minutes reading time)

Gaussian Process Notes

written by Eric J. Ma on 2018-12-16 | tags: data science bayesian

Here are my notes from learning about Gaussian Processes. It's been a long intellectual journey; hope you find my notes useful.

Read on... (283 words, approximately 2 minutes reading time)

Mathematical Intuition

written by Eric J. Ma on 2018-12-09 | tags: deep learning bayesian math data science

Last week, I picked up Jeremy Kun's book, "A Programmer's Introduction to Mathematics". In it, I finally found an explanation for my frustrations when reading math papers:

What programmers would consider... (read more)
(777 words, approximately 4 minutes reading time)

Solving Problems Actionably

written by Eric J. Ma on 2018-11-13 | tags: data science insight data science

There's a quote by John Tukey that has been a recurrent theme at work.

It's better to solve the right problem approximately than to solve the wrong problem exactly.

Continuing on the theme of quoting two... (read more)

(328 words, approximately 2 minutes reading time)

Thoughts on Black

written by Eric J. Ma on 2018-11-12 | tags: python code style

Having used Black for quite a while now, I have a hunch that it will continue to surpass its current popularity amongst projects.

It's one thing to be opinionated about things that matter for a project, but don't matter personally. Like code... (read more)

(181 words, approximately 1 minute reading time)

Bayesian Modelling is Hard Work!

written by Eric J. Ma on 2018-11-07 | tags: bayesian data science statistics

It’s definitely not easy work; anybody trying to tell you that you can "just apply this model and just be done with it" is probably wrong.

Simple Models

Let me clarify: I agree that doing the first half of the statement,... (read more)

(1015 words, approximately 6 minutes reading time)

More Dask: Pre-Scattering Data

written by Eric J. Ma on 2018-10-26 | tags: dask parallel data science optimization gridengine

I learned a new thing about dask yesterday: pre-scattering data properly!

Turns out, you can pre-scatter your data across worker nodes, and have them access that data later when submitting functions to the... (read more)

(519 words, approximately 3 minutes reading time)

Parallel Processing with Dask on GridEngine Clusters

written by Eric J. Ma on 2018-10-11 | tags: parallel dask gridengine data science optimization

I recently just figured out how to get this working... and it's awesome! :D

Motivation

If I'm developing an analysis in the Jupyter notebook, and I have one semi-long-running function (e.g. takes dozens of seconds) that I need to... (read more)

(1999 words, approximately 10 minutes reading time)

Optimizing Block Sparse Matrix Creation with Python

written by Eric J. Ma on 2018-09-04 | tags: graph optimization numba python data science sparse matrix

Introduction

At work, I recently encountered a neat problem. I'd like to share it with you all.

One of my projects involves graphs; specifically, it involves taking individual graphs and turning them into one big graph. If you've... (read more)

(1220 words, approximately 7 minutes reading time)

3D Printed WiFi Access QR Codes: Part 2

written by Eric J. Ma on 2018-09-02 | tags: 3d printing python qr code

Part 2 of how to create 3D-printed QR codes!

Read on... (850 words, approximately 5 minutes reading time)

Eric J Ma's Website

Simple Models

Motivation

Introduction