written by Eric J. Ma on 2018-12-25 | tags: data science conda hacks
The conda
package manager has, over the years, become an integral part of my workflow. I use it to manage project environments, and have built a bunch of very simple hacks around it that you can adopt too. I'd like to share them with...
(read more)
written by Eric J. Ma on 2018-12-16 | tags: data science bayesian
Here are my notes from learning about Gaussian Processes. It's been a long intellectual journey; hope you find my notes useful.
Read on... (283 words, approximately 2 minutes reading time)written by Eric J. Ma on 2018-12-09 | tags: deep learning bayesian math data science
Last week, I picked up Jeremy Kun's book, "A Programmer's Introduction to Mathematics". In it, I finally found an explanation for my frustrations when reading math papers:
What programmers would consider... (read more)
(777 words, approximately 4 minutes reading time)
written by Eric J. Ma on 2018-11-13 | tags: data science insight data science
There's a quote by John Tukey that has been a recurrent theme at work.
It's better to solve the right problem approximately than to solve the wrong problem exactly.
Continuing on the theme of quoting two... (read more)
(328 words, approximately 2 minutes reading time)written by Eric J. Ma on 2018-11-12 | tags: python code style
Having used Black for quite a while now, I have a hunch that it will continue to surpass its current popularity amongst projects.
It's one thing to be opinionated about things that matter for a project, but don't matter personally. Like code... (read more)
(181 words, approximately 1 minute reading time)written by Eric J. Ma on 2018-11-07 | tags: bayesian data science statistics
It’s definitely not easy work; anybody trying to tell you that you can "just apply this model and just be done with it" is probably wrong.
Let me clarify: I agree that doing the first half of the statement,... (read more)
(1015 words, approximately 6 minutes reading time)written by Eric J. Ma on 2018-10-26 | tags: dask parallel data science optimization gridengine
I learned a new thing about dask
yesterday: pre-scattering data properly!
Turns out, you can pre-scatter your data across worker nodes, and have them access that data later when submitting functions to the... (read more)
(519 words, approximately 3 minutes reading time)written by Eric J. Ma on 2018-10-11 | tags: parallel dask gridengine data science optimization
I recently just figured out how to get this working... and it's awesome! :D
If I'm developing an analysis in the Jupyter notebook, and I have one semi-long-running function (e.g. takes dozens of seconds) that I need to... (read more)
(1999 words, approximately 10 minutes reading time)written by Eric J. Ma on 2018-09-04 | tags: graph optimization numba python data science sparse matrix
At work, I recently encountered a neat problem. I'd like to share it with you all.
One of my projects involves graphs; specifically, it involves taking individual graphs and turning them into one big graph. If you've... (read more)
(1220 words, approximately 7 minutes reading time)written by Eric J. Ma on 2018-09-02 | tags: 3d printing python qr code
Part 2 of how to create 3D-printed QR codes!
Read on... (850 words, approximately 5 minutes reading time)