written by Eric J. Ma on 2017-01-17
I finally did it - I built a Bokeh server app for myself!
All of last week, I brought my UROP student Vivian and a fellow church friend Lin to the "Data Science with Python" skill-building workshop series, hosted by the Harvard IACS.
As I already knew most of what was going to be taught, I decided that it'd be a fun thing to try playing with the dataset that they used for the contest. (Having a contest was also extra motivation to try building something I've never done before.)
The goal of the contest was "to build a visualization of the UCI forest dataset that would help an analyst make decisions". The UCI forest dataset basically asks us to predict forest cover type from cartographic variables. As I had always wanted to use Bokeh to build a data dashboard, I thought I should give it a shot with the UCI dataset.
What I eventually built were two things. The first one was a linked scatter-plot dashboard to visualize how the quantitative variables varied with one another and how they grouped by forest cover type. Here, the user can select the two variables that they want to visualize together, using drop-down menus.
The second one was a bar chart showing the mutual information between the categorical variable values and the forest cover type. Here, the user can select from over 40+ variables, and their mutual information score will be displayed as one of the bars in the bar chart.
As usual, I copied heavily from the examples and then modified it for my own use. Breaking the copied app helped me figure out where my cognitive blind spots were.
Here's what I've learned.
Firstly, I finally have wrapped my head around callback functions and their use in GUI applications. The idea behind callbacks is that when I click on a GUI element (e.g. button) or change the GUI element's state (e.g. selecting a checkbox, typing into a textbox), that function is called that "does something". What's that something? Well, it's any action I decide that needs to be taken.
Secondly, I finally figured out how non-intimidating Bokeh server really is. It's like... Flask, except for data visualizations. And it takes care of most of the HTML hard lifting for the data visualization portion.
Thirdly, I finally got 'linked plots' working, in which the axes scales or data selections are shared across plots. I noticed it works when using the mid-level "plotting" interface, but doesn't work when using the high-level "charts" interface. Definitely something to keep in mind.
Fourthly, I saw how Bokeh server allows us to write custom Python code for callbacks. At least for me, it's much more user-friendly this way, as I find JS idioms to be a bit wonky. Maybe I'm just not used to them. Either way, being able to use custom Python means I'm able to do things like filtering Pandas DataFrames before pushing the data onto the plot.
Finally, I got to use the VBar contribution I made to the Bokeh library last year! Though I eventually moved my bar plots to the "charts"-level API, my initial implementation was done using the "plotting"-level API. It brought me a modicum of satisfaction to use the vbar
glyph, something that I had contributed (of course, acknowledging the amount of hand-holding provided by Sarah Bird and Bryan Van De Ven, who are the lead developers of Bokeh).
My code can be found online here.
@article{
ericmjl-2017-building-apps,
author = {Eric J. Ma},
title = {Building Bokeh Apps!},
year = {2017},
month = {01},
day = {17},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2017/1/17/building-bokeh-apps},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!