written by Eric J. Ma on 2021-11-28 | tags: data science job hunt career careers
Having been involved in quite a few rounds of hiring data scientists in a biomedical research context, I'd like to share some perspectives that may help candidates who desire a move into a data science role in biomedical research. I'll start off with the usual disclaimer that these are personal observations and thoughts; they may not apply uniformly to all biomedical data science teams, and may reflect personal biases. With that disclaimer out of the way, here are my observations.
As a candidate, you probably want to focus on just the things that are in your realm of control while not fretting over the things that are out of your control.
Here are a few examples of what I mean by things you cannot control:
A candidate can't know any of these constraints unless you have contacts inside the company that can help you find these out. Because these pointers are inherently dependent on what I call the "local" hiring context, you're therefore also unable to directly control how those factors will affect your chances of being hired.
For one's own sanity and peace of mind, I would strongly advise not fretting about these matters and instead, focus on the 30-40% of things you can control.
Here is a sampling of what I know is within the realm of control of a candidate.
how cutting-edge their choice of methods is is highly likely to be within their realm of control. 2. The professionalism of their searchable digital footprint. 3. The development of their creative imagination to imagine the problem space of the firm they are applying to. 4. Mock practice sessions that a candidate gets before interviewing.
I think these examples adequately highlight the kinds of things within a candidate's realm of agency as they embark on the job hunt. Once in a while, I see junior candidates (fresh grads) complain about the state of data science hiring. While I can empathize with the underlying emotion, I also think it is more productive to direct that energy towards things they can control - including better curating that digital footprint which is being affected by those publicly aired complaints.
When I was involved in hiring, I was laser-focused on technical excellence. I chose this aspect out of the many factors because of my personal interest. I wanted to see what new things I could learn from candidates. I also wanted to continue raising the bar on technical excellence in my team because I know this is fundamental to a data science team's collective ability to deliver on projects. Having worked with individuals who possess the seeds of technical excellence that also match my own mental model of excellence, I had a blueprint for MSc-level and PhD-level from prior experience that I could benchmark candidates against.
Here are the criteria by which I have evaluated technical excellence.
Firstly, I looked for evidence of special sauce modelling skills. "Special sauce" skills refer to something uncommon, rare, and technically challenging. I looked for probabilistic modelling, causal inference, graph algorithms, and graph neural networks. Having taken a class can only be considered weak evidence; strong evidence implied actual projects that they had to dedicate a significant amount of time towards solving, usually for half a year or longer. This may be in a Masters or Ph.D. research thesis, or it may be in an internship.
Candidates may or may not be able to go in-depth into specific business-related project details. However, they should usually be able to go in-depth into technical details. To tease out these details, I would question and probe very deeply. I would also ask counterfactual questions, say, on their choice of modelling strategy. In questioning, I would place equal weight on technical correctness and interpersonal response; a candidate that gets defensive or starts name-dropping terms without justification would usually end up getting low ratings from me.
Special sauce skills, in my opinion, signal intellectual agility and relentlessness to go technically deep. Hence the premium I placed on this skillset.
Secondly, I looked for evidence of solid software development skills.
My preferred way of evaluating this skill is not by programming puzzles; in my opinion, these are artificial tests that are mostly divorced from the day-to-day reality of data science programming. Instead, I looked for evidence of a candidate being able to structure their code in a way that enables others to use it. When it comes to data science projects, I hold a firm conviction that it is primarily by structuring the project like an idiomatically-built software package with notebooks, can we: (1) smoothen out the path from prototype to production, and (2) enable team ownership of the project. Hence, I looked out for evidence of solid software development skillsets.
To assess these skillsets, I would ask a candidate to walk through the source code they were most proud of. This served as an excellent simulator for code review sessions where we bring others up to speed on the status of a project. While they introduced the code to me, I put on the persona that I would have in an actual code review question. I would ask questions about the structure of the code, counterfactual questions about alternative code architecture, and how they would improve the code. It was okay if their only best public work was from a few years ago and less-than-polished; using that code as a base, it is still possible to probe deeply with questions to tease out the candidates' thought processes.
As an example, there was one candidate I interviewed before
(left anonymized to protect privacy)
whose main thesis work I read via a preprint
and whose project code I read via GitHub.
That candidate's work had excellence written throughout their project.
The preprint contained clear and structured writing
that made it easy to understand the premise of the project.
Furthermore, the candidate's codebase was already pip
-installable via PyPI!
In addition, when I spoke with the candidate in a pre-interview setting,
the candidate graciously handled a very pointed technical question
about setting priors on one of their model's hyperparameters to induce regularization.
When hiring, a candidate needs to remember: evidence is everything! If a hiring manager is comparing two candidates, the only way to be fair to both candidates is to use the evidence presented by both candidates to assess their merits. We cannot use intent to learn something as a justification to hire someone. Companies hire people who are equipped to do a job; companies do not hire people who intend to be equipped to do a job. We can help provide the environment for our colleagues to expand their skillsets after they accumulate enough credibility points.
Evidence of software development skills can be assessed by publicly available code, and barring that, through a code walkthrough on code that has been approved for public viewing. Evidence of technical depth can be assessed by looking at work products such as blog posts, papers (and preprints), and code. Evidence of mental agility can be assessed by deep, probing questions.
Because evidence matters so much, for this reason, the digital footprint can be such a powerful tool to leverage if you are in a position to do so. (Unfortunately, I am not sure how to advise those whose work doesn't permit them to have a digital footprint.) By investing time curating your digital footprint, you invest in a public profile that will yield career dividends for many years. Your digital footprint, and more generally, your production of intellectual work products, serve as evidence of your capabilities.
To summarize my points here:
@article{
ericmjl-2021-what-hunt,
author = {Eric J. Ma},
title = {What candidates can and cannot control in their job hunt},
year = {2021},
month = {11},
day = {28},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2021/11/28/what-candidates-can-and-cannot-control-in-their-job-hunt},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!