-
Notifications
You must be signed in to change notification settings - Fork 2
Chapter 4: Choosing a bioinformatician
The most difficult step in the project plan may be step 2, building wet and dry laboratory infrastructure. Building sequencing infrastructure has been extensively covered in APHL’s Next Generation Sequencing Guide, so this section will instead focus on building bioinformatics capability. Material in this chapter is reproduced/modified from course materials provided by Ewan Birney of EMBL-EBI and Contrad Bessant of Queen Mary University for the 2018 EMBL-EBI courses “Bioinformatics for Core Facility Managers” and “Bioinformatics for Principal Investigators.”
For bioinformatics analyses, before any equipment or software is purchased, the first consideration should be whether the laboratory should 1) collaborate/consult with a bioinformatician, or 2) hire bioinformatician(s). A hired bioinformatician is advantageous in that s/he will be solely dedicated to the laboratory’s projects, but the laboratory will need to handle hiring, training, management, and compute infrastructure. In contrast, a collaborator may already have the appropriate compute power and expertise for the specific project. The main assessment should be the collaborators’ track record, motivations, and data security.
Most state PHLs do not have a bioinformatician. Still, bioinformatics analyses are no longer limited to those who know command line. As seen in the NEEPHLD Bioinformatics calls, most projects that involve sequencing and bioinformatics (such as PulseNet, NARMS) have outsourced the bioinformatics to the CDC. For example, the whole genome multi-locus sequence typing (wgMLST) pipeline for PulseNet is currently written by CDC bioinformaticians, and run on the CDC cluster using the BioNumerics 7.6 interface. If the needed bioinformatics tool(s) and/or custom pipeline(s) are already packaged for graphical user interfaces (GUI), a laboratorian can run the analyses and learn to interpret the data. Commercial GUIs include the CLC Genomics Workbench, Geneious, BioNumerics 7.6, and DNANexus, while an open-source GUI includes Galaxy. If the PHL needs help with interpreting the results, they could consult with or hire a bioinformatician. An in-house bioinformatician (or longer term collaboration) is generally not required until the PHL needs to build a custom bioinformatics pipeline. The bioinformatician should have knowledge of the command line, which will give them 1) full control of and the ability to update existing software, as well as 2) capability to write and develop new software.
There are 4 main questions that should be asked before deciding which option to choose (Figure 5). First, the laboratory should consider i) type and ii) amount of bioinformatics work needed per project (from Chapter 3). If there are no pipelines to develop, or if there are only a few projects, it may be more beneficial to collaborate or consult with a bioinformatician rather than hire one. Next, the laboratory should determine whether compute infrastructure is in place, or could quickly be put into place for the bioinformatician to do analyses (see Section “Setting up compute infrastructure”). Lastly, the laboratory should try to identify potential training resources for the bioinformatician to grow, which will help with retention of staff. Currently, there is only 1 bioinformatician in each state PHL, which can make work isolating. A great resource for dealing with this can be found in the blog post, “A guide for the lonely bioinformatician” by Mick Watson.
Figure 5. Flowchart for deciding whether to collaborate with or hire a bioinformatician. A laboratory should consider these four steps in sequential order. Each question can be answered “yes” or “no”. If at any time, the answer leads to the pink arrows, the PHL should consider consulting instead. If all four elements have answers that follow the black arrows, the PHL should consider hiring.
Regardless of whether the laboratory chooses to collaborate or hire, the laboratory will need to consider the type of bioinformatician needed (Figure 6). Bioinformatics is an interdisciplinary field, where bioinformaticians may specialize on specific molecules (DNA, RNA, proteins, metabolites) and/or organisms (viruses, bacteria, fungi, or human). For example, a bioinformatician that understands gene expression profiling may not necessarily know how to predict protein structure or a cell’s metabolic activity, and the tools used for microbial DNA may differ from that of human DNA. Furthermore, bioinformaticians will be more familiar with wet laboratory techniques, others with mathematics or statistics, and others with computer science or informatics. It will be up to the PHL to determine the type of bioinformatician will be the best fit for the laboratory.
Figure 6. Different types of bioinformaticians. Provided by Ewan Birney of EMBL-EBI at ftp://ftp.ebi.ac.uk/pub/training/2018/Bioinformatics_for_Core_Facilities_Managers/Day_2/Ewan_Birney_data_challenges.
How can a PHL determine what type of bioinformatician to hire? Bioinformatics is an interdisciplinary field, where bioinformaticians are often a “jack-of-many-trades”, but “master of none”. A PHL can safely assume that a bioinformatician knows how to analyze large biological datasets (as in Step 3 of Figure 4), but should screen for the bioinformatician’s specialty. In contrast, a PHL cannot assume that the bioinformatician can independently generate project ideas, set up compute infrastructure, or develop production level pipelines (as in Steps 1, 2, or 4 of Figure 4). Some bioinformaticians with wet laboratory training or biological backgrounds will be better at communicating with the laboratory and epidemiology divisions (Step 1, Figure 4), whereas a bioinformatician with a computer science or information technology background will be better at choosing and setting up compute infrastructure (Step 2, Figure 4). Listed below are skills bioinformaticians may need to work in a public health setting, but not necessarily have. If they do not have these skills, the laboratory should hire or identify support staff (columns 3 or 4) that can help achieve the project tasks and objectives (Table 1).
Skills | Example Tasks | Other Professionals |
---|---|---|
Laboratory or clinical knowledge |
|
Bacteriologists, scientist, clinicians, epidemiologists |
Network administration |
|
Network administrator, computer scientist |
System administration |
|
System administrator |
Database management |
|
Data architects, database administrator |
Software development / engineering |
|
Software engineers, software developers |
Table 1. Skills and other professionals a bioinformatician may need. Since bioinformatics is an interdisciplinary field, bioinformaticians may have a range of skills ranging from laboratory experience to database management. When collaborating or hiring, laboratories should assess what skills they may need for their project goals (column 2). If the bioinformatician does not have these skills, they may have to collaborate with other professionals (titles in column 3).
Some laboratories may not have the budget or the need to hire a bioinformatician. That is okay! Bioinformaticians are expensive, and should only be needed if you need to consistently modify existing or write new pipelines (Figure 5). If this is the case, laboratory staff should at the minimum be trained on how to:
- Run pipelines via graphical user interfaces (such as BioNumerics) and basic command line, and understand how they are related
- Solid understanding of the biological concepts underlying the bioinformatic tools
- Basic understanding of how to manipulate the bioinformatic tools for troubleshooting
- Interpret the results for epidemiologists
This knowledge can be obtained at the trainings offered by the Bioinformatics Training Laboratories (Chapter 7), CDC AMD Academy, and APHL.