Words a bioinformatician never wants to hear
(This first appeared on biocodershub.net courtesy of Rad, and has since popped up on coderscrowd.com. It enjoyed some moments of viral popularity, with many aggrieved practitioners chipping in on the comments of the article. Following the resurrection of my website, it's a good opportunity to bring this piece home.)
“The data is all in these [proprietary and undocumented format] files.”
“What I want is a program to browse, edit and validate gigabyte-size whole genome sequencing runs. It should import and export all known formats. And it has to run in a browser. And some of our staff refuse to use anything but IE6.”
(After delivering an insignificant or negative result) “Can’t you analyse it again?”
“Why don’t we put the new server rack in your office?”
“That software you wrote is buggy!
[What happened?]
It’s not working!
[How do you know that?]
It’s broken!
[In what way?]
Can’t you just fix it?
[How? I don't know what's wrong ...]
“I don’t understand - [large research institute / multinational commercial company] has software that can do this. Can’t you just write something similar?”
“This is a great / exciting opportunity …”
“This program is great. But could you rewrite it in another programming language?”
“That database and web service you wrote for X? We need one that works just like that. Except for …" [lists dozens of ways in which the new service actually differs entirely from the previous one]
“You want to know what feature or task is the most important? They all are!”
(After being told that the data sample is too small, or incorrectly sampled such that analysis is impossible.)
“You don’t understand – we really need this result.”
“Here’s the data. I haven’t had time to clean it up, so it might be incomplete. And some of the identifiers might not agree. And there are mis-spellings …”
(After delivering the outcome of an analysis) “Pth – that result is obvious.”
“Don’t worry about who’s going to [maintain the new database / monitor the new service / curate the data / come in on the weekends to restart the system]. We’ll work that out later …”
“So X wrote us this pipeline before he left. I’m not sure if he finished it. No, there’s no documentation. Can you get it working? By next week?”
“I think I read a way to do this: it was in a journal, maybe. Or on a webpage. Done by some lab in France. Or was it China? Anyway, it should be simple.”
“Do you really need that much disk space for this NGS data?”
“So your program crashed when I tried to load data. What format? Does that matter? They were Word documents. Really, the program doesn’t read those?”
“So, what you’re saying is that a Word document isn’t a text file. But I used Courier as a font.”
“We need this program. It’s really simple … [30 minutes of essential features follow]“
"I'm sure it can't be that complicated ..."
(While waiting for the result of a Bayesian calculation)
“Why does it take so long to get this answer? Can’t you just make it go faster?”
“I know you said that 30 data points were the minimum for statistical rigour. But we only got 5. Can’t you analyze it anyway?”
“We keep all those records in Excel files … uh, I think this is the most current version …”
“The Z lab showed you could do this [with 10 genes and a computing cluster]. So do you think you could this this with our data [200 whole genomes, on a PC]?”
“Good news – we got a huge grant for sequencing and annotating 6 squillion whole genomes. You’re not on the grant and we didn’t budget for any bioinformatic work but here’s the data. Can you have this done by next week?”
(After being told that an analysis is impossible or ill-considered) “But X over in Y’s lab does it all the time.”
“Uh, so what is it that you do again?”