Words a data scientist never wants to hear

It never stops.

Some years ago I wrote a popular piece Words a bioinformatician doesn't want to hear. Sadly, a lot of practitioners found it all too real. This time, I'm concentrating on the data end of things.


"We really need a database / webapp, so I can query / analyse / visualise this data!"

[database / webapp is completed]

"Great! Now, can you use it to query / analyse / visualise this data for me ..."


“The software doesn't automatically correct all the malformed and garbage data that we give it? Doesn't it 'know' that it's wrong?”


"Oh my god, this data is completely incorrect! The figures are corrupted, the rows are in the wrong order, and the values are wrong! This is a disaster!"

[Email returned, politely pointing out they are looking at the wrong dataset, have misunderstood what it says, are looking for information that was not in the original data, or have loaded it into Excel which has helpfully 'corrected it'. ]

[No reply is received.]

[Repeat three weeks later.]


[Looking at a reporting interface for a complex dataset]

"There's a lot of buttons here to click and things to select. Can't you just include a button that will select and analyse everything I'm interested in?"


"That database is a real problem. It doesn't work at all. You should fix it."

Uh, yes. But that's not ours. It's run by an external organisation.

"I don't understand. Why won't you fix it?"


[Job ad]

"We're looking for someone who is deeply passionate about time series analysis of train movements / analysing advertising revenue for off-shore sneaker vendors / functional javascript handling of streaming video ..."


"Let's dockerize the instance and use Spark to visualise it with D3 on tablets."

Yes. Yes, let's do that.


"You've messed up. This value used to be 3.7 and your spreadsheet is showing it as 3.69999999999."

[You explain floating point precision]

"Uh ... no, you don't understand. You've messed up. This value used to be ..."


"I was thinking about the project the other day and had the idea [... BLAH BLAH BLAH ...] but on the other hand perhaps it would introduce problems similar to those encountered by [... BLAH BLAH BLAH ...] so perhaps it would be useful to have a feature like this. I'm not really sure of how it would work but [... BLAH BLAH BLAH ...] did some creditable work that is maybe of relevance to our current situation. Perhaps you can email her and [... BLAH BLAH BLAH ...] off to Switzerland next week, which is a terrible annoyance. Anyway, while I'm there I may talk to [... BLAH BLAH BLAH ...] will be joining us and Hal is very eager to get her involved [...]"

"Anyway, what do you think?"

Did you ask a question in there?