Posts categorized under: science

2020-04-05

by Paul Agapow

in science

tagged data-science, rwe, healthcare-data

Pointers for analytics of healthcare data

A start

In recent time I've been asked for pointers on analysing complex healthcare data. This is a difficult issue. Healthcare analytics / health informatics / medical informatics / etc. range over a wide area, driven a wide variety of interests and outcomes, overlapping hugely in some areas and not at all in others. The …

2020-03-29

by Paul Agapow

in science

tagged data-science, data, analysis

The rules of analysis

Opinions, I got 'em'

While mentoring some juniors, I started to think about the rules of thumb for analysing data that I've built up over the years. While I'm certainly not the world's greatest data scientist (or it's greatest bioinformatician, statistician, biomedical scientist, etc.), it seems worthwhile trying to capture them here. these are …

2020-03-26

by Paul Agapow

in science

tagged data-science, bioinformatics, infectious-disease

Fighting COVID19 with data science and bioinformatics

Last updated 2020/4/4. Originally published on LinkedIn

I won't be updating this list further as the intention was to point people to the right places to help in a crisis.

Should you have an potentially useful information set or data source, the best course of action is you …

2018-11-19

by Paul Agapow

in science

tagged science, academia, career

Have you looked on Evoldir?

Reflections on the academic job search

For those who don’t know, EvolDir is a worldwide mailing list on evolutionary biology that has been running since approximately forever. Everyone who works even vaguely in the area of evolution subscribes to it. Every day It typically carries several posts on conferences, book announcements, funding opportunities and job …

2018-02-20

by Paul Agapow

in science

tagged biology

Tentacles vs arms

Solving (partly) an old conundrum

You might have come across confusing statements like "octopuses have arms, squids have tentacles" and wondered what's the difference. I did and here's the (unsatisfactory) answer.

Cephalopods (squid, octopus, nautilus and a number of other aquatic creatures, which are a class of mollusc) have a number of muscular "limbs". Traditionally …

2017-06-11

by Paul Agapow

in science

tagged bioinformatics, academia

Why bioinformaticians don't get no respect

It's a common cry amongst working bioinformaticians that they're unappreciated, undervalued and generally "get no respect". While people love to complain, from discussions with peers and colleagues, the same stories come up again and again:

Not being consulted when projects are planned
Technical advice and results not taken seriously
Being …

2017-05-01

by Paul Agapow

in science

tagged programming, software, research, amazon, aws

Using AWS for research computing

Your own private computing cluster?

This is based upon my reply to a question on reddit concerning experiences with using Amazon Web Services (including ELastic Computing, Glacier, etc.) for research.

I was part of a pan-European research consortium that used it for our shared computing infrastructure: databases, a few web apps and web sites, mailers …

2017-02-01

by Paul Agapow

in science

tagged career, job-search

Academic job ad red flags

Words you don't want to read in a job description

"competitive salary":: Like several other terms on this list, this phrase is can be used in a completely sincere manner: We pay decently. If we offer you the job, then we'll negotiate. Unfortunately, it is just as frequently used as a way of avoiding the subject of remuneration, in the …

2016-09-10

by Paul Agapow

in science

tagged data-science, redcap, databases

Some more things I done learned about REDCap

A few more surprises

Refer back to the previous article for the background and some introduction to REDCap.

Note

These notes are based upon REDCap version 6.4.4. Different versions may have fixed or adjusted some of this behaviour.

Adding users

To add a user, you need an email address to send a …

2016-09-01

by Paul Agapow

in science

tagged academia, job-search, interviews

Random observations on the academic-scientific job search

Written without any bitterness at all

Maintain employable, valuable skills. Never evince any technical skills. Never casually offer to help a colleague with computer problems, etc. First you'll end up being known for that. Second, people will keep asking you to encrypt their hard-drive, build a database, etc. Third, technical matters are low status "working class …

2016-08-15

by Paul Agapow

in science

tagged data-science, databases, career, humour

Words a data scientist never wants to hear

It never stops.

Some years ago I wrote a popular piece Words a bioinformatician doesn't want to hear. Sadly, a lot of practitioners found it all too real. This time, I'm concentrating on the data end of things.

"We really need a database / webapp, so I can query / analyse / visualise this data!"

[database …

2016-07-01

by Paul Agapow

in science

tagged bayesian, statistics, likelihood

Bayesian stats in very plain language

My pass at explaining the often misunderstood.

Introduction

Some years ago, I got into an argument with someone abut the relative merits of Bayesian versus Maximum Likelihood in phylogenetics. They asserted the two were basically the same or would come to the same answers. I countered that while they would often agree, they were measuring different things …

2015-07-15

by Paul Agapow

in science

tagged data-science, redcap, databases

What I done learned about REDCap

A few surprises

For those not in the know, REDCap is a platform for creating and editing databases through the web. And by and large, it works fine. It saves a lot of development effort. It provides good reporting tools for users. It's secure and robust. But there are some things to be …

2015-05-01

by Paul Agapow

in science

tagged r, r-studio, markdown, markup, knitr

Markdown in R Studio

What Markdown markup is allowed.

If you're doing reproducbility in R Studio, you're probably using knitr. And if you're using knitr, you're probably using Markdown. Unfortunately, due to the lack of a standard for Markdown (and the subsequent proliferation of various flavours and extensions), it's sometimes not clear what syntax is available to you. Consequently …

2015-03-24

by Paul Agapow

in science

tagged database, data-science, csv, cssql, csvkit

(Re-)building databases with csvsql

Tips and traps when going from a database dump to a database

The scenario

You have a bunch of related CSV files.

Maybe they're the result of a raw database dump. Maybe they've been generated in some other way: experimental results, various public data sets, whatever. But the important thing is that you need to make a database from them. Perhaps because …

2015-02-25

by Paul Agapow

in science

tagged bioinformatics, career, humour

Words a bioinformatician never wants to hear

Based on hard-won experience.

(This first appeared on biocodershub.net courtesy of Rad, and has since popped up on coderscrowd.com. It enjoyed some moments of viral popularity, with many aggrieved practitioners chipping in on the comments of the article. Following the resurrection of my website, it's a good opportunity to bring this piece …

2015-02-20

by Paul Agapow

in science

tagged Ruby-on-Rails, Rails, data science, CSV, JSON, Excel, SQLite, XML, REDCap

Tools for data

How to store data, what to use

Prompted by a recent tweet asking what people used for storing and managing their data, I wrote down my own hard-won lessons on the topic. In rough order of preference and data complexity:

A hierarchical strategy

Use restructured text for documentation

Or markdown / asciidoc. The advantages of this being:

It's …

2013-11-11

by Paul Agapow

in science

tagged academia, humour, publications

Philosophical considerations in manuscript preparation

Thought experiments and musings.

Xeno's paradox of manuscript completeness

No matter how many drafts you go through, the number of helpful suggestions make by your co-authors will approach but never quite reach zero.

Plato's allegory of collaborators and the cave wall

Distinguished or influential co-authors have a tendency to invite, introduce or insist upon …

2013-07-11

by Paul Agapow

in science

tagged BioPython, biosequences, computational biology, Python

Hitchhikers guide to Biopython: Sequences & alphabets

(Originally published on BiocodersHub.)

If you’re doing bioinformatics in Python, you’re probably using Biopython. Actually, Biopython is a good reason for using Python. But it can be formidable to newcomers: there’s a lot there and there’s not a huge amount of learning material. This then is …

2012-09-01

by Paul Agapow

in science

tagged knitr, r, reproducibility, restructured-text

Writing knitr in restructured text

Swapping out Markdown for a different markup.

knitr is a useful R package/tool for documenting analysis. Basically, it allows the embedding of R code "chunks" within a simple text document. This document can then be "knitted", which means that the R code is interpreted and reinserted in the document along with the results of that code …

2012-06-14

by Paul Agapow

in science

tagged computational biology, galaxy

Common tasks in Galaxy

It's all there in the documentation, but sometimes it's hard to find. This document gives you another place to look.

So how do I ...

... create admin users?

Curiously, the identity of admin users is hardcoded into the Galaxy configuration file. (Which makes it secure, I guess, but separate from the …

2012-06-14

by Paul Agapow

in science

Compiling Quickjoin and file formats

Problems with building qjoin and getting it to read stockholm files.

Quickjoin / qjoin is an excellent commandline program for rapid construction of neighbour-joining trees. However, while using it recently, I had a few problems getting it to read Stockholm files, the most accessible of the formats it can use.

The …

2012-06-14

by Paul Agapow

in science

tagged computational biology, galaxy

Galaxy toolsheds

Galaxy toolsheds

Relatively painless tool-sharing

This is a more recent innovation in Galaxy, which can make it a somewhat confused one: the concept of the toolshed has changed over its lifetime, the documentation is incomplete, and there's a slightly strange emphasis in the documentation that exists. So …

Mile-high description

Toolsheds …

2012-06-01

by Paul Agapow

in science

tagged computational-biology, programming, programming-langauges, python, ruby

Language Wars

"What language would you recommend to introduce programming to an audience of life science students at a bachelor level?"

(Originally published on BiocodersHub)

Following several lengthy and passionate discussions in different venues on what language to use for teaching bioinformatics, I've started cutting and pasting my reply. And here it is.

You'll get a lot of different opinions on this because:

It's a religious issue. That is, it comes …

2012-02-01

by Paul Agapow

in science

tagged BioPython, biosequences, computational biology, programming, Python

Hitchhikers guide to BioPython: SeqRecords

For the novice, more-than-raw sequences.

(Previously published on BiocodersHub.)

Previously I'd spoken about how Biopython represents sequence data with the Seq class. But there is also the SeqRecord class:

A Seq is just raw sequence data and information about what type of sequence it is.
A SeqRecord is a Seq and all the other information …

2012-01-01

by Paul Agapow

in science

Cleaning biosequences

A simple script to check and purge sequence files of possible problems.

Some times you need sequences that are unambiguous (i.e. only 'ACGT', lacking gaps) whether it's because of the limitations or assumptions of tools (like omegaMap) or just because you want to know where SNPs or sequencing ambiguities …

2012-01-01

by Paul Agapow

in science

tagged Bioruby, phylogenetics, visualisation, Dendroscope

Coloring dendroscope files

How to programatically label phylogenies.

The need had arisen for the tips of a large phylogeny to be labelled in a systematic way. Rather than "point and click" within Dendroscope, this script takes a .den/dendro file and colors the tips according to a "color description" file. This is a simple csv file with taxa …

2012-01-01

by Paul Agapow

in science

tagged bioinformatics, BioRuby, sequence analysis

Consensus in BioRuby

Explaining the ill-explained ways to obtain a consensus sequence in BioRuby.

In BioRuby, alignments are equipped with several methods for obtaining consensus sequences. Unfortunately, these have terse descriptions which point you at the BioPerl documentation, with the added bonus of not quite working like the BioPerl equivalents.

First, let's create a very simple alignment, where everything agrees except the last sequence …

2012-01-01

by Paul Agapow

in science

tagged computational biology, galaxy

Galaxy miscellanea

Odds and ends and the surprising.

Redirects

If you are serving the installation with a proxy redirect (e.g. the galaxy server is running on port 7070 but is being redirect by Apache to appear at port 80 on /galaxy), while you can access Galaxy at both addresses, login will …

2012-01-01

by Paul Agapow

in science

More about MrBayes

Some (more) notes about the venerable Bayesian reconstruction program.

Error when setting parameter "Gap" (2)

When attempting to execute a Nexus file, MrBayes kept spitting back this cryptic error upon loading:

Executing file "c_vp1_nuc_seqs.nxs" [...]
Reading data block Allocated matrix [...]
Data is Dna Gap character matches matching or missing characters …

2012-01-01

by Paul Agapow

in science

tagged computational biology, galaxy

More things I done learned about Galaxy tool development

More things I done learned about Galaxy tool development

A pot-pouri of titbits that are probably documented somewhere, but weren't obvious to me.

2012-01-01

by Paul Agapow

in science

Ross Crozier 1943-2009

The sudden death of Ross Crozier on the 12th of November was heralded largely by a slow ripple of email, phone calls and Facebook messages across the globe. I found out from an email that started with a short but singularly complete sentence:

Terrible news.

It is sobering to think …

2012-01-01

by Paul Agapow

in science

tagged science, bioinformatics, ngs

What works - NGS assemblers

A quick paper review on picking the best assembler

You could spend all day just keeping up with developments in next-generation sequencing. Companies announce new and revolutionary technologies seemingly every month, promising to do more, better and for less. Yet at the same time, it’s difficult to hack your way through the marketing tallk and get hard figures …

2011-01-01

by Paul Agapow

in science

tagged ruby, bioinformatics, sequence-analysis

Drawing sequence logos

A very simple script to do a simple but tedious task.

Sequence logos are a common way of representing SNPs and diversity in groups of sequences. This script automates the task. It's a bit rough around the edges and serves mainly as a base for further hacking.

Usage is:

drawlogo.rb [options] FILE1 [FILE2 ...]

where options are:

-h, --help Display this …

2010-06-01

by Paul Agapow

in science

tagged imported, sequence-analysis, script, ruby, bioruby, bioinformatics

Attention: This article has been imported from a previous website and has not yet been checked. It may be malformed or incomplete.

Reducing a sequence to SNPs

A script for a simple task.

A largely self-explanatory script. This will "shrink" an alignment, deleting all sites that don't contain a polymorphism in some member sequence. A little bit of script candy as well, this takes any number of files and saves the results in a new file named according to a definable schema:

#!/usr …

2010-03-01

by Paul Agapow

in science

tagged science, tools, phylogenetics, web-development

jsPhyloSVG

jsPhyloSVG (or JPS from this point) is a nifty Javascript library for displaying phylogenetic trees in your browser. It can:

accept trees in a number of formats, encoded in the page or on file or loaded via Ajax
allow these trees to be exported or saved as SVG
display trees …

2009-06-01

by Paul Agapow

in science

tagged osgb

Ordnance Survey locations

Converting between OS grid references and longitudes-latitudes.

The Ordinance Survey is a UK-peculiar geospatial format, ubiquitous via street atlases, hiking charts and (yes) farming and epidemiological maps. It is explained in great detail is several places, but here's a quick overview:

The OS grid is a set of 25 squares, 500 kilometers a side, arranged 5-by-5 that …

2009-01-01

by Paul Agapow

in science

tagged mr-bayes, phylogenetics, possibly-obselete

Attention: This article may refer to information that is outdated or no longer relevant. It is left here for historical purposes.

Error 1 for Mrbayes

What happens when a make fails.

If this happens to you when trying to compile MrBayes:

% make
gcc -DUNIX_VERSION -DUSE_READLINE -O3 -Wall    -c -o mb.o mb.c
gcc -DUNIX_VERSION -DUSE_READLINE -O3 -Wall    -c -o mcmc.o mcmc.c
gcc -DUNIX_VERSION -DUSE_READLINE -O3 -Wall    -c -o bayes.o bayes.c
bayes.c:45:31: readline/readline …

2007-06-14

by Paul Agapow

in science

tagged bioruby, sequences, database

Fetch sequences from db

A simple script to grab bioseqs by accession.

This just wraps the BioRuby fetch functionality in a friendly commandline interface. In brief, it can accept accession ids on the commandline or from a piped file (one accession per line) and save the corresponding sequences from the db. Sequences may be downloaded via the bioruby or EBI servers. The …

2007-06-14

by Paul Agapow

in science

tagged computational biology, galaxy

Installing Galaxy

Setting up a production version of GMOD Galaxy for general use.

This presents one way to create an optimized production Galaxy instance. Variations are certainly possible and some of the choices presented are/were dictated by local culture. Certain settings may be more suitable for production or development environments. Nonetheless, this presents a start-to-stop process for installation and setup.

Note: this …

2007-01-01

by Paul Agapow

in science

Parsing Dendroscope nodes

For when you have to do lots to a big tree.

Previously, I showed how Dendroscope files can be easily manipulated with brute-force regex, so you can right scripts to color a mass of nodes, rather than having to format them one-by-one in the GUI. However, more complex manipulations require …

Tweets by @agapow