Some more things I done learned about REDCap

2016-09-10

by Paul Agapow

in science

tagged data-science, redcap, databases

A few more surprises

Refer back to the previous article for the background and some introduction to REDCap.

Note

These notes are based upon REDCap version 6.4.4. Different versions may have fixed or adjusted some of this behaviour.

Adding users

To add a user, you need an email address to send a registration link to. No getting around it. (Which may seem like a trivial requirement, but this can be an issue when new staff are joining and IT is yet to assign them an address.)

A small irritation is that if a new user has to be added to multiple projects, this has to be done, one-by-one, project-by-project. There's no getting around this.

Design

You can set the type of a field via validation and use this to control the number of decimal figures a numerical values gets. For example, integer, number, number_1dp, number_2dp can have no, any, one or two figures after the decimal point respectively. You can also set minimum and maximum values for the numeric figures.

Here's the catch:

Only integer and number pay any attention to the min/max fields. number_1dp and number_2dp ignore them. (Some REDCap gurus asserted that you shouldn't even see the mix/max options for number_1dp and number_2dp but maybe that's version-specific.)

So here's your choice: to you want to limit the number of decimal figures or the minimum and maximum values?

Making project changes

While it would be best for any project to be completely designed before it is released to production, in reality changes will often have to be made to datasets that already in use: adding & renaming columns, tweaking validation, etc. REDCap is good about preserving data in the face of schema change, but caution still needs to be exercised.

When in doubt, save and re-upload all your data
Making changes one-by-one in the browser (using the design GUI) tends to be much more robust and preserve more data than by uploading a new data dictionary. REDCap knows what data is being changed from and to in the first, but not in the second.
Changing the primary key / identifier will likely bork all your data.

Choices

You can trip yourself up with single / multiple-choice fields in REDCap, although you have to try hard. Choices are written as id-title pairs, where the first is the value that is stored internally and the second is the value that is shown to users:

1=foo
2=bar
3=baz

Note that they appear in the order that they are written in the choice options. So in this case:

3=baz
2=bar
1=foo

sorting does not occur on the internal values, they appear in the order baz-bar-foo. And should you be crazy enough to later edit and shuffle up the ids and titles:

1=baz
2=foo
3=bar

REDCap will happily let you.

Memory

REDCap report generation is incredibly memory hungry. I've had cases where a 20Mb report needed more than 2Gb of memory. The explanation for this is that any report has to be assembled in memory, to allow for the various filters and selections, before it is converted to the downloadable text. In short: that's just the way it is.

Downloading only part of the dataset (i.e. a subset of instruments or columns) will consume a correspondingly smaller amount of memory. Thus, a common riposte is to say that users shouldn't download the entire dataset. Good luck with that: you can provide an extensive list of report building and filtering mechanisms but people will insist on just downloading the entire dataset. A useful solution to this end is to install a plugin that dumps the data without filters. There's a useful implementation of this on the REDCap mailing list.

Elastic beanstalk

REDCap is easily deploy-able to Amazon's EB service, which not only saves you the work of setting up a system but also gets you auto-scaling behaviour. The way I set up an EB-based REDCap system was:

Set it up the conventional way. This probably isn't necessary but is handy to make sure the configuration values.
Set up your database to use Amazon RDS (their database service)
Set up the reports and files to use Amazon S3. You need to do this because the EB disk space is completely ephemeral. See the previous article for some gotchas about S3.

You may have to insert a few config files via .ebextensions. I've used two to adjust some environmental and PHP variables:

# configure environment variables:
option_settings:
  - option_name: PHP_MEMORY_LIMIT
    value: 4000M

# extra php configuration
# this file will be placed in `php.d` and read after `php.ini`
files:
  "/etc/php.d/project.ini" :
    mode: "000644"
    owner: root
    group: root
    content: |
      upload_max_filesize = 64M
      post_max_size = 64M
      memory_limit = 3900M

You may have to adjust some variables on your EB dashboard to configure
Otherwise, it's a simple case of dropping the REDCap source code into the EB deploy directory and writing the db settings in the appropriate place.

Email trapped by spam

This is not a REDCap problem per se, but can be an issue depending on how you deploy it. If you use Elastic Beanstalk or one of the other Amazon deployment methods that pushes mail through Amazon's servers, many systems will mark this email as spam. It seems that many of the other systems sharing the email servers with you may be spamming and so the servers have ended up on blacklists. About the only thing you can do is use your own servers, Amazon SMS (Simple Message Service) or something similar.

Weird characters

Generally, REDCap is alright with handling extended characters (accents, umlauts, etc.) but CSV files and the associated tools often will only handle ASCII. So it may be necessary to normalise everything to plain text or play around with encodings, which is never fun.

Some more things I done learned about REDCap

Adding users

Design

Making project changes

Choices

Memory

Elastic beanstalk

Email trapped by spam

Weird characters

References