Practices for programming in a scientific environment? [closed]

Background

Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.

Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?

Questions

Some concrete questions:

  • What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
  • Was there any training for people without any significant background in programming?
  • Did you have anything like version control, and bug tracking?
  • How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)

Summary of answers thus far

The answers (or my interpretation of them) thus far: (2008-10-11)

  • Languages/packages that seem to be the most widely used:
    • LabVIEW
    • Python
      • with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
    • C/C++
    • MATLAB
  • Version control is used by nearly all respondents; bug tracking and other processes are much less common.
  • The Software Carpentry course is a good way to teach programming and development techniques to scientists.
  • How to improve things?
    • Don't force people to follow strict protocols.
    • Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
    • Reviewing other people's code can help, but be aware that not everyone may appreciate that.

Solution 1:

What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)

I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.

The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.

Was there any training for people without any significant background in programming?

Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.

Did you have anything like version control, bug tracking?

Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.

How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)

Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.

Solution 2:

The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.

It covers topics like version control, debugging, testing, scripting and various other issues.

I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.

Edit: The MP3s of the lectures are available as well.

Solution 3:

Nuclear/particle physics here.

  • Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
  • Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
  • One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
  • Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
  • Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.

To improve things:

  1. Get on the good side of the local software leaders
  2. Implement the process you want to use in your own area, and encourage those you let in to use it too.
  3. Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.

One more suggestion for improving things.

  1. Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.

Solution 4:

This might be slightly tangential, but hopefully relevant.

I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:

  1. Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
  2. We wrote automated test suites.
  3. We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
  4. Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
  5. I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].

Solution 5:

I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.

I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.

As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).