Roy and Niels

Roy and Niels

Thursday, December 30, 2010

Happy new year...

... and a few wishes for 2011:

libdEdx: I hoped I could announce first official release of the stopping power library before 2011. Unfortunately, there are a few minor problems with the Bethe implementation which Jakob wanted to fix first. It is almost done...
Just to give an idea what's coming up: libdEdx 1.0 will probably be a Linux-only release and we most likely won't include ESTAR tables. Good news are that PSTAR, ASTAR, MSTAR and the ICRU 73 (old and new) tables seem to work. The entire ESTAR ICRU material composition table is supported, so if the user of the library calls a non-standard compound, Braggs additivity rule is automatically applied according to the stochiometric compositions defined by ICRU (see list here).
The Bethe-equation implementation allows the user to override the mean exitation potential for elements. Also, an uninstall target is now provided in the CMake configuration.
If you can't wait for the 1.0 release of libdEdx you may test revision 85 in the SVN repository which is quite close to something functioning.
Enough said about that.

So, regardning the current status of SHIELD-HIT10A: The main developer and maintainer Prof. Nikolaj Sobolevsky from INR Moscow visited us again in Aarhus for a month (a few pictures to be added later). Basically, we discussed the changes from 08 to 10A, the fitting of nuclear models to recently published experimental data and  the development framework in general.  More importantly, we try to encourage a clearer road map for SHIELD-HIT. In particular, this involves settling on a clear license model and terms of use. This process is ongoing and takes time, but surely we still see SHIELD-HIT filling a gap which neither FLUKA or Geant4 fills when it comes to combining ease-of-use while still having access to the source code. Stay tuned for more on this.

Now for something completely different: Since this blog (<-- beware of the recursion) quite unintentionally has turned more or less into a blog on topics in computing, medical physics and particle therapy, I have invited a fellow blogger Roy Keyes from University of New Mexico (Albuquerque) to contribute. Roy and I have shared many night shifts at CERN running our antiproton experiments.
Roy Keyes' (to the left) inexhaustible repository of real life anecdotes helped me to stay awake during the long night shifts at CERN October 2011. Thanks, Roy. Everyone, say hi to Roy...
Common work topics are also Monte Carlo simulations, mostly FLUKA, which UNM is running in the Amazon cloud (awesome idea!). In addition, Roy works on an open source DICOM-RT viewer dicompyler. Dicompyler resembles in many ways my PyTRiP project, which is supposed to be a versatile python visualization tool for the heavy ion treatment planning program TRiP, including a GUI. Probably these projects will merge at some point and take over the world. Again, more on this later.

BTW: You can meet Roy in a little test video I made about the CERN antiproton experiments. Alas, since kdenlive crashes big time in the current version at project loading, I never made it further than the intro and gave up the editing. I'll have to wait until kdenlive is updated in Debian testing repo. But I got looooots of wonderful footage, including details of antiproton production and french speaking technicians fixing dead synchrotrons! :-)

Plenty of plans, only little time, but surely this blog will become much more lively 2011. With these words, I wish you all a happy new year!


P.S.: Future non-work related blog entries from my side will be published on (The similarity to a nuclear waste dump is not coincidental.)

Monday, November 29, 2010

The Stopping Power of Frozen Water

In my last blog entry I commented on stopping powers of fast ions in medical physics, and announced the libdEdx library. (Stopping powers describe the energy loss of fast charged particles in material, thereby transferring energy to the target matter.) Stopping powers directly relate to the deposited dose, but there are plenty of more subtle effects where they may or may not have a profound influence:
  • Range of ions in matter. Often the mean excitation energy (not to be confused with the ionization potential or w value...) in the Bethe-equation is used as a macroscopic fitting parameter for the range of ions. Effects such as an primary particle dependent I-value are reported, even this is unphysical. Discussion is going on what the I-value for water actually is covering the 75 to 85 eV interval. PSTAR claims 75 eV. More recent studies seem to agree on a value close to 80 eV.
  • Ionization chambers rely on a solid assertion of stopping power ratios, since these detectors measure dose to air. In order to translate this to dose to water, you should know the particle spectrums and the stopping power ratio of water to air (see e.g. IAEA TRS-398 dosimetry protocol, so far one of the best out there, even though it has its flaws...) Or, you can use a parametrization, as Armin tries to show in our recent yet unpublished paper (pre-print).
  • Detector and biology response models such as the Katz or LEM model rely on the stopping power of ions. How large these effects are, is still to be investigated, and is something we want to look at using libdEdx and libamtrack.
There may be many more applications, whereas the first two mentioned here are quite well researched. Frustrating enough, if you have to calculate e.g. the stopping power ratios for a given particle spectrum, you have to rely on the ICRU49 (PSTAR/ASTAR) table and the ICRU73, and they are not calculated consistently. Ok, the errors may be minor for practical dosimetry purposes, but thinking of primary standard laboratories such as PTB in Braunschweig or NPL in London who try to increase the precision at least one order of magnitude, you may get into difficulties.

How can this be, don't we have a large data base on experiments for various ions on various targets? Well, yes, for some ion/target combination, but not for all of them. Peter Sigmund from University of Southern Denmark, (now Professor Emeritus), once showed a very nice matrix of combinations at our 4th Danish Workshop, where all the experimental gaps are.
Even worse is the situation for compounds, here no or very little data are available.

So, we decided to take this up a the 5th Danish Workshop on Particle Therapy, in order to sort out the field, and give the research some direction.
This brings me back to the title of this blog entry: The workshop was scheduled to take place tomorrow (30th November) in Aarhus, but exactly due to the stopping power of frozen water, we had to cancel it. Several key persons were stuck in various airports and could not make it because of snowstorms.

Now... this massive amount of snow in Aarhus at this time of the year is not common, and frankly, I wonder if I am going to make it to work tomorrow. Instead, I would like to invite you - dear reader of this blog - to stop a few minutes with me and silently enjoy the scenery below, accompanied with a piece of J.S. Bach.

Sunday, November 7, 2010

One Stopping Power Library to Rule Them All: libdEdx

Stopping powers describe the energy loss of charged particles traversing matter. In particle therapy, stopping powers are an essential ingredient for calculating the dose distribution of ion beams.

A direct way of calculating the stopping power is using the Bethe equation. However, this equation requires a good knowledge of the ionization potential (in particle therapy jargon: "the I-value") and for compounds this value is pretty ill researched. The I-value can be found experimentally, but there are only few experimental data available for compounds relevant for particle therapy, such as various tissue types and even for water. Our postdoc. Armin Lühr has recently submitted a paper where he takes a closer look on these issues which is available on the archive:

Several programs provide stopping powers, either as analytic calculations or inter/extrapolated experimental data. The list below may be incomplete, but these are the codes I have been in touch with so far:
  • ESTAR: for electrons. (Fortran77)
  • PSTAR: for protons. (Fortran77)
  • ASTAR: for alpha particles (Fortran77)
  • MSTAR: for alpha particles and heavier ions up to Z=18. Basically MSTAR is scaling ASTAR data, where the scaling factors are fitted to experimental data. (Fortran77)
  • ATIMA: code developed by GSI. (Java wrapping a Fortran core)
  • TRIM/SRIM: Application which can simulate any ion on any compound.
And various tables provided by the ICRU:
  • ICRU 49: proton data and alphas, equivalent with PSTAR/ASTAR.
  • ICRU 73: ions heavier than helium, old version
  • ICRU 73: new version after errata, quite similar to MSTAR, even if calculated differently.
and in summary they don't agree too well at lower energies (say below 10 MeV/u).

Nonetheless: from an application developer point of view this leaves you in a dilemma of choosing the proper code for your application, and most likely we will see more updates, since the issues are not solved yet with the erratum of the ICRU73.

In addition, from a technical point of view most stopping power codes are not written in convenient ways which allows a clean integration into your application. This is a common thing which happens when physicists develop code: the code will always be optimized to work nicely on the developers computer and fulfill their specific needs. Platform portability, an API, and install scripts following de-facto standards are mostly absent.

This is why we started the development of libdEdx.

libdEdx is meant to be a platform independent stopping power library which contains multiple tables to choose from, and can be extended with additional algorithms/tables. Currently libdEdx includes ESTAR,PSTAR,ASTAR,MSTAR and ICRU73 data as well as an implementation of the Bethe-equation akin to that found in SHIELD-HIT.
The material database is based on the extensive ICRU set found in the ESTAR table, and using Braggs additivity rule libdEdx can extend the MSTAR/ICRU73 data set to cover the ESTAR material list.

libdEdx comes along with an installer based on cmake, and the first official release should be able to run on Windoze just as well as Unix/Posix/Linux compliant platforms.

Surely, Geant4 also offers a range of stopping power tables to choose from, but you do not want to install entire Geant4 just to access the stopping power values.

Our bachelor student Jakob Toftegaard (who very conveniently has a background in physics and computer science) did most of the coding. A very early experimental pre-release is available on sourceforge, but if you want to experiment with it and even contribute you can also grab the most recent version from the SVN repository.

P.S.: Armin and David will present libdEdx on the MC2010 conference in Stockholm on the 9th to the 12th November. Posters can be found here.

Wednesday, October 6, 2010

Monte Carlo Programs in Particle Therapy Research

- a note on software design.

In my research, I use Monte Carlo particle transport codes a lot in order to simulate the interactions of a particle beams with matter.  There are several of these particle transport codes which are available for free, but if I have to simulate ions heavier than protons, then there are only few codes available which are capable of simulating the physics processes at clinical relevant energies. The four most common codes are FLUKA, Geant4, MCNPX and SHIELD-HIT.

Each of these transport codes have their scope of applicability, advantages and disadvantages. Here, in our research group APTG we work with FLUKA, Geant4 and SHIELD-HIT. In fact, we are actively developing the SHIELD-HIT code (visit our SHIELD-HIT developer page). Recently, I had a discussion with a colleague (and non-programmer) where I wanted to clarify why we put so much effort into SHIELD-HIT, now that basically all functions are more or less available in FLUKA or can be build in Geant4. I found it difficult to explain him the reasons.

That is why I came up with the idea of comparing the particle transport codes with cars. I am not particularly fond of cars, on the contrary, but in the spirit of particle transportation (pun intended) I could not resist the temptation.

A FLUKA car, Fortran 77 style.
A FLUKA car would be a regular car which does the job it is supposed to do. Imagine VW Passat or whatever. If you enter such a car for the first time, it will take you less than 10 seconds to orient yourself, since things are as expected: clutch, throttle, brakes etc.. If you forget to fasten your seat belt, you will get an error message which tells you exactly what is wrong. If you need some extras which goes beyond the standard equipment, you can add them yourselves (by the FLUKA user routines), although the developers limit what you can attach to your car. You are not allowed to touch the engine at all, the developers tune it for you to best possible performance.
FLUKA is closed source, and black-box like. If something fails with the engine (the physics models inside FLUKA), and this happens very rarely, you should take the car to repair ( = notifying the developers). Only the developers have access to the engine interior, and can fix it. While they do this, you realize that the interior may look rather old fashioned (the source is written in Fortran 77), and imagine it might be difficult to maintain, but the developers are experts, and have worked with it for many years. No reason to change this, as long as the engine runs smoothly and you still can find the special gasoline for the car (= a Fortran 77 compiler, deprecated on many newer Linux distributions).
If you want to publish benchmarks against other cars (other codes or experimental data), you must first talk back to the developers, and they want to assure that you have been operating the car right. (Unlike removing the brakes, then publishing how the car crashed.) The FLUKA car is increasingly popular, since it has a steep learning curve, is easy to run and the driver does not need to know how the engine is working.

A Geant4 car. Geant4 provides you all the pieces you need to build a car. ANY car.
Geant4 is not a car. It is a large box of Lego where you - in principle - can build your own car. We are not talking about normal standard Lego bricks, but the most fancy kind of them, Lego Technic C++ style bricks! If you ever have programmed in C++ and played with Lego, you know that equally to Lego you can also attach these code bricks in ways which they are not really supposed to (weak typing). Installing the Lego bricks in your laboratory may require some expert knowledge.
No-one has ever build a Lego based car similar to the FLUKA car, where you just step in, start the engine and go places. But there is no doubt, given infinite resources, you can build yourself a Ferrari, if you want. You can build a diamond inlaid SUV if such things turns you on. In principle anything is possible, but you need a lot of good developers and plenty of time. No research group has access to indefinite money supplies (or time), so instead research groups using Geant4, focuses on their specific needs. For instance, a famous research group in Boston have developed an engine (doing protons), which works well for their specific needs. In Japan a group is working on a special vehicle, lets say a Caterpillar (think gMocren is a part of this), which eventually can do specialized tasks (treatment planning with ions). That is fine, but if you want to adapt it to your own needs which goes beyond the application originally was designed for, it can again be quite some task and requires a good deal of programming knowledge. (Assuming you get the source code at all. If you get it, you still need to understand what is going on).
Perhaps some group has already developed higher level parts such as a carburetor and a light generator which are available to other researchers, yet they are not obliged to give these parts away. Geant4 is not GPL.

A SHIELD-HIT type car. Note, the driver found the light switch.
SHIELD-HIT is a russian Niva. Well, not precisely a regular Niva. At first glance you think, this is a normal car, not unlike a FLUKA car. You have to get a special contract with the code owners at INR before you may access the car.
Once you got it, you feel confident you can run this car effortlessly. However, as soon as you get inside, you realize something is very different. First of all you need three different keys to start the engine. The clutch is mounted on the steering wheel, the light switch is hidden under the seat, and if you need to do left turns, you must configure the car to do so before you start it. You need an English speaking Russian to tell you all this, because essential parts of the manual is written in Russian, and the manual itself is incomplete, only covering the light switch part.
If you look on the engine, you again realize it has lot in common with the FLUKA engine (SHIELD-HIT is also written in Fortran 77, and the geometry parser is also CG).
The key difference to the FLUKA car is, if something is broken, you are allowed (or even encouraged) to repair it yourself. Clear error messages or other indications of what is wrong, are seldom. You may simply have operated your Niva wrong, of there may be a real bug in the engine. But you do have access to the source code - and this enables you to do all the modification you want in the code. If you know exactly what you want, this is actually a big advantage. Imagine yourself stranded in a village in the middle of Siberia, you will be happy you drive a Niva. When your Niva is fixed, the car runs smoothly - just as smooth as a FLUKA car or a custom Geant4 based "forward going vehicle" would do. And the Niva is tolerant to various gasoline types (you can compile with g77, gfortran, ifort etc...)

So, our MSc student David has invested a lot of time in building the next generation Niva, the Niva version 2.0 (currently also known as SHIELD-HIT10A). It is supposed still to resemble a regular car but with improved user friendliness . The clutch and light switch are moved to more intuitive positions. Only one key should be necessary for starting the engine. And, as I mentioned in an earlier blog-post I am currently preparing an English manual for the new Niva 2.0.

David is also benchmarking the Niva 2.0 and overhauling the engine (meaning, better parameters for the physics models matching the recently released experimental data by E. Hättner). Finally, the Niva 2.0 will feature a lot of new features which are not readily available in any other Monte Carlo particle transport vehicle, such as air conditioning and cup holders (one for each passenger).

SHIELD-HIT10A, aka Niva 2.0. Fortran 77 style artwork is still visible despite of the upgrade. (APTG developer impression).

Now, let us assume, when David finishes his MSc February 2010, he would like to do a PhD project involving a pair of windscreen wipers and a light generator.
He has three codes to choose from. Of course, we all like playing with Lego. Lego is fun, and someone has already developed a Lego light generator ... so we only need to build the windscreen wipers, and off we go, doing a lot of research. (I really get carried away now.)
Alternatively David could choose to continue to work with the Niva 2.0. As a side effect the Niva will be upgraded with windscreen wipers which may benefit the continuation of the SHIELD-HIT project. Not as much fun as working with Lego, but probably as useful. Personally, I'd hit it, but working with Fortran77 is really demotivating. Especially if the code is full of GOTO statements.
Finally we have the choice of working with FLUKA. Perhaps this is the easiest, but in my opinion a bit dull since we are not contributing with much new on the developer side of the code.

There will be more of the Niva 2.0 at the MC2010 conference in Stockholm, where David will present his work (either poster or talk, we don't know yet).

Come and meet us!

Tuesday, October 5, 2010

Pretty Good Privacy and Evolution

This year, the danish government forced all its internet users to use the "NemID" solution for digital communication between the several public institutions, banks etc. The NemID concept is based on regular asymmetric encryption with a public and secret key pair. The developers of NemID realized very wisely that a significant amount of users (if not the majority) won't be capable of storing their secret key safely on their respective computers running windows.

The solution is that a private company "DanID", contracted by the danish government, stores the secret key for the user (imagine this happening in Germany!), and any interaction is realized with a java based login portal and a TAN list.

Without commenting on the trustworthiness of "DanID", this solution obviously does not integrate with typical Linux mailers such as Evolution.

Therefore, my colleague Bjarne Thomsen recently urged me (multiple times, thanks) to encrypt / sign my emails using gpg. Last time I did this was in 2004, but I must admit I cannot remember where I stored my old secret key (it is probably lost), and the revocation file is probably also gone. So, I had to start again from scratch. Here is the recipe:

Fist generate a new key pair. I was very paranoid, and closed down any closed source processes which I do not trust (skype, flash, google earth ...), while generating this key.

$ gpg --gen-key

Let it be valid for 2 years, you will probably loose your secret key, forget your pass phrase and/or your revocation file sooner or later, and there is no way you can delete keys from the key server.
You can also choose between RSA and DSA/ElGamal signing/encryption. I chose RSA, for no specific reason. For the bit length the default is 2048, but I chose 4096 bit, which should be safe until year 2030.

You will get a response akin to:

pub 4096R/xxxxxxxx 2010-10-04 [expires: 2012-10-03]

where the xxxxxxxx value is your public key identifier.

Next, you submit your public key to the key server:

$ gpg --keyserver --send-keys xxxxxxx

And finally I recommend you generate the aforementioned revocation file.

$ gpg --output revoke.asc --gen-revoke xxxxxxxx

Anyone who has this file can revoke your key. You can print the file on paper and store it a safe place, if you wish.

So, now you are ready to go. Fire up evolution, go to your mail account setup. There is a tab which says "Security". In this tab there is a place where you can enter your secret key ID. Don't worry, your key identifier is not secret in that sense, the actual key is protected with your pass phrase.

Now you are able to send signed emails. Evolution will ask for a pass phrase when accessing your key.

But you probably also want to send encrypted emails. In order to do so, you need to import the public key of the recipient. Evolution does not do this automatically, this is a very old bug in evolution which still has not been fixed, see #259665.

Instead, you must manually import the recipients public key. I look up the recipients key id on a key server, such as this one:

$ gpg --keyserver --recv-keys xxxxxxxx

If you are sure you got the right key, sign it:
$ gpg --sign-key xxxxxxxx

Ideally, you meet the person and exchange they key (e.g. at key-signing party)

If you need to sign against a specific secret key, use:
$ gpg --default-key xx(yoursecretkeyID)xx --sign-key xx(keyIDtobesigned)xx

List your keys with:
$ gpg --list-keys

Tadaaa, now you can encrypt the mails in evolution. Note that the email addresses of the public key and the recipient you mail the key to must match.

If you ever need to revoke your key, do:
$ gpg --output revoke.asc --gen-revoke xxxxxxxx
$ gpg --import revoke.asc
$ gpg --keyserver --send-keys xxxxxxxx

Oh yes, and here is my ASCII armored public key:
Version: SKS 1.1.0


Sunday, September 26, 2010

Cyrillic letters in LaTeX

Currently, I am writing a manual for a Russian Monte Carlo particle transport code SHIELD-HIT. LaTeX is just perfect for this purpose due to its vast amount of capabilities while still, say, "forcing" the user to be well structured and use a clean layout and formatting.

Unlike some years ago, most Linux distributions are now running in a UTF-8 environment, and most applications endorse UTF-8, which I think is a blessing for international minded spirits such as myself. LaTeX documents can be in UTF-8, without the use of awkward escape characters such as \"u for an ü.

But LaTeX and dvips still need to have the appropriate fonts installed. The aforementioned manual may contain text in Cyrillic letters, so my header is formatted this way:

\usepackage[utf8]{inputenc} % make weird characters work
\usepackage[T2A,OT1]{fontenc} % enable Cyrillic fonts

Most editors such as emacs or gedit will recognize the character encoding for you:
However, emacs is also sensitive to the \usepackage[utf8]{inputenc} line in the .tex file, presumably this is a feature coming along with the auctex package. If you specify something which is inconsistent with the character encoding of the file, you will have a terrible mess.

Compiling the .tex file, I was confronted with an error:


! Package fontenc Error: Encoding file `t2aenc.def' not found.
(fontenc) You might have misspelt the name of the encoding.

Ok, so I looked into synaptic what file provides the t2aenc.def. But synaptic did not return any results. Then I searched for the "T2A" font. According to synaptic, this is provided by the "cm-super" package. But after installing the 60 MB package the error still persisted.

The trick was to install the texlive-lang-cyrillic package. Then the .tex file could be compiled. This is rather unintuitive, since there is no mentioning of the t2aenc.def file in the texlive-lang-cyrillic package.

Next step is to convert the .dvi file to a PDF file. If you do not have the cm-super package installed, you may end up with:

~/Projects/shieldhit/trunk/doc/tex$ dvipdf ShieldHIT_Manual.dvi
dvips: Font tctt0800 not found, using cmr10 instead.
dvips: Design size mismatch in font tctt0800
dvips: Checksum mismatch in font tctt0800
dvips: ! invalid char 246 from font tctt0800

and a rather empty .pdf file. The cm-super-minimal package may be insufficient, In my case I had to install the full cm-super package.

Conclusion: UTF-8 is good for you, it makes life easier and more enjoyable. So use it whenever you can. :-)

Friday, September 24, 2010

Yet another ssh wonder: SSHFS

SSHFS is a file system which is capable of mounting remote filesystems locally on a computer if the remote system can be accessed by ssh.

This is a very interesting alternative to NFS or Samba based filesystems. First of all, it can be operated entirely from userspace, which is cool. Contrary to NFS, any user can mount any remote directory he or she has access to via ssh.

There are no locking issues with SSHFS, which makes it robust and free of the infamous "NFS stale handles". The speed is inferior to NFS and not recommended for intensive use, but it is perfect for occasional work.

On Debian you merely need to install the sshfs package and its dependencies, e.g.: (as root)

$ apt-get install sshfs

Then you need to add all the users to the fuse group. E.g. the user bassler can be added:

$ adduser bassler fuse

after this, you must logout of the system (entirely, yes. Like exiting your X session, and logging in again. Just closing and opening the terminal window wont be sufficient).
That's it.

Now the (non-root) user you have added to the fuse group can mount remote directories:

$ sshfs /local/directory

Unmounting can be realized with

$ fusermount -u /local/directory

What I still miss, is an automount solution.

Thursday, September 16, 2010

Downloading a youtube video and extract audio

Ever played a game of the STALKER series? I am not all that fond of ego shooters, but I enjoy movies made by the russian director Tarkovsky, and the STALKER triology is based upon his famous movie of the same name. You can find plenty of movie snippets on YouTube.

Anyway, I also came across this little benchmark test (yeah, there is no visible difference between DirectX10 and 11, but that's not the point here), which had a very catchy tune. A few visitors ask where the music was taken from, but no one replied. Now, it can be rather difficult to search for an instrumental piece of music on the net, when all you have got is the music.
Update: Or you may find the tune in another vid, and actually get a reply. The track above is Dream Catchers by A.L.N.P.S.V from the Tunguska Chillout Grooves vol. 1.

So I searched for a way of somehow extracting the audio in a .ogg file. All my attempts (even trying to use some sort of audio capture) failed, until my friend Alexandru Csete told me about this nifty python script which can download youtube videos.

Unpacking it, and executing it from command line

$ ./youtube-dl

which returns a .mp4 file. The audio channel can now be extracted with ffmpeg:

$ ffmpeg -i 6QpAMzTCDpg.mp4 -vn -acodec vorbis -aq 50 audio.ogg

The "-vn" option tells ffmpeg to ignore the video part. If the "-acodec vorbis" is not specified, the audio had some aac format, which e.g. my mobile phone is not capable of handling, so this had to be stated explicitly.

I spent some time to figure out the quality setting. First I tried "-ab 64000" setting, but somehow ffmpeg did all the encoding at about 60 kpbs irrespectively of the -ab setting. The default settings were pretty bad, and especially in this example you can easily hear losses. (If you listen to the faint woodwind-like sounds close to 5:30 you hear they are almost gone on lower quality setting.) Using the "-aq 50" option I got a good result.

I checked with the "file" command to see what format the audio was:

$ file audio.ogg
audio.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~0 bps

Apparently the file command cannot recognize the bps correctly.

I also tried to convert to mp3 format, but even if I had installed the "lame" library, ffmpeg refused to find it. I did not pursue this further, since I am actually quite happy with the ogg/vorbis format.

Wednesday, September 15, 2010

Wonders of chmod: set executeable bit only on directories

On some systems, the default file permissions are rather restrictive. I.e. at our university the policy is umask=002, which means that any new file created or copied from a remote computer will only have owner read/write access enabled and read/write/executable to directories.

This is sometimes annoying when files are being uploaded to the ~/public_html/ directory which were meant to be accessed by external users. Usually one forgets to set the correct permissions, leaving the files inaccessible to general public.

This morning I got an email from my PhD student (he is currently in Australia, which explains why he was up that early) - he had to turn in an exam project via his web page, but he had forgotten to set the permission right.

His WIFI access did not allow ssh (lame!), so he gave me his password so I could fix this for him. Entering his directory, I encountered a large directory structure with several sub-directories and files.

Issue a recursive chmod

chmod -R a+r *

from the base directory would still leave the sub-directories inaccessible.

chmod -R a+rx *

is an ugly thing to do. Sure it works, but messes up auto completion as bash/dash will think all files can be executed. Also on a coloured console this would produce psychedelic output when listing files. Manually setting permissions on all directories would be a very stupid thing to do.

That is when I realized that chmod can also explicitly be told only to set the executable bit on directories using capital X. Awesome, I thought, so here we go:

chmod -R a+rX *

Yeah, this may be trivial, but I was not aware of this option before.

Tuesday, September 14, 2010

Transferring files via tar and a compressed ssh tunnel.

I was confronted with an old AIX computer with a bad network card, and had to secure 60 GB data located on a SCSI disk. We did not have any available SCSI controller, so we could not dismount the disk from the AIX machine. The machine did not have any USB ports either, so the only way to secure the disk was via network.

However, the network card frequently stalled and provided transmission rates of only max. 30-40 kB/second. Copying data by scp via ssh would be way too slow. Initial attempts indicated that the file transfer could take several weeks.

Rsync could be an option, but it was not installed on the AIX machine. Installing rsync was not considered, because no one really dared to touch the AIX machine with root rights.

The solution was to pipe the data via a compressed ssh tunnel. Most data on the AIX machine were CT scans, which can provide a significant compression ratio.
The command line is slightly cryptic though: (it is a one-liner, ignore the line wrapping)

tar cvf - /tp_data | gzip -9c | ssh root@gorm 'cd /target_dir; gunzip - | tar xpf -'

which will place the /tp_data directory inside the /target directory on the gorm computer, running linux.

Result: all data were transferred in less than a week.