Roy and Niels

Roy and Niels

Sunday, September 26, 2010

Cyrillic letters in LaTeX

Currently, I am writing a manual for a Russian Monte Carlo particle transport code SHIELD-HIT. LaTeX is just perfect for this purpose due to its vast amount of capabilities while still, say, "forcing" the user to be well structured and use a clean layout and formatting.

Unlike some years ago, most Linux distributions are now running in a UTF-8 environment, and most applications endorse UTF-8, which I think is a blessing for international minded spirits such as myself. LaTeX documents can be in UTF-8, without the use of awkward escape characters such as \"u for an ΓΌ.

But LaTeX and dvips still need to have the appropriate fonts installed. The aforementioned manual may contain text in Cyrillic letters, so my header is formatted this way:

\usepackage[utf8]{inputenc} % make weird characters work
\usepackage[T2A,OT1]{fontenc} % enable Cyrillic fonts

Most editors such as emacs or gedit will recognize the character encoding for you:
However, emacs is also sensitive to the \usepackage[utf8]{inputenc} line in the .tex file, presumably this is a feature coming along with the auctex package. If you specify something which is inconsistent with the character encoding of the file, you will have a terrible mess.

Compiling the .tex file, I was confronted with an error:


! Package fontenc Error: Encoding file `t2aenc.def' not found.
(fontenc) You might have misspelt the name of the encoding.

Ok, so I looked into synaptic what file provides the t2aenc.def. But synaptic did not return any results. Then I searched for the "T2A" font. According to synaptic, this is provided by the "cm-super" package. But after installing the 60 MB package the error still persisted.

The trick was to install the texlive-lang-cyrillic package. Then the .tex file could be compiled. This is rather unintuitive, since there is no mentioning of the t2aenc.def file in the texlive-lang-cyrillic package.

Next step is to convert the .dvi file to a PDF file. If you do not have the cm-super package installed, you may end up with:

~/Projects/shieldhit/trunk/doc/tex$ dvipdf ShieldHIT_Manual.dvi
dvips: Font tctt0800 not found, using cmr10 instead.
dvips: Design size mismatch in font tctt0800
dvips: Checksum mismatch in font tctt0800
dvips: ! invalid char 246 from font tctt0800

and a rather empty .pdf file. The cm-super-minimal package may be insufficient, In my case I had to install the full cm-super package.

Conclusion: UTF-8 is good for you, it makes life easier and more enjoyable. So use it whenever you can. :-)

Friday, September 24, 2010

Yet another ssh wonder: SSHFS

SSHFS is a file system which is capable of mounting remote filesystems locally on a computer if the remote system can be accessed by ssh.

This is a very interesting alternative to NFS or Samba based filesystems. First of all, it can be operated entirely from userspace, which is cool. Contrary to NFS, any user can mount any remote directory he or she has access to via ssh.

There are no locking issues with SSHFS, which makes it robust and free of the infamous "NFS stale handles". The speed is inferior to NFS and not recommended for intensive use, but it is perfect for occasional work.

On Debian you merely need to install the sshfs package and its dependencies, e.g.: (as root)

$ apt-get install sshfs

Then you need to add all the users to the fuse group. E.g. the user bassler can be added:

$ adduser bassler fuse

after this, you must logout of the system (entirely, yes. Like exiting your X session, and logging in again. Just closing and opening the terminal window wont be sufficient).
That's it.

Now the (non-root) user you have added to the fuse group can mount remote directories:

$ sshfs /local/directory

Unmounting can be realized with

$ fusermount -u /local/directory

What I still miss, is an automount solution.

Thursday, September 16, 2010

Downloading a youtube video and extract audio

Ever played a game of the STALKER series? I am not all that fond of ego shooters, but I enjoy movies made by the russian director Tarkovsky, and the STALKER triology is based upon his famous movie of the same name. You can find plenty of movie snippets on YouTube.

Anyway, I also came across this little benchmark test (yeah, there is no visible difference between DirectX10 and 11, but that's not the point here), which had a very catchy tune. A few visitors ask where the music was taken from, but no one replied. Now, it can be rather difficult to search for an instrumental piece of music on the net, when all you have got is the music.
Update: Or you may find the tune in another vid, and actually get a reply. The track above is Dream Catchers by A.L.N.P.S.V from the Tunguska Chillout Grooves vol. 1.

So I searched for a way of somehow extracting the audio in a .ogg file. All my attempts (even trying to use some sort of audio capture) failed, until my friend Alexandru Csete told me about this nifty python script which can download youtube videos.

Unpacking it, and executing it from command line

$ ./youtube-dl

which returns a .mp4 file. The audio channel can now be extracted with ffmpeg:

$ ffmpeg -i 6QpAMzTCDpg.mp4 -vn -acodec vorbis -aq 50 audio.ogg

The "-vn" option tells ffmpeg to ignore the video part. If the "-acodec vorbis" is not specified, the audio had some aac format, which e.g. my mobile phone is not capable of handling, so this had to be stated explicitly.

I spent some time to figure out the quality setting. First I tried "-ab 64000" setting, but somehow ffmpeg did all the encoding at about 60 kpbs irrespectively of the -ab setting. The default settings were pretty bad, and especially in this example you can easily hear losses. (If you listen to the faint woodwind-like sounds close to 5:30 you hear they are almost gone on lower quality setting.) Using the "-aq 50" option I got a good result.

I checked with the "file" command to see what format the audio was:

$ file audio.ogg
audio.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~0 bps

Apparently the file command cannot recognize the bps correctly.

I also tried to convert to mp3 format, but even if I had installed the "lame" library, ffmpeg refused to find it. I did not pursue this further, since I am actually quite happy with the ogg/vorbis format.

Wednesday, September 15, 2010

Wonders of chmod: set executeable bit only on directories

On some systems, the default file permissions are rather restrictive. I.e. at our university the policy is umask=002, which means that any new file created or copied from a remote computer will only have owner read/write access enabled and read/write/executable to directories.

This is sometimes annoying when files are being uploaded to the ~/public_html/ directory which were meant to be accessed by external users. Usually one forgets to set the correct permissions, leaving the files inaccessible to general public.

This morning I got an email from my PhD student (he is currently in Australia, which explains why he was up that early) - he had to turn in an exam project via his web page, but he had forgotten to set the permission right.

His WIFI access did not allow ssh (lame!), so he gave me his password so I could fix this for him. Entering his directory, I encountered a large directory structure with several sub-directories and files.

Issue a recursive chmod

chmod -R a+r *

from the base directory would still leave the sub-directories inaccessible.

chmod -R a+rx *

is an ugly thing to do. Sure it works, but messes up auto completion as bash/dash will think all files can be executed. Also on a coloured console this would produce psychedelic output when listing files. Manually setting permissions on all directories would be a very stupid thing to do.

That is when I realized that chmod can also explicitly be told only to set the executable bit on directories using capital X. Awesome, I thought, so here we go:

chmod -R a+rX *

Yeah, this may be trivial, but I was not aware of this option before.

Tuesday, September 14, 2010

Transferring files via tar and a compressed ssh tunnel.

I was confronted with an old AIX computer with a bad network card, and had to secure 60 GB data located on a SCSI disk. We did not have any available SCSI controller, so we could not dismount the disk from the AIX machine. The machine did not have any USB ports either, so the only way to secure the disk was via network.

However, the network card frequently stalled and provided transmission rates of only max. 30-40 kB/second. Copying data by scp via ssh would be way too slow. Initial attempts indicated that the file transfer could take several weeks.

Rsync could be an option, but it was not installed on the AIX machine. Installing rsync was not considered, because no one really dared to touch the AIX machine with root rights.

The solution was to pipe the data via a compressed ssh tunnel. Most data on the AIX machine were CT scans, which can provide a significant compression ratio.
The command line is slightly cryptic though: (it is a one-liner, ignore the line wrapping)

tar cvf - /tp_data | gzip -9c | ssh root@gorm 'cd /target_dir; gunzip - | tar xpf -'

which will place the /tp_data directory inside the /target directory on the gorm computer, running linux.

Result: all data were transferred in less than a week.