I did something very clever today!

I have been working on the Android version of Tunepal for the past few days. I’ve been using a friends Nexus One that he got at the GDC this year for debugging and yesterday I went out and bought a HTC Desire. Wow what a lovely machine. It’s like the iPhone on steroids! Really fast UI, friend feed, widgets tight google integration. I can’t imagine the iPhone 4 could be any better than this. I think Android would make a great platform for tablets actually.

Anyway I’ve been working on the transcription engine and my first attempt was to take the transcription engine from tunepal.org/MATT2 – which is all written in Java. It just about worked, but was really slow taking about two minutes to transcribe 12 seconds of audio. I did some profiling and the bottleneck was the FFT running on the Davlix JVM on Android. Even though the HTC Desire has a 1GHZ cpu, the Davlix JVM is very slow at performing vector maths. Each 2048 sample FFT was taking 500 ms on average!

So then I decided to do the FFT in native code. I downloaded the Android Native SDK. I found the documentation ok for me, but its pretty high level and not step by step, so someone who is not familiar with cygwin and the Unix shell might find it tough. Anyway the Android NDK uses JNI and a shell script called ndk-build to do it’s magic, so because I am using Netbeans on Windows 7 to code Tunepal first I had to install cygwin, GCC, GDB etc. I also installed C/C++ support into Netbeans while I was at it, so I could edit C++ code with code completion. I highly recommend this as the code completion and refactoring in Netbeans is the best – better IMO than Visual Studio for editing C++.

Ok I am using this FFT library, which is based on the Audacity codebase, so I wrote  a short JNI wrapper for it:

extern “C” {
jint Java_org_tunepal_Transcriber_PowerSpectrum( JNIEnv* env,jobject thiz, jint frameSize, jfloatArray in, jfloatArray out)

After figuring out that you can’t just pass ints and floats in and out of a JNI method. You have to use jfloat’s jfloatArray’s etc and marshal them using:

jfloat *signal = env->GetFloatArrayElements(in, 0);

Then you can use signal as if it was  a normal C float array. Compiling the C code using the cygwin shell and running ndk-build with this Android.mk file in the jni folder:

LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
LOCAL_MODULE    := tunepal
LOCAL_SRC_FILES := tunepal.cpp
include $(BUILD_SHARED_LIBRARY)

Generates the file tunepal.so, which you load in Java by creating the class Transcriber as follows:

public class Transcriber extends java.lang.Thread
{
// Load the library

static {
System.loadLibrary(“tunepal”);
}


// The method signature:
public native int PowerSpectrum(int frameSize, float[] in, float[] out);

Compiled everything and tested it to discover a few interesting facts:

  1. It was way faster than the pure Java implementation. The whole transcription was not taking about 12 seconds – roughly the same as the iPhone version.
  2. printf’s and couts get lost. They dont show up on the Android Java console, which you can view by running: adb.exe logcat on the Windows host
  3. The system crashed after performing 513 FFTs with the error: ReferenceTable overflow (max=1024) being printed out on the logcat console.

Hmmm not good.

I tried to solve this problem for a bit and then it struck me – why not do all the transcription in C++ – the FFT, the onset detection, the pitch spelling and the quantisation and the post process to filter out transcription  errors.  I had all this code written in C++ from the iPhone version, so I copied all the C++ files into the JNI folder, edited them around a bit and added them, to Android.mk:

LOCAL_SRC_FILES := tunepal.cpp FFT.cpp FuzzyHistogram.cpp PitchDetector.cpp pitchspeller.cpp transcriber.cpp

I also made a main.cpp file and a  NetBeans C++ project so I could test to make sure the transcription algorithm was working outside of the iPhone environment. After a bit of messing around, I had a Windows console app compiled and running in NetBeans to do the transcription so then on to making it compile into tunepal.so using ndk-build. Big problem. It turns out the C++ support in the Android NDK is very limited. No std namespace so no  string and no STL. My transcription system could be rewritten without these, but that would be at least a day’s work. Bummer!

It turns out I’m not the first person to run into this problem and a guy called Dmitry Moskalchuk aka CrystaX has made custom versions of the NDK with std and STL built in! so I downloaded the custom NDK 4 from his website and used it to compile my C++ code instead of the official Android NDK and low and behold…success! My C++ transcription engine compiled fine. The performance of the native transcription engine on the HTC desire is quite amazing. It now takes just a couple of seconds to transcribe 12 seconds of audio – compared to 12 seconds on my iPhone 3.

Tunepal on Android is a big step closer to reality today.

Now I have no intention of releasing this for free, so perhaps someone could tell me when paid apps are going to be available on the Android marketplace in Ireland?

Leave a comment