18 novembre 2015

Windows compatibility fixes and enveloppe & pitch bend demo

I didn't show all the possibilities yet, so here is another demo showing the enveloppe editing in action and a keyboard made of faders which can be used for pitchbend or aftertouch.

I still got to document all the available components and in general how to customize the app, not a small task.

Also, I updated the html5 control surface to make sure it's windows compatible. I had to change the communication layer once again, because linking binaries is not an option on windows ( it depends on python25.dll, while the interpreter is statically compiled into Ableton Live ), and will not be possible at all according to the changes made in current Live beta, which comes with python 2.7 with all binary loading features disabled : pyd stripped off, the module loader has apparently been customized to ignore .pyd/.so files, and ctypes won't load.

So I'm back to a more traditional and verbose protocol : plain newline-separated json requests through a TCP socket between the node server and the control surface script. At this level I can disable nagle's algorithm, and the two components being on the same machine, the latency is negligible. I used the oldschool asyncore module which is quite not as cool as ZeroMQ bindings and asyncio, but it's just available builtin and simple enough.

12 octobre 2015

Nagle's algorithm workaround with websockets

Finally found a workaround for the random 200-300ms latency problem that would make MIDI playing practically useless from the app. I isolated the problem in this project : https://github.com/valsteen/socketio-nagle-experiment

Tried 3 approaches :
  • send ack from server after each client message
  • send filler packets each 10ms from client
  • send filler packets each 10ms from server
Conclusion on chrome for android, same wifi network:
  • without any strategy, pure client->server data, about 25% of the messages come with a latency > 50ms
  • with acks from server after each client message: unexpectedly gives no significant change
  • with filler packets each 10ms from server to client: actually great! 0% messages over 50ms. Average latency 4.47 which is great
  • filler packets from client to server each 10ms gives similar results
I applied this trick on the control surface app, with some fine-tuning because the browser tends to be overwhelmed if I send packets each 10ms. So the trick is this: if I use MIDI ( because I'm playing on an instrument widget or an USB MIDI device is plugged into the tablet ), I send filler packets each 50ms for maximum one second after the last MIDI note was sent. This tricks the algorithm which, I guess, sees those messages as being responses, triggering the immediate flush of the send buffer.

Here is how I fixed it in my app, if it can help anyone with a similar problem : https://github.com/valsteen/ableton-live-html5-control-surface/commit/43551ef5d32c7fe5a4af7e15b66d6ee7f2d1410b

08 octobre 2015

HTML5 Control surface : first public release

I just opened the sources, the project is on github.

For now it's mostly to allow peer review and for the sake of opening the sources. License is UNLICENSE, so go ahead, help yourself.

Later on I'll work on making it installable for everyone, but evolving from a proof of concept to a deployable and configurable software requires quite some time.

07 octobre 2015

Project update: application state as an aurelia template

The last important design decision I had to take is how to express dependencies between widgets and live controls or other widgets without relying on code. In other words, some kind of "save state".

Instead of creating yet another json state that itself creates objects and layout, I figured that directly using aurelia templates actually allows me to express those dependencies. Live's parameters happen to be custom tags without any view, but that can then be bound to visible and controllable widgets.

Hopefully it won't become a tag soup and stay focussed on layout and dependencies. Additional behavior can always be declared in the ViewModel or as new custom tags.

This is one of the last thing I wanted to implement before the first public release, so stay tuned, it's coming soon.

29 septembre 2015

Building a HTML5 control surface part 2 : networking

I started the project with a few critical unknowns - things that must be possible in order to make it actually worth anything:
  • full API access
  • low network latency
  • touch precision, responsiveness and actual practicability for the task
  • fast UI updates

In part 1 I've already shown how to access Live’s API and more importantly, how to make it practical and meaningful. I’ve seen all kind of APIs and I can tell that the ability to interact and experiment is more important than a dump of method signatures.

In this second part I’ll focus on the networking part. Thanks to existing software such as TouchDAW and LK I already knew that acceptable latency was achievable. But I had no idea if I could achieve this on a browser.

Here is a simplified schema showing the communication between components. Promises are a programming concept but it’s really critical to handle all the complexity caused by asynchronous communication, so I present them here as an internal communication bus.

|   Browser       |                                                     
|                 |                                                     
|                 |                                                     
|   Widgets       |                                                     
|     ^           |                                                     
|     |           |                                                     
|     | Promises  |                                                     
|     | + Updates |      +---------------+            +-----------------+
|     v           |      |               | ZeroMQ     |                 |
| Live Proxy <-----------+>   Server   <---------------> Ableton Live   |
|                 | WebRTC               |            |                 |
+-----------------+      +---------------+            +-----------------+

Reality is a bit more complex but parts not shown here are mostly workarounds. 

Message format and messaging pattern

Format of a request:
|    Envelope          |                    Message                                      |
| clientID | messageID | Object reference                        method        parameter |
|  xxxx    |  yyyyy    | tracks.0.devices.1.parameters.2.value   set           100       |
|          |           |                                                                 |

The simplest design would be to send one request at a time, wait for the response, then send the next request

+        +
| Request|
| Request|
| Request|
|  ...   |
+        +

But this approach is really too slow and resources are underused : only either the client, the network or the server is busy at a given time. We want to use all the throughput we can have.

Instead, if we have several requests to make at the same time, we just push them all through the pipe, then we get all the responses back.

+   Request 1    +
| +------------> |
|   Request 2    |
| +------------> |
|   Request 3    |
| +------------> |
|   Reply 1      |
| <------------+ |
|   Reply 2      |
| <------------+ |
|   Request 4    |
| +------------> |
|   Reply 3      |
| <------------+ |
|   Request 5    |
| +------------> |
|   Reply 4      |
| <------------+ |
|   Reply 5      |
+ <------------+ +

This way we use all the bandwidth we can have for requests and we use the CPU as much as we can. At startup, several dozens of API calls are necessary to setup widgets : initial value for knobs ( aka encoders ), clip length and notes, etc. 

Since all used layers are supposed to guarantee order and reliability, we could in principle just use a queue to match each reply to its requester, but somehow I manage to lose replies after a while. I have no idea where it happens but the fact is that, if you don't mark your messages because you expect a perfect sequence, then once the sequence is lost it's a catastrophic failure, you have no choice but to restart all components to a clean state, so nobody keeps a polluted queue. And it's indeed a mess to troubleshoot. Turns out, adding a messageid to match request and replies adds no significant overhead and is easier to program and manage. I lost a few hours refactoring that part, because I wanted to save a few bytes per message, which is ridiculous.

So here is the sequence of events :
  • browser-side
    • widget needs to call the API to get its initial value or to set a new one.
    • widget calls the « API Proxy » with the message that has to be sent to the API
    • the proxy adds a message ID and sends it to the server
  • server-side
    • add a clientId to the request, to know which browser made that request
    • call the API with that request
  • Live API
    • process request, generate answer
    • send reply with the original envelope ( that is, clientid and message id )
  • Server-side
    • read reply’s clientid, send to the corresponding browser
  • browser-side
    • API proxy receives the message, checks the messageId.
    • The corresponding callback is called with the reply content
      ( it’s actually promises, more on that below )

All of this can happen in parallel with multiple messages.

I suggest anyone to always use messageIds even if they expect a strict order, this makes it much more easier to troubleshoot. Most problems happen at high load, and you don’t want to manually count dozens of messages and replies in the logs. Grepping for a unique ID is much simpler.


WebRTC is normally meant for peer-to-peer communication between browsers, allowing realtime video, voice and chat, and the only browsers that implement it are Chrome and Firefox. It’s even more restricted on mobile devices, as Apple only allows its own rendering engine — as a consequence, no browser on iOS supports WebRTC. That means I’m limited to Android and desktops.

So why not just use websockets, which are so easy to setup and widely supported ? It was indeed what I used in my early experiments. Unfortunately you can’t disable nagle's algorithm : small messages are grouped together, which regularly causes a latency of a few tenths of seconds ; that’s totally unacceptable as I want to be able to play notes without any perceptible delay between the touch and the sound coming out of Ableton Live. On the other hand WebRTC can be used on top of UDP, and so far I did not hear any network-induced delay.

So I end up using WebRTC’s datachannel, which can be used as a message-oriented communication, perfect for the purpose. I tried to stream the sound as well but there is way too much latency ; I’m not sure if it can be achieved, maybe streaming it in a more classical way from the server is a better option.

Setting up a WebRTC connection is not straightforward, fortunately it’s possible to find great examples out there, like in this blogpost : http://blog.printf.net/articles/2014/07/01/serverless-webrtc-continued/ . So, during the lifetime of the web application, HTTP is only used to load the usual assets ( html, css and javascript ), then another HTTP call is made to setup the WebRTC channel ( offer/answer in WebRTC terminology ). After that, everything else is WebRTC.

Server-side, each WebRTC connection is associated with an ID that is used in the envelope.

That said, as the node bindings for WebRTC is not stable I'll explain my workaround in the end of the article.


ZeroMQ is ridiculously simple to use but you only grasp its essence once you need it in a project. It was a bit tricky to build it for python2.5, but that’s totally worth it ( details here : http://www.djcrontab.com/2015/09/building-html5-control-surface-for.html#comment-2274750354 ).

There is no « connection refused » with ZeroMQ, it’s just a peer that is not yet there ; indeed you can detect the situation and handle it, but it’s a separate event handler. This forces a clean, focussed, event-oriented code ; setup, data processing and exception handling belong to different places.

Also, while it implements explicit high-level roles such as publisher/subscriber or pull/push, the notion of bind and connect is independent from these roles. For instance, several publishers can connect to a subscriber. Apart from the initialization, none of your code has to care about it.

That’s plenty of details that just makes it easy to experiment until you find the right implementation. I’d like to see an API similar to ZeroMQ on top of WebRTC datachannels, this would make distributed applications involving a browser much more consistent.

ZeroMQ comes with three basic messaging patterns - on top of which more complex schemes can be built : 
  • Request/reply is simpler as it requires only one socket on each side, but the throughput is limited as told above.

    Instead we need two pairs of sockets : one for pipelining requests and another one to pipeline replies.
  • Pub/Sub is an option but the default behavior is to drop packets once the send buffer is full. Also, pub/sub suffers from the « slow joiner » problem ; overall, that pattern is more convenient for one-way event broadcasting that doesn’t involve state change. 
  • Push/pull blocks the call if the send buffer is full, and doesn't suffer from the slow joiner problem. Therefore, pull/push seems the best option for pipelining requests and replies.

Python side ( Ableton Live API )

While asyncio is around since a while, it seems that the most common way to do IO in python, including ZeroMQ, is to just use threads and let IO methods block as needed. This is the way it is done in the official documentation . I'm principally a Python developper and I find this approach idiomatic to python. An alternate implementation, aiozmq, allows using coroutines. This really feels strange and unnatural to me ; I'm not sure if it's a question of habit, or if it doesn't fit with the overall syntax. I'd prefer callbacks or some other inversion of control ; I didn't try but those "yield" calls in the middle of a method feel like it's error prone, actually bringing more complexity or may lead to forget to free resources. Or, maybe, it doesn't follow the "explicit is better than implicit" principle. It's only a feeling.

So, threading and blocking calls are fine, but here lead to a problem : we can't rely on blocking and threading inside an Ableton Live  control surface script. I told in the previous article I did JSON-to-MIDI serialization but this was a temporary solution: serialization is too slow, I have to escape the end-of-sysex byte ( 247 ), and I had to split big messages because of a (totally reasonable) buffer limitation in node-midi ; overall it's not a viable solution.

Here is the trick I use instead. If have two pair of push/pull socket, one for requests, one for replies. When I need to make a request to Live's API :
— send an empty sysex message ( [240,247] ) from the server
— then immediately send the request via the server's push socket
— the sysex message wakes up the control surface script in the midi handler method
— read one message from the control surface's pull socket.
— when request is processed, the control surface script sends the response from its push socket.
— from the server, get the reply from the pull socket

Which just gives, in a simplified version :

class NodeControlSurface(ControlSurface):
    def __init__(self, c_instance):
        self.context = zmq.Context.instance()
        # receiver socket
        self.pull_socket = self.context.socket(zmq.PULL)
        self.pull_socket.RCVTIMEO = 5 # never block longer than 5ms
        # reply socket
        self.push_socket = self.context.socket(zmq.PUSH)
    def receive_midi(self, midi_bytes):
        # this method is called as soon as a sysex message is received
        # ( not called for any other midi message since no mapping has been setup )
        # receive request. This will block at worse 5ms
        clientId, req = self.pull_socket.recv_multipart()
        # send reply

So instead of an explicit event loop blocking on a pull socket, it's just an event handler. It processes one request at a time, that's it. It feels less cluttered than a loop.

Javascript side ( web application )

I didn't mention it yet, but the web application uses aurelia.io . Since the app is very specific, the framework isn't used extensively, but there is one important point : aurelia enforces the use of ES6 ( check ES6 features, it's amazing ). It allowed me to discover what the latest trend in javascript looks like, and it's really, really neat. Python still has a strong points due to its extensive standard library, but when it comes to the language itself, I feel like I'm close to call it my favourite language. It's not about purity, it's about solving today's problems ; it was once the language everyone was forced to use to program browser-side, but it's more than that today, it has a real practical value.

Coroutines ?

There are some attempts at implementing coroutines as well, but I don't like it either. Event handlers are just fine and more importantly match what's really happening ; even when it comes to handle a state, closures are just fine. Indeed some examples show that coroutines can get rid of some clutter, but I feel like the value of removed clutter is not balanced by the added obscurity in real situations. Learning to think in terms of state and events provides more value than using coroutines in my opinion.


Promises are really absolutely critical to allow such an application to even exist. It's not about performance, it's not enabling anything internal to the program. Humans have hard limits as well. Consider complexity as being proportional to the number of arbitrary items you have to hold in your mind to understand something, in order to troubleshoot or modify a program. One trick is to make those items not arbitrary anymore, so they are connected somehow. See this great TED talk about memory : Feats of memory anyone can do. You must organize knowledge to tell a story, present logical relationships, meet expectations from previous experiences, be consistent and meaningful.

One of the most complex part of the application is the clip editor. The initialization depends on plenty of parameters which must be fetched remotely: does a clip exist in the given slot ? If not, create it. Then fetch how long it is, and where it starts, so I can request the notes to display. Also, create listeners for play status, play position and note changes, but not before I ensure the clip exists. Some part of it must be strictly ordered, some part can be in parallel ; some part can fail.

Imagine this on ten levels. Add checks, branches, temporary state variables.

This would be an unmanageable spaghetti of callbacks with no hope to properly pipeline parallelizable calls to make it fast. Indeed one can be creative and come with an original solution but you don't have to : promises to the rescue.

This is the initialization code for the clipslot object ( think of something similar to an active record ):

this.ready = Promise.reduce([
    () => this.get("has_clip"),  // this.get automatically initializes this.has_clip at reply
    () => { 
        if (!this.has_clip) {
            return this.call("create_clip", 4); // create a clip of 4 beats
    () => this.get("clip"), // creates a local reference to the remote "clip" object
    () => Promise.settle([ // wait for all promises to either succeed or fail, in any order
        this.clip.set("looping", true),
        this.clip.listen("playing_status", () => this.get("is_playing")),
        this.clip.listen("notes", () => this.updateDisplay()) //notes have changed
    () => {
        // we now have all the references we need, so now we can listen for changes 
        Object.observe(this.clip, (changes) => {
            var attributes = new Set(changes.map((x) => x.name));
            if (attributes.has("loop_end") || attributes.has("loop_start")) {
                this.updateDisplay(); // update display on change
        this.updateDisplay(); // first update of the display
], (_, c) => c(_), null /* "reduce" needs an initial value and an accumulator function */).catch((error) => {
    throw error;

It's still quite involved indeed, but it combines ordered initialization when needed, parallelization when possible, error tolerance if applicable. All of this being as flat as possible instead of a christmas tree.

  • First notice it's all arrow functions, new in ES6. Not only this clutters less the source code, but also the "this" variable is the same as the englobing scope. Perfect for callbacks. 
  • this.get, this.set, this.call all actually pass a request to the API proxy, and immediately return a promise. On reply, "get" updates the matching attribute of "this", and solves the promise.
  • Promise.reduce executes items in sequence. It expects an array of functions, which either return an immediate value or a promise, in which case the sequence is blocked until it is solved. A promise can solve to another promise, which is really another way of expressing a sequence. Notice it's all functions ; if promises where directly expressed they would be executed, but in our case their execution depend on completion of previous steps in the sequence.
  • "listen" calls a remote function that calls us back on change. Notice an event differs from a promise : a promise is solved once, while an event is called as much as needed. In our specific implementation "listen" can fail if we are already listening, which may happen due to order of initialization with other components
  • "settle" just waits for the completion of an array of promises, either failing or succeeding
  • "observe" is a recent addition to javascript, and is a lightweight way to be notified for changes on an local oject. It's called once per event loop, this is why it receives a "change" array.
Not visible here but important as well : all promises are created with a 30s timeout. That feature made me use bluebird, recommended by Aurelia's documentation, instead of q. This way all delayed actions can be considered as failed after a timeout, no matter the origin, whether it's an ajax call or another communication protocol, or even as a temporary solution to a failure you can't spot in a series of callbacks : no matter what, you'll never get a promise stuck forever. 

WebRTC Workaround

I couldn't get wrtc to run more than 10 minutes without crashing ; I went as far isolating it inside a subprocess, so I'm sure it's the one creating segmentation faults. Strangely enough, that's the only binding I could find for an interpreted language. I didn't take the time so far to get a nice isolated coredump and either open a ticket or try to fix it myself, I knew this would add weeks of delay to my progress. So I went for an option that would get me a 100% chance of working for now: run the webrtc part of the server inside a browser. I get CPU peaks time to time but never a crash.

Chrome's native messaging allows extensions to run a local program. The best wrapper I could find for node is https://github.com/jdiamond/chrome-native-messaging.git . But it does complicate the infrastructure again as Chrome cannot directly talk to any external program, even as an extension : it must be the one executing it. So this extension comes in two part :
  • The frontend of the extension is a background page managing WebRTC connections. I extracted the logic from my original nodejs attempt, it didn't require much rework.
  • The extension launches a native application, which is really just a node script. This native part is needed to talk to the server via ZeroMQ, while it talks to the extension via standard input/output streams.
Setting up the webrtc connection then follows this workflow :
  • a offer/answer exchange must be done in order to setup the WebRTC connection. This is done via HTTP on the main server.
  • The server forwards the offer to the native part of the extensions via ZeroMQ
  • The native extension then talks to the background page via the native messaging
  • Once the WebRTC connection is setup, the extension routes messages to Ableton if it's API commands, or to the server if it's MIDI
It's a bit messy to my taste, especially due to the fact that there is only one communication channel between the native extension and the background page, and the background page is the only one able to talk to the browser via WebRTC. This means the stream between the extension and the native application contains messages of mixed nature : it can be the offer/answer exchange, or midi that must be relayed to the server, or commands to send to Ableton, or replies coming from Ableton, or listener notifications coming from Ableton. Each message has a specific signature in order to be properly routed.

Fortunately, the only purpose of this extension is to route messages. It's also adding the "clientid" to the requests, so it's almost stateless, the only state is connections. Messages and replies contain all the state needed for routing.

I'd like to see that part merged with the server once the WebRTC instability is fixed ; it's a workaround, and  not a proper architecture.


Due to browser limitations we can't have a web application directly talking to ableton. Browser-side, communication protocols are limited; in ableton live, control surface scripts can't rely on threads. This is what justifies the mix of ZeroMQ and WebRTC I presented here. If you intend to build a web application with the lowest network latency you can get, I hope it helped.

19 septembre 2015

Building a HTML5 Control Surface for Ableton Live , part 1

Before I release the code I'd like to clear my ideas by sharing the challenges I encountered, the choices I made, and most importantly, the trial and failures I had to admit before considering other options.

In this first part I will focus on the general approach, what got me inspired and how I could get the most out of ableton’s live API.

I’ll explain the networking choices and the web application in another article, but here is an overview of the architecture :

│                    Server                                      │
│                                                                │
│                                       +----------------+       │
│ +-----------------------------+       │ WebRTC bridge  <---------------------+
│ │                             │       +-^---^----------+       │WebRTC       │
│ │                            ZeroMQ     │   │                  │             │
│ │     Ableton Live       +----+---------+   │                  │             │
│ │           ^            │    │             │ZeroMQ            │             │
│ │           │Python API  │    │             │                  │          +--v------------+
│ │ +---------v------------+-+  │ MIDI   +----v--------------+   │ HTTP     │               │
│ │ │ Control surface script <-----------> nodejs webserver  <-------------->  Web browser  │
│ │ │                        │  │        +-------------------+   │          │               │
│ │ +------------------------+  │                                │          +---------------+
│ │                             │                                │
│ +-----------------------------+                                │
│                                                                │
If you can’t draw it in ASCII-art, then it’s too complicated.

This looks crazy indeed, but this is the combination that gave me the best results so far.1

Unlike too many software engineers who are proud to show off their overly complicated architecture, I don’t like complexity, I’m totally ashamed about this mess. But if it’s necessary to get something done, I've got to bite the bullet.

What I’d prefer to have is indeed just a single page that directly talks to Ableton Live, but due to today’s hard limits and the short latency we aim for, this is not possible. Now indeed HTML5 is a strange choice if we want low latency, but I think it’s the best option for quick prototyping. Artists know that best ideas come from the fact that you can try many of them fast, and this is how you keep being inspired and motivated. Let’s put it this way : HTML is not the fastest way to browse a map, yet everyone just open maps.google.com because it’s convenient and fast enough. To generate electronic sounds you can’t beat the speed of a power source connected to oscillators, filters and whatnot, yet nowadays hardware synthesizers are hardly used and mostly for the vintage look and the physical interface, not because it’s hardwired inside.

The way we design software controllers should follow the same path. Aiming for convenience will create an environment in which the most brilliant and disruptive ideas will appear just because it’s so convenient to try many of them.

How current software is trying to control Ableton Live

I tried various applications, most of them are actually just MIDI controllers that need to be mapped to controls. They look great, and are indeed more polished than my proof of concept, but at most, they can launch clips, control volume&pan of tracks, and set parameter values for devices. Fair enough, but not something that enables you to access the full potential of Live, with limited and tedious customization.

Two options stand out : Max for live and Liime. But they suffer from the same syndrome I’ve encountered in so many « visual programming »  environments that plague the corporate world for all kind of purposes : while they help getting stuff done, they are designed for a non-existing species of half-developpers addicted to mouseclicks and end up frustrating everyone. Accidental complexity quickly builds up, it’s error prone, lacks means to get proper feedback, versioning is limited to ctrl-z and saving as .bak.

Here is an excerpt from a document produced by Liime :

A screenshot from Max4Live:

Sorry, I’m just poking some fun to make my point. I totally respect what Liime and Max developers have done all these years with the limited set of tools and practices that were available when they started, considering they’re targeting nerdish yet non-developper musicians. But I think it’s time to move on, today’s tools give us tons of opportunities that will benefit both musicians and developers at the same time. It’s time to apply the devops attitude to Electronic Music.

We need an approach were simple stuff is simple, complicated stuff is possible, allowing everyone to evolve within the same environment.

Software controllers need to follow modern good practices that applies to any business :
  • Use modern programming language with a widely tested and optimized interpreter or compiler
  • A wide variety of libraries that fit any need must be available for that programming language. You don’t want to reinvent the wheel, ever. Focus on the new thing you have to offer, the rest is just glue code between your stuff, the libraries and the reality you want to control.
  • Separate different concerns. Code and view belong to different files in different formats.
  • You know you’re doing it right when you can swap some part for a better implementation. I swapped the networking part at least five times (!)
  • The whole thing must be versionable, deployable, testable in reproducible conditions
  • The deployment procedure must be fast. The only incompressible time is the time it takes to actually code, everything else should be automated. I've seen countless situations where businesses miss opportunities just because they lack deployment automation, they are afraid to change anything, so they keep outdated technology for ages and skilled developers are hired just to become maintenance zombies.

Accessing Ableton LIVE API

What I have tried :
  • Pure midi mapping : boring, very limited.
  • Max for live:
    I tried to access the Live API through a javascript executed by Max For Live, which then receives and sends messages through a drap&dropped UDP socket.

    It’s hard to know what’s happening, crashes are silent, you are never sure if it’s running the current version, Live API documentation is mostly just a dump of the method headers while it’s quite hard and tedious to discover what all objects and methods actually do.

    I tried to make it execute arbitrary commands ; eval() is not available but there is this trick:

    var a='console.log("hello world")'; (new Function(a))()

    Still, it was too slow, and by the way, it totally crashes both Max for Live and Ableton Live if the UDP packet is too big. So you have to cut it into frames and add a sequence number yourself.

    (╯°□°)╯︵ ┻━┻

At that point I figured that using a control surface script is probably the best option. I decompiled existing python scripts and barely understood anything. This would take ages and I was not even sure this would lead me anywhere.

I spent some time googling until I found what would be my rosetta stone : showtime

Thanks to what the author has done, I finally knew I was going somewhere. I relied on it quite some time, it helped me to make some progress and make good choices. For receiving parameter updates, ZeroMQ is totally a great choice because it's so easy to setup while being blazing fast, I wouldn’t have come to this idea just by myself.

But then I hit another brick wall : I can browse tracks, devices, change parameters, receive parameters update, but that’s about it. Among other things, what about clip editing ? Was I going to expand the control surface script, that requires to restart Ableton Live every time I have to validate my changes ? When I tried to use max for live as a gateway, I was using it to send arbitrary commands ; it was too slow and buggy but otherwise I felt it was the way it should be done.

It was time for me to use all the experience I got from reverse-engineering poorly documented projects in the corporate world. « DJ Crontab » really comes from the fact I once had to debug an architecture made of at least 200 lines of crontab spread on 6 servers, executing countless scripts that communicate via CSVs moved around using « mv » and « scp » ( yes, you just read that, it exists, and people just get away with it ).

Digging a manhole

To execute control surface scripts, Live uses an embedded python2.5 interpreter. It's an outdated and no more supported version with most core libraries stripped off, but you can still drop what you need into the folders and mess around with sys.path. Just make sure your script is executed by creating a folder in the right place, it will appear as an available control surface. Write some logging to a file, and you start getting somewhere.

What I needed at this point is embedding an ipython-like REPL remotely accessible. It exists, it's rconsole, part of the rfoo library.

Once it's setup, here is what's possible :

$ rconsole
Python 2.5 (r25:51908, Jul 21 2015, 18:07:15)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> # note: when I write <tab> below, it's me tapping the tab key
>>> import Live
>>> app=Live.Application.get_application()
>>> app.<tab>
app.View                               app.__hash__                           ap
app.__class__                          app.__init__                           ap
app.__delattr__                        app.__module__                         ap
app.__dict__                           app.__ne__                             ap
app.__doc__                            app.__new__                            ap
app.__eq__                             app.__nonzero__                        ap
app.__getattribute__                   app.__reduce__                         ap
>>> doc=app.get_document()
>>> doc.tracks[0].devices[0].name
>>> device = doc.tracks[0].devices[0]
>>> device.<tab>
device.View                               device.__repr__                       
device.__class__                          device.__setattr__                    
device.__delattr__                        device.__str__                        
device.__dict__                           device.__weakref__                    
device.__doc__                            device.add_chains_listener            
device.__eq__                             device.add_drum_pads_listener         
device.__getattribute__                   device.add_has_drum_pads_listener     
device.__hash__                           device.add_name_listener              
device.__init__                           device.add_parameters_listener        
device.__module__                         device.add_return_chains_listener     
device.__ne__                             device.add_visible_drum_pads_listener 
device.__new__                            device.can_have_chains                
device.__nonzero__                        device.can_have_drum_pads             
device.__reduce__                         device.canonical_parent               
device.__reduce_ex__                      device.chains                         
>>> parameter = device.parameters[0]
>>> parameter.<tab>
parameter.__class__                         parameter.__new__                   
parameter.__delattr__                       parameter.__nonzero__               
parameter.__dict__                          parameter.__reduce__                
parameter.__doc__                           parameter.__reduce_ex__             
parameter.__eq__                            parameter.__repr__                  
parameter.__getattribute__                  parameter.__setattr__               
parameter.__hash__                          parameter.__str__                   
parameter.__init__                          parameter.__weakref__               
parameter.__module__                        parameter.add_automation_state_liste
parameter.__ne__                            parameter.add_value_listener        
>>> parameter.begin_gesture.__doc__
'\nbegin_gesture( (DeviceParameter)arg1) -> None :\n    Notify the begin of a mo


Typing commands right in the execution context is totally what you need to discover what an API is doing.

What I figured out is that all you need is to manipulate the document instance, which gives you as much control as Ableton Push, programmatically. As far as I know Push's control surface script is the only one using "begin_gesture", the only mentions of it, if you google “ "begin_gesture" ableton ”, are just decompiled files, it seems nobody else is using it. begin_gesture is triggered when you touch one of the rotary controls of Push, so you don't need to start moving it to let Live know you're using it. This is why you don't have any latch behavior when using Push, contrary to other controllers. I'll come back to it in another article but this is how I'm able to smoothly record fine-tuned effects on the baseline in the demo.

So in fact, while control surface scripts extensively use the base classes provided by the _Framework package, it's totally superfluous in our use case. It's useful for hardware devices, because hardware is not flexible, writing a firmware is not an walk in the park, therefore having an interpreted language directly embedded in the DAW is a much more flexible option. Ableton is providing framework classes that help managing typical hardware controllers features like a touch pad and dials, which is great for manufacturers.

But what about software controllers ? You have all the flexibility on your own software but when you need to change the way it interacts with Live, you have to edit the control surface script then restart the whole thing. When you're used to just hit reload on a web page to see your changes this sounds like tying your shoelaces while wearing boxing gloves.

What you need is just a way to access the document tree, call methods, get and set properties, and receiving updates.

What I did then, is to make a dummy class that just opens a ZeroMQ REP socket. I serialize commands, a request handler receives my commands and sends me back the response. Listeners are just dynamic methods exposing updates via a ZeroMQ SUB socket.

A word of warning, though : as the request handler is running in a thread, as well as rconsole, I got strange side effects. Directly calling some methods or setting some parameters totally locked Live, I had to kill the program. It's a well known limitation of python, in particular cpython, the most widely used implementation : if an embedded interpreter gives back the control to the host by calling a method, and the host calls back the interpreter without releasing the Global Interpreter Lock, you get a deadlock. What you need to do is to schedule a callback that is executed by the main thread ( this is done via schedule_message )

Embedded interpreter limitations

At first Showtime was still active in Live configuration, but I didn't need it anymore, so I disabled it. And then suddenly my script was super slow. rconsole took 4 seconds to reply. My ZeroMQ thread took as long to reply anything. What happened ?

As it seems, the embedded interpreter is only used when needed, that is when events are sent to the control surface script. The ZeroMQ thread was executed as a side-effect of showtime scheduling events with schedule_message, keeping the interpreter busy. I didn't want to use that trick, but I'm not sure if I made the best choice to deal with it.

So far I figured that the best way to communicate with the script without executing a busy loop to keep it awake, was to write a midi hook and send commands via MIDI sysex messages.

Thus what I end up with is ... JSON serialized in MIDI.

( EDIT: I switched the way messages are passed since then. An empty sysex message signals a message is available in a ZeroMQ socket, since I can't have ZeroMQ in the background )

This just works and I'm quite happy with the low latency. Sometimes you need ridiculously convoluted workarounds, and in those case it's absolutely critical to contain the problem, it must be a blackbox to all other components. The nodejs server serializes JSON coming from the web application into MIDI, reads MIDI responses and sends it back as JSON. Needless to say, you don't want your web application to care about that fact.

On the other hand, I could still use ZeroMQ to send updates from Live because, like MIDI events hooks, parameter update listeners just wake up the interpreter and executes python code in the main thread. You can do whatever you need from there but just make sure it's fast, because it's blocking Live. Notice that you can't call any setter from an event listener, so it's not the right place to respond to an external command. Just serialize the event and send it through the socket.

Coming up next

This is what I end up with, from the web application :

It's just like AJAX calls, except it's sent via a WebRTC DataChannel to the server instead of HTTP. A request/response cycle for arbitrary commands takes less that 20ms, and pure midi notes are indeed faster.

But all of this is for another article.


I spent quite some time figuring out what's the best option to remotely control Ableton Live, and what I found out is that people that have tried struggled as well. So that's why I wanted to write this article first: yes, I found the solution, it's fast, I know what kind of problems you had and how to solve it. I don't know if anyone else found a better solution, but as far as I know it's not something available on the interwebs, so I hope it helps.

Next article will be less focussed on Ableton Live and I will explain how was made the rest of the application. I will also mention what I tried and what didn’t work, so my failures can at least be useful to people trying to make similar time-critical web applications.

Thanks for reading, and by the way I’ll be attending Berlin loop, a summit for music makers, 30 Oct - 1 Nov 2015.

1 You have to wonder what’s this WebRTC bridge, that's because there is no stable WebRTC bindings. In the network-intensive conditions needed for this application, wrtc, the bindings for NodeJS, crashes about every 10 minutes. As strange as it sounds, WebRTC bindings are almost non-existent. I figured the most stable option was to make a chrome extension, which runs on the same computer as the webserver and Live. More on that in another article.