19 septembre 2015

Building a HTML5 Control Surface for Ableton Live , part 1

Before I release the code I'd like to clear my ideas by sharing the challenges I encountered, the choices I made, and most importantly, the trial and failures I had to admit before considering other options.

In this first part I will focus on the general approach, what got me inspired and how I could get the most out of ableton’s live API.

I’ll explain the networking choices and the web application in another article, but here is an overview of the architecture :

│                    Server                                      │
│                                                                │
│                                       +----------------+       │
│ +-----------------------------+       │ WebRTC bridge  <---------------------+
│ │                             │       +-^---^----------+       │WebRTC       │
│ │                            ZeroMQ     │   │                  │             │
│ │     Ableton Live       +----+---------+   │                  │             │
│ │           ^            │    │             │ZeroMQ            │             │
│ │           │Python API  │    │             │                  │          +--v------------+
│ │ +---------v------------+-+  │ MIDI   +----v--------------+   │ HTTP     │               │
│ │ │ Control surface script <-----------> nodejs webserver  <-------------->  Web browser  │
│ │ │                        │  │        +-------------------+   │          │               │
│ │ +------------------------+  │                                │          +---------------+
│ │                             │                                │
│ +-----------------------------+                                │
│                                                                │
If you can’t draw it in ASCII-art, then it’s too complicated.

This looks crazy indeed, but this is the combination that gave me the best results so far.1

Unlike too many software engineers who are proud to show off their overly complicated architecture, I don’t like complexity, I’m totally ashamed about this mess. But if it’s necessary to get something done, I've got to bite the bullet.

What I’d prefer to have is indeed just a single page that directly talks to Ableton Live, but due to today’s hard limits and the short latency we aim for, this is not possible. Now indeed HTML5 is a strange choice if we want low latency, but I think it’s the best option for quick prototyping. Artists know that best ideas come from the fact that you can try many of them fast, and this is how you keep being inspired and motivated. Let’s put it this way : HTML is not the fastest way to browse a map, yet everyone just open maps.google.com because it’s convenient and fast enough. To generate electronic sounds you can’t beat the speed of a power source connected to oscillators, filters and whatnot, yet nowadays hardware synthesizers are hardly used and mostly for the vintage look and the physical interface, not because it’s hardwired inside.

The way we design software controllers should follow the same path. Aiming for convenience will create an environment in which the most brilliant and disruptive ideas will appear just because it’s so convenient to try many of them.

How current software is trying to control Ableton Live

I tried various applications, most of them are actually just MIDI controllers that need to be mapped to controls. They look great, and are indeed more polished than my proof of concept, but at most, they can launch clips, control volume&pan of tracks, and set parameter values for devices. Fair enough, but not something that enables you to access the full potential of Live, with limited and tedious customization.

Two options stand out : Max for live and Liime. But they suffer from the same syndrome I’ve encountered in so many « visual programming »  environments that plague the corporate world for all kind of purposes : while they help getting stuff done, they are designed for a non-existing species of half-developpers addicted to mouseclicks and end up frustrating everyone. Accidental complexity quickly builds up, it’s error prone, lacks means to get proper feedback, versioning is limited to ctrl-z and saving as .bak.

Here is an excerpt from a document produced by Liime :

A screenshot from Max4Live:

Sorry, I’m just poking some fun to make my point. I totally respect what Liime and Max developers have done all these years with the limited set of tools and practices that were available when they started, considering they’re targeting nerdish yet non-developper musicians. But I think it’s time to move on, today’s tools give us tons of opportunities that will benefit both musicians and developers at the same time. It’s time to apply the devops attitude to Electronic Music.

We need an approach were simple stuff is simple, complicated stuff is possible, allowing everyone to evolve within the same environment.

Software controllers need to follow modern good practices that applies to any business :
  • Use modern programming language with a widely tested and optimized interpreter or compiler
  • A wide variety of libraries that fit any need must be available for that programming language. You don’t want to reinvent the wheel, ever. Focus on the new thing you have to offer, the rest is just glue code between your stuff, the libraries and the reality you want to control.
  • Separate different concerns. Code and view belong to different files in different formats.
  • You know you’re doing it right when you can swap some part for a better implementation. I swapped the networking part at least five times (!)
  • The whole thing must be versionable, deployable, testable in reproducible conditions
  • The deployment procedure must be fast. The only incompressible time is the time it takes to actually code, everything else should be automated. I've seen countless situations where businesses miss opportunities just because they lack deployment automation, they are afraid to change anything, so they keep outdated technology for ages and skilled developers are hired just to become maintenance zombies.

Accessing Ableton LIVE API

What I have tried :
  • Pure midi mapping : boring, very limited.
  • Max for live:
    I tried to access the Live API through a javascript executed by Max For Live, which then receives and sends messages through a drap&dropped UDP socket.

    It’s hard to know what’s happening, crashes are silent, you are never sure if it’s running the current version, Live API documentation is mostly just a dump of the method headers while it’s quite hard and tedious to discover what all objects and methods actually do.

    I tried to make it execute arbitrary commands ; eval() is not available but there is this trick:

    var a='console.log("hello world")'; (new Function(a))()

    Still, it was too slow, and by the way, it totally crashes both Max for Live and Ableton Live if the UDP packet is too big. So you have to cut it into frames and add a sequence number yourself.

    (╯°□°)╯︵ ┻━┻

At that point I figured that using a control surface script is probably the best option. I decompiled existing python scripts and barely understood anything. This would take ages and I was not even sure this would lead me anywhere.

I spent some time googling until I found what would be my rosetta stone : showtime

Thanks to what the author has done, I finally knew I was going somewhere. I relied on it quite some time, it helped me to make some progress and make good choices. For receiving parameter updates, ZeroMQ is totally a great choice because it's so easy to setup while being blazing fast, I wouldn’t have come to this idea just by myself.

But then I hit another brick wall : I can browse tracks, devices, change parameters, receive parameters update, but that’s about it. Among other things, what about clip editing ? Was I going to expand the control surface script, that requires to restart Ableton Live every time I have to validate my changes ? When I tried to use max for live as a gateway, I was using it to send arbitrary commands ; it was too slow and buggy but otherwise I felt it was the way it should be done.

It was time for me to use all the experience I got from reverse-engineering poorly documented projects in the corporate world. « DJ Crontab » really comes from the fact I once had to debug an architecture made of at least 200 lines of crontab spread on 6 servers, executing countless scripts that communicate via CSVs moved around using « mv » and « scp » ( yes, you just read that, it exists, and people just get away with it ).

Digging a manhole

To execute control surface scripts, Live uses an embedded python2.5 interpreter. It's an outdated and no more supported version with most core libraries stripped off, but you can still drop what you need into the folders and mess around with sys.path. Just make sure your script is executed by creating a folder in the right place, it will appear as an available control surface. Write some logging to a file, and you start getting somewhere.

What I needed at this point is embedding an ipython-like REPL remotely accessible. It exists, it's rconsole, part of the rfoo library.

Once it's setup, here is what's possible :

$ rconsole
Python 2.5 (r25:51908, Jul 21 2015, 18:07:15)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> # note: when I write <tab> below, it's me tapping the tab key
>>> import Live
>>> app=Live.Application.get_application()
>>> app.<tab>
app.View                               app.__hash__                           ap
app.__class__                          app.__init__                           ap
app.__delattr__                        app.__module__                         ap
app.__dict__                           app.__ne__                             ap
app.__doc__                            app.__new__                            ap
app.__eq__                             app.__nonzero__                        ap
app.__getattribute__                   app.__reduce__                         ap
>>> doc=app.get_document()
>>> doc.tracks[0].devices[0].name
>>> device = doc.tracks[0].devices[0]
>>> device.<tab>
device.View                               device.__repr__                       
device.__class__                          device.__setattr__                    
device.__delattr__                        device.__str__                        
device.__dict__                           device.__weakref__                    
device.__doc__                            device.add_chains_listener            
device.__eq__                             device.add_drum_pads_listener         
device.__getattribute__                   device.add_has_drum_pads_listener     
device.__hash__                           device.add_name_listener              
device.__init__                           device.add_parameters_listener        
device.__module__                         device.add_return_chains_listener     
device.__ne__                             device.add_visible_drum_pads_listener 
device.__new__                            device.can_have_chains                
device.__nonzero__                        device.can_have_drum_pads             
device.__reduce__                         device.canonical_parent               
device.__reduce_ex__                      device.chains                         
>>> parameter = device.parameters[0]
>>> parameter.<tab>
parameter.__class__                         parameter.__new__                   
parameter.__delattr__                       parameter.__nonzero__               
parameter.__dict__                          parameter.__reduce__                
parameter.__doc__                           parameter.__reduce_ex__             
parameter.__eq__                            parameter.__repr__                  
parameter.__getattribute__                  parameter.__setattr__               
parameter.__hash__                          parameter.__str__                   
parameter.__init__                          parameter.__weakref__               
parameter.__module__                        parameter.add_automation_state_liste
parameter.__ne__                            parameter.add_value_listener        
>>> parameter.begin_gesture.__doc__
'\nbegin_gesture( (DeviceParameter)arg1) -> None :\n    Notify the begin of a mo


Typing commands right in the execution context is totally what you need to discover what an API is doing.

What I figured out is that all you need is to manipulate the document instance, which gives you as much control as Ableton Push, programmatically. As far as I know Push's control surface script is the only one using "begin_gesture", the only mentions of it, if you google “ "begin_gesture" ableton ”, are just decompiled files, it seems nobody else is using it. begin_gesture is triggered when you touch one of the rotary controls of Push, so you don't need to start moving it to let Live know you're using it. This is why you don't have any latch behavior when using Push, contrary to other controllers. I'll come back to it in another article but this is how I'm able to smoothly record fine-tuned effects on the baseline in the demo.

So in fact, while control surface scripts extensively use the base classes provided by the _Framework package, it's totally superfluous in our use case. It's useful for hardware devices, because hardware is not flexible, writing a firmware is not an walk in the park, therefore having an interpreted language directly embedded in the DAW is a much more flexible option. Ableton is providing framework classes that help managing typical hardware controllers features like a touch pad and dials, which is great for manufacturers.

But what about software controllers ? You have all the flexibility on your own software but when you need to change the way it interacts with Live, you have to edit the control surface script then restart the whole thing. When you're used to just hit reload on a web page to see your changes this sounds like tying your shoelaces while wearing boxing gloves.

What you need is just a way to access the document tree, call methods, get and set properties, and receiving updates.

What I did then, is to make a dummy class that just opens a ZeroMQ REP socket. I serialize commands, a request handler receives my commands and sends me back the response. Listeners are just dynamic methods exposing updates via a ZeroMQ SUB socket.

A word of warning, though : as the request handler is running in a thread, as well as rconsole, I got strange side effects. Directly calling some methods or setting some parameters totally locked Live, I had to kill the program. It's a well known limitation of python, in particular cpython, the most widely used implementation : if an embedded interpreter gives back the control to the host by calling a method, and the host calls back the interpreter without releasing the Global Interpreter Lock, you get a deadlock. What you need to do is to schedule a callback that is executed by the main thread ( this is done via schedule_message )

Embedded interpreter limitations

At first Showtime was still active in Live configuration, but I didn't need it anymore, so I disabled it. And then suddenly my script was super slow. rconsole took 4 seconds to reply. My ZeroMQ thread took as long to reply anything. What happened ?

As it seems, the embedded interpreter is only used when needed, that is when events are sent to the control surface script. The ZeroMQ thread was executed as a side-effect of showtime scheduling events with schedule_message, keeping the interpreter busy. I didn't want to use that trick, but I'm not sure if I made the best choice to deal with it.

So far I figured that the best way to communicate with the script without executing a busy loop to keep it awake, was to write a midi hook and send commands via MIDI sysex messages.

Thus what I end up with is ... JSON serialized in MIDI.

( EDIT: I switched the way messages are passed since then. An empty sysex message signals a message is available in a ZeroMQ socket, since I can't have ZeroMQ in the background )

This just works and I'm quite happy with the low latency. Sometimes you need ridiculously convoluted workarounds, and in those case it's absolutely critical to contain the problem, it must be a blackbox to all other components. The nodejs server serializes JSON coming from the web application into MIDI, reads MIDI responses and sends it back as JSON. Needless to say, you don't want your web application to care about that fact.

On the other hand, I could still use ZeroMQ to send updates from Live because, like MIDI events hooks, parameter update listeners just wake up the interpreter and executes python code in the main thread. You can do whatever you need from there but just make sure it's fast, because it's blocking Live. Notice that you can't call any setter from an event listener, so it's not the right place to respond to an external command. Just serialize the event and send it through the socket.

Coming up next

This is what I end up with, from the web application :

It's just like AJAX calls, except it's sent via a WebRTC DataChannel to the server instead of HTTP. A request/response cycle for arbitrary commands takes less that 20ms, and pure midi notes are indeed faster.

But all of this is for another article.


I spent quite some time figuring out what's the best option to remotely control Ableton Live, and what I found out is that people that have tried struggled as well. So that's why I wanted to write this article first: yes, I found the solution, it's fast, I know what kind of problems you had and how to solve it. I don't know if anyone else found a better solution, but as far as I know it's not something available on the interwebs, so I hope it helps.

Next article will be less focussed on Ableton Live and I will explain how was made the rest of the application. I will also mention what I tried and what didn’t work, so my failures can at least be useful to people trying to make similar time-critical web applications.

Thanks for reading, and by the way I’ll be attending Berlin loop, a summit for music makers, 30 Oct - 1 Nov 2015.

1 You have to wonder what’s this WebRTC bridge, that's because there is no stable WebRTC bindings. In the network-intensive conditions needed for this application, wrtc, the bindings for NodeJS, crashes about every 10 minutes. As strange as it sounds, WebRTC bindings are almost non-existent. I figured the most stable option was to make a chrome extension, which runs on the same computer as the webserver and Live. More on that in another article.

Aucun commentaire :

Enregistrer un commentaire