One of the most frequent questions on /r/podcasting is “how do I record a podcast with my co-hosts/guests being far away?” Here I’ll share my experience.
First of all, we can roughly divide all remote recording options into two categories:
Centralized recording, where all participants are recorded into a single audio track.
Double-ender style recording, where each participant gets their own audio track, usually recorded on that participant’s own computer.
Each approach has its own advantages and disadvantages. Double-ender recording usually is of higher quality than centrally recorded, for two reasons:
- Real-time VoIP audio is severely compressed1, and this is the audio stream that’s typically captured in the centralized recording scenario. In the double-ender scenario, each participant’s audio stream is being saved locally before it is compressed and transmitted over the network, so the recorded track has much higher bitrate than the VoIP stream.
- The double-ender is not affected by network glitches. If your guest’s voice turned robotic or simply vanished, you still know that their speech is being captured by their own computer in its original form.
Another advantage of a double-ender, besides quality, is that having each participant on their own audio track gives you more control in post-production. You can apply equalization, compression, and set levels for each voice independently.
That said, double-enders have two potential drawbacks:
- Unless you use specialized services (which I discuss below), each participant needs to record their own side of the conversation. A lot can go wrong here: they may not know how to record sound on their computer, or their recording may fail, or they may simply forget to press record… and if at least one participant’s track is lost, so is the whole episode.
- The tracks recorded on different computers tend to get out of sync with time—the effect known as “audio drift”.
When I started podcasting, I leaned towards centralized recording for its simplicity and robustness. I even created my own web tool which would join a skype group call and record the conversation. (This was in 2015, before skype introduced their call recording feature.)
Since then, I became a huge fan of double-enders, for the reasons described above and because now we have a few services that address the potential drawbacks I mentioned: they don’t require any extra actions from you and your guests, are reliable, and help with the audio drift problem.
The services I have experience with are Zencastr and Cast (also known as “Try Cast”). They are very similar for my purposes, except one of them works much better than the other one. I’ll give the details below, but feel free to skip straight to the conclusions. There are other offerings in this space, such as Ringr or SquadCast, but I haven’t tried them.
I used Zencastr for a year, from April 2017 to May 2018. At first I was quite happy with it, but in October 2017 I started to notice some weird clicks in Zencastr’s mp3 files.
I always do a local recording as a backup (using the method described here), and when I found the corresponding piece in my local recording, it didn’t have these artifacts.
I reported this issue to Zencastr developers on October 20, 2017, and they replied that they’ll look into it.
They didn’t get back to me after that, but in February 2018 a post was published on Zencastr’s blog which alluded to these issues and claimed they were fixed. Except they weren’t.
On October 28, I wrote this to Zencastr support:
Thank you for the recent update — there are many great features there.
However, the cracking sound I reported back in October, sadly, wasn’t resolved by this update.
Here is an example of a recording I did just today (zencastr vs local, made at the same time). I used Chrome 64 (specifically, google-chrome-stable-64.0.3282.186-1.x86_64), and zencastr gave me no warnings.
I understood that these problems were incredibly hard to track down and didn’t expect any more than “we’ll look into it”. But the response really disappointed me:
It appears one of your guests was using Chrome 63 at the time of your most recent recording.
We know you can’t always control what your guests are running, but an up to date browser will likely improve things going forward. Since Zencastr is very reliant on the browser, we would recommend double checking that Chrome up to date before each recording.
So they were saying that the clicks in my track, recorded locally on my computer, were somehow induced by my guest’s older browser. It might be that I don’t fully understand how Zencastr works, but this sounded like BS to me.
I continued using Zencastr for some time, using my local recording for my track and using Zencastr only to get my guests’ tracks. It was palatable, but the cracking sound was only getting worse with time (even when all participants were on Chrome 64), so in June 2018 I finally switched to Cast.
Also, Zencastr couldn’t fully solve the drift problem. Their February 2018 update improved things, but as the screenshot shows, the drift was still about 2 seconds per 2 hours recording time (which is a lot).
Cast has worked much better for me than Zencastr. There are occasional artifacts with Cast, too, such as this one (which I reported in July 2018):
But this one is subtle—you’ll probably need to wear a decent pair of headphones and turn the volume up to hear the difference—and that was the only time when I noticed it.
Cast recordings also suffer less from audio drift. Though not perfect, they have about 1 second or less deviation per two hours of recording in my experience.
On the other hand, some things were better in Zencastr. For instance, in Zencastr when you create a recording session, it remains accessible indefinitely for anyone with the link. So I could create a recording link a month in advance and send it to my guests. Not only was that convenient, but my guests could use the link to test that Zencastr works in their browser and that their microphone is detected. In Cast, session links are only valid until the host leaves the session, so this workflow no longer works. They have a feature to schedule a session, where they’d send the recording link to the guests 30 minutes before the start, but it’s not very convenient and doesn’t allow the guest to test their browser and microhpone in advance.
I should also mention that there was one time—in April 2017—when my guest didn’t manage to get Cast working with her browser/OS, but Zencastr worked. (That’s when I started using Zencastr.) But since I switched back to Cast in May 2018, Cast always worked for my guests.
(Added on 2018-11-03.)
Shortly after I published this article, Mark Hills from Cleanfeed got in touch with me and recommended to give it a try. Cleanfeed is not a double-ender service per se; instead, it’s a web-based centralized recording service. From what I understand, recording and mixing happens in the host’s browser, not on their server.
I only played with Cleanfeed for an hour or two, so take this with a grain of salt.
Mark wrote to me:
Though I see you were looking at double-ender, I would be grateful if you’d give it a try, at least just to experience VoIP that is as good as local audio.
So I did, and I have to say that Cleanfeed is to my ear indistinguishable from local audio. Take a listen yourself to these audio samples. One of them is local, the other one is from Cleanfeed.
If you hear a difference, try to guess which one is local and which one is remote, then hover/tap below to reveal the correct answer:
To conduct the above experiment, I opened two Chrome windows, emulating a “host” and a “guest”. To make sure Cleanfeed doesn’t cheat by going through the high-bandwidth and low-latency loopback network, my guest Chrome window was configured to go through an http proxy in Texas, while the host window would use my normal Internet connection in Ukraine.
To achieve this audio quality, Cleanfeed uses the awesome Opus audio codec with bitrates averaging to 72kbps for mono audio and 172kbps for stereo music. If 72kbps seems low to you, it shouldn’t; it’s an excellent bitrate for speech even for mp3, let alone Opus.
Thus Cleanfeed gives you great audio quality without audio drift. So what’s the catch? Here are a few limitations of Cleanfeed that I’ve discovered during my testing.
- With Cleanfeed you can only have two separate audio tracks, which will become two stereo channels in the wav file. To enable this, choose “Separate tracks” when starting the recording. This works if you have a single guest or co-host, but otherwise you’ll get an already mixed recording. For comparison, Cast allows up to 3 guests and Zencastr allows up to 2 guests on their free plan and unlimited guests on the $20/mo plan (all of which get their separate tracks).
Unlike Zencastr and Cast, Cleanfeed does not save the recording on the server, nor does it have an option to retrieve the recording from the browser’s local storage. You have to download your track before you close the window; otherwise it’s lost. It’s very scary to me how easy it is to lose a recording by accidentally closing the Chrome window or by having your laptop run out of battery.On the other hand, with Cleanfeed you can download the recording up to this moment without stopping the recording session—a feature I haven’t seen in any other services.
Cleanfeed tolerates packet loss of up to 30% surprisingly well. But if the packet loss reaches a certain level or the network connection drops for some time, the guests will disconnect and won’t be able to connect to this session again. In that case you’ll need to start a new session (and don’t forget to download your audio track before doing that). None of the services handle network disconnects particularly well, but with Cast and Zencastr guests can usually reconnect to the current session and what they were saying during the disconnect is still saved.
At this point I would definitely not recommend Zencastr because of its cracking/clicking issues. Cast, on the other hand, proved to be reliable, and I will continue to use and recommend it. Still, with any service, you should always do a local backup recording in case your service fails. (If you use Linux, see my audio recording guide.)
As for Ringr and SquadCast, they are pricier than Cast ($19 and $20 vs $10 per month, respectively, for the plans that allow split-track recording), but if I ever run into problems with Cast, they will be next in queue to try out.
If the audio drift drives you crazy, then give Cleanfeed a try. Or if you are currently using Skype or Hangouts to record your podcast. Subjectively Cleanfeed offers much better quality than those, although I haven’t rigorously compared them head to head. But don’t forget to save your track and always have a backup.
In the coming years, I hope to see the relevant browser APIs standardized and made interoperable (most of these services are Chrome-only) and an open source version of Cast/Zencastr emerge that I could host on my own server for free for as long as I like.
in the sense of “lossy data compression”, not “dynamic range compression”↩