Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

  • Submit a Support Request
  • Home
  • Documentation
  • Technical Reference

How Remoto Playback Handles Multichannel Audio Fold Down

Written by Sara Griffith

Updated at May 8th, 2026

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

  • Documentation
    User Guide Quickstart Guides Release Notes Technical Reference
  • Troubleshooting
  • Compatibility
  • Read Me
  • How To
  • FAQ
+ More

Table of Contents

How Remoto Playback Handles Multichannel Audio Fold Down At a Glance Overview What Is a Fold Down? Supported Audio Input Modes Fold Down Coefficients 5.1 to Stereo (RFC 7845, Figure 7) 7.1 to Stereo (RFC 7845, Figure 9) Discrete 16-Channel Streams How Each Client Handles the Fold Down Web access (Browser) Desktop Application (macOS / Windows) iOS Application (iPhone / iPad) Apple TV Application (tvOS) Platform Comparison Matrix What Will I Hear Differently? Production Guidance What to Expect as a Viewer What to Expect as a Host / Organizer Recommendations for Critical Listening Good to Know Frequently Asked Questions Glossary

How Remoto Playback Handles Multichannel Audio Fold Down


At a Glance

If you are joining a Remoto Playback session and the host is streaming surround sound (5.1 or 7.1), what you hear depends on your output device. If your device outputs stereo (laptop speakers, headphones, iPhone, or the web browser), you will automatically hear a stereo version of the mix. Dialogue, music, and effects are all preserved. Surrounds are blended into Left and Right, and the LFE (subwoofer) channel is mixed in at a reduced level. No setup is required on your end. If you have a surround-capable output (Apple TV with a receiver, or Desktop with a multichannel audio interface), you will hear the full discrete surround mix.

For 16-channel sessions, the Desktop app provides full discrete channel routing. iOS and Apple TV allow you to select a stereo pair to listen to. 16-channel sessions are not available in the web browser — browser users are guided to join from a native app (Desktop, iOS, or Apple TV).


Overview

When a Remoto Playback session is configured to stream multichannel audio (5.1, 7.1, or up to 16 discrete channels), participants who are monitoring on a stereo device, or who explicitly select stereo output, will hear a fold down (also called a downmix) of the multichannel stream into two channels (Left and Right).

This article explains:

  • What a fold down is and why it happens
  • How Remoto performs the fold down on each client platform
  • The exact fold down coefficients used
  • What production teams should keep in mind

What Is a Fold Down?

A fold down is the process of combining a multichannel audio signal (for example, 5.1 surround with six discrete channels) into a signal with fewer channels (typically stereo). The goal is to preserve the spatial intent of the original mix as faithfully as possible, while ensuring all content (dialogue, music, effects, surrounds) remains audible in the reduced channel count.

Remoto uses the downmix matrices defined in RFC 7845 (Ogg Encapsulation for the Opus Audio Codec, Section 5.1.1.5), which are designed to preserve perceived spatial intensity while preventing clipping through global normalization.


Supported Audio Input Modes

The Remoto host (Desktop application) can configure a session with any of the following audio input modes. The input mode defines the channel layout and ordering, and this is critical because the fold down matrix must know which channel is which.

Input Mode Channels Channel Order
Stereo 2 L, R
5.1 Film 6 L, C, R, Ls, Rs, LFE
5.1 SMPTE 6 L, R, C, LFE, Ls, Rs
7.1 Film 8 L, C, R, Ls, Rs, Lrs, Rrs, LFE
7.1 SMPTE 8 L, R, C, LFE, Ls, Rs, Lrs, Rrs
16 Discrete 16 Discrete (no spatial relationship assumed)

Important: The distinction between Film and SMPTE orderings affects where the Center, LFE, and Surround channels sit in the stream. The fold down engine must respect this ordering to avoid misassigning channels (for example, treating LFE as a Center channel).


Fold Down Coefficients

Remoto uses the stereo downmix matrices specified in RFC 7845 (Section 5.1.1.5), which is the standard for Opus multichannel audio. These matrices are designed so that:

  • Front-left and front-right channels are passed through directly (at normalized gain).
  • Surround channels are cross-mixed into both left and right outputs, weighted more heavily toward their own side. Coefficients are chosen so their squares sum to 1, preserving perceived intensity.
  • The LFE channel is included in the downmix at a reduced level (same weight as the Center channel).
  • All coefficients are globally normalized so each row sums to 2, as a compromise between preventing clipping and preserving dynamic range.

5.1 to Stereo (RFC 7845, Figure 7)

Channel order in Opus mapping family 1 for 5.1: FL, FC, FR, RL, RR, LFE

Source Channel Left Output Coefficient Right Output Coefficient Relative Gain (dB)
FL (Front Left) 0.529067 0.0 -5.5 dB
FC (Front Center) 0.374107 0.374107 -8.5 dB
FR (Front Right) 0.0 0.529067 -5.5 dB
RL (Rear Left) 0.458186 0.264534 -6.8 / -11.6 dB
RR (Rear Right) 0.264534 0.458186 -11.6 / -6.8 dB
LFE 0.374107 0.374107 -8.5 dB

Resulting formula:

L_out = 0.529067×FL + 0.374107×FC + 0.458186×RL + 0.264534×RR + 0.374107×LFE
R_out = 0.529067×FR + 0.374107×FC + 0.264534×RL + 0.458186×RR + 0.374107×LFE

Exact unnormalized coefficient values are 1, 1/√2, √3/2, and 1/2, multiplied by 2/(1 + 1/√2 + √3/2 + 1/2 + 1/√2) for normalization.

Note on LFE: Unlike the traditional ITU-R BS.775 broadcast standard (which omits LFE), the Opus specification includes the LFE in the stereo downmix at the same coefficient as the Center channel. The global normalization prevents overload. This means deep bass content carried exclusively in the LFE channel will still be audible in the stereo fold down, though at a reduced level.

Note on surround cross-mixing: Each surround channel is mixed into both stereo outputs — more heavily to its own side (√3/2 relative weight) and less to the opposite side (1/2 relative weight). This preserves perceived spatial intensity better than routing each surround exclusively to one side.

7.1 to Stereo (RFC 7845, Figure 9)

Channel order in Opus mapping family 1 for 7.1: FL, FC, FR, SL, SR, RL, RR, LFE

Source Channel Left Output Coefficient Right Output Coefficient Relative Gain (dB)
FL (Front Left) 0.388631 0.0 -8.2 dB
FC (Front Center) 0.274804 0.274804 -11.2 dB
FR (Front Right) 0.0 0.388631 -8.2 dB
SL (Side Left) 0.336565 0.194316 -9.5 / -14.2 dB
SR (Side Right) 0.194316 0.336565 -14.2 / -9.5 dB
RL (Rear Left) 0.336565 0.194316 -9.5 / -14.2 dB
RR (Rear Right) 0.194316 0.336565 -14.2 / -9.5 dB
LFE 0.274804 0.274804 -11.2 dB

Resulting formula:

L_out = 0.388631×FL + 0.274804×FC + 0.336565×SL + 0.194316×SR + 0.336565×RL + 0.194316×RR + 0.274804×LFE
R_out = 0.388631×FR + 0.274804×FC + 0.194316×SL + 0.336565×SR + 0.194316×RL + 0.336565×RR + 0.274804×LFE

Exact unnormalized coefficient values are 1, 1/√2, √3/2, and 1/2, multiplied by 2/(2 + 2/√2 + √3) for normalization.

Discrete 16-Channel Streams

When the session is configured as 16 Channels (16 discrete channels with no defined spatial relationship), a standard surround fold down matrix does not apply.

  • Web access (Browser): 16-channel sessions are not available in the web browser. Users attempting to join from a browser are guided to use a native app (Desktop, iOS, or Apple TV).
  • iOS (iPhone / iPad): iOS receives all 16 channels via SRT and decodes locally. The guest is presented with a stereo pair selector (audio drawer with left/right arrows) to choose which channel pair to listen to. Defaults to channels 1+2.
  • Apple TV (tvOS): Apple TV receives all 16 channels via SRT and decodes locally. The guest is presented with a stereo pair selector (defaults to channels 1+2). If a surround-capable receiver or soundbar is connected, surround passthrough is available for applicable formats.
  • Desktop App: The only client that supports full 16-channel discrete routing. Can route individual discrete channels to specific output pairs. When stereo output is selected, the behavior depends on the host's routing configuration. If no custom routing is defined, the first two channels are passed through as L/R.

How Each Client Handles the Fold Down

Web access (Browser)

Aspect Detail
Fold down location Client-side (browser, Web Audio API)
When it happens At audio output, after the browser decodes the multichannel Opus stream
Mechanism The browser receives the full multichannel Opus stream via WebRTC. The local libopus decoder applies the RFC 7845 downmix to produce stereo output for the device's speakers or headphones.
Codec Opus (compressed) multichannel via WebRTC, decoded locally
User control None. Web access always outputs stereo. Volume and mute controls are available, but output channel configuration is not.
Audio processing chain WebRTC MediaStream (multichannel Opus) > libopus decode + RFC 7845 downmix > AudioProcessor (Web Audio API: source > delay > gain > mute > destination).
16-channel support Not available in the web browser. For 16-channel sessions, join using a native app (Desktop, iOS, or Apple TV).

Desktop Application (macOS / Windows)

Aspect Detail
Fold down location Client-side (local audio engine within the Desktop app)
When it happens When the user selects a stereo output device or stereo output mode while a multichannel session is active
Mechanism The Desktop app receives the full multichannel stream (up to 16 channels) via SRT. Its native C++ audio engine applies the RFC 7845 downmix matrix locally, using the session's AudioInputMode to determine the correct channel ordering (Film or SMPTE).
Codec Opus over SRT for streaming; PCM internally for mixing
User control Full. The user can select output mode (stereo, 5.1, 7.1, discrete), choose audio output devices, and adjust per-channel routing. The fold down is only applied when the output mode has fewer channels than the input.
Quality Highest fidelity fold down, as the app has access to discrete channels before mixing

iOS Application (iPhone / iPad)

Aspect Detail
Fold down location Client-side (Opus → PCM → Obj-C++ downmix)
When it happens Automatically when the device outputs to stereo speakers or headphones
Mechanism The iOS app receives the full multichannel SRT stream (stereo, 5.1, 7.1, or 16 discrete channels). After Opus decode, the downmix is performed in PCM space in Obj-C++ on the device.
Codec Opus over SRT, decoded locally to PCM, downmixed in Obj-C++
User control For stereo, 5.1, and 7.1 sessions: none — automatic stereo downmix. For 16-channel sessions: the guest can select a stereo pair to listen to via an audio drawer (left/right arrows to navigate pairs, defaults to channels 1+2).
Spatial Audio note When using AirPods Pro or AirPods Max with Spatial Audio enabled, iOS may apply head-tracked spatialization to the signal. This is an Apple system-level feature and is independent of Remoto's stream delivery.
16-channel support Yes. iOS receives all 16 channels via SRT. The guest selects a stereo pair to listen to (defaults to channels 1+2). For full discrete channel routing, use the Desktop app.

Apple TV Application (tvOS)

Aspect Detail
Fold down location Client-side (Opus → PCM → Obj-C++ downmix)
When it happens Automatically based on the audio output configuration of the Apple TV
Mechanism The Apple TV app receives the full multichannel SRT stream (stereo, 5.1, 7.1, or 16 discrete channels). After Opus decode, the downmix is performed in PCM space in Obj-C++. Apple TV detects the output device's capabilities and outputs in the format the attached hardware supports (stereo, 5.1, or 7.1). If connected to a surround receiver or soundbar, it passes through the full multichannel audio.
Codec Opus over SRT, decoded locally to PCM, downmixed in Obj-C++
User control Users can configure their Apple TV's audio output format in tvOS Settings (Stereo, Dolby Digital 5.1, Dolby Atmos). The Remoto stream is delivered in full multichannel, so the Apple TV will output surround if the connected hardware supports it. For 16-channel sessions, the guest can select a stereo pair to listen to (defaults to channels 1+2).
16-channel support Yes. Apple TV receives all 16 channels via SRT. The guest selects a stereo pair to listen to (defaults to channels 1+2). If a surround receiver or soundbar is connected, surround passthrough is available. For full discrete channel routing, use the Desktop app.

Platform Comparison Matrix

  Desktop App Web access iOS App Apple TV App
Transport SRT (end-to-end) WebRTC (full multichannel) SRT (end-to-end) SRT (end-to-end)
Receives multichannel? Yes (up to 16ch) Yes (up to 7.1) Yes (up to 16ch) Yes (up to 16ch)
Fold down location Client-side Client-side (browser) Client-side (Obj-C++) Client-side (Obj-C++)
Fold down engine Native C++ audio engine libopus / Web Audio API Opus → PCM → Obj-C++ downmix Opus → PCM → Obj-C++ downmix
Fold down standard RFC 7845 (Opus) RFC 7845 (Opus) Custom Obj-C++ downmix (PCM space) Custom Obj-C++ downmix (PCM space)
LFE handling Included at reduced level Included at reduced level Included in downmix Included in downmix
User can select output mode Yes No No (stereo only); 16ch: stereo pair selector Via tvOS audio settings; 16ch: stereo pair selector
Channel ordering aware Yes (Film/SMPTE) Yes (via Opus channel mapping) Yes (via SRT stream metadata) Yes (via SRT stream metadata)
16-channel support Yes (full discrete routing) No (guided to use native app) Yes (stereo pair selector, defaults to 1+2) Yes (stereo pair selector, defaults to 1+2; surround passthrough if hardware supports it)
Adjustable fold down coefficients No No No No

What Will I Hear Differently?

If you are used to hearing content in surround and are now listening through a stereo fold down (because your output device is stereo, or you are on Web access), here is what changes perceptually:

What you're listening for What happens in the fold down
Dialogue Sounds the same or very similar. The Center channel (where dialogue typically lives) is mixed equally into Left and Right at a reduced level (-8.5 dB for 5.1). Dialogue remains clear and centered in the stereo image.
Music and effects in the front L/R channels Passed through at the highest coefficient in the matrix (-5.5 dB for 5.1). These remain the dominant elements in the stereo image.
Surround effects (ambience, rear sound design, crowd noise) Still audible, but no longer "behind" you. They are cross-mixed into both Left and Right channels (more strongly to their own side). You will hear them, but they will feel like they are part of the front soundstage rather than coming from behind.
Subwoofer / LFE (deep bass rumbles, explosions) Still audible at a reduced level (-8.5 dB for 5.1, same weight as Center). The LFE is included in the Opus downmix, so deep bass content carried in the LFE channel will still be present in stereo, though quieter than on a full-range surround system.
Overall loudness The global normalization in the Opus downmix (row sums to 2) is designed to prevent clipping while preserving dynamic range. Overall level may differ slightly from the surround version.
Spatial sense Reduced. Surround sound creates a 360-degree experience. The stereo fold down preserves Left-Right separation but collapses front-back depth into the front stereo field.

In short: You will hear all the important content (dialogue, music, effects, and bass) clearly. The main difference is that surround envelopment is reduced to a front-facing stereo image.


Production Guidance

What to Expect as a Viewer

  • If you are joining a session and your output device is stereo (laptop speakers, headphones, etc.), you will automatically hear a stereo fold down. No action is required.
  • The fold down is a reference-quality stereo preview. All dialogue (Center channel) will be clearly audible, mixed equally into both Left and Right. Surround content is cross-mixed into both outputs.
  • The LFE channel is included in the stereo fold down at a reduced level. Deep bass content will still be audible, though quieter than on a dedicated subwoofer system.

What to Expect as a Host / Organizer

  • The fold down is automatic and transparent. You do not need to configure it.
  • Desktop, iOS, and Apple TV all receive the full multichannel stream via SRT. If you or your guests select stereo output (or are using stereo hardware), the fold down is applied locally on each device.
  • The fold down does not alter the source stream. All clients receive the full multichannel stream (via SRT or WebRTC). The downmix is always performed locally on each device.
  • 16-channel sessions: All native apps (Desktop, iOS, Apple TV) can join. iOS and Apple TV guests select a stereo pair to listen to. Web browser guests are not able to join and will be guided to use a native app.

Recommendations for Critical Listening

  • For critical stereo evaluation of multichannel content, perform the fold down in your DAW or NLE before streaming, where you have full control over coefficients, limiter settings, and monitoring levels.
  • Use the Desktop app with a calibrated stereo output for the most faithful fold down during a live session.
  • All fold downs happen locally on the client device. The Desktop app decodes to PCM before downmixing via native C++ (highest fidelity). The web browser downmixes within the Opus decode chain (libopus). iOS and Apple TV decode Opus to PCM, then downmix in Obj-C++.

Good to Know

Behavior Detail
Fixed coefficients per Opus standard The fold down matrix follows the RFC 7845 Opus specification and is applied automatically across all clients.
LFE is included at reduced level The .1 channel is mixed into stereo at the same coefficient as the Center channel, not at full subwoofer level. LFE-only content will be quieter than on a dedicated subwoofer system.
16-channel discrete routing is Desktop-only Only the Desktop app supports routing individual channels to specific output pairs. iOS and Apple TV can join 16-channel sessions and select a stereo pair to listen to. Web browser users are guided to use a native app.
Web access outputs stereo only The web client receives the full multichannel stream (up to 7.1) but always outputs stereo via the browser's libopus downmix. 16-channel sessions are not available in the browser.
Codec differences Desktop decodes to PCM before downmixing via native C++ (highest fidelity). Web downmixes within the Opus decode chain (libopus). iOS and Apple TV decode Opus to PCM, then downmix in Obj-C++.

Frequently Asked Questions

Q: I'm joining a session from my browser. Will I hear the surround mix?
Your browser receives the full multichannel stream (up to 7.1), but always outputs stereo. The libopus decoder in the browser applies the RFC 7845 downmix locally. All content is still audible (dialogue, music, effects) but spatial surround information is reduced to a stereo Left/Right image. This happens automatically and requires no setup. Note: 16-channel sessions are not available in the web browser.

Q: Why does the mix sound slightly different on my laptop compared to the screening room?
In a screening room with a surround speaker system, you hear discrete channels placed around you (front, sides, rear, subwoofer). On your laptop, all of those channels are folded into two speakers. The center channel (dialogue) is blended into both left and right, surround elements are cross-mixed to the front, and the LFE is included at a reduced level. The artistic intent is preserved, but the immersive spatial experience is inherently limited by stereo speakers.

Q: Can I hear the full surround mix on my iPhone or Apple TV?
The iOS and Apple TV apps receive the same full multichannel SRT stream as the Desktop app. On Apple TV connected to a surround receiver or soundbar, you will hear the full 5.1 or 7.1 mix. On iPhone with stereo speakers or headphones, the app will automatically downmix to stereo. For 16-channel sessions, iOS and Apple TV receive all 16 channels and allow you to select a stereo pair to listen to (defaults to channels 1+2). For full discrete 16-channel routing to individual output pairs, use the Desktop app. 16-channel sessions are not available in the web browser.

Q: Will I miss any dialogue or important audio?
No. All channels are included in the stereo downmix. Dialogue is typically carried on the Center channel, which is mixed into both Left and Right. Music and effects in the front Left/Right channels pass through at the highest gain in the matrix. Surround effects are still audible, cross-mixed to both outputs. The LFE (subwoofer) channel is also included at a reduced level, so deep bass content remains present.

Q: Is the subwoofer / LFE channel included in the stereo fold down?
Yes. Unlike the traditional broadcast standard (ITU-R BS.775) which omits LFE, the Opus downmix (RFC 7845) includes the LFE at a reduced level — the same coefficient as the Center channel. The global normalization of the matrix prevents overload. This means bass-only content in the LFE will be present in stereo, though quieter than on a dedicated subwoofer system.

Q: The stereo fold down sounds louder than the surround version. Is that normal?
Yes. When multiple channels are summed into two, the combined signal level increases. This is expected behavior. If the increase is distracting, reduce your output volume. The relative balance between dialogue, music, and effects remains correct.

Q: Can I adjust the fold down settings (for example, make surrounds louder or quieter)?
The fold down applies optimized coefficients defined in RFC 7845, the industry standard for Opus multichannel audio. For critical work requiring custom fold down settings, we recommend performing the fold down in your DAW or NLE before streaming.

Q: Does the fold down affect the source stream or what other participants hear?
No. The fold down is applied independently per client. All clients receive the full multichannel stream — the fold down only happens locally on each device if the output is stereo. The original multichannel stream is never altered. A guest on a Desktop or Apple TV with a 5.1 system will hear full surround, while a web guest or someone on headphones will hear stereo, simultaneously, in the same session.

Q: I'm connected to a soundbar / HomePod / AirPods. What will I hear?
All clients receive the full multichannel stream. What you hear depends on your output device: a surround-capable soundbar or receiver (on Apple TV or Desktop) will output the full 5.1/7.1 mix; stereo speakers, AirPods, or a web browser will output a stereo downmix performed locally. Apple's Spatial Audio (on AirPods Pro/Max) may apply head-tracked spatialization for a wider feel — this is an Apple system feature independent of Remoto.

Q: Can I join a 16-channel session from my iPhone or Apple TV?
Yes. iOS and Apple TV receive all 16 channels via SRT. Since there is no standard downmix matrix for 16 unrelated discrete channels, these apps present a stereo pair selector that lets you choose which channel pair to listen to (defaults to channels 1+2). Use the left/right arrows in the audio drawer to navigate between pairs. For full discrete 16-channel routing to individual output pairs, use the Desktop app.



Glossary

Term Definition
RFC 7845 IETF standard defining the Ogg encapsulation for the Opus audio codec. Section 5.1.1.5 specifies the stereo downmix matrices used by Remoto for multichannel-to-stereo fold down.
LFE Low Frequency Effects, the ".1" channel in 5.1/7.1. Carries bass content at +10 dB relative level, intended for subwoofer reproduction. Included in the Opus downmix at a reduced coefficient.
Opus An open, royalty-free audio codec optimized for interactive speech and music. Used by Remoto for real-time streaming over SRT and WebRTC. Supports multichannel via the MultiOpus extension (RFC 8486).
MultiOpus Extension of the Opus codec for more than 2 channels. Uses multiple coupled and uncoupled Opus streams with a channel mapping table to represent 5.1, 7.1, and higher channel counts.
SRT Secure Reliable Transport. An open-source, low-latency streaming protocol used by Remoto to deliver full multichannel audio to Desktop, iOS, and Apple TV clients end-to-end.
WHEP WebRTC-HTTP Egress Protocol. The standard used by clients to subscribe to a WebRTC stream from the streaming server.
SDP Session Description Protocol. Describes the media capabilities (codecs, channels, etc.) negotiated between WebRTC peers. The channel count and Opus parameters are set here.
AVAudioSession Apple's iOS/tvOS framework for managing audio behavior, routing, and session configuration on Apple devices.


 

audio mixing playback integration

Was this article helpful?

Yes
No
Give feedback about this article

Related Articles

  • How to Send Remoto Playback Audio to Pro Tools or Other DAWs / NLEs
  • How to Set Up Pro Tools for Playback Streaming
  • How to Set Up Logic Pro for Playback Streaming
© 2024 Remote Studio Inc.
Resources | Careers | Legal | LinkedIn
Expand