How to Virtually Develop Sound Systems for Professional Audio Applications

August 26, 2019:

The fully virtual multidisciplinary development environment for professional audio systems has already been applied to first real-life industrial applications, and its maturity could be validated. This approach leads to substantial improvements in engineering efficiency resulting in the realization of significant business benefits.


Mvoid developed a complete virtual development environment for professional audio systems, that allows listening to a virtual prototype and thus opens the capability of assessing the product quality in a very early development stage where no real prototypes in hardware exist.


We will explain how to virtuallly develop audio systems for professional applications. For a common understanding of what Professional Audio (Abbreviation: Pro Audio) applications include, we would first like to share with you our definition of Pro Audio: For Mvoid “Professional Audio (Pro Audio)” is a term referring to an application of amplification of audio signals in pretty much any situation outside of a home (and not in a car). This may be reproduction of recorded sound as in recording studios, cinemas, clubs etc. or reinforcement of live music, speech, special effects or artistic performances. In all cases the loudspeaker systems need to be tailored to the venue.


Fundamental Considerations for the Development of a Virtual Environment
Multidisciplinary Approach

To be able to listen to an audio system based on a virtual prototype, the following engineering disciplines need to be added to multiphysical engineering analysis: Digital signal processing, psychoacoustics, subjective evaluation, sound tuning, auralization (binaural audio). Hence, the virtual prototypes are extended from a multiphysical to a multidisciplinary approach.


Workflow Process Steps

The multiphysical CAE-based simulation model needs to cover the electroacoustics of a transducer, as well as the acoustical and mechanical interaction with loudspeaker enclosures and the listening space (vibro-acoustics). Thus, Mvoid follows a workflow process of the following physical domains, fully coupled within the multiphysical model: electromagnetics, structural vibrations, acoustics (propagation of sound waves).

Mvoid follows a bottom up approach in a methodical workflow process, starting with the subcomponents of the transducer. For more details of this so-called vibro-electroacoustics model we refer to [1].


Complete Virtual Development Environment for Professional Audio Systems

While talking about general acoustic principles and their particularities when applied to a room, we are confronted by many degrees of complexity. Its particular geometry, including cavities, edges, small and wide angles between uneven surfaces with very different absorption and reflection properties due to a very complex mix of materials. One of the key attributes to the success of Mvoid’s approach is the careful consideration of boundary conditions.


Another requirement for room simulations is the need to cover the complete audible frequency range. In fact, the classic Finite Elements Analysis (FEA) fails due to numerical pollution at frequencies above 100 Hz to 500 Hz (depending on the room size). Therefore, we uniquely combine low and mid frequency simulations in the frequency domain based on FEA, and high frequency simulations in the time domain based on geometrical acoustics. Please note, that vice versa, geometrical acoustics typically fails for frequencies below 500 Hz to 100 Hz.


Finally, the degree of complexity and the level of detail considered is another key to the success of Mvoid’s approach. The ability to model a very wide dimensional scale of elements (down to the 0.1mm air gap between the voice coil and magnet of a transducer’s motor structure, up to the 10 to 40 m length of a venue), complicated mechanical structures and detailed physical phenomena (e.g. transducer and enclosure nonlinearities) can yield to very complex simulation models.


For this reason, Mvoid has developed a “step-by-step” approach. The first step, which is called the concept or reference model, aims at the true reproduction of the room acoustics of a given room. All loudspeaker sources are being considered as perfect pistons, perfectly integrated into a structure as a rigid configuration. Figure 1 depicts a concert hall with line arraysin each corner at left. As a result, the concept model will simply show us the best possible system in a given acoustic environment (that’s why we also call it a reference system). Once this is achieved and the locations of all loudspeakers have been chosen, we can consider adding, step by step, further details to the concept model.

Figure 1: Concert hall with line arrays


Virtual Tuning of Professional Audio Systems

According to S. Olive [2], W. Kippel [3] and from Mvoid’s practical experience the major factors in perceived sound quality are:

  • Frequency response smoothness on- and off-axis
  • Perceived bass extension
  • Naturalness


Naturalness can be defined as “the feeling of space”, i.e., the correct sense of ambience, or the appropriate levels of direct and reflected information.


Misalignment of Loudspeakers

There is no single speaker that can reproduce the whole audible frequency range with balanced response. As a result, several loudspeakers of different sizes (subwoofers, woofers, midranges and tweeters) are used, each covering a specific bandwidth. Ideally, they should be placed at the same general location in close proximity to each other and time aligned so that differences in distance to the listener position are minimal, and the arrival is a single wave front arriving and of requisite amplitude of each frequency band. Ultimately, the directivity of a loudspeaker system is derived from the sum of the acoustic sources. Figure 2 depicts a generic example of a single enclosure intended to be used in multiples in a line array system.

Figure 2: Generic example of a single enclosure



Figure 3 shows the polar axis with 2D and 3D polar dispersion characteristics of the single enclosure.

Figure 3: Generic example of a single enclosure


Installation space and transportation requirements of professional loudspeakers face packaging and integration constraints – the available installation space for loudspeakers is always kept at a minimum, resulting in small enclosures for woofers for reproducing low frequencies. These constraints have a severe impact on the vibrations of the speaker, and thus deteriorate frequency response smoothness and bandwidth.


By using methods of digital signal processing, these degradations in sound quality can be improved. This process of improving the perceived sound quality of an audio system is called sound tuning.Frequency response smoothness can be enhanced by equalization based on diverse filters. Equalization here means changing the sound pressure level at frequencies where we have significant peaks and notches by means of electronic filters that can boost or cut specific regions of frequencies.


Filters are also used for crossing over different types of speakers acting in different frequency regions. As said before, one single transducer cannot cover the whole audible frequency range. Audio systems usually have up to three(four)-way arrangements ((subwoofer), woofer, midrange and tweeter). It is therefore necessary to delimit the field of action of each contributor by properly setting high-pass and low-pass filters. Doing so, one will have to take specific care of the phase alignment of all speakers in order to achieve a good overlapping.


The other two main features used in the tuning process are:

  • Changes in channel gain
    That is done to compensate for different sound pressure levels of channels.
  • Adding channel delay
    The signal is delayed compensating for differences in the arrival time of sound waves from different transducers, e.g. caused by misalignment.


The third major factor “naturalness” can be classified as a spatial attribute. There is almost no indication of spatial attributes in a frequency response. And also, in the time domain, by looking at the room impulse response (RIR), it is almost impossible to judge spatial attributes of an audio system.


While the evaluation of the spectral performance of an audio system purely based on visual data, i.e. frequency responses visualized by X/Y graphs, is to some extent possible, there is no alternative to a listening test to evaluate spatial attributes. However, as in the concept phase no hardware exists, in early development phases there is only one possibility for a listening test, i.e. in the virtual domain by means of auralizations based on simulated RIRs.


Auralization of Audio Systems

For a realistic virtual listening experience, all major psychoacoustic features need to be included, and thus we cannot directly use the RIRs at ear locations. Spatial effects and localization of sound is based on binaural hearing, and such effects are not included in RIRs. At our two ears, sound events arrive with differences in time and level, caused by reflection and diffraction due to our head and torso. These differences, which are called interaural time difference (ITD) and interaural level difference (ILD), make significant changes to the incident sound waves, and are dependent on direction. Binaural cues, because of diffraction of incident sound waves on head and torso, are added to the ear sound pressure. These cues (referenced to a plane wave) are described by the so-called head-related transfer function (HRTF) [4]. HRTFs describe the relationship of sound pressure at the eardrum or ear canal entrance to an incident plane wave without reflection and diffraction effects, i.e. without head and torso.


These HRTFs can either be used by means of measurements from individual persons, which might then not fit well for listeners with different physiology, artificial heads (so-called dummy heads), or, as we propose, based on an analytical model with simplified geometry. Figure 4 shows a set-up, where a rigid sphere is being used as a head model. Due to the simplified geometry, it is possible to derive closed form solutions for the approximation of left ear’s and right ear’s HRTFs due to time and level differences of incident sound waves. It must be noted, that the model is a full 3D model and thus HRTFs vary with azimuth (i.e. horizontally) as well as elevation (vertically) for arriving sound waves.

Figure 4: HRTF model set-up


Now, as the HRTFs are known, it is possible to calculate the binaural room impulse response (BRIR) for left and right ear by applying the HRTFs to the simulated RIRs coming from our multiphysical simulation model. Despite typical binaural audio applications, where usually only one microphone located at the center of the head is used to derive a single RIR, we use two RIRs from the simulation model corresponding to the ear canal entrance location of left and right ear. Thus, ITD is directly based on the simulation model, resulting in an improved accuracy and a more natural listening experience.


An additional advantage of our analytical HRTF model is that the sound path through the ear canal is not included. At the time during playback through headphones, this part of the sound path is added by the listener’s own ear canal physiology. Our HRTF model actually calculates the sound at the ear canal entrance, which is very close to the transducer used in a headphone, and thus gives superior results. If a measured HRTF, either from individual persons or artificial heads, were being used, the ear canal sound path would be twice included during playback and would need to be excluded once. The HRTF filters are implemented in the discrete time domain, using first order IIR filters and fourth order Lagrange fractional delays, to enable real-time processing. Real-time processing is of crucial importance to realize a virtual reality (VR) like environment for tuning an audio system.


The BRIRs are finally convolved with acoustical test files, sound files containing music, speech or noise, and ultimately a binaural listening experience is created by sending the final signal to headphones.


Figure 5 shows the signal flow of our auralization environment. First, the test signals are sent to a routing matrix, distributing the two-channel (stereo) signal to the individual channels of the sound system. Then each channel’s tuning parameters are applied, and the resulting signal is sent through the simulation data, actually the RIRs are calculated, and finally binaurally rendered by means of HRTFs, and ultimately send to headphones for listening.


Figure 5: Signal flow for a VR-like tuning environment


ll building blocks are designed for real-time processing, so that by changing any tuning parameter, we immediately get a change in the visual response, i.e. the graphics showing the frequency response, as well as a change in the audible response via the headphones. Thus, we have an acoustically VR-like development environment generated for listening to audio systems based on purely computer-generated models.



Mvoid has developed Virtual Product Development processes that reduce development time and cost for all aspects of system development. Moreover, through engineering analysis virtual products may be optimized to refine the design to meet multiple best practice objectives.


The process enables simulations of audio system analysis over the audible frequency spectrum in professional audio applications. Mvoid has introduced a unique splicing technique to merge low to mid frequency FEA with mid to high frequency ray tracing that results in full frequency broad-band analysis at any listening point in a room. This enables a realistic prediction of the performance of the audio system on a fully virtual basis.


In the product development concept phase, where no physical hardware exists, engineers will have access to digital prototypes that represent the functional behavior of products in a realistic way, so that product definition and specification can be refined and improved in the virtual domain. The quality can be improved, development time can be significantly shortened, and costs can be saved.



  • Svobodnik, A. J. (2013). Virtual Development of Audio Systems – Application of CAE Methods, 135thAES Convention
  • Olive, S. (2004). A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements. Audio Engineering Society
  • Klippel, W. (1990). Assessing the Subjectively Perceived Loudspeaker Quality on the Basis of Objective Parameters. Audio Engineering Society
  • Faller, C., Menzer, F., Tournery, C. (2015). Binaural Audio with Relative and Pseudo Head Tracking. Audio Engineering Society


For more information we recommend our paper on the NAFEMS Conference 2019 on this topic:


Picture Source:
Head: Adobe, Svetazi # 202996450
All other pictures, Mvoid Technologies GmbH