Setting up with SoundID VoiceAI

Download, install, setup, free trial, load the plugin in DAW, features & functionality: this step-by-step guide for SoundID VoiceAI covers the entire process.

 

In this article:

What is SoundID VoiceAI?

SoundID VoiceAI is a voice and instrument AI transformation plugin for DAW. It allows changing the recorded singing voice to that of another human being or an instrument using AI technology:

  • Voice model library: transform your vocal track into a realistic singing voice using a studio-grade library of AI voice models
  • Instrument model library: transform your melodic humming or beatbox to sound like drums, guitar, strings, or other instruments from a studio-grade library of AI instrument models

Transform singing voice tracks, generate choir and backing vocals from a single voice track, transform speaking voice tracks, mimic instruments with your voice, and transform vocal inputs into realistic instruments for quick transfers of melodic ideas into DAW or creative sound generation, turn beatboxing into drums, and more.

 

 

Download and install

The SoundID VoiceAI plugin can be used in DAW (e.g. Cubase, Logic Pro X, Pro Tools, etc.). Before installing, here are the system requirements for using SoundID VoiceAI:

  • macOS 11 Big Sur, 12 Monterey, 13 Ventura, 14 Sonoma, 15 Sequoia, 26 Tahoe.
  • Windows 10, 11.
  • DAW or other plugin host app that supports AU, AAX, or VST3 plugin formats.
  • SoundID VoiceAI Perpetual license or Pay-as-you-go tokens registered and available in your Sonarworks Account.
    • Note: A free 7-day trial is available.
  • Stable internet connection (offline use not supported).

 

SoundID VoiceAI installer (download here) will install the VST3, AU, and AAX plugin formats in these default install directories:

 

macOS

Macintosh HD/Library/Audio/Plug-Ins/Components/SoundIDVoiceAI.component
Macintosh HD/Library/Application Support/Avid/Audio/Plug-Ins/SoundIDVoiceAI.aaxplugin
Macintosh HD/Library/Audio/Plug-Ins/VST3/SoundIDVoiceAI.vst3

 

Windows

C:\Program Files\Common Files\VST3\Sonarworks\SoundIDVoiceAI\SoundIDVoiceAI.vst3
C:\Program Files\Common Files\Avid\Audio\Plug-Ins\SoundIDVoiceAI.aaxplugin\Contents\x64\SoundIDVoiceAI.aaxplugin

 

Note: Custom install locations are optional for the locally stored presets (for the Perpetual mode), and for the audio Cache folder, learn more here: File locations of SoundID VoiceAI.

 

Processing modes (license types)

There are two processing modes in SoundID VoiceAI, and they are equivalent to the license types. The processing mode toggle switch at the top of the plugin allows quick switching between the modes:

  • Perpetual mode: Requires a one-time payment for a Perpetual license. Featuring unlimited local processing and a catalogue of 50+ Factory Presets, this is the best license option for most users. Preset Expansion Packs are available as a separate purchase.
  • Pay-as-you-go mode: Purchase Token packs and use Tokens based on actual processing time. This more uses cloud processing with no subscription fees, suitable for occasional use. All presets are available, including the Expansion Packs.

 

 

[SVAI] Processing modes toggle switch.png

 

Learn more about the difference between the Perpetual and Pay-as-you-go modes here: SoundID VoiceAI license types: Perpetual and Pay-as-you-go.

 

Preset Expansion Packs

Preset Expansion Packs in SoundID VoiceAI are genre-focused sets of new preset releases, separate from the default catalogue of 50+ Factory Presets. The Expansion Packs are available as a separate purchase to Perpetual license owners. For Pay-as-you-go mode users (cloud processing), the Expansion Packs are available at no additional cost.

 

Currently available Preset Expansion Packs

  • Rock Voices: Includes 10 distinctive rock voices; 5 male and 5 female (Jackie, Cliff, Vince, Andi, Tyson, Rose, Patti, Raven, Ember, Grace). Voices are tailored for heavy music styles - with natural rasp, rich tone, and wide dynamic range, delivering grit, energy, and expressive power. Preview here.
  • Kids Voices: Includes 10 bright and playful children's voices; 5 boys and 5 girls (Ryan, Jamie, Luke, Ethan, Jordan, Daisy, Kate, Lucy, Claire, Eve). Designed for a variety of styles. With a clear tone, playful expression, and natural dynamics, they bring an authentic and engaging children’s vocal character to any suitable production. Preview here.
  • Pop Voices: Includes 10 versatile pop presets; 5 male and 5 female (Wesley, Cameron, Liam, Adrian, Oscar, Valerie, Natalie, Vivian, Celeste, Serena). Voices are crafted for contemporary pop styles - with smooth tone and clear articulation, they deliver polished vocals perfect for pop or any production needing modern, radio-ready vocal presence. Preview here.
  • K-Pop Voices: 10 voices for K-pop; 5 male and 5 female (Jin, Tae, Kai, Minho, Joon, Yuna, Sora, Hana, Minji, Jisoo). Purpose-built for K-pop, anime productions, and genre-blending tracks that demand that distinctive Korean vocal character. Preview here.

 

These presets can be found in the "Expansion packs" section in the SoundID VoiceAI plugin, displaying the exact list of preset names, properties, and which expansion pack they belong to. If you have purchased a preset pack, the pack download can also be initiated from here (there will be a "Download" icon available next to available presets after registering and activating an expansion pack). 

 

[SVAI] Plugin - Expansion packs.png

 

Where to buy Expansion Packs 

Preset Expansion Packs are available in the Sonarworks Store and are registered as separate licenses in your Sonarworks Account. Once activated, the new presets are available for download through the SoundID Download Manager. See this article for step-by-step instructions: How to register and activate SoundID VoiceAI.

 

[SVAI] Preset Expansion Packs.png

 

Where to find the Expansion Pack keys in user accounts

The license keys for Expansion Packs are registered the same way as a full license key or add-on keys, in the top-right corner of the account. Expansion Packs follow the license state of the SoundID VOiceAI Perpetual license. You can view and manage your registered Expansion Packs by clicking on 'Explore and manage preset expansion packs' below your SoundID VoiceAI license. 

 

preset accounts.png

 

Activate your license/tokens, or start a free trial

To start working with SoundID VoiceAI, install the software (download here) and load the plugin on any voice or instrument track in your DAW project. Proceed to activate your license/tokens, or launch a free trial:

  1. Log in to your Sonarworks Account, or create a new account.
  2. Navigate to SoundID VoiceAI, and click on Start a full trial, to start a full-featured free 7-day trial.
  3. If you have already purchased a Perpetual license or a token pack, activate as follows:
    • Perpetual license: Click on Register a new license and enter your activation key.
    • Pay-as-you-go: Tokens are automatically added to your account balance upon purchase.
  4. Click on Activate > Open app > Open SoundID Download Manager.
    • Note: Successful activation is indicated by the "Successful login or activation" dialog.
  5. Return to DAW - the plugin will be logged in and activated.

 

Note: A fully detailed guide for setting up the trial and activating a license is available here: How to register and activate SoundID VoiceAI.

 

VoiceAI - Logic Pro.png

 

Register a new license.png

 

Activate SoundID VoiceAI.png

 

Open SoundID Download Manager - Successful login or activation.png

Capture audio

Before the target voice or instrument model AI processing can be applied, the input audio of the DAW project track must be captured:

  1. Click on Capture to Arm the plugin.
  2. Select your DAW playback position and start playback.
  3. Click on Stop to complete the capture.
  4. Click on Remove to delete the last capture and start over.

 

Once the capture is Stopped, the exact audio capture duration and region timestamps will be displayed.

 

Audio capture - VoiceAI.png

 

Captured audio - SoundID VoiceAI.png

 

Note: In the Pay-as-you-go mode, the token cost needed for each processing instance will be displayed on the 'Start processing' button.

 

Note: A single plugin instance can record up to 5 minutes of audio to be processed at a time. 

 

Captured voice cleanup

If input voice is captured in a noisy environment, a voice cleanup can be applied before the processing begins to filter out background noise to enhance voice clarity, thus improving the end result. Click the Captured voice cleanup toggle switch to enable the feature:

 

 

 

Select your preset and apply processing

  1. Click on Voices or Creative to select the target voice or instrument preset.
  2. Click on '▶' ("Preview") to preview how the preset sounds at its best vocal range.
  3. If your source pitch is similar to the preset preview, proceed to Start processing.
  4. If the results sound too high or low, use Transpose to adjust the output pitch by semi-notes, and process again.
  5. Use the AI voice button to Enable/Disable the transformation on the track.

Processed tab

  1. Click on Processed to see and interact with the processed history on the audio track.
  2. Select a previous result and play back the audio track to hear the processed audio.
  3. Drag the item(s) onto an existing or new audio track to use that take.
  4. Click on 🗑️ ("Delete") to remove the previous result.
  5. Click on 🔍 ("Show file") to locate the source file.

Preset selection and Transpose

The primary use case for SoundID VoiceAI is transforming a singing voice into a realistic singing voice of another human being. Ideally, the original input should match the best input pitch - see the preset descriptions for what recorded audio pitch will generate the best results. If the natural vocal range difference is significant between the input audio and the applied preset, pitch adjustments can be made with the Transpose feature. 

 

Transpose allows pitch adjustments by semitones (half steps) for the generated audio. 12 steps of the Transpose parameter value corresponds to an octave. Transpose can be adjusted to +/- 4 octaves (48 steps up or down). If the Transpose value is unaltered, the pitch will remain the same.

 

 

Achieving optimal results becomes more straightforward and efficient when certain parameters are considered, particularly when a project is fixed to a specific key. Before processing a vocal track, we recommend taking the following steps:

  • Preview the preset by clicking on "" (play button).
  • Evaluate the best input pitch to find a suitable preset without Transposing the output pitch.
  • Use Transpose according to the preset model's vocal range:
    • If the target preset sings in a higher pitch than your input voice track, increase the value of the Transpose parameter.
    • If the target preset sings in a lower pitch than your input voice track, decrease the value of the Transpose parameter.
  • Process a small section and evaluate the results before committing to process the entire track.

 

Note: Transpose values below or above 12 might produce unexpected results. Using Transpose with Drums will have a small impact on the overall sound and is not advised.

 

Auto-transpose

By default, an additional Auto-transpose feature is enabled. When it is active, the Transpose knob is unavailable for adjustments, and the plugin automatically detects and applies the optimal Transpose value for the combination of the captured audio and the applied preset.

  • For Voice presets, the auto-transpose values can be -12, 0, or +12.
  • For Creative (instrument) presets, the auto-transpose values range from -24, -12, 0, +12, or +24.

 

To switch back to manual Transpose adjustments, disable the 'Auto' checkbox - manual adjustments will become available again (by default, the last set value of Auto-transpose will be retained).

 

 

Unison Mode

The Unison Mode in SoundID VoiceAI allows you to create natural-sounding double-tracks from a single vocal source. Various controls are available for realistic sounding results - some are available as "Offline" controls (requires reprocessing after making the adjustemtns) while others are "Real-time" effects.

Note that the same CPU load and the Processing time is the same each additional voice instance, and will increase as more voices are added.

 

Offline controls

  • Number of voices: Add up to eight double tracks with a single plugin instance.
  • Pitch variance: Introduce subtle differences between voices for a more realistic layered effect.

 

Real-time controls

  • Timing variance: Shift the timing between voices to create a more natural feel. The control knob works in the 2–50 ms range, and the effect applies to each voice independently. Lower values provide tighter sync while higher values give a looser, more natural feel.
  • Width: Spread vocals across the stereo field or keep them centered. Setting Width to "0" places all voices in the center, creating a mono sound. Increasing Width spreads the voices further apart, creating a wider stereo effect. For an odd number of voices, the middle voice in the set will remain centered, while the rest are panned symmetrically. For an even number of voices, they will be evenly distributed across the stereo field.
    When Width is set to "50", the voices are spread evenly between the left and right channels. At "100", they are pushed further apart, creating a wider stereo field.

 

Note: if the Width feature is grayed out, it's likely due to the plugin being loaded on a mono track. To enable it, make sure that the plugin is loaded on a stereo input track instead. 

 

 

Double-tracking workflow

  1. Capture the audio. Place a plugin on a track and capture your audio. Or you can copy the lead vocal part you want to double onto a new track and load a plugin there.
  2. Choose the processing settings. Set the number of voices (2-8). Adjust processing settings to add more randomness using the Pitch Variance knob (a recommended starting point is 30–40).
  3. Start processing. Multi-voice processing takes longer than single-voice processing.
  4. Shape the sound with Real-time settings. Once the double-tracked vocals are set up and processed, the next step is to fine-tune their stereo placement and timing with Width adjustment and Timing var. adjustment.
  5. Finalizie the sound. After adjusting Width and Timing Variance, listen to the double-tracked vocals in the mix and make final refinements: adjust the Width knob to ensure the spread fits the intended sound. Modify the Timing var. knob to achieve the right balance between tightness and looseness.

Double-tracking workflow

  1. Capture the audio. Place a plugin on a track and capture your audio. Or you can copy the lead vocal part you want to double onto a new track and load a plugin there.
  2. Choose the processing settings. Set the number of voices (2-8). Adjust processing settings to add more randomness using the Pitch Variance knob (a recommended starting point is 30–40).
  3. Start processing. Multi-voice processing takes longer than single-voice processing.
  4. Shape the sound with Real-time settings. Once the double-tracked vocals are set up and processed, the next step is to fine-tune their stereo placement and timing with Width adjustment and Timing var. adjustment.
  5. Finalizie the sound. After adjusting Width and Timing Variance, listen to the double-tracked vocals in the mix and make final refinements: adjust the Width knob to ensure the spread fits the intended sound. Modify the Timing var. knob to achieve the right balance between tightness and looseness.

Creative presets (instrument transformation)

With the Creative presets, you can transform humming and beatboxing into tracks that sound like instruments, discover new ways of generating sounds and melodies, and create demo songs quickly. Here are some ideas to consider:

  • Mimic instruments with your voice and transform vocal inputs into realistic instruments for quick transfer of melodic ideas into DAW or creative sound generation.
  • Turn beatboxing into drums. Record a few bars of beatboxing to create a drum track.
  • Transform existing instrument tracks. Convert your guitar solo into a saxophone solo, use your guitar to create a realistic bass guitar track, or use a trumpet track to harmonize, and create an entire brass section of various instruments, and much more.
  • Use virtual instruments for creative AI processing.

 

 

Reprocessing

All processing (including Reprocessing) is unlimited in the Perpetual mode, however, it is also possible to Reprocess the results in the Pay-as-you-go mode for free up to 10 times per hour to minimize excessive artifacts (additional reprocessing will deduct tokens, see below). Free Reprocessing is only available with the same Preset. 

After the captured audio has been processed, clicking on the Reprocess button starts processing again with the same source, Preset, and Transpose combination by default. The previous processing result will get overwritten. 

If the hourly free Reprocessing limit is reached, the "Free processing is unavailable" message will be displayed, and the option to Use tokens is available. 

 

Screenshot 2024-04-29 at 15.00.28.png

 

Free processing unavailable.png

 

Frequently asked questions

 

Can I try it for free?

Yes, SoundID VoiceAI offers a fully functional free trial for the plugin. See detailed trial instructions here.

  • Full trial: Free 7-day trial for SoundID VoiceAI, fully featured with all presets (Factory and Expansion Pack), and both processing modes (Perpetual and Pay-as-you-go).

 

How does it work?

SoundID VoiceAI extracts audio information from the source voice track, passes it onto the AI model preset selected in the DAW plugin, and applies the target voice (similar to a virtual instrument). The resulting voice track keeps most of the key melodic properties of the input voice but replaces all the details with sounds generated by the target voice model the user has selected.

The resulting voice track keeps most of the key melodic properties of the input voice but replaces all the details with sounds generated by the selected target voice model. 

 

Important to know when capturing audio

  • The audio capture mechanics depend on smooth, continuous playback. Don't change the playback position while an audio capture is in progress.
  • The positioning of the AI replacement audio will depend on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
  • The plugin supports a single audio capture per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
    • Use two plugin instances on the same track.
    • Capture a single (longer) clip with both fragments.
  • If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
  • There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the cache folder.
  • There is a five minute capture limit per plugin instance.

 

Learn more about the plugin mechanics here: How to use SoundID VoiceAI in your DAW.

 

Important to know when processing audio

  • Repeated AI processing of the same audio source will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
  • The positioning of the AI replacement audio relies on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
  • The plugin supports a single audio capture and processing per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
    • Use two plugin instances on the same track.
    • Capture a single (longer) clip with both fragments.
  • If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
  • There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the Cache folder.

 

Where can I buy SoundID VoiceAI?

The Perpetual license and Token packs for SoundID VoiceAI can be purchased in the Sonarworks Store, see here: SoundID VoiceAI | Pricing.

 

Can I train and create my own voice models?

No, it is currently not possible to create your own presets/vocal processing models with SoundID VoiceAI. This would require the capability to train and create a voice model based on the vocal data sets/samples of your choosing (for example, your own voice) - such a feature is not available yet. 

 

What are the input/output audio properties and quality?

SoundID VoiceAI plugin can cater to a relatively wide range of recording quality for the input track. Regular phone microphone recordings in a random space with reverb are perfectly okay to use - after processing, the output results will have the properties of studio-quality audio captured with a great microphone.

There are, however, some limits to take into consideration:

  • Repeated AI processing on the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • Excessive reverb on the input audio can lead to melodic artifacts in the output.
  • When applied to non-English singing, some amount of English accent might bleed over into the processed voice, depending on the preset applied.
  • The AI models can sometimes introduce artifacts such as clipping "s'es" into the processing results. This is typically resolved by re-processing or adjusting the Transpose setting to a value closer to the input track pitch.
  • The AI models work great for normal spoken voice tracks, too; however, when applied to extreme emotional states of speech such as whispering or shouting, artifacts are possible.
  • Repeated AI processing of the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • The intonation of the input voice audio is a key aspect of the AI models. Raspiness in the voice can lead to artifacts in the processing results (rough, raspy, strained, or breathy properties).

 

 

Was this article helpful?

19 out of 29 found this helpful

Have more questions? Submit a request

2 comments

0

My VoiceAI perpetual license is activated on my production Mac, Yet I get the "Please start the Sonarworks license service" when I try using ithe AAX in my Pro Tools session. Licensemanager does not appear in my Activity Monitor. I am also unable to open SoundID Download manager, same error message. Please help

0

Please update or show me how to change the cache files location, the procession eats up my main Hard drive and I would like to have the cache folder move to elsewhere, Thanks

Please sign in to leave a comment.