Getting started with SoundID VoiceAI - from downloading and setting up a trial to loading the plugin in DAW and exploring the features, this step-by-step guide covers the entire process.
In this article:
What is SoundID VoiceAI?
SoundID VoiceAI is a voice and instrument AI transformation plugin for DAW. It allows changing the recorded singing voice to that of another human being or an instrument using AI technology:
- Voice model library: transform your vocal track into a realistic singing voice from a studio-grade AI library of 26 voice models
- Instrument model library: transform your melodic humming or beatbox to sound like drums, guitar violin, or other instruments from a studio-grade AI library of 23 instrument models
Transform singing voice tracks, generate backing vocals from a single voice track, transform speaking voice tracks, mimic instruments with your voice, and transform vocal inputs into realistic instruments for quick transfers of melodic ideas into DAW or creative sound generation, turn beatboxing into drums, and more.
Learn more about the use cases and advantages here: What is SoundID VoiceAI?
Processing modes (license types)
SoundID VoiceAI is available in two processing modes:
- Perpetual mode (unlimited local processing): one-time payment for a Perpetual license purchase.
- Pay-as-you-go mode (cloud processing): purchase and spend tokens as you go.
Learn more about the difference between the Perpetual and Pay-as-you-go modes here: SoundID VoiceAI license types: Perpetual and Pay-as-you-go.
Download and install
The SoundID VoiceAI plugin can be used in DAW (e.g. Cubase, Logic Pro X, Pro Tools, etc.). Before installing, here are the system requirements for using SoundID VoiceAI:
- macOS 11 Big Sur, 12 Monterey, 13 Ventura, 14 Sonoma
- Windows 10, 11
- DAW or other plugin host app that supports AU, AAX, or VST3 plugin formats
- SoundID VoiceAI Perpetual license or Pay-as-you-go tokens available in your Sonarworks Account
- Stable internet connection (offline use not supported)
SoundID VoiceAI installer can be downloaded here and will install the plugins in the default plugin install directories on macOS and Windows:
Macintosh HD/Library/Audio/Plug-Ins/Components/SoundIDVoiceAI.component Macintosh HD/Library/Application Support/Avid/Audio/Plug-Ins/SoundIDVoiceAI.aaxplugin Macintosh HD/Library/Audio/Plug-Ins/VST3/SoundIDVoiceAI.vst3
C:\Program Files\Common Files\VST3\Sonarworks\SoundIDVoiceAI\SoundIDVoiceAI.vst3 C:\Program Files\Common Files\Avid\Audio\Plug-Ins\SoundIDVoiceAI.aaxplugin\Contents\x64\SoundIDVoiceAI.aaxplugin
Load and trial/activate the plugin in your DAW
To start working with SoundID VoiceAI, load the plugin on any voice or instrument track in your DAW project:
- Download and install the SoundID VoiceAI plugin
- Launch your DAW and load the SoundID VoiceAI plugin on an audio track
- Log in to your Sonarworks Account, or create a new account
- Click on Start trial to start a free 7-day trial
- Click on Activate if you already have a Perpetual license or Pay-as-you-go tokens available in your account
- Click on Activate > Activate on this device
- Return to DAW - the plugin will be activated
Capture audio
Before the target voice or instrument model AI processing can be applied, the input audio of the DAW project track must be captured:
- Click on Capture to Arm the plugin
- Select your DAW playback position and start playback
- Click on Stop to complete the capture
- Click on Remove to delete the last capture and start over
Once the capture is Stopped, the exact audio capture duration and region timestamps will be displayed.
Note: In the Pay-as-you-go mode, the token cost needed for each processing instance will be displayed on the 'Start processing' button.
Important to know when capturing audio
- The audio capture mechanics depend on smooth continuous playback. Don't change the playback position while an audio capture is in progress.
- The positioning of the AI replacement audio will depend on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
- The plugin supports a single audio capture per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
- Use two plugin instances on the same track
- Capture a single (longer) clip with both fragments
- If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
- There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the cache folder.
Learn more about the plugin mechanics here: How to use SoundID VoiceAI in your DAW
Select your preset and apply AI processing
- Click on Voices or Creative to select the target voice or instrument preset
- Click on '▶' ("play") to preview how the preset sounds at its best vocal range
- If your source pitch is similar to the preset preview, proceed to Start processing
- If the results sound too high or low, use Transpose to adjust the output pitch by seminotes, and process again
- Use the AI voice button to Enable/Disable the transformation on the track
Processing cost in the Pay-as-yo-go mode
Before committing to process the entire track, it's a good idea to highlight and process a smaller section of the track first and ensure the results sound good. The cloud-based processing takes approximately 2.5x the time of the captured audio duration. It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
1 minute of audio processing costs 600 tokens. The token amount needed for processing will always be displayed on the Start processing button. You can check your balance in the plugin, or your Sonarworks Account. Learn more about tokens below.
Note: Learn more about optimal preset selection and Transpose use below.
Important to know when processing
- Repeated AI processing of the same audio source will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
- It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
- The positioning of the AI replacement audio relies on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
- The plugin supports a single audio capture and processing per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
- Use two plugin instances on the same track
- Capture a single (longer) clip with both fragments
- If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
- There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the Cache folder.
Important for Pay-as-you-go mode
- Processing requires a minimum of 70 tokens (7 seconds) followed by 10 token increments.
- Tokens will still be deducted if processing is Canceled while in progress.
Reprocessing in the Pay-as-you-go mode
It is possible to Reprocess the AI processing results for free up to 10 times per hour to minimize excessive artifacts (additional reprocessing will deduct tokens, see below). Free Reprocessing is only available with the same Preset.
After the captured audio has been processed, clicking on the Reprocess button starts processing again with the same source, Preset, and Transpose combination by default. The previous processing result will get overwritten.
Note: If the Reprocessing limit is reached, you will see a message indicating that free Reprocessing is unavailable. If you choose to 'Use tokens', the displayed token amount will be deducted from your token balance.
Preset selection and Transpose
The primary use case for SoundID VoiceAI is transforming a singing voice into a realistic singing voice of another human being. Ideally, the original input should match the best input pitch - see the preset descriptions for what recorded audio pitch will generate the best results. If the natural vocal range difference is significant between the input audio and the applied preset, pitch adjustments can be made with the Transpose feature.
Transpose allows pitch adjustments by semitones (half steps) for the generated audio. 12 steps of the Transpose parameter value corresponds to an octave. Transpose can be adjusted to +/- 4 octaves (48 steps up or down). If the Transpose value is unaltered, the pitch will remain the same.
Achieving optimal results becomes more straightforward and efficient when certain parameters are considered, particularly when a project is fixed to a specific key. Before processing a vocal track, we recommend taking the following steps:
- Preview the preset by clicking on "▶" (play button).
- Evaluate the best input pitch to find a suitable preset without Transposing the output pitch.
- Use Transpose according to the preset model's vocal range:
- If the target preset sings in a higher pitch than your input voice track, increase the value of the Transpose parameter.
- If the target preset sings in a lower pitch than your input voice track, decrease the value of the Transpose parameter.
- Process a small section and evaluate the results before committing to process the entire track.
Note: Transpose values below or above 12 might produce unexpected results. Using Transpose with Drums will have a small impact on the overall sound and is not advised.
By default, an additional Auto-transpose feature is enabled. When it is active, the Transpose knob is unavailable for adjustments, and the plugin automatically detects and applies the optimal Transpose value for the combination of the captured audio and the applied preset.
- For Voice presets, the auto-transpose values can be -12, 0, or +12
- For Creative (instrument) presets, the auto-transpose values range from -24, -12, 0, +12, or +24
To switch back to manual Transpose adjustments, disable the 'Auto' checkbox - manual adjustments will become available again (by default, the last set value of Auto-transpose will be retained).
Creative (instrument) transformation
With the Creative presets you can transform humming and beatboxing into tracks that sound like instruments, discover new ways of generating sounds and melodies, and create demo songs quickly. Here are some ideas to consider:
- Mimic instruments with your voice and transform vocal inputs into realistic instruments for quick transfer of melodic ideas into DAW or creative sound generation.
- Turn beatboxing into drums. Record a few bars of beatboxing to create a drum track.
- Transform existing instrument tracks. Convert your guitar solo into a saxophone solo, use your guitar to create a realistic bass guitar track, or use a trumpet track to harmonize, and create an entire brass section of various instruments, and much more.
- Use virtual instruments for creative AI processing.
Input/output audio quality and properties
SoundID VoiceAI plugin can cater to a relatively wide range of recording quality for the input track. Regular phone microphone recordings in a random space with reverb are perfectly okay to use - after processing, the output results will have the properties of studio-quality audio captured with a great microphone.
There are, however, some limits to take into consideration:
- Repeated AI processing on the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
- Excessive reverb on the input audio can lead to melodic artifacts in the output.
- In the Pay-as-you-go mode, it is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
- When applied to non-English singing, some amount of English accent might bleed over into the processed voice depending on the preset applied.
- The AI models can sometimes introduce artifacts such as clipping "s'es" into the processing results. This is typically resolved by re-processing or adjusting the Transpose setting to a value closer to the input track pitch.
- The AI models work great for normal spoken voice tracks too, however, when applied to extreme emotional states of speech such as whispering or shouting, artifacts are possible.
- Repeated AI processing of the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
- The intonation of the input voice audio is a key aspect of the AI models. Raspiness in the voice (rough, raspy, strained, or breathy properties), can lead to artifacts in the processing results.
Pay-as-you-go tokens (cloud processing)
The Pay-as-you-go mode enables you to pay with tokens for the processed audio. There are no subscription fees or other hidden charges involved in this mode - you only pay for what you process. Here's how tokens & minutes are calculated in the Pay-as-you-go mode:
- Processing cost: 600 tokens per 1 minute of audio processing.
- A minimum charge of 70 tokens (7 seconds) applies for each processing instance, followed by increments of 10 tokens (1 second).
- Transpose adjustments to an already processed audio capture will require re-processing
- Free reprocessing with different Transpose settings is available. There is a limit of 10 free reprocessing instances, which resets hourly.
Here's a realistic example of tokens spent in a specific scenario - vocal replacement for a full song:
- Capturing a 12 seconds sample audio of a voice track
- Processing the sample with 5 voice presets and trying 3 different Transpose settings on each preset to find the best fit: 12x5x3 = 180 seconds / 3 minutes = 1800 tokens
- Processing the entire vocal track of 2.5 minutes = 1500 tokens
- Total processing time and token cost: 5.5 minutes = 3300 tokens
SoundID VoiceAI token packs can be purchased from your Sonarworks Account:
- Small token pack: 72,000 tokens (120 minutes of audio processing) - 19.99 EUR/USD
- Medium token pack: 180,000 tokens (300 minutes of audio processing) - 39.99 EUR/USD
- Large token pack: 360,000 tokens (600 minutes of audio processing) - 69.99 EUR/USD
A 7-day trial with 9000 free tokens is available in your Sonarworks Account. If you haven't created a Sonarworks Account in the past, sign up here.
Note: The trial tokens will expire once the 7-day trial runs out, or once a token purchase is made.
Learn more about the token system here: SoundID VoiceAI license types: Perpetual and Pay-as-you-go