Download, install, setup, free trial, load the plugin in DAW, features & functionality - this step-by-step guide for SoundID VoiceAI covers the entire process.
In this article:
What is SoundID VoiceAI?
SoundID VoiceAI is a voice and instrument AI transformation plugin for DAW. It allows changing the recorded singing voice to that of another human being or an instrument using AI technology:
- Voice model library: transform your vocal track into a realistic singing voice using a studio-grade library of AI voice models
- Instrument model library: transform your melodic humming or beatbox to sound like drums, guitar, strings, or other instruments from a studio-grade library of AI instrument models
Transform singing voice tracks, generate choir and backing vocals from a single voice track, transform speaking voice tracks, mimic instruments with your voice, and transform vocal inputs into realistic instruments for quick transfers of melodic ideas into DAW or creative sound generation, turn beatboxing into drums, and more.
Download and install
The SoundID VoiceAI plugin can be used in DAW (e.g. Cubase, Logic Pro X, Pro Tools, etc.). Before installing, here are the system requirements for using SoundID VoiceAI:
- macOS 11 Big Sur, 12 Monterey, 13 Ventura, 14 Sonoma, 15 Sequoia
- Windows 10, 11
- DAW or other plugin host app that supports AU, AAX, or VST3 plugin formats
- SoundID VoiceAI Perpetual license or Pay-as-you-go tokens registered and available in your Sonarworks Account
- Note: A free 7-day trial is available
- Stable internet connection (offline use not supported)
SoundID VoiceAI installer can be downloaded here and will install the plugins in these directories on macOS and Windows by default:
macOS
Macintosh HD/Library/Audio/Plug-Ins/Components/SoundIDVoiceAI.component Macintosh HD/Library/Application Support/Avid/Audio/Plug-Ins/SoundIDVoiceAI.aaxplugin Macintosh HD/Library/Audio/Plug-Ins/VST3/SoundIDVoiceAI.vst3
Windows
C:\Program Files\Common Files\VST3\Sonarworks\SoundIDVoiceAI\SoundIDVoiceAI.vst3 C:\Program Files\Common Files\Avid\Audio\Plug-Ins\SoundIDVoiceAI.aaxplugin\Contents\x64\SoundIDVoiceAI.aaxplugin
Note: Custom install locations are optional for the locally stored presets (for the Perpetual mode), and for the audio Cache folder, learn more here: File locations of SoundID VoiceAI
Processing modes and license types
There are two processing modes in SoundID VoiceAI, and they are equivalent to the license types. The processing mode toggle switch at the top of the plugin allows quick switching between the modes:
- Perpetual mode: Unlimited local processing; requires a one-time payment for a Perpetual license. A simple and fully-featured license option for most users.
- Pay-as-you-go mode: Online processing; requires Tokens (buy Token packs and spend tokens for the amount of processing used). Suitable for infrequent users.
Learn more about the difference between the Perpetual and Pay-as-you-go modes here: SoundID VoiceAI license types: Perpetual and Pay-as-you-go.
Activate your license/tokens, or start a free trial
To start working with SoundID VoiceAI, install the software (download here) and load the plugin on any voice or instrument track in your DAW project. Proceed to activate your license/tokens, or launch a free trial:
- Log in to your Sonarworks Account, or create a new account
- Navigate to SoundID VoiceAI, and click on Start trial to start a free 7-day trial
-
If you have already purchased a Perpetual license or a token pack, activate as follows:
- Perpetual license: Click on Register a new license and enter your activation key
- Pay-as-you-go: Tokens are automatically added to your account balance upon purchase
- Click on Activate > Activate on this device > Open SoundID Download Manager
- Note: Successful activation is indicated by the "Successful login or activation" dialog.
- Return to DAW - the plugin will be logged in and activated
Note: Fully detailed guides for setting up the trial and activating a license are available here:
Capture audio
Before the target voice or instrument model AI processing can be applied, the input audio of the DAW project track must be captured:
- Click on Capture to Arm the plugin
- Select your DAW playback position and start playback
- Click on Stop to complete the capture
- Click on Remove to delete the last capture and start over
Once the capture is Stopped, the exact audio capture duration and region timestamps will be displayed.
Note: In the Pay-as-you-go mode, the token cost needed for each processing instance will be displayed on the 'Start processing' button.
Important to know when capturing audio
- The audio capture mechanics depend on smooth continuous playback. Don't change the playback position while an audio capture is in progress.
- The positioning of the AI replacement audio will depend on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
- The plugin supports a single audio capture per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
- Use two plugin instances on the same track
- Capture a single (longer) clip with both fragments
- If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
- There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the cache folder.
Learn more about the plugin mechanics here: How to use SoundID VoiceAI in your DAW
Select your preset and apply AI processing
- Click on Voices or Creative to select the target voice or instrument preset
- Click on '▶' ("play") to preview how the preset sounds at its best vocal range
- If your source pitch is similar to the preset preview, proceed to Start processing
- If the results sound too high or low, use Transpose to adjust the output pitch by semi-notes, and process again
- Use the AI voice button to Enable/Disable the transformation on the track
Important to know when processing
- Repeated AI processing of the same audio source will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
- It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
- The positioning of the AI replacement audio relies on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
- The plugin supports a single audio capture and processing per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
- Use two plugin instances on the same track
- Capture a single (longer) clip with both fragments
- If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
- There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the Cache folder.
Reprocessing in the Pay-as-you-go mode
It is possible to Reprocess the AI processing results for free up to 10 times per hour to minimize excessive artifacts (additional reprocessing will deduct tokens, see below). Free Reprocessing is only available with the same Preset.
After the captured audio has been processed, clicking on the Reprocess button starts processing again with the same source, Preset, and Transpose combination by default. The previous processing result will get overwritten.
Note: If the Reprocessing limit is reached, you will see a message indicating that free Reprocessing is unavailable. If you choose to 'Use tokens', the displayed token amount will be deducted from your token balance.
Preset selection and Transpose
The primary use case for SoundID VoiceAI is transforming a singing voice into a realistic singing voice of another human being. Ideally, the original input should match the best input pitch - see the preset descriptions for what recorded audio pitch will generate the best results. If the natural vocal range difference is significant between the input audio and the applied preset, pitch adjustments can be made with the Transpose feature.
Transpose allows pitch adjustments by semitones (half steps) for the generated audio. 12 steps of the Transpose parameter value corresponds to an octave. Transpose can be adjusted to +/- 4 octaves (48 steps up or down). If the Transpose value is unaltered, the pitch will remain the same.
Achieving optimal results becomes more straightforward and efficient when certain parameters are considered, particularly when a project is fixed to a specific key. Before processing a vocal track, we recommend taking the following steps:
- Preview the preset by clicking on "▶" (play button).
- Evaluate the best input pitch to find a suitable preset without Transposing the output pitch.
- Use Transpose according to the preset model's vocal range:
- If the target preset sings in a higher pitch than your input voice track, increase the value of the Transpose parameter.
- If the target preset sings in a lower pitch than your input voice track, decrease the value of the Transpose parameter.
- Process a small section and evaluate the results before committing to process the entire track.
Note: Transpose values below or above 12 might produce unexpected results. Using Transpose with Drums will have a small impact on the overall sound and is not advised.
Auto-transpose
By default, an additional Auto-transpose feature is enabled. When it is active, the Transpose knob is unavailable for adjustments, and the plugin automatically detects and applies the optimal Transpose value for the combination of the captured audio and the applied preset.
- For Voice presets, the auto-transpose values can be -12, 0, or +12
- For Creative (instrument) presets, the auto-transpose values range from -24, -12, 0, +12, or +24
To switch back to manual Transpose adjustments, disable the 'Auto' checkbox - manual adjustments will become available again (by default, the last set value of Auto-transpose will be retained).
Unison Mode
The Perpetual mode in SoundID VoiceAI introduces a new AI-powered approach to double tracking called Unison Mode.
- Create natural-sounding double tracks from a single vocal source (up to eight double tracks with a single plugin instance).
- Adjust pitch variations to introduce subtle differences between voices for a more realistic layered effect.
- Control timing shifts between voices using the Timing variance knob to create a more natural feel.
- Use the Width control to spread vocals across the stereo field or keep them centered.
- Enhance vocal production for music, post-production, and sound design.
Note: Unison Mode is available in the Perpetual mode only (offline processing); it is not available for the Pay-as-you-go mode. Learn more about Unison Mode here: Unison Mode in SoundID VoiceAI.
Creative/instrument presets
With the Creative presets, you can transform humming and beatboxing into tracks that sound like instruments, discover new ways of generating sounds and melodies, and create demo songs quickly. Here are some ideas to consider:
- Mimic instruments with your voice and transform vocal inputs into realistic instruments for quick transfer of melodic ideas into DAW or creative sound generation.
- Turn beatboxing into drums. Record a few bars of beatboxing to create a drum track.
- Transform existing instrument tracks. Convert your guitar solo into a saxophone solo, use your guitar to create a realistic bass guitar track, or use a trumpet track to harmonize, and create an entire brass section of various instruments, and much more.
- Use virtual instruments for creative AI processing.
Frequently asked questions
How does it work?
SoundID VoiceAI extracts audio information from the source voice track, passes it onto the AI model preset selected in the DAW plugin, and applies the target voice (similar to a virtual instrument). The resulting voice track keeps most of the key melodic properties of the input voice but replaces all the details with sounds generated by the target voice model the user has selected.
The resulting voice track keeps most of the key melodic properties of the input voice but replaces all the details with sounds generated by the selected target voice model.
Where can I buy SoundID VoiceAI?
The Perpetual license and Token packs for SoundID VoiceAI can be purchased in the Sonarworks Store, see here: SoundID VoiceAI | Pricing.
Can I train and create my own voice models?
No, it is currently not possible to create your own presets/vocal processing models with SoundID VoiceAI. This would require the capability to train and create a voice model based on the vocal data sets/samples of your choosing (for example, your own voice) - such a feature is not available yet.
What are the input/output audio properties and quality?
SoundID VoiceAI plugin can cater to a relatively wide range of recording quality for the input track. Regular phone microphone recordings in a random space with reverb are perfectly okay to use - after processing, the output results will have the properties of studio-quality audio captured with a great microphone.
There are, however, some limits to take into consideration:
- Repeated AI processing on the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
- Excessive reverb on the input audio can lead to melodic artifacts in the output.
- When applied to non-English singing, some amount of English accent might bleed over into the processed voice, depending on the preset applied.
- The AI models can sometimes introduce artifacts such as clipping "s'es" into the processing results. This is typically resolved by re-processing or adjusting the Transpose setting to a value closer to the input track pitch.
- The AI models work great for normal spoken voice tracks, too; however, when applied to extreme emotional states of speech such as whispering or shouting, artifacts are possible.
- Repeated AI processing of the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
- The intonation of the input voice audio is a key aspect of the AI models. Raspiness in the voice can lead to artifacts in the processing results (rough, raspy, strained, or breathy properties).
0 comments