Tool
Visit website →
Sam Audio
SAM Audio uses Metaβs Segment Anything Audio Model to isolate vocals, instruments, speech and effects from mixes via multimodal prompts (text, visual, time-span). It produces target and residual stems at original sample rates for production, post, and research.
Use Cases
- π’ Create isolated vocal, instrument, and effect stems from mixed tracks using SAM Audio's multimodal (text, visual, time-span) prompts to produce target and residual stems at original sample rates for high-quality remixing, mastering, and collaborative production.
- π’ Develop clean, intelligible dialogue for podcasts and film post-production by using SAM Audio's time-span and visual audio selection to precisely extract speech, remove background ambience, and prepare stems for ADR, noise reduction, and loudness-compliant delivery.
- π’ Create reusable instrument samples and sound-design assets or run large-scale audio research locally with SAM Audio's prompt-driven, offline inference to extract specific sound events, build sample libraries, and analyze mixes while preserving privacy.