Skip to main content

AI Deep Dive Nov 11: State-of-the-art Audio Models – Speech to Text, but what else?

Posted by on Monday, November 7, 2022 in Newsletter.

Whisper is OpenAI’s new addition to the growing portfolio of open source artificial intelligence (AI) models for speech-to-text and audio processing. Capable of predicting audio text captions, the model leverages intermixed with special tokens which enables the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

Given its architecture, what other tasks might it be able to perform? What research objectives do you have that the model could accelerate? Let’s discuss!

Join us Friday, November 11th for an AI Deep Dive on Whisper and other audio models for an open discussion hosted by Data Science Institute (DSI) data scientists. Faculty, researchers, and students from all levels and disciplines are welcomed to join and chat over a full popcorn bar. The weekly deep dives are held at the VU Data Science Institute (DSI) at 1400 18th Avenue S in Suite 2000 on the main floor.

We hope to see you there!

Tags: , , ,