<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OpenAI&#039;s Whisper Archives - Good Shepherd News - Fastest Growing Religious, Free Speech &amp; Political Content</title>
	<atom:link href="https://goodshepherdmedia.net/tag/openais-whisper/feed/" rel="self" type="application/rss+xml" />
	<link>https://goodshepherdmedia.net/tag/openais-whisper/</link>
	<description>Christian, Political, ‎‏‏‎Social &#38; Legal Free Speech News &#124; Ⓒ2024 Good News Media LLC &#124; Shepherd for the Herd! God 1st Programming</description>
	<lastBuildDate>Sat, 28 Jan 2023 10:34:07 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.1</generator>

<image>
	<url>https://goodshepherdmedia.net/wp-content/uploads/2023/08/Good-Shepherd-News-Logo-150x150.png</url>
	<title>OpenAI&#039;s Whisper Archives - Good Shepherd News - Fastest Growing Religious, Free Speech &amp; Political Content</title>
	<link>https://goodshepherdmedia.net/tag/openais-whisper/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Fixing YouTube Search with OpenAI&#8217;s Whisper</title>
		<link>https://goodshepherdmedia.net/fixing-youtube-search-with-openais-whisper/</link>
		
		<dc:creator><![CDATA[The Truth News]]></dc:creator>
		<pubDate>Wed, 18 Jan 2023 10:27:41 +0000</pubDate>
				<category><![CDATA[Science & Engineering]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[Top Stories]]></category>
		<category><![CDATA[Zee Truthful News]]></category>
		<category><![CDATA[💻Tech History]]></category>
		<category><![CDATA[🤖 AI Artificial Intelligence]]></category>
		<category><![CDATA[🤖Open AI]]></category>
		<category><![CDATA[🤖🗣️Whisper]]></category>
		<category><![CDATA[Open AI]]></category>
		<category><![CDATA[Open AI's Whisper]]></category>
		<category><![CDATA[OpenAI's Whisper]]></category>
		<category><![CDATA[Whisper AI]]></category>
		<guid isPermaLink="false">https://goodshepherdmedia.net/?p=10212</guid>

					<description><![CDATA[Fixing YouTube Search with OpenAI&#8217;s Whisper OpenAI’s Whisper is a new state-of-the-art (SotA) model in speech-to-text. It is able to almost flawlessly transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. The domain of spoken word has always been somewhat out of reach for ML use-cases. Whisper changes that for [&#8230;]]]></description>
										<content:encoded><![CDATA[<h1>Fixing YouTube Search with OpenAI&#8217;s Whisper</h1>
<p>OpenAI’s <em>Whisper</em> is a new state-of-the-art (SotA) model in speech-to-text. It is able to almost flawlessly transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise.</p>
<p>The domain of spoken word has always been somewhat out of reach for ML use-cases. Whisper changes that for speech-centric use cases. We will demonstrate the power of Whisper alongside other technologies like transformers and vector search by building a new and improved YouTube search.</p>
<p>Search on YouTube is good but has its limitations, especially when it comes to answering questions. With trillions of hours of content, there should be an answer to almost every question. Yet, if we have a specific question like <em>“what is OpenAI’s CLIP?&#8221;</em>, instead of a concise answer we get lots of very long videos that we must watch through.</p>
<p>What if all we want is a short 20-second explanation? The current YouTube search has no solution for this. Maybe there’s a good reason to encourage users to watch as much of a video as possible (more ads, anyone?).</p>
<p>Whisper is the solution to this problem <em>and many others involving the spoken word</em>. In this article, we’ll explore the idea behind a better speech-enabled search.</p>
<p><iframe title="How to Use OpenAI Whisper to Fix YouTube Search" width="640" height="360" src="https://www.youtube.com/embed/vpU_6x3jowg?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></p>
<p><iframe title="How to do Free Speech-to-Text Transcription Better Than Google Premium API with OpenAI Whisper Model" width="640" height="360" src="https://www.youtube.com/embed/msj3wuYf3d8?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></p>
<div></div>
<div></div>
<div></div>
<hr />
<h2 id="the-idea">The Idea</h2>
<p>We want to get specific timestamps that answer our search queries. YouTube does support time-specific links in videos, so a more precise search with these links should be possible.</p>
<p><video class="responsive" autoplay="autoplay" loop="loop" muted="" width="300" height="150" data-mce-fragment="1"></video></p>
<p><small>Timestamp URLs can be copied directly from a video, we can use the same URL format in our search app.</small></p>
<p>To build something like this, we first need to transcribe the audio in our videos to text. YouTube automatically captions every video, and the captions are okay — <em>but</em> OpenAI just open-sourced something called “Whisper”.</p>
<p>Whisper is best described as the GPT-3 or DALL-E 2 of speech-to-text. It’s open source and can transcribe audio in real-time <em>or faster</em> with <em>unparalleled performance</em>. That seems like the most exciting option.</p>
<p>Once we have our transcribed text and the timestamps for each text snippet, we can move on to the <a href="https://www.pinecone.io/learn/question-answering">question-answering (QA)</a> part. QA is a form of search where given a natural language query like <em>“what is OpenAI’s Whisper?&#8221;</em> we can return accurate natural language answers.</p>
<p>We can think of QA as the most intuitive form of searching for information because it is how we ask other people for information. The only difference being we type the question into a search bar rather than verbally communicate it — for now.</p>
<p>How does all of this look?</p>
<p><img fetchpriority="high" decoding="async" class="" src="https://d33wubrfki0l68.cloudfront.net/b04cbd7e64c2cbfe65bfe1f6b9035e239d845871/2c500/images/openai-whisper-2.png" alt="whisper-architecture" width="787" height="535" /></p>
<p><small>Overview of the process used in our demo. Covering OpenAI’s Whisper, sentence transformers, the Pinecone vector database, and more.</small></p>
<p>Now let’s color in the details and walk through the steps.</p>
<h2 id="video-data">Video Data</h2>
<p>The first step is to download our YouTube video data and extract the audio attached to each video. Fortunately, there’s a Python library for exactly that called <code>pytube</code>.</p>
<p>With <code>pytube</code>, we provide a video ID (found in the URL bar or downloadable if you have a channel). I directly downloaded a summary of channel content, including IDs, titles, publication dates, etc., via YouTube. This same data is available via Hugging Face <em>Datasets</em> in a dataset called <code>jamescalam/channel-metadata</code>.</p>
<div id="gist118847977" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-channel-meta-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="159f0d0f-0192-47a5-8a8b-817a070ea0f7" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=7fb608479778bb2a5220463f44241ee34a099638&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f62343238306431663430383935623066623130353832663332363732346332312f7261772f376662363038343739373738626232613532323034363366343432343165653334613039393633382f776869737065722d79742d7365617263682d6368616e6e656c2d6d6574612e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2Fb4280d1f40895b0fb10582f326724c21&amp;path=whisper-yt-search-channel-meta.ipynb&amp;repository_id=118847977&amp;repository_type=Gist#159f0d0f-0192-47a5-8a8b-817a070ea0f7" name="159f0d0f-0192-47a5-8a8b-817a070ea0f7" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/b4280d1f40895b0fb10582f326724c21/raw/7fb608479778bb2a5220463f44241ee34a099638/whisper-yt-search-channel-meta.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/b4280d1f40895b0fb10582f326724c21#file-whisper-yt-search-channel-meta-ipynb">whisper-yt-search-channel-meta.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>We’re most interested in the <code>Title</code> and <code>Video ID</code> fields. With the video ID, we can begin downloading the videos and saving the audio files with <code>pytube</code>.</p>
<div id="gist118847915" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-pytube-py" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-python  ">
<div class="js-check-bidi js-blob-code-container blob-code-content">
<table class="highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file" data-hpc="" data-tab-size="8" data-paste-markdown-skip="" data-tagsearch-lang="Python" data-tagsearch-path="whisper-yt-search-pytube.py">
<tbody>
<tr>
<td id="file-whisper-yt-search-pytube-py-L1" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="1"></td>
<td id="file-whisper-yt-search-pytube-py-LC1" class="blob-code blob-code-inner js-file-line"><span class="pl-k">from</span> <span class="pl-s1">pytube</span> <span class="pl-k">import</span> <span class="pl-v">YouTube</span> <span class="pl-c"># !pip install pytube</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L2" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="2"></td>
<td id="file-whisper-yt-search-pytube-py-LC2" class="blob-code blob-code-inner js-file-line"><span class="pl-k">from</span> <span class="pl-s1">pytube</span>.<span class="pl-s1">exceptions</span> <span class="pl-k">import</span> <span class="pl-v">RegexMatchError</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L3" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="3"></td>
<td id="file-whisper-yt-search-pytube-py-LC3" class="blob-code blob-code-inner js-file-line"><span class="pl-k">from</span> <span class="pl-s1">tqdm</span>.<span class="pl-s1">auto</span> <span class="pl-k">import</span> <span class="pl-s1">tqdm</span> <span class="pl-c"># !pip install tqdm</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L4" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="4"></td>
<td id="file-whisper-yt-search-pytube-py-LC4" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L5" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="5"></td>
<td id="file-whisper-yt-search-pytube-py-LC5" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># where to save</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L6" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="6"></td>
<td id="file-whisper-yt-search-pytube-py-LC6" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">save_path</span> <span class="pl-c1">=</span> <span class="pl-s">&#8220;./mp3&#8221;</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L7" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="7"></td>
<td id="file-whisper-yt-search-pytube-py-LC7" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L8" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="8"></td>
<td id="file-whisper-yt-search-pytube-py-LC8" class="blob-code blob-code-inner js-file-line"><span class="pl-k">for</span> <span class="pl-s1">i</span>, <span class="pl-s1">row</span> <span class="pl-c1">in</span> <span class="pl-en">tqdm</span>(<span class="pl-s1">videos_meta</span>):</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L9" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="9"></td>
<td id="file-whisper-yt-search-pytube-py-LC9" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># url of video to be downloaded</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L10" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="10"></td>
<td id="file-whisper-yt-search-pytube-py-LC10" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">url</span> <span class="pl-c1">=</span> <span class="pl-s">f&#8221;https://youtu.be/<span class="pl-s1"><span class="pl-kos">{</span>row[&#8216;Video ID&#8217;]<span class="pl-kos">}</span></span>&#8220;</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L11" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="11"></td>
<td id="file-whisper-yt-search-pytube-py-LC11" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L12" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="12"></td>
<td id="file-whisper-yt-search-pytube-py-LC12" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># try to create a YouTube vid object</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L13" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="13"></td>
<td id="file-whisper-yt-search-pytube-py-LC13" class="blob-code blob-code-inner js-file-line"><span class="pl-k">try</span>:</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L14" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="14"></td>
<td id="file-whisper-yt-search-pytube-py-LC14" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">yt</span> <span class="pl-c1">=</span> <span class="pl-v">YouTube</span>(<span class="pl-s1">url</span>)</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L15" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="15"></td>
<td id="file-whisper-yt-search-pytube-py-LC15" class="blob-code blob-code-inner js-file-line"><span class="pl-k">except</span> <span class="pl-v">RegexMatchError</span>:</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L16" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="16"></td>
<td id="file-whisper-yt-search-pytube-py-LC16" class="blob-code blob-code-inner js-file-line"><span class="pl-en">print</span>(<span class="pl-s">f&#8221;RegexMatchError for &#8216;<span class="pl-s1"><span class="pl-kos">{</span>url<span class="pl-kos">}</span></span>&#8216;&#8221;</span>)</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L17" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="17"></td>
<td id="file-whisper-yt-search-pytube-py-LC17" class="blob-code blob-code-inner js-file-line"><span class="pl-k">continue</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L18" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="18"></td>
<td id="file-whisper-yt-search-pytube-py-LC18" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L19" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="19"></td>
<td id="file-whisper-yt-search-pytube-py-LC19" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">itag</span> <span class="pl-c1">=</span> <span class="pl-c1">None</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L20" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="20"></td>
<td id="file-whisper-yt-search-pytube-py-LC20" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># we only want audio files</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L21" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="21"></td>
<td id="file-whisper-yt-search-pytube-py-LC21" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">files</span> <span class="pl-c1">=</span> <span class="pl-s1">yt</span>.<span class="pl-s1">streams</span>.<span class="pl-en">filter</span>(<span class="pl-s1">only_audio</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L22" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="22"></td>
<td id="file-whisper-yt-search-pytube-py-LC22" class="blob-code blob-code-inner js-file-line"><span class="pl-k">for</span> <span class="pl-s1">file</span> <span class="pl-c1">in</span> <span class="pl-s1">files</span>:</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L23" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="23"></td>
<td id="file-whisper-yt-search-pytube-py-LC23" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># from audio files we grab the first audio for mp4 (eg mp3)</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L24" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="24"></td>
<td id="file-whisper-yt-search-pytube-py-LC24" class="blob-code blob-code-inner js-file-line"><span class="pl-k">if</span> <span class="pl-s1">file</span>.<span class="pl-s1">mime_type</span> <span class="pl-c1">==</span> <span class="pl-s">&#8216;audio/mp4&#8217;</span>:</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L25" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="25"></td>
<td id="file-whisper-yt-search-pytube-py-LC25" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">itag</span> <span class="pl-c1">=</span> <span class="pl-s1">file</span>.<span class="pl-s1">itag</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L26" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="26"></td>
<td id="file-whisper-yt-search-pytube-py-LC26" class="blob-code blob-code-inner js-file-line"><span class="pl-k">break</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L27" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="27"></td>
<td id="file-whisper-yt-search-pytube-py-LC27" class="blob-code blob-code-inner js-file-line"><span class="pl-k">if</span> <span class="pl-s1">itag</span> <span class="pl-c1">is</span> <span class="pl-c1">None</span>:</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L28" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="28"></td>
<td id="file-whisper-yt-search-pytube-py-LC28" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># just incase no MP3 audio is found (shouldn&#8217;t happen)</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L29" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="29"></td>
<td id="file-whisper-yt-search-pytube-py-LC29" class="blob-code blob-code-inner js-file-line"><span class="pl-en">print</span>(<span class="pl-s">&#8220;NO MP3 AUDIO FOUND&#8221;</span>)</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L30" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="30"></td>
<td id="file-whisper-yt-search-pytube-py-LC30" class="blob-code blob-code-inner js-file-line"><span class="pl-k">continue</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L31" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="31"></td>
<td id="file-whisper-yt-search-pytube-py-LC31" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L32" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="32"></td>
<td id="file-whisper-yt-search-pytube-py-LC32" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># get the correct mp3 &#8216;stream&#8217;</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L33" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="33"></td>
<td id="file-whisper-yt-search-pytube-py-LC33" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">stream</span> <span class="pl-c1">=</span> <span class="pl-s1">yt</span>.<span class="pl-s1">streams</span>.<span class="pl-en">get_by_itag</span>(<span class="pl-s1">itag</span>)</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L34" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="34"></td>
<td id="file-whisper-yt-search-pytube-py-LC34" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># downloading the audio</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L35" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="35"></td>
<td id="file-whisper-yt-search-pytube-py-LC35" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">stream</span>.<span class="pl-en">download</span>(</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L36" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="36"></td>
<td id="file-whisper-yt-search-pytube-py-LC36" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">output_path</span><span class="pl-c1">=</span><span class="pl-s1">save_path</span>,</td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L37" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="37"></td>
<td id="file-whisper-yt-search-pytube-py-LC37" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">filename</span><span class="pl-c1">=</span><span class="pl-s">f&#8221;<span class="pl-s1"><span class="pl-kos">{</span>row[&#8216;Video ID&#8217;]<span class="pl-kos">}</span></span>.mp3&#8243;</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-pytube-py-L38" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="38"></td>
<td id="file-whisper-yt-search-pytube-py-LC38" class="blob-code blob-code-inner js-file-line">)</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/51c7d1cd1ad4e8c4b581f05c59881e39/raw/fe6d74835cb9cf1a0fe830ec0bf22f336c176555/whisper-yt-search-pytube.py">view raw</a><a href="https://gist.github.com/jamescalam/51c7d1cd1ad4e8c4b581f05c59881e39#file-whisper-yt-search-pytube-py">whisper-yt-search-pytube.py </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>After this, we should find ~108 audio MP3 files stored in the <code>./mp3</code> directory.</p>
<p><img decoding="async" class="" src="https://d33wubrfki0l68.cloudfront.net/9027b8e7c1bd21239f83d39c1c122da12f679524/8b071/images/openai-whisper-3.png" alt="mp3 files directory" width="1031" height="646" /></p>
<p><small>Downloaded MP3 files in the <code>./mp3</code> directory.</small></p>
<p>With these, we can move on to transcription with OpenAI’s Whisper.</p>
<h2 id="speech-to-text-with-whisper">Speech-to-Text with Whisper</h2>
<p>OpenAI’s Whisper speech-to-text-model is completely open source and available via <a href="https://github.com/openai/whisper">OpenAI’s Whisper library</a> available for <code>pip install</code> via GitHub:</p>
<pre><code>!pip install git+https://github.com/openai/whisper.git</code></pre>
<p>Whisper relies on another software called FFMPEG to convert video and audio files. The installation for this varies by OS [1]; the following cover the primary systems:</p>
<pre><code># on Ubuntu or Debian sudo apt update &amp;&amp; sudo apt install ffmpeg # on Arch Linux sudo pacman -S ffmpeg # on MacOS using Homebrew (https://brew.sh/) brew install ffmpeg # on Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg # on Windows using Scoop (https://scoop.sh/) scoop install ffmpeg</code></pre>
<p>After installation, we download and initialize the <em>large</em> model, moving it to GPU if CUDA is available.</p>
<div id="gist118848073" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-init-whisper-py" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-python  ">
<div class="js-check-bidi js-blob-code-container blob-code-content">
<table class="highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file" data-hpc="" data-tab-size="8" data-paste-markdown-skip="" data-tagsearch-lang="Python" data-tagsearch-path="whisper-yt-search-init-whisper.py">
<tbody>
<tr>
<td id="file-whisper-yt-search-init-whisper-py-L1" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="1"></td>
<td id="file-whisper-yt-search-init-whisper-py-LC1" class="blob-code blob-code-inner js-file-line"><span class="pl-k">import</span> <span class="pl-s1">whisper</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-init-whisper-py-L2" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="2"></td>
<td id="file-whisper-yt-search-init-whisper-py-LC2" class="blob-code blob-code-inner js-file-line"><span class="pl-k">import</span> <span class="pl-s1">torch</span> <span class="pl-c"># install steps: pytorch.org</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-init-whisper-py-L3" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="3"></td>
<td id="file-whisper-yt-search-init-whisper-py-LC3" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-init-whisper-py-L4" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="4"></td>
<td id="file-whisper-yt-search-init-whisper-py-LC4" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">device</span> <span class="pl-c1">=</span> <span class="pl-s">&#8220;cuda&#8221;</span> <span class="pl-k">if</span> <span class="pl-s1">torch</span>.<span class="pl-s1">cuda</span>.<span class="pl-en">is_available</span>() <span class="pl-k">else</span> <span class="pl-s">&#8220;cpu&#8221;</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-init-whisper-py-L5" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="5"></td>
<td id="file-whisper-yt-search-init-whisper-py-LC5" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-init-whisper-py-L6" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="6"></td>
<td id="file-whisper-yt-search-init-whisper-py-LC6" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">whisper</span>.<span class="pl-en">load_model</span>(<span class="pl-s">&#8220;large&#8221;</span>).<span class="pl-en">to</span>(<span class="pl-s1">device</span>)</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/9b41191c65c029602d85c08043f6f683/raw/f183e4344edd3e80e0a7f94ea5afbb103ee0f675/whisper-yt-search-init-whisper.py">view raw</a><a href="https://gist.github.com/jamescalam/9b41191c65c029602d85c08043f6f683#file-whisper-yt-search-init-whisper-py">whisper-yt-search-init-whisper.py </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>Other models are available, and given a smaller GPU (or even CPU) should be considered. We transcribe the audio like so:</p>
<div id="gist118848187" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-transcribe-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="4cab2014-8712-4bbb-b47c-93b6e1abda54" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=03b3d637baaad2ae9c0d294f87815d273d2e7b7c&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f38636439633131313230343539333864666565306539393534323232626331662f7261772f303362336436333762616161643261653963306432393466383738313564323733643265376237632f776869737065722d79742d7365617263682d7472616e7363726962652e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2F8cd9c1112045938dfee0e9954222bc1f&amp;path=whisper-yt-search-transcribe.ipynb&amp;repository_id=118848187&amp;repository_type=Gist#4cab2014-8712-4bbb-b47c-93b6e1abda54" name="4cab2014-8712-4bbb-b47c-93b6e1abda54" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/8cd9c1112045938dfee0e9954222bc1f/raw/03b3d637baaad2ae9c0d294f87815d273d2e7b7c/whisper-yt-search-transcribe.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/8cd9c1112045938dfee0e9954222bc1f#file-whisper-yt-search-transcribe-ipynb">whisper-yt-search-transcribe.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>From this, we have a list of ~27K transcribed audio segments, including text alongside start and end seconds. If you are waiting a long time for this to process, a pre-built version of the dataset is available. Download instructions are in the following section.</p>
<p>The last cell from above is missing the logic required to extract and add the metadata from our <code>videos_dict</code> that we initialized earlier. We add that like so:</p>
<div id="gist118848525" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-build-segments-py" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-python  ">
<div class="js-check-bidi js-blob-code-container blob-code-content">
<table class="highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file" data-hpc="" data-tab-size="8" data-paste-markdown-skip="" data-tagsearch-lang="Python" data-tagsearch-path="whisper-yt-search-build-segments.py">
<tbody>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L1" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="1"></td>
<td id="file-whisper-yt-search-build-segments-py-LC1" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">data</span> <span class="pl-c1">=</span> []</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L2" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="2"></td>
<td id="file-whisper-yt-search-build-segments-py-LC2" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L3" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="3"></td>
<td id="file-whisper-yt-search-build-segments-py-LC3" class="blob-code blob-code-inner js-file-line"><span class="pl-k">for</span> <span class="pl-s1">i</span>, <span class="pl-s1">path</span> <span class="pl-c1">in</span> <span class="pl-en">enumerate</span>(<span class="pl-en">tqdm</span>(<span class="pl-s1">paths</span>)):</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L4" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="4"></td>
<td id="file-whisper-yt-search-build-segments-py-LC4" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">_id</span> <span class="pl-c1">=</span> <span class="pl-s1">path</span>.<span class="pl-en">split</span>(<span class="pl-s">&#8216;/&#8217;</span>)[<span class="pl-c1">&#8211;</span><span class="pl-c1">1</span>][:<span class="pl-c1">&#8211;</span><span class="pl-c1">4</span>]</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L5" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="5"></td>
<td id="file-whisper-yt-search-build-segments-py-LC5" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># transcribe to get speech-to-text data</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L6" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="6"></td>
<td id="file-whisper-yt-search-build-segments-py-LC6" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">result</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-en">transcribe</span>(<span class="pl-s1">path</span>)</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L7" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="7"></td>
<td id="file-whisper-yt-search-build-segments-py-LC7" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">segments</span> <span class="pl-c1">=</span> <span class="pl-s1">result</span>[<span class="pl-s">&#8216;segments&#8217;</span>]</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L8" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="8"></td>
<td id="file-whisper-yt-search-build-segments-py-LC8" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># get the video metadata&#8230;</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L9" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="9"></td>
<td id="file-whisper-yt-search-build-segments-py-LC9" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">video_meta</span> <span class="pl-c1">=</span> <span class="pl-s1">videos_dict</span>[<span class="pl-s1">_id</span>]</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L10" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="10"></td>
<td id="file-whisper-yt-search-build-segments-py-LC10" class="blob-code blob-code-inner js-file-line"><span class="pl-k">for</span> <span class="pl-s1">segment</span> <span class="pl-c1">in</span> <span class="pl-s1">segments</span>:</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L11" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="11"></td>
<td id="file-whisper-yt-search-build-segments-py-LC11" class="blob-code blob-code-inner js-file-line"><span class="pl-c"># merge segments data and videos_meta data</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L12" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="12"></td>
<td id="file-whisper-yt-search-build-segments-py-LC12" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">meta</span> <span class="pl-c1">=</span> {</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L13" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="13"></td>
<td id="file-whisper-yt-search-build-segments-py-LC13" class="blob-code blob-code-inner js-file-line"><span class="pl-c1">**</span><span class="pl-s1">video_meta</span>,</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L14" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="14"></td>
<td id="file-whisper-yt-search-build-segments-py-LC14" class="blob-code blob-code-inner js-file-line"><span class="pl-c1">**</span>{</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L15" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="15"></td>
<td id="file-whisper-yt-search-build-segments-py-LC15" class="blob-code blob-code-inner js-file-line"><span class="pl-s">&#8220;id&#8221;</span>: <span class="pl-s">f&#8221;<span class="pl-s1"><span class="pl-kos">{</span>_id<span class="pl-kos">}</span></span>-t<span class="pl-s1"><span class="pl-kos">{</span>segments[j][&#8216;start&#8217;]<span class="pl-kos">}</span></span>&#8220;</span>,</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L16" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="16"></td>
<td id="file-whisper-yt-search-build-segments-py-LC16" class="blob-code blob-code-inner js-file-line"><span class="pl-s">&#8220;text&#8221;</span>: <span class="pl-s1">segment</span>[<span class="pl-s">&#8220;text&#8221;</span>].<span class="pl-en">strip</span>(),</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L17" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="17"></td>
<td id="file-whisper-yt-search-build-segments-py-LC17" class="blob-code blob-code-inner js-file-line"><span class="pl-s">&#8220;start&#8221;</span>: <span class="pl-s1">segment</span>[<span class="pl-s">&#8216;start&#8217;</span>],</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L18" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="18"></td>
<td id="file-whisper-yt-search-build-segments-py-LC18" class="blob-code blob-code-inner js-file-line"><span class="pl-s">&#8220;end&#8221;</span>: <span class="pl-s1">segment</span>[<span class="pl-s">&#8216;end&#8217;</span>]</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L19" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="19"></td>
<td id="file-whisper-yt-search-build-segments-py-LC19" class="blob-code blob-code-inner js-file-line">}</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L20" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="20"></td>
<td id="file-whisper-yt-search-build-segments-py-LC20" class="blob-code blob-code-inner js-file-line">}</td>
</tr>
<tr>
<td id="file-whisper-yt-search-build-segments-py-L21" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="21"></td>
<td id="file-whisper-yt-search-build-segments-py-LC21" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">data</span>.<span class="pl-en">append</span>(<span class="pl-s1">meta</span>)</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/4e6e978b5dcf7c4277d46f5f4a74798f/raw/7c18e20e8895e6cf46f11497a2a87e0799aad226/whisper-yt-search-build-segments.py">view raw</a><a href="https://gist.github.com/jamescalam/4e6e978b5dcf7c4277d46f5f4a74798f#file-whisper-yt-search-build-segments-py">whisper-yt-search-build-segments.py </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>After processing all of the segments, they are saved to file as a JSON lines file with:</p>
<div id="gist118848786" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-save-transcriptions-py" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-python  ">
<div class="js-check-bidi js-blob-code-container blob-code-content">
<table class="highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file" data-hpc="" data-tab-size="8" data-paste-markdown-skip="" data-tagsearch-lang="Python" data-tagsearch-path="whisper-yt-search-save-transcriptions.py">
<tbody>
<tr>
<td id="file-whisper-yt-search-save-transcriptions-py-L1" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="1"></td>
<td id="file-whisper-yt-search-save-transcriptions-py-LC1" class="blob-code blob-code-inner js-file-line"><span class="pl-k">import</span> <span class="pl-s1">json</span></td>
</tr>
<tr>
<td id="file-whisper-yt-search-save-transcriptions-py-L2" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="2"></td>
<td id="file-whisper-yt-search-save-transcriptions-py-LC2" class="blob-code blob-code-inner js-file-line"></td>
</tr>
<tr>
<td id="file-whisper-yt-search-save-transcriptions-py-L3" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="3"></td>
<td id="file-whisper-yt-search-save-transcriptions-py-LC3" class="blob-code blob-code-inner js-file-line"><span class="pl-k">with</span> <span class="pl-en">open</span>(<span class="pl-s">&#8220;youtube-transcriptions.jsonl&#8221;</span>, <span class="pl-s">&#8220;w&#8221;</span>, <span class="pl-s1">encoding</span><span class="pl-c1">=</span><span class="pl-s">&#8220;utf-8&#8221;</span>) <span class="pl-k">as</span> <span class="pl-s1">fp</span>:</td>
</tr>
<tr>
<td id="file-whisper-yt-search-save-transcriptions-py-L4" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="4"></td>
<td id="file-whisper-yt-search-save-transcriptions-py-LC4" class="blob-code blob-code-inner js-file-line"><span class="pl-k">for</span> <span class="pl-s1">line</span> <span class="pl-c1">in</span> <span class="pl-en">tqdm</span>(<span class="pl-s1">data</span>):</td>
</tr>
<tr>
<td id="file-whisper-yt-search-save-transcriptions-py-L5" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="5"></td>
<td id="file-whisper-yt-search-save-transcriptions-py-LC5" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">json</span>.<span class="pl-en">dump</span>(<span class="pl-s1">line</span>, <span class="pl-s1">fp</span>)</td>
</tr>
<tr>
<td id="file-whisper-yt-search-save-transcriptions-py-L6" class="blob-num js-line-number js-code-nav-line-number js-blob-rnum" data-line-number="6"></td>
<td id="file-whisper-yt-search-save-transcriptions-py-LC6" class="blob-code blob-code-inner js-file-line"><span class="pl-s1">fp</span>.<span class="pl-en">write</span>(<span class="pl-s">&#8216;<span class="pl-cce">\n</span>&#8216;</span>)</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/e645ea4499e3e7984d4392325566573e/raw/6a0f2547e80e4fea7890719bb455ace3f7bc1d40/whisper-yt-search-save-transcriptions.py">view raw</a><a href="https://gist.github.com/jamescalam/e645ea4499e3e7984d4392325566573e#file-whisper-yt-search-save-transcriptions-py">whisper-yt-search-save-transcriptions.py </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>With that ready, let’s build the QA embeddings and vector search component.</p>
<h2 id="question-answering">Question-Answering</h2>
<p>On Hugging Face <em>Datasets</em>, you can find the data I scraped in a dataset called <code>jamescalam/youtube-transcriptions</code>:</p>
<div id="gist118848889" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-get-transcriptions-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="6605bf6a-ed33-423c-b81e-63895d87dff6" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=8fbb91a064a2898da35d4c095ca51a93487421f1&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f33653531376332656535666166666232626364626539663732376539643232372f7261772f386662623931613036346132383938646133356434633039356361353161393334383734323166312f776869737065722d79742d7365617263682d6765742d7472616e736372697074696f6e732e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2F3e517c2ee5faffb2bcdbe9f727e9d227&amp;path=whisper-yt-search-get-transcriptions.ipynb&amp;repository_id=118848889&amp;repository_type=Gist#6605bf6a-ed33-423c-b81e-63895d87dff6" name="6605bf6a-ed33-423c-b81e-63895d87dff6" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/3e517c2ee5faffb2bcdbe9f727e9d227/raw/8fbb91a064a2898da35d4c095ca51a93487421f1/whisper-yt-search-get-transcriptions.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/3e517c2ee5faffb2bcdbe9f727e9d227#file-whisper-yt-search-get-transcriptions-ipynb">whisper-yt-search-get-transcriptions.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>For now, the dataset only contains videos from my personal channel, but I will add more videos from other ML-focused channels in the future.</p>
<p>The data includes a short chunk of text (the transcribed audio). Each chunk is relatively meaningless:</p>
<div id="gist118848859" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-short-segments-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="a43b2443-da86-43b9-bb97-c2d9bdface4d" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=5548ec17e93f0f54cb0b8b1f968709a3fa3cc597&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f64326562653032376336636438313236636166643362346632643338353235612f7261772f353534386563313765393366306635346362306238623166393638373039613366613363633539372f776869737065722d79742d7365617263682d73686f72742d7365676d656e74732e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2Fd2ebe027c6cd8126cafd3b4f2d38525a&amp;path=whisper-yt-search-short-segments.ipynb&amp;repository_id=118848859&amp;repository_type=Gist#a43b2443-da86-43b9-bb97-c2d9bdface4d" name="a43b2443-da86-43b9-bb97-c2d9bdface4d" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/d2ebe027c6cd8126cafd3b4f2d38525a/raw/5548ec17e93f0f54cb0b8b1f968709a3fa3cc597/whisper-yt-search-short-segments.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/d2ebe027c6cd8126cafd3b4f2d38525a#file-whisper-yt-search-short-segments-ipynb">whisper-yt-search-short-segments.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>Ideally, we want chunks of text 4-6x larger than this to capture enough meaning to be helpful. We do this by simply iterating over the dataset and merging every <em>six</em> segments.</p>
<div id="gist118848939" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-longer-segments-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="5ecd865a-1dd5-4a85-93ba-e6cf986307da" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=9ab91b16bf41870cc4949fd2107c8b8295aa03cb&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f31376466343031333364313163336332356161396634303435633964313134352f7261772f396162393162313662663431383730636334393439666432313037633862383239356161303363622f776869737065722d79742d7365617263682d6c6f6e6765722d7365676d656e74732e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2F17df40133d11c3c25aa9f4045c9d1145&amp;path=whisper-yt-search-longer-segments.ipynb&amp;repository_id=118848939&amp;repository_type=Gist#5ecd865a-1dd5-4a85-93ba-e6cf986307da" name="5ecd865a-1dd5-4a85-93ba-e6cf986307da" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/17df40133d11c3c25aa9f4045c9d1145/raw/9ab91b16bf41870cc4949fd2107c8b8295aa03cb/whisper-yt-search-longer-segments.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/17df40133d11c3c25aa9f4045c9d1145#file-whisper-yt-search-longer-segments-ipynb">whisper-yt-search-longer-segments.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>A few things are happening here. First, we’re merging every six segments, as explained before. However, doing this alone will likely cut a lot of meaning between related segments.</p>
<p><img decoding="async" class="" src="https://d33wubrfki0l68.cloudfront.net/a0373a295dbbbe3c3ca686687b47a4e6c1aba11b/0c6f2/images/openai-whisper-4.png" alt="window-no-overlap" width="827" height="364" /></p>
<p><small>Even when merging segments we’re still left with a point where we must split the text (annotated with red cross-mark above). This can lead to us missing important information.</small></p>
<p>A common technique to avoid cutting related segments is adding some <em>overlap</em> between segments, where <code>stride</code> is used. For each step, we move <em>three</em> segments forward while merging <em>six</em> segments. By doing this, any meaningful segments cut in one step will be included in the next.</p>
<p><img loading="lazy" decoding="async" class="" src="https://d33wubrfki0l68.cloudfront.net/06f2cfe89c666aeddb1204b9741ae8a964460fb3/f6190/images/openai-whisper-5.png" alt="window-overlap" width="829" height="362" /></p>
<p><small>We can avoid this loss of meaning by adding an overlap when merging segments. It returns more data but means we are much less likely to cut between meaning segments.</small></p>
<p>With this, we have larger and more meaningful chunks of text. Now we need to encode them with a QA embedding model. Many high-performing, pretrained QA models are available via Hugging Face <em>Transformers</em> and the <em>Sentence Transformers</em> library. We will use one called <a href="https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1"><code>multi-qa-mpnet-base-dot-v1</code></a>.</p>
<div id="gist118848976" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-init-encoder-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="2e2ede93-df7b-44ce-b037-8355f0809b1b" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=7b8b0b3094761e66ebf905ddf933be9872f644f6&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f30643266616337326636643630613339666666343737613163326364663362612f7261772f376238623062333039343736316536366562663930356464663933336265393837326636343466362f776869737065722d79742d7365617263682d696e69742d656e636f6465722e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2F0d2fac72f6d60a39fff477a1c2cdf3ba&amp;path=whisper-yt-search-init-encoder.ipynb&amp;repository_id=118848976&amp;repository_type=Gist#2e2ede93-df7b-44ce-b037-8355f0809b1b" name="2e2ede93-df7b-44ce-b037-8355f0809b1b" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/0d2fac72f6d60a39fff477a1c2cdf3ba/raw/7b8b0b3094761e66ebf905ddf933be9872f644f6/whisper-yt-search-init-encoder.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/0d2fac72f6d60a39fff477a1c2cdf3ba#file-whisper-yt-search-init-encoder-ipynb">whisper-yt-search-init-encoder.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>Using this model, we can encode a passage of text to a <em>meaningful</em> 768-dimensional vector with <code>model.encode("&lt;some text&gt;")</code>. Encoding all of our segments at once or storing them locally would require too much compute or memory — so we first initialize the vector database where they will be stored:</p>
<div id="gist118849004" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-init-pinecone-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="38e5e5c8-4247-42f6-946a-56879b89ede5" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=a11d7c1d105515aa0327dcc0a42fed4c2d7b77f2&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f65656261613262643435346635383237623736343330386137313437373436632f7261772f613131643763316431303535313561613033323764636330613432666564346332643762373766322f776869737065722d79742d7365617263682d696e69742d70696e65636f6e652e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2Feebaa2bd454f5827b764308a7147746c&amp;path=whisper-yt-search-init-pinecone.ipynb&amp;repository_id=118849004&amp;repository_type=Gist#38e5e5c8-4247-42f6-946a-56879b89ede5" name="38e5e5c8-4247-42f6-946a-56879b89ede5" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/eebaa2bd454f5827b764308a7147746c/raw/a11d7c1d105515aa0327dcc0a42fed4c2d7b77f2/whisper-yt-search-init-pinecone.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/eebaa2bd454f5827b764308a7147746c#file-whisper-yt-search-init-pinecone-ipynb">whisper-yt-search-init-pinecone.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>We should see that the index (vector database) is currently empty with a <code>total_vector_count</code> of <code>0</code>. Now we can begin encoding our segments and inserting the embeddings (and metadata) into our index.</p>
<div id="gist118849042" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-index-vecs-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="e0517d3d-2c66-455a-b588-75842ec40888" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=c582150a54ba8fe07d9e5e4b702e730b070b9bd2&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f34386134316239363663303634646164656431663638636561393164336638362f7261772f633538323135306135346261386665303764396535653462373032653733306230373062396264322f776869737065722d79742d7365617263682d696e6465782d766563732e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2F48a41b966c064daded1f68cea91d3f86&amp;path=whisper-yt-search-index-vecs.ipynb&amp;repository_id=118849042&amp;repository_type=Gist#e0517d3d-2c66-455a-b588-75842ec40888" name="e0517d3d-2c66-455a-b588-75842ec40888" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/48a41b966c064daded1f68cea91d3f86/raw/c582150a54ba8fe07d9e5e4b702e730b070b9bd2/whisper-yt-search-index-vecs.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/48a41b966c064daded1f68cea91d3f86#file-whisper-yt-search-index-vecs-ipynb">whisper-yt-search-index-vecs.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>That is everything we needed to prepare our data and add everything to the vector database. All that is left is querying and returning results.</p>
<h2 id="making-queries">Making Queries</h2>
<p>Queries are straightforward to make; we:</p>
<ol>
<li>Encode the query using the same embedding model we used to encode the segments.</li>
<li>Pass to query to our index.</li>
</ol>
<p>We do that with the following:</p>
<div id="gist118849355" class="gist">
<div class="gist-file" translate="no">
<div class="gist-data">
<div class="js-gist-file-update-container js-task-list-container file-box">
<div id="file-whisper-yt-search-query-ipynb" class="file my-2">
<div class="Box-body p-0 blob-wrapper data type-jupyter-notebook  ">
<div class="render-wrapper ">
<div class="render-container is-render-pending js-render-target " data-identity="2c15dc64-c8b8-45f6-bfe4-c3cb9423181e" data-host="https://notebooks.githubusercontent.com" data-type="ipynb"><iframe class="render-viewer " title="File display" src="https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&amp;commit=2a439b5f3964c0d504b2478b7477b2ce247377bc&amp;enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f6a616d657363616c616d2f32396235656564346135303338636436363533313135663434633332353761322f7261772f326134333962356633393634633064353034623234373862373437376232636532343733373762632f776869737065722d79742d7365617263682d71756572792e6970796e62&amp;logged_in=false&amp;nwo=jamescalam%2F29b5eed4a5038cd6653115f44c3257a2&amp;path=whisper-yt-search-query.ipynb&amp;repository_id=118849355&amp;repository_type=Gist#2c15dc64-c8b8-45f6-bfe4-c3cb9423181e" name="2c15dc64-c8b8-45f6-bfe4-c3cb9423181e" sandbox="allow-scripts allow-same-origin allow-top-navigation" data-mce-fragment="1"></iframe></div>
</div>
</div>
</div>
</div>
</div>
<div class="gist-meta"><a href="https://gist.github.com/jamescalam/29b5eed4a5038cd6653115f44c3257a2/raw/2a439b5f3964c0d504b2478b7477b2ce247377bc/whisper-yt-search-query.ipynb">view raw</a><a href="https://gist.github.com/jamescalam/29b5eed4a5038cd6653115f44c3257a2#file-whisper-yt-search-query-ipynb">whisper-yt-search-query.ipynb </a>hosted with <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2764.png" alt="❤" class="wp-smiley" style="height: 1em; max-height: 1em;" /> by <a href="https://github.com/">GitHub</a></div>
</div>
</div>
<p>These results are relevant to the question; three, in particular, are from a similar location in the same video. We might want to improve the search interface to be more user-friendly than a Jupyter Notebook.</p>
<p>One of the easiest ways to get a web-based search UI up and running is with Hugging Face <em>Spaces</em> and Streamlit (or Gradio if preferred).</p>
<p>We won’t go through the code here, but if you’re familiar with Streamlit, you can build a search app quite easily within a few hours. Or you can use our example and do it in 5-10 minutes.</p>
<p>When querying again for <code>"what is OpenAI's clip?"</code> we can see that multiple results from a single video are merged. With this, we can jump to each segment by clicking on the part of the text that is most interesting to us.</p>
<p>Try a few more queries like:</p>
<pre><code>What is the best unsupervised method to train a sentence transformer? What is vector search? How can I train a sentence transformer with little-to-no data?</code></pre>
<hr />
<p>We can build incredible speech-enabled search apps very quickly using Whisper alongside Hugging Face, sentence transformers, and Pinecone’s <a href="https://www.pinecone.io/learn/vector-database">vector database</a>.</p>
<p>Whisper has unlocked a entire modality — the spoken word — and it’s only a matter of time before we see a significant increase in speech-enabled search and other speech-centric use cases.</p>
<p>Both machine learning and vector search have seen exponential growth in the past years. These technologies already seem like sci-fi. Yet, despite the incredible performance of everything we used here, it’s only a matter of time before all of this gets <em>even better</em>.</p>
<p><a href="https://www.pinecone.io/learn/openai-whisper/" target="_blank" rel="noopener">source</a></p>
<p><iframe title="How to do Free Speech-to-Text Transcription Better Than Google Premium API with OpenAI Whisper Model" width="640" height="360" src="https://www.youtube.com/embed/msj3wuYf3d8?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
