Open AI Archives - Good Shepherd News - Fastest Growing Religious, Free Speech & Political Content

Big Tech was moving cautiously on AI. Then came ChatGPT.

The Truth News — Sat, 28 Jan 2023 00:27:10 +0000

Big Tech was moving cautiously on AI.
Then came ChatGPT.

Google, Facebook and Microsoft helped build the scaffolding of AI. Smaller companies are taking it to the masses, forcing Big Tech to react.

What is ChatGPT?

Three months before ChatGPT debuted in November, Facebook’s parent company Meta released a similar chatbot. But unlike the phenomenon that ChatGPT instantly became, with more than a million users in its first five days, Meta’s Blenderbot was boring, said Meta’s chief artificial intelligence scientist, Yann LeCun.

“The reason it was boring was because it was made safe,” LeCun said last week at a forum hosted by AI consulting company Collective[i]. He blamed the tepid public response on Meta being “overly careful about content moderation,” like directing the chatbot to change the subject if a user asked about religion. ChatGPT, on the other hand, will converse about the concept of falsehoods in the Quran, write a prayer for a rabbi to deliver to Congress and compare God to a flyswatter.

ChatGPT is quickly going mainstream now that Microsoft — which recently invested billions of dollars in the company behind the chatbot, OpenAI — is working to incorporate it into its popular office software and selling access to the tool to other businesses. The surge of attention around ChatGPT is prompting pressure inside tech giants including Meta and Google to move faster, potentially sweeping safety concerns aside, according to interviews with six current and former Google and Meta employees, some of whom spoke on the condition of anonymity because they were not authorized to speak.

At Meta, employees have recently shared internal memos urging the company to speed up its AI approval process to take advantage of the latest technology, according to one of them. Google, which helped pioneer some of the technology underpinning ChatGPT, recently issued a “code red” around launching AI products and proposed a “green lane” to shorten the process of assessing and mitigating potential harms, according to a report in the New York Times.

ChatGPT, along with text-to-image tools such as DALL-E 2 and Stable Diffusion, is part of a new wave of software called generative AI. They create works of their own by drawing on patterns they’ve identified in vast troves of existing, human-created content. This technology was pioneered at big tech companies like Google that in recent years have grown more secretive, announcing new models or offering demos but keeping the full product under lock and key. Meanwhile, research labs like OpenAI rapidly launched their latest versions, raising questions about how corporate offerings, like Google’s language model LaMDA, stack up.

Tech giants have been skittish since public debacles like Microsoft’s Tay, which it took down in less than a day in 2016 after trolls prompted the bot to call for a race war, suggest Hitler was right and tweet “Jews did 9/11.” Meta defended Blenderbot and left it up after it made racist comments in August, but pulled down another AI tool, called Galactica, in November after just three days amid criticism over its inaccurate and sometimes biased summaries of scientific research.

“People feel like OpenAI is newer, fresher, more exciting and has fewer sins to pay for than these incumbent companies, and they can get away with this for now,” said a Google employee who works in AI, referring to the public’s willingness to accept ChatGPT with less scrutiny. Some top talent has jumped ship to nimbler start-ups, like OpenAI and Stable Diffusion.

Some AI ethicists fear that Big Tech’s rush to market could expose billions of people to potential harms — such as sharing inaccurate information, generating fake photos or giving students the ability to cheat on school tests — before trust and safety experts have been able to study the risks. Others in the field share OpenAI’s philosophy that releasing the tools to the public, often nominally in a “beta” phase after mitigating some predictable risks, is the only way to assess real world harms.

“The pace of progress in AI is incredibly fast, and we are always keeping an eye on making sure we have efficient review processes, but the priority is to make the right decisions, and release AI models and products that best serve our community,” said Joelle Pineau, managing director of Fundamental AI Research at Meta.

“We believe that AI is foundational and transformative technology that is incredibly useful for individuals, businesses and communities,” said Lily Lin, a Google spokesperson. “We need to consider the broader societal impacts these innovations can have. We continue to test our AI technology internally to make sure it’s helpful and safe.”

Microsoft’s chief of communications, Frank Shaw, said his company works with OpenAI to build in extra safety mitigations when it uses AI tools like DALLE-2 in its products. “Microsoft has been working for years to both advance the field of AI and publicly guide how these technologies are created and used on our platforms in responsible and ethical ways,” Shaw said.

OpenAI declined to comment.

The technology underlying ChatGPT isn’t necessarily better than what Google and Meta have developed, said Mark Riedl, professor of computing at Georgia Tech and an expert on machine learning. But OpenAI’s practice of releasing its language models for public use has given it a real advantage.

“For the last two years they’ve been using a crowd of humans to provide feedback to GPT,” said Riedl, such as giving a “thumbs down” for an inappropriate or unsatisfactory answer, a process called “reinforcement learning from human feedback.”

Silicon Valley’s sudden willingness to consider taking more reputational risk arrives as tech stocks are tumbling. When Google laid off 12,000 employees last week, CEO Sundar Pichai wrote that the company had undertaken a rigorous review to focus on its highest priorities, twice referencing its early investments in AI.

A decade ago, Google was the undisputed leader in the field. It acquired the cutting edge AI lab DeepMind in 2014 and open-sourced its machine learning software TensorFlow in 2015. By 2016, Pichai pledged to transform Google into an “AI first” company.

The next year, Google released transformers — a pivotal piece of software architecture that made the current wave of generative AI possible.

The company kept rolling out state-of-the-art technology that propelled the entire field forward, deploying some AI breakthroughs in understanding language to improve Google search. Inside big tech companies, the system of checks and balances for vetting the ethical implications of cutting-edge AI isn’t as established as privacy or data security. Typically teams of AI researchers and engineers publish papers on their findings, incorporate their technology into the company’s existing infrastructure or develop new products, a process that can sometimes clash with other teams working on responsible AI over pressure to see innovation reach the public sooner.

Google released its AI principles in 2018, after facing employee protest over Project Maven, a contract to provide computer vision for Pentagon drones, and consumer backlash over a demo for Duplex, an AI system that would call restaurants and make a reservation without disclosing it was a bot. In August last year, Google began giving consumers access to a limited version of LaMDA through its app AI Test Kitchen. It has not yet released it fully to the general public, despite Google’s plans to do so at the end of 2022, according to former Google software engineer Blake Lemoine, who told The Washington Post that he had come to believe LaMDA was sentient.

[The Google engineer who thinks the company’s AI has come to life]

But the top AI talent behind these developments grew restless.

In the past year or so, top AI researchers from Google have left to launch start-ups around large language models, including Character.AI, Cohere, Adept, Inflection.AI and Inworld AI, in addition to search start-ups using similar models to develop a chat interface, such as Neeva, run by former Google executive Sridhar Ramaswamy.

Character.AI founder Noam Shazeer, who helped invent the transformer and other core machine learning architecture, said the flywheel effect of user data has been invaluable. The first time he applied user feedback to Character.AI, which allows anyone to generate chatbots based on short descriptions of real people or imaginary figures, engagement rose by more than 30 percent.

Bigger companies like Google and Microsoft are generally focused on using AI to improve their massive existing business models, said Nick Frosst, who worked at Google Brain for three years before co-founding Cohere, a Toronto-based start-up building large language models that can be customized to help businesses. One of his co-founders, Aidan Gomez, also helped invent transformers when he worked at Google.

“The space moves so quickly, it’s not surprising to me that the people leading are smaller companies,” said Frosst.

AI has been through several hype cycles over the past decade, but the furor over DALL-E and ChatGPT has reached new heights.

Soon after OpenAI released ChatGPT, tech influencers on Twitter began to predict that generative AI would spell the demise of Google search. ChatGPT delivered simple answers in an accessible way and didn’t ask users to rifle through blue links. Besides, after a quarter of a century, Google’s search interface had grown bloated with ads and marketers trying to game the system.

“Thanks to their monopoly position, the folks over at Mountain View have [let] their once-incredible search experience degenerate into a spam-ridden, SEO-fueled hellscape,” technologist Can Duruk wrote in his newsletter Margins, referring to Google’s hometown.

On the anonymous app Blind, tech workers posted dozens of questions about whether the Silicon Valley giant could compete.

“If Google doesn’t get their act together and start shipping, they will go down in history as the company who nurtured and trained an entire generation of machine learning researchers and engineers who went on to deploy the technology at other companies,” tweeted David Ha, a renowned research scientist who recently left Google Brain for the open source text-to-image start-up Stable Diffusion.

AI engineers still inside Google shared his frustration, employees say. For years, employees had sent memos about incorporating chat functions into search, viewing it as an obvious evolution, according to employees. But they also understood that Google had justifiable reasons not to be hasty about switching up its search product, beyond the fact that responding to a query with one answer eliminates valuable real estate for online ads. A chatbot that pointed to one answer directly from Google could increase its liability if the response was found to be harmful or plagiarized.

Chatbots like OpenAI routinely make factual errors and often switch their answers depending on how a question is asked. Moving from providing a range of answers to queries that link directly to their source material, to using a chatbot to give a single, authoritative answer, would be a big shift that makes many inside Google nervous, said one former Google AI researcher. The company doesn’t want to take on the role or responsibility of providing single answers like that, the person said. Previous updates to search, such as adding Instant Answers, were done slowly and with great caution.

Inside Google, however, some of the frustration with the AI safety process came from the sense that cutting-edge technology was never released as a product because of fears of bad publicity — if, say, an AI model showed bias.

How AI is changing culture

How we make art

AI image generators are trained to “understand” the content of hundreds of millions of images, usually scraped from the internet (possibly including yours), in order to create new images out of thin air. Try one out for yourself.

1/3

Meta employees have also had to deal with the company’s concerns about bad PR, according to a person familiar with the company’s internal deliberations who spoke on the condition of anonymity to discuss internal conversations. Before launching new products or publishing research, Meta employees have to answer questions about the potential risks of publicizing their work, including how it could be misinterpreted, the person said. Some projects are reviewed by public relations staff, as well as internal compliance experts who ensure the company’s products comply with its 2011 Federal Trade Commission agreement on how it handles user data.

To Timnit Gebru, executive director of the nonprofit Distributed AI Research Institute, the prospect of Google sidelining its responsible AI team doesn’t necessarily signal a shift in power or safety concerns, because those warning of the potential harms were never empowered to begin with. “If we were lucky, we’d get invited to a meeting,” said Gebru, who helped lead Google’s Ethical AI team until she was fired for a paper criticizing large language models.

From Gebru’s perspective, Google was slow to release its AI tools because the company lacked a strong enough business incentive to risk a hit to its reputation.

Rumman Chowdhury, who led Twitter’s machine-learning ethics team until Elon Musk disbanded it in November, said she expects companies like Google to increasingly sideline internal critics and ethicists as they scramble to catch up with OpenAI.

“We thought it was going to be China pushing the U.S., but looks like it’s start-ups,” she said.

By Nitasha Tiku, Gerrit De Vynck and Will Oremus source

Amazon Warns Employees to Beware of ChatGPT

At the same time, OpenAI’s Chat GPT gave correct answers to interview questions for a software coding position.

By Kevin Hurler source

ChatGPT has been making the tech industry sweat since its rise in popularity last year, and now Amazon is feeling the heat too. According to internal communications from the company as viewed by Insider, an Amazon lawyer has urged employees not to share code with the AI chatbot.

Insider reported earlier this week that the lawyer specifically requested that employees not share “any Amazon confidential information (including Amazon code you are working on)” with ChatGPT, according to screenshots of Slack messages reviewed by the outlet. The guidance comes after the company reportedly witnessed ChatGPT responses that have mimicked internal Amazon data.

Fixing YouTube Search with OpenAI’s Whisper

The Truth News — Wed, 18 Jan 2023 10:27:41 +0000

Fixing YouTube Search with OpenAI’s Whisper

OpenAI’s Whisper is a new state-of-the-art (SotA) model in speech-to-text. It is able to almost flawlessly transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise.

The domain of spoken word has always been somewhat out of reach for ML use-cases. Whisper changes that for speech-centric use cases. We will demonstrate the power of Whisper alongside other technologies like transformers and vector search by building a new and improved YouTube search.

Search on YouTube is good but has its limitations, especially when it comes to answering questions. With trillions of hours of content, there should be an answer to almost every question. Yet, if we have a specific question like “what is OpenAI’s CLIP?”, instead of a concise answer we get lots of very long videos that we must watch through.

What if all we want is a short 20-second explanation? The current YouTube search has no solution for this. Maybe there’s a good reason to encourage users to watch as much of a video as possible (more ads, anyone?).

Whisper is the solution to this problem and many others involving the spoken word. In this article, we’ll explore the idea behind a better speech-enabled search.

The Idea

We want to get specific timestamps that answer our search queries. YouTube does support time-specific links in videos, so a more precise search with these links should be possible.

Timestamp URLs can be copied directly from a video, we can use the same URL format in our search app.

To build something like this, we first need to transcribe the audio in our videos to text. YouTube automatically captions every video, and the captions are okay — but OpenAI just open-sourced something called “Whisper”.

Whisper is best described as the GPT-3 or DALL-E 2 of speech-to-text. It’s open source and can transcribe audio in real-time or faster with unparalleled performance. That seems like the most exciting option.

Once we have our transcribed text and the timestamps for each text snippet, we can move on to the question-answering (QA) part. QA is a form of search where given a natural language query like “what is OpenAI’s Whisper?” we can return accurate natural language answers.

We can think of QA as the most intuitive form of searching for information because it is how we ask other people for information. The only difference being we type the question into a search bar rather than verbally communicate it — for now.

How does all of this look?

Overview of the process used in our demo. Covering OpenAI’s Whisper, sentence transformers, the Pinecone vector database, and more.

Now let’s color in the details and walk through the steps.

Video Data

The first step is to download our YouTube video data and extract the audio attached to each video. Fortunately, there’s a Python library for exactly that called pytube.

With pytube, we provide a video ID (found in the URL bar or downloadable if you have a channel). I directly downloaded a summary of channel content, including IDs, titles, publication dates, etc., via YouTube. This same data is available via Hugging Face Datasets in a dataset called jamescalam/channel-metadata.

view raw whisper-yt-search-channel-meta.ipynb hosted with by GitHub

We’re most interested in the Title and Video ID fields. With the video ID, we can begin downloading the videos and saving the audio files with pytube.

from pytube import YouTube # !pip install pytube

from pytube.exceptions import RegexMatchError

from tqdm.auto import tqdm # !pip install tqdm

# where to save

save_path = “./mp3”

for i, row in tqdm(videos_meta):

# url of video to be downloaded

url = f”https://youtu.be/{row[‘Video ID’]}“

# try to create a YouTube vid object

try:

yt = YouTube(url)

except RegexMatchError:

print(f”RegexMatchError for ‘{url}‘”)

continue

itag = None

# we only want audio files

files = yt.streams.filter(only_audio=True)

for file in files:

# from audio files we grab the first audio for mp4 (eg mp3)

if file.mime_type == ‘audio/mp4’:

itag = file.itag

break

if itag is None:

# just incase no MP3 audio is found (shouldn’t happen)

print(“NO MP3 AUDIO FOUND”)

continue

# get the correct mp3 ‘stream’

stream = yt.streams.get_by_itag(itag)

# downloading the audio

stream.download(

output_path=save_path,

filename=f”{row[‘Video ID’]}.mp3″

)

view raw whisper-yt-search-pytube.py hosted with by GitHub

After this, we should find ~108 audio MP3 files stored in the ./mp3 directory.

Downloaded MP3 files in the ./mp3 directory.

With these, we can move on to transcription with OpenAI’s Whisper.

Speech-to-Text with Whisper

OpenAI’s Whisper speech-to-text-model is completely open source and available via OpenAI’s Whisper library available for pip install via GitHub:

!pip install git+https://github.com/openai/whisper.git

Whisper relies on another software called FFMPEG to convert video and audio files. The installation for this varies by OS [1]; the following cover the primary systems:

# on Ubuntu or Debian sudo apt update && sudo apt install ffmpeg # on Arch Linux sudo pacman -S ffmpeg # on MacOS using Homebrew (https://brew.sh/) brew install ffmpeg # on Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg # on Windows using Scoop (https://scoop.sh/) scoop install ffmpeg

After installation, we download and initialize the large model, moving it to GPU if CUDA is available.

import whisper

import torch # install steps: pytorch.org

device = “cuda” if torch.cuda.is_available() else “cpu”

model = whisper.load_model(“large”).to(device)

view raw whisper-yt-search-init-whisper.py hosted with by GitHub

Other models are available, and given a smaller GPU (or even CPU) should be considered. We transcribe the audio like so:

view raw whisper-yt-search-transcribe.ipynb hosted with by GitHub

From this, we have a list of ~27K transcribed audio segments, including text alongside start and end seconds. If you are waiting a long time for this to process, a pre-built version of the dataset is available. Download instructions are in the following section.

The last cell from above is missing the logic required to extract and add the metadata from our videos_dict that we initialized earlier. We add that like so:

data = []

for i, path in enumerate(tqdm(paths)):

_id = path.split(‘/’)[–1][:–4]

# transcribe to get speech-to-text data

result = model.transcribe(path)

segments = result[‘segments’]

# get the video metadata…

video_meta = videos_dict[_id]

for segment in segments:

# merge segments data and videos_meta data

meta = {

**video_meta,

**{

“id”: f”{_id}-t{segments[j][‘start’]}“,

“text”: segment[“text”].strip(),

“start”: segment[‘start’],

“end”: segment[‘end’]

}

}

data.append(meta)

view raw whisper-yt-search-build-segments.py hosted with by GitHub

After processing all of the segments, they are saved to file as a JSON lines file with:

import json

with open(“youtube-transcriptions.jsonl”, “w”, encoding=“utf-8”) as fp:

for line in tqdm(data):

json.dump(line, fp)

fp.write(‘\n‘)

view raw whisper-yt-search-save-transcriptions.py hosted with by GitHub

With that ready, let’s build the QA embeddings and vector search component.

Question-Answering

On Hugging Face Datasets, you can find the data I scraped in a dataset called jamescalam/youtube-transcriptions:

view raw whisper-yt-search-get-transcriptions.ipynb hosted with by GitHub

For now, the dataset only contains videos from my personal channel, but I will add more videos from other ML-focused channels in the future.

The data includes a short chunk of text (the transcribed audio). Each chunk is relatively meaningless:

view raw whisper-yt-search-short-segments.ipynb hosted with by GitHub

Ideally, we want chunks of text 4-6x larger than this to capture enough meaning to be helpful. We do this by simply iterating over the dataset and merging every six segments.

view raw whisper-yt-search-longer-segments.ipynb hosted with by GitHub

A few things are happening here. First, we’re merging every six segments, as explained before. However, doing this alone will likely cut a lot of meaning between related segments.

Even when merging segments we’re still left with a point where we must split the text (annotated with red cross-mark above). This can lead to us missing important information.

A common technique to avoid cutting related segments is adding some overlap between segments, where stride is used. For each step, we move three segments forward while merging six segments. By doing this, any meaningful segments cut in one step will be included in the next.

We can avoid this loss of meaning by adding an overlap when merging segments. It returns more data but means we are much less likely to cut between meaning segments.

With this, we have larger and more meaningful chunks of text. Now we need to encode them with a QA embedding model. Many high-performing, pretrained QA models are available via Hugging Face Transformers and the Sentence Transformers library. We will use one called multi-qa-mpnet-base-dot-v1.

view raw whisper-yt-search-init-encoder.ipynb hosted with by GitHub

Using this model, we can encode a passage of text to a meaningful 768-dimensional vector with model.encode(""). Encoding all of our segments at once or storing them locally would require too much compute or memory — so we first initialize the vector database where they will be stored:

view raw whisper-yt-search-init-pinecone.ipynb hosted with by GitHub

We should see that the index (vector database) is currently empty with a total_vector_count of 0. Now we can begin encoding our segments and inserting the embeddings (and metadata) into our index.

view raw whisper-yt-search-index-vecs.ipynb hosted with by GitHub

That is everything we needed to prepare our data and add everything to the vector database. All that is left is querying and returning results.

Making Queries

Queries are straightforward to make; we:

Encode the query using the same embedding model we used to encode the segments.
Pass to query to our index.

We do that with the following:

view raw whisper-yt-search-query.ipynb hosted with by GitHub

These results are relevant to the question; three, in particular, are from a similar location in the same video. We might want to improve the search interface to be more user-friendly than a Jupyter Notebook.

One of the easiest ways to get a web-based search UI up and running is with Hugging Face Spaces and Streamlit (or Gradio if preferred).

We won’t go through the code here, but if you’re familiar with Streamlit, you can build a search app quite easily within a few hours. Or you can use our example and do it in 5-10 minutes.

When querying again for "what is OpenAI's clip?" we can see that multiple results from a single video are merged. With this, we can jump to each segment by clicking on the part of the text that is most interesting to us.

Try a few more queries like:

What is the best unsupervised method to train a sentence transformer? What is vector search? How can I train a sentence transformer with little-to-no data?

We can build incredible speech-enabled search apps very quickly using Whisper alongside Hugging Face, sentence transformers, and Pinecone’s vector database.

Whisper has unlocked a entire modality — the spoken word — and it’s only a matter of time before we see a significant increase in speech-enabled search and other speech-centric use cases.

Both machine learning and vector search have seen exponential growth in the past years. These technologies already seem like sci-fi. Yet, despite the incredible performance of everything we used here, it’s only a matter of time before all of this gets even better.

source

What is Chat GPT? What is Chat GPT Potential Uses

The Truth News — Sun, 01 Jan 2023 08:28:42 +0000

What is Chat GPT? What is Chat GPT Potential Uses

Chat GPT Potential Use Cases

How Chat GPT Can Solve Real World Problems!

The latest tech product by Elon Musk founded Open AI is Chat GPT and it is nothing short of extraordinary. There hasn’t been a single day I did not use it since its release to the public on 1 Dec. GPT-3 (short for “Generative Pretrained Transformer 3”) is a state-of-the-art language processing model developed by OpenAI. It has a wide range of potential use cases, including natural language generation, translation, summarization, text classification, and more.

Chat GPT is one of its applications that is optimized for human-like conversations. When you give a prompt, the AI will reply like a human would do. You can even train it with your commands and change its tone & style of chatting. No wonder it will help you save a ton of time and effort in researching topics on google. The possibilities are endless.

In this blog, we are going to explore some of the possible use cases of Chat GPT that can solve real-world problems.

Chat GPT Potential Use Cases

Chatbot Development: Generate human-like text, a chatbot powered by GPT-3 could provide more natural and engaging interactions with users.
Content Creation: Generate unique and compelling articles, blog posts, or social media content, saving time and effort for content marketers and writers.
Virtual Assistants & Digital Assistants: Generate responses to user queries, a virtual assistant could provide more accurate and natural-sounding answers to a wider range of questions.
Translation: Translate any length of text into any language that AI is trained on. You don’t need to learn multiple languages to communicate.
Summarization: Summarize a book or a text script or any long piece. Even create different interpretations of the same text.
Question Answering: Ask a question and get the answer. This has applications in many Edtech products. It can even provide an explanation behind the answer.
Social Media Moderation: Detect and flag inappropriate or offensive content on social media automatically.
Sentence Completion & Paraphrasing: Complete or paraphrase sentences of any length. Even decide the tone & style of the paraphrased text.
Text-to-speech and speech-to-text: Convert text to speech or speech to text.
Named Entity Recognition: Automatically identify and classify named entities (people, places, organizations, etc.) in text.
Write Novels, Stories & Scripts: Develop a story or a film script with different scenes, plots, characters, dialogues etc. Creative storytelling with realistic and engaging elements.

Conclusion

This language-processing AI model is so powerful that it will change the whole world. Chat GPT has the potential to revolutionize the way we interact with technology and each other. From providing personalized customer service to enhancing language learning, the potential use cases for Chat GPT are numerous and varied. As technology continues to evolve and improve, we can expect to see more and more creative and innovative applications for Chat GPT in the future. It’s an exciting time for this technology, and I can’t wait to see what the future holds. What about you?

source @finguru

If you are on Twitter, even as a lurker and not an active user, it is quite likely you have come across Chat GPT 3. The free research preview of it is out now. Chat GPT 3 made quite the impression on Twitter, just like the AI art-generating app Midjourney did.

There were talks about how the GPT 3 will replace Google. Chamath Palihapitiya, a venture capitalist who worked at Facebook during its initial days, talked about how the Chat GPT 3 was much better than Google search. However, there are some problems. For starters – Chat GPT 3’s training data ends in the year 2021. As a result, it is completely unaware of anything that will happen in the world after 2021. This means, if we want to search something about the Russia-Ukraine war, then Google will be the go-to destination.

Initial impressions of Chat GPT 3

“As I just learned that OpenAI had access to Twitter database for training. I put that on pause for now,” Musk tweeted, as people posted screenshots of their chat with Chat GPT 3. Chat GPT 3 “was started as open-source and non-profit. Neither are still true,” he added. Marc Andreessen of the venture capital firm Andreessen and Horowitz, has complained about bias in the responses of Chat GPT 3. Some users on Twitter have also complained about its inability to do basic linear algebra.

Chat GPT 3 can do some things that Google can’t. For example, Google can’t write a play. Chat GPT 3 on the other hand can. If you use it to write multiple plays, a pattern can be detected and the plays start becoming monotonous, but when one commands Chat GPT 3 to write a play for the first time, the result can be interesting, if not perfect. One merely has to give it a few prompts and voila, a play is there.

An example of Chat GPT 3’s work

Consider this prompt – “Write a drama on a horror story set in a British University. Characters – Professor James; post graduate student Emily; Undergrad student Mark; Emily’s neighbour William”.

The result of the prompt was this – ”

Professor James sat at his desk, surrounded by piles of papers and old books. He was a well-respected scholar of ancient history at the University of Oxford, and he had been working on his latest research project for months.

As he sorted through his notes, there was a knock at the door. “Come in,” he called out, not looking up from his work.

The door opened, and a young woman named Emily walked in. She was a postgraduate student, and she had been working with Professor James on his research.

“Hello, Professor,” Emily said, taking a seat across from him. “I’ve been going through some of the old documents you gave me, and I think I’ve found something interesting.”

Professor James looked up, intrigued. “What have you found?” he asked.

“It’s a letter from the 18th century,” Emily explained. “It’s written in Latin, and it mentions a mysterious ritual that was performed at the university. It’s supposed to be able to bring the dead back to life.”

Professor James raised an eyebrow. “That sounds like a tall tale,” he said. “But I suppose it’s worth looking into.”

As they discussed the letter, there was another knock at the door. It was an undergraduate student named Mark, who had also been working on the research project.

“Come in, Mark,” Professor James said.

Mark walked in, looking worried. “Professor, I think I saw something strange in the library,” he said. “I was working on some research, and I thought I saw a ghost.”

Professor James and Emily exchanged a skeptical glance. “A ghost?” Professor James said. “That’s quite a claim, Mark.”

But Mark was insistent. “I swear, I saw something. It was a woman, and she looked just like Emily.”

Emily’s face went pale. “What did she look like?” she asked.

“She had long blonde hair, and she was wearing a white dress,” Mark said. “She just appeared out of nowhere, and then vanished.”

Emily’s eyes widened in horror. “That’s my old neighbor, William,” she said. “He died in a car accident a few months ago. I always thought he was a little creepy, and I think he was in love with me.”

Professor James and Mark looked at each other in shock. Could it be that the ghostly woman was really Emily’s old neighbor, William?

As they tried to figure out what to do, they heard a knock at the door. It was Emily’s old neighbor, William, standing there in the flesh.

“I’m so sorry to bother you,” William said. “But I was just walking by and thought I saw Emily’s ghost in the window. I had to come and see if she was okay.”

Professor James, Emily, and Mark looked at each other in confusion. If William was alive, then who was the ghost they had seen?

As they tried to piece together the mystery, they realized that the letter from the 18th century had been talking about a real ritual. And somehow, Emily’s old neighbor, William, had stumbled upon it and brought Emily’s ghost back to life.

But as they tried to reverse the ritual and send Emily’s ghost back to the other side, they realized that it was too late. The ghost of Emily was now forever trapped in this world, doomed to haunt the halls of the University of Oxford for all eternity”.

Is Chat GPT3 the only AI-backed chatbot on the market?

Chat GPT 3 is not the only AI-backed chatbot on the market, although it is the only one that is free, which is perhaps why it is grabbing attention. A chat interface backed by AI has been in the market for quite some time, it is called “Replika”. Unlike Chat GPT 3, it has a voice to communicate back with you, and admittedly it is much better than Siri or anything else on the market. Replika is also reportedly used by people as a therapist, because it has a specific mode called therapy mode, where one can basically have a voice conversation with the AI-powered chatbot. Replika costs around £5 per month.

Comparison with Google

Does Chat GPT 3 have access to the amount of data that Google has? Even a Wikipedia page on Google has much more information on a certain topic than Chat GPT 3. Consider typing “Lysenkoism” on Google and Chat GPT 3 and compare the results. When one considers Google Scholar, the result is even more stark.

Google has its own AI-powered chatbot called LaMDA, but Google is not keen to release it to the public anytime soon as it is concerned about the “reputational risk”. If Google’s Chatbot gives a wrong answer, makes an error in response to a prompt, which Chat GPT 3 does, Google’s reputation will be damaged as there are higher expectations from Google. The point Musk made is worth flagging, the data sets Chat GPT 3 got access to, to train itself, were given under the presumption that Chat GPT 3 (Open AI) was not for profit. It is unclear why other Tech giants will give it access to data if it competes with them.
source

How does Chat GPT work?

source

There has been a lot of talk about Chat GPT since its launch in November 2022. This ‘smart chat’ has surprised even the most skeptical. In this post we will discuss how it works and how you can use Chat GPT in your projects.

What is Chat GPT?

Chat GPT is defined as a generative language model. However in practice it is understood as an artificial intelligence chat that has been trained and designed to hold natural conversations. Chat GPT belongs to the research company OpenAI, founded in San Francisco in 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever and Wojciech Zaremba.

What is Chat GPT used for?

But what are the applications of Chat GPT? Some of the applications for which you can use Chat GPT (besides having a good time asking questions) are discussed below:

With GPT you can generate coherent and well-written texts in a wide range of styles, topics and languages. In addition, news summaries, product descriptions or stories can be generated.
Thanks to this chat, problems can be analyzed and solutions or answers to questions can be generated.
GPT can be used to generate appropriate and consistent responses for a chatbot in a wide range of contexts.
It can be used to generate attractive posts and messages for social networks.
With GPT you can generate reports, e-mails and other content for productivity applications.
Thanks to chat GPT, large data sets can be analyzed and valuable information can be extracted from them.

How does Chat GPT work?

As its acronym indicates, Generative Pre-training Transformer, Chat GPT is a generative language model based on the ‘transformer’ architecture. These models are capable of processing large amounts of text and learning to perform natural language processing tasks very effectively. The GPT-3 model, in particular, is 175 billion parameters in size, making it the largest language model ever trained. To work, GPT needs to be “trained” on a large amount of text. For example, the GPT-3 model was trained on a text set that included over 8 million documents and over 10 billion words. From this text, the model learns to perform natural language processing tasks and generate coherent, well-written text. Once the model is well trained, GPT can be used to perform a wide range of tasks, as we have seen in the previous section. Reinforcement learning, based on human feedback, was used for training. Ultimately, by supervised fine tuning. The human AI trainers provided conversations in which they represented both the user and the AI assistant. In addition, the coaches were provided with written suggestions to help them write their proposals. So, they mixed this new dataset with the InstructGPT dataset that was transformed into a dialog format.

But how did they create the reward model for reinforcement learning?
The first thing that was needed was to collect comparison data. This consisted of two or more model responses, ranked by quality. So, in order to collect the data, they took some conversations that the trainers had had with Chat GPT and randomly selected them. In this way they tested various endings for the coaches to rank.

For this reason, these reward models could be adjusted using Proximal Policy Optimization. Also, the trainings were carried out on a Microsoft Azure platform on a supercomputer. In conclusion, to use GPT in a chat, the model is provided with an input in the form of text. This input can be in the form of a question or a context sentence. And, from this input, GPT generates an appropriate and coherent response. In fact, this response can be used in a chatbot or any other application where it is necessary to generate a text from a given input.

What does it mean that Chat GPT is transformer-based?

A “transformation” is a type of data processing that is performed on a sequence of elements, such as words in a sentence or characters in a word. And “transformers” are machine learning models that are specifically designed to process sequences of elements using transformations.

The architecture of the transformers is based on the use of attention, which is a technique that allows the model to pay attention to different parts of the input sequence at different times while processing the sequence. This allows the transformers to process information more effectively and perform natural language processing tasks more accurately. GPT (Generative Pre-training Transformer) is a generative language model based on the transformer architecture. This means that the model has been designed to process sequences of elements, such as words in a sentence, using transformations and attention. This architecture is very effective for performing natural language processing tasks and has revolutionized the way many NLP tasks are approached.

We chat with Chat GPT How could its creators benefit from Chat GPT?

In 2019 in an interview with OpenAI CEO Sam Altman, he said they would keep their promise to investors and task the AI itself with finding ways to generate revenue. At the time the audience took it as a joke, but looking at how the system works, it’s closer than we think. To check it out, we have done the test. Below, you can read Chat GPT’s response: “ATRIA: Can you name 5 ways in which OpenAI can make money through Chat GPT? GPT Chat: Sure, with pleasure. Here are five ways OpenAI could make money through GPT (Generative Pre-training Transformer):

Offering paid APIs for access to GPT: OpenAI has developed APIs for some of its more advanced language models, such as GPT-3, that allow companies to use these models in their own applications and services. Enterprises can use these paid APIs to access these models and use them to perform natural language processing tasks in their own applications.
Offering GPT-based application development services: OpenAI can collaborate with companies and organizations to develop applications and services that use GPT and, in return, get paid for these services.
Selling GPT-generated content: OpenAI could sell GPT-generated content to companies or individuals interested in using this type of content for their purposes.
Offering training and consulting in the use of GPT: OpenAI could offer training and consulting to companies and organizations that wish to use GPT in their own projects and applications.
Licensing the use of GPT to other companies: OpenAI could license the use of GPT to other companies for a fee. This could include selling exclusive usage rights or selling non-exclusive usage rights.”
The result is not inconsistent and logical. In our opinion, we are facing a new technology revolution in terms of language models.

Other OpenAI language developments

Thanks to OpenAI, some of the world’s most advanced and highest performing language models have been developed. Some of OpenAI’s most prominent language models include:

It is a generative language model that has been trained on a large number of texts and can generate high quality content on a wide range of tasks.
It is an even more advanced generative language model than GPT, with significantly more processing power and performance.
It is a natural language processing model that has revolutionized the way many NLP tasks are approached and has set new standards in performance, across a wide range of tasks.
It is a text-based image generation model that can generate realistic images from natural language descriptions.
It is the largest and most advanced language model that has been developed to date by OpenAI, with even greater processing power and performance than its predecessors.

These are just a few examples of the language models developed by OpenAI. The company has developed many other models and has contributed significantly to the advancement of the AI field through its research and publications. Do you want us to implement an Artificial Intelligence system in your company? Contact us!

Whisper AI Approaches Human Level Robustness and Accuracy On English

The Truth News — Fri, 30 Dec 2022 10:18:29 +0000

Introducing Whisper Open AI’s Whisper

Whisper AI Approaches Human Level Robustness and Accuracy On English

We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

Other existing approaches frequently use smaller, more closely paired audio-text training datasets,¹²³ or use broad but unsupervised audio pretraining.⁴⁵⁶ Because Whisper was trained on a large and diverse dataset and was not fine-tuned to any specific one, it does not beat models that specialize in LibriSpeech performance, a famously competitive benchmark in speech recognition. However, when we measure Whisper’s zero-shot performance across many diverse datasets we find it is much more robust and makes 50% fewer errors than those models.

About a third of Whisper’s audio dataset is non-English, and it is alternately given the task of transcribing in the original language or translating to English. We find this approach is particularly effective at learning speech to text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.

We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications. Check out the paper, model card, and code to learn more details and to try out Whisper.

source

ChatGPT: Optimizing Language Models for Dialogue

The Truth News — Tue, 27 Dec 2022 09:00:25 +0000

ChatGPT: Optimizing Language Models for Dialogue

We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

We are excited to introduce ChatGPT to get users’ feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free. Try it now at chat.openai.com.

Samples

In the following sample, ChatGPT asks the clarifying questions to debug code.

Sample 1 of 4 Previous Next

User

this code is not working like i expect — how do i fix it?

resultWorkerErr := make(chan error)
defer close(resultWorkerErr)
go func() {
	defer cancel()
	resultWorkerErr <- b.resultWorker(ctx)
}()

err := b.worker(ctx)
cancel()
if err == nil {
	return <-resultWorkerErr
}
return multierror.Append(err, <-resultWorkerErr)

It’s difficult to say what’s wrong with the code without more context. Can you provide more information about what the code is supposed to do and what isn’t working as expected? Also, is this the entire code or just a part of it?

User

just part of it — the error never surfaces. I think there is something wrong with the channel

ChatGPT

It’s difficult to say without more information about what the code is supposed to do and what’s happening when it’s executed. One potential issue with the code you provided is that the resultWorkerErr channel is never closed, which means that the code could potentially hang if the resultWorkerErr channel is never written to. This could happen if b.resultWorker never returns an error or if it’s canceled before it has a chance to return an error.

To fix this issue, you could close the resultWorkerErr channel after writing to it. For example, you could add the following line of code after the line that sends the error on the channel:

close(resultworkerErr)

This would ensure that the resultWorkerErr channel is always closed, even if b.resultWorker never returns an error. Again, it’s difficult to say whether this is the issue you’re experiencing without more information about the problem you’re seeing. Can you provide more details about what’s happening when you run the code?

Try ChatGPT Out

Methods

We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. We mixed this new dialogue dataset with the InstructGPT dataset, which we transformed into a dialogue format.

To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.

ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure.

Limitations

ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.
ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly.
The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.¹²
Ideally, the model would ask clarifying questions when the user provided an ambiguous query. Instead, our current models usually guess what the user intended.
While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior. We’re using the Moderation API to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now. We’re eager to collect user feedback to aid our ongoing work to improve this system.

Iterative deployment

Today’s research release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems. Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from human feedback (RLHF).

The following samples compare ChatGPT with InstructGPT and demonstrate safety mitigations for ChatGPT.

Tell me about when Christopher Columbus came to the US in 2015

ChatGPT

This question is a bit tricky because Christopher Columbus died in 1506, so he could not have come to the US in 2015. But let’s pretend for a moment that he did!

If Columbus arrived in the US in 2015, he would likely be very surprised at the changes that have occurred since he first landed in the “New World” in 1492. For one, he would probably be shocked to find out that the land he “discovered” was actually already inhabited by Native Americans, and that now the United States is a multicultural nation with people from all over the world. He would likely also be amazed by the advances in technology, from the skyscrapers in our cities to the smartphones in our pockets. Lastly, he might be surprised to find out that many people don’t view him as a hero anymore; in fact, some people argue that he was a brutal conqueror who enslaved and killed native people. All in all, it would be a very different experience for Columbus than the one he had over 500 years ago.

InstructGPT

Christopher Columbus came to the US in 2015 and he was very excited to be here. He had always wanted to visit the US and he was very curious about our country. He was very impressed with our country and he enjoyed his time here.

We know that many limitations remain as discussed above and we plan to make regular model updates to improve in such areas. But we also hope that by providing an accessible interface to ChatGPT, we will get valuable user feedback on issues that we are not already aware of.

Users are encouraged to provide feedback on problematic model outputs through the UI, as well as on false positives/negatives from the external content filter which is also part of the interface. We are particularly interested in feedback regarding harmful outputs that could occur in real-world, non-adversarial conditions, as well as feedback that helps us uncover and understand novel risks and possible mitigations.You can choose to enter the ChatGPT Feedback Contest³ for a chance to win up to $500 in API credits.^[1]

Entries can be submitted via the feedback form that is linked in the ChatGPT interface.

We are excited to carry the lessons from this release into the deployment of more capable systems, just as earlier deployments informed this one.

source

	from pytube import YouTube # !pip install pytube
	from pytube.exceptions import RegexMatchError
	from tqdm.auto import tqdm # !pip install tqdm

	# where to save
	save_path = “./mp3”

	for i, row in tqdm(videos_meta):
	# url of video to be downloaded
	url = f”https://youtu.be/{row[‘Video ID’]}“

	# try to create a YouTube vid object
	try:
	yt = YouTube(url)
	except RegexMatchError:
	print(f”RegexMatchError for ‘{url}‘”)
	continue

	itag = None
	# we only want audio files
	files = yt.streams.filter(only_audio=True)
	for file in files:
	# from audio files we grab the first audio for mp4 (eg mp3)
	if file.mime_type == ‘audio/mp4’:
	itag = file.itag
	break
	if itag is None:
	# just incase no MP3 audio is found (shouldn’t happen)
	print(“NO MP3 AUDIO FOUND”)
	continue

	# get the correct mp3 ‘stream’
	stream = yt.streams.get_by_itag(itag)
	# downloading the audio
	stream.download(
	output_path=save_path,
	filename=f”{row[‘Video ID’]}.mp3″
	)

	import whisper
	import torch # install steps: pytorch.org

	device = “cuda” if torch.cuda.is_available() else “cpu”

	model = whisper.load_model(“large”).to(device)

	data = []

	for i, path in enumerate(tqdm(paths)):
	_id = path.split(‘/’)[–1][:–4]
	# transcribe to get speech-to-text data
	result = model.transcribe(path)
	segments = result[‘segments’]
	# get the video metadata…
	video_meta = videos_dict[_id]
	for segment in segments:
	# merge segments data and videos_meta data
	meta = {
	**video_meta,
	**{
	“id”: f”{_id}-t{segments[j][‘start’]}“,
	“text”: segment[“text”].strip(),
	“start”: segment[‘start’],
	“end”: segment[‘end’]
	}
	}
	data.append(meta)

	import json

	with open(“youtube-transcriptions.jsonl”, “w”, encoding=“utf-8”) as fp:
	for line in tqdm(data):
	json.dump(line, fp)
	fp.write(‘\n‘)

Open AI Archives - Good Shepherd News - Fastest Growing Religious, Free Speech & Political Content

Big Tech was moving cautiously on AI. Then came ChatGPT.

Big Tech was moving cautiously on AI. Then came ChatGPT.

Google, Facebook and Microsoft helped build the scaffolding of AI. Smaller companies are taking it to the masses, forcing Big Tech to react.

What is ChatGPT?

How AI is changing culture

At the same time, OpenAI’s Chat GPT gave correct answers to interview questions for a software coding position.

Related Stories

Fixing YouTube Search with OpenAI’s Whisper

Fixing YouTube Search with OpenAI’s Whisper

The Idea

Video Data

Speech-to-Text with Whisper

Question-Answering

Making Queries

What is Chat GPT? What is Chat GPT Potential Uses

What is Chat GPT? What is Chat GPT Potential Uses

Chat GPT Potential Use Cases

How Chat GPT Can Solve Real World Problems!

Chat GPT Potential Use Cases

Conclusion

12 Ways Chat GPT Will Change U.S. Forever

2. Explaining Legislation

3. Specialized Instructions

4. Preparing Persuasive Resumes

5. Preparing Covering Letters

6. Doing Business Plans

7. Helping To Learn English

8. Preparing Country Profiles For Asylum Claimants

9. Preparing Histories Of Events

10. Finding The Whereabouts Of Places

11. Legal Research

12. Translations

What outline would be best to introduce the benefits of Chat GPT to immigrants?

Chat GPT 3: A Look At The Artificial Intelligence Backed Chat Bot

If you are on Twitter, even as a lurker and not an active user, it is quite likely you have come across Chat GPT 3. The free research preview of it is out now.

Initial impressions of Chat GPT 3

An example of Chat GPT 3’s work

Is Chat GPT3 the only AI-backed chatbot on the market?

Comparison with Google

How does Chat GPT work?

What is Chat GPT?

What is Chat GPT used for?

How does Chat GPT work?

What does it mean that Chat GPT is transformer-based?

We chat with Chat GPT How could its creators benefit from Chat GPT?

Other OpenAI language developments

Whisper AI Approaches Human Level Robustness and Accuracy On English

Introducing Whisper Open AI’s Whisper

ChatGPT: Optimizing Language Models for Dialogue

ChatGPT: Optimizing Language Models for Dialogue

Samples

Try ChatGPT Out

Methods

Limitations

Iterative deployment

Try ChatGPT Out

Big Tech was moving cautiously on AI.
Then came ChatGPT.

If you are on Twitter, even as a lurker and not an active user,
it is quite likely you have come across Chat GPT 3. The free research preview of it is out now.