In all of our previous tasks, we were training a neural network to perform a certain task using labeled dataset. With large transformer models, such as BERT, we use language modelling in self-supervised fashion to build a language model, which is then specialized for specific downstream task with further domain-specific training. However, it has been demonstrated that large language models can also solve many tasks without ANY domain-specific training. A family of models capable of doing that is called GPT: Generative Pre-Trained Transformer.
The idea of a neural network being able to do general tasks without downstream training is presented in Language Models are Unsupervised Multitask Learners paper. The main idea is the many other tasks can be modeled using text generation, because understanding text essentially means being able to produce it. Because the model is trained on a huge amount of text that encompasses human knowledge, it also becomes knowledgeable about wide variety of subjects.
Understanding and being able to produce text also entails knowing something about the world around us. People also learn by reading to the large extent, and GPT network is similar in this respect.
Text generation networks work by predicting probability of the next word $$P(w_N)$$ However, unconditional probability of the next word equals to the frequency of the this word in the text corpus. GPT is able to give us conditional probability of the next word, given the previous ones: $$P(w_N | w_{n-1}, ..., w_0)$$
You can read more about probabilities in our Data Science for Beginers Curriculum
Quality of language generating model can be defined using perplexity. It is intrinsic metric that allows us to measure the model quality without any task-specific dataset. It is based on the notion of probability of a sentence - the model assigns high probability to a sentence that is likely to be real (i.e. the model is not perplexed by it), and low probability to sentences that make less sense (eg. Can it does what?). When we give our model sentences from real text corpus, we would expect them to have high probability, and low perplexity. Mathematically, it is defined as normalized inverse probability of the test set: $$ \mathrm{Perplexity}(W) = \sqrt[N]{1\over P(W_1,...,W_N)} $$
You can experiment with text generation using GPT-powered text editor from Hugging Face. In this editor, you start writing your text, and pressing [TAB] will offer you several completion options. If they are too short, or you are not satisfied with them - press [TAB] again, and you will have more options, including longer pieces of text.
GPT is not a single model, but rather a collection of models developed and trained by OpenAI.
Under the GPT models, we have:
GPT-2 | GPT 3 | GPT-4 |
---|---|---|
Language model with upto 1.5 billion parameters. | Language model with up to 175 billion parameters | 100T parameters and accepts both image and text inputs and outputs text. |
The GPT-3 and GPT-4 models are available as a cognitive service from Microsoft Azure, and as OpenAI API.
Because GPT has been trained on a vast volumes of data to understand language and code, they provide outputs in response to inputs (prompts). Prompts are GPT inputs or queries whereby one provides instructions to models on tasks they next completed. To elicit a desired outcome, you need the most effective prompt which involves selecting the right words, formats, phrases or even symbols. This approach is Prompt Engineering
This documentation provides you with more information on prompt engineering.
Continue your learning in the following notebooks:
New general pre-trained language models do not only model language structure, but also contain vast amount of natural language. Thus, they can be effectively used to solve some NLP tasks in zero-shop or few-shot settings.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。