SignalPlus: An Introduction to Generative AI

Original Author: Steven Wang

SignalPlus: Introduction to Generative AI

“What I cannot create, I do not understand.”

-Richard Feynman

Preface

You embrace Stable Diffusion and MidJourney to create stunning images.

You are proficient in using ChatGPT and LLaMa to create elegant words.

You switch back and forth between MuseNet and MuseGAN to create mountain music.

Undoubtedly, the most unique ability of human beings is to create, but in today's ever-changing technology, we create by creating machines! A machine can draw original artwork (draw), compose a long coherent article (write), compose melodious music (compose), and formulate winning strategies for complex games (play), given a style. This technology is Generative Artificial Intelligence (Generative Artificial Intelligence, GenAI), now is just the beginning of the GenAI revolution, now is the best time to learn GenAI.

1. Generating and Discriminating Models

GenAI is a buzzword, the essence behind it is generative model (generative model), which is a branch of machine learning, the goal is to train the model to generate new data similar to the given data set.

Suppose we have a dataset of horses. First, we can train a generative model on this dataset to capture the rules governing the complex relationships between pixels in horse images. This model is then sampled to create realistic images of horses that did not exist in the original dataset, as shown in the figure below.

SignalPlus: Introduction to Generative AI

To truly understand the goals and importance of a generative model, it is necessary to compare it to a discriminative model. In fact, most problems in machine learning are solved by discriminative models, see the following examples.

Suppose we have a dataset of paintings, some by Van Gogh and some by other artists. With enough data, we can train a discriminative model to predict whether a given painting is by Van Gogh, as shown in the figure below.

SignalPlus: Introduction to Generative AI

When using a discriminative model, each example in the training set has a label (label). For the above two-category problem, the label of Van Gogh's painting is usually 1, and the label of non-Van Gogh's painting is 0. In the figure above, the final predicted probability of the model is 0.83, so it is very likely that it was made by Van Gogh. Unlike a discriminative model, a generative model does not require examples to contain labels because its goal is to generate new data, not to predict labels for data.

After reading the example, let us use mathematical notation to precisely define the generative model and the discriminative model:

  • The discriminative model models P(y|x), given features x to estimate the conditional probability of label y.
  • The generation model models P(x), directly estimates the probability of feature x, and samples from this probability distribution to generate new features.

Note that even if we were able to build a perfect discriminative model for identifying paintings by Van Gogh, it would still not know how to create a painting that looks like Van Gogh, it would only output a probability of whether the image is from Van Gogh hand possibility. It can be seen that generative models are much more difficult than discriminative models.

2. Generate the framework of the model

Before getting into the generative model framework, let's play a game. Assuming that the points in the figure below are generated by some kind of rule, we call this rule Pdata, now let you generate a different x = (x 1, x 2) so that this point looks like Generated by the same rules Pdata.

SignalPlus: Introduction to Generative AI

How would you generate this point? You may use the given points to generate a model Pmodel in your mind, and the points you want may be generated at the positions occupied by this model. It can be seen that the model Pmodel is the estimate of Pdata. Then the simplest model Pmodel is the orange box in the figure below. Points can only be generated inside the box, but not outside the box.

SignalPlus: Introduction to Generative AI

To generate a new point, we can randomly pick a point from the box, or more rigorously, sample from the model Pmodel distribution. This is a minimalist generative model. You create a model (orange box) from the training data (black dots), and then you sample from the model, hoping that the generated points look similar to the points in the training set.

Now we can formally propose a framework for generative learning.

SignalPlus: Introduction to Generative AI

Let us now expose the real data generating distribution Pdata and see how the above framework can be applied to this example. From the figure below, we can see that the data generation rule Pdata is that the points are only uniformly distributed on the land, and will not appear in the ocean.

SignalPlus: Introduction to Generative AI

Clearly, our model Pmodel is a simplification of the rule Pdata. Examining points A, B, and C in the above figure can help us understand whether the model Pmodel successfully imitates the rule Pdata.

  • Point A does not conform to the rule Pdata because it appears in the sea, but may be generated by the model Pmodel because it appears inside the orange box.
  • Point B cannot possibly be generated by the model Pmodel because it appears outside the orange box, but conforms to the rule Pdata because it appears on land.
  • Point C is generated by the model Pmodel, and conforms to the rule Pdata.

This example shows the basic concepts behind generative modeling. Although it is much more complicated to use generative models in reality, the basic framework is the same.

3. The first generative model

Suppose you are the Chief Fashion Officer (CFO) of a company and your job is to create new trendy clothes. This year you received 50 data sets about fashion collocations (as shown below), and you need to create 10 new fashion collocations.

SignalPlus: Introduction to Generative AI

Although you are the chief fashion officer, you are also a data scientist, so you decide to use generative models to solve this problem. After reading the above 50 pictures, you decide to use five features, accessories type (accessies type), clothing color (clothing color), clothing type (clothing type), hair Color (hair color) and hair type (hair type), to describe fashion collocation.

The top 10 image data features are as follows.

SignalPlus: Introduction to Generative AI

Each feature also has a different number of eigenvalues:

  • 3 types of accessories (accessies type):

Blank, Round, Sunglasses

  • 8 clothing colors:

Black, Blue 01, Gray 01, PastelGreen, PastelOrange, Pink, Red, White

  • 4 clothing types:

Hoodie, Overall, ShirtScoopNeck, ShirtVNeck

  • 6 hair colors:

Black, Blonde, Brown, PastelPink, Red, SilverGray

  • 7 hair types:

NoHair, LongHairBun, LongHairy, LongHairStraight, ShortHairShortWaved, ShortHairShortFlat, ShortHairFrizzle

In this way, there are 3 * 8 * 4 * 6 * 7 = 4032 feature combinations, so it can be imagined that the sample space contains 4032 points. From the 50 data points given, it can be seen that Pdata prefers certain feature values for different features. It can be seen from the above table that there are more white clothing colors and silver-gray hair colors in the image. Since we don't know the real Pdata, we can only use these 50 data to build a Pmodel so that it can be similar to Pdata.

3.1 Minimalist model

One of the simplest methods is to assign a probability parameter to each point in the 4032 feature combinations, then the model contains 4031 parameters, because the sum of all probability parameters is equal to 1. Now let's check 50 data one by one, and then update the parameters of the model **(**θ 1 , θ 2 ,...,θ 4031 ), the expression of each parameter is :

SignalPlus: Introduction to Generative AI

Among them, N is the number of observed data, namely 50, and nj is the number of the jth feature combination appearing in 50 data.

For example, the feature combination (called combination 1) of (LongHairStraight, Red, Round, ShirtScoopNeck, White) appears twice, then

SignalPlus: Introduction to Generative AI

For example, if the feature combination (called combination 2) of (LongHairStraight, Red, Round, ShirtScoopNeck, Blue 01) does not appear, then

SignalPlus: Introduction to Generative AI

According to the above rules, we calculate a θ value for all 4031 combinations. It is not difficult to see that there are many θ values that are 0. What’s worse is that we cannot generate new Unseen images ( θ = 0 means no image with that combination of features has ever been observed). To fix this, simply add the total number of features, d, to the denominator and 1 to the numerator, a technique called Laplace smoothing.

SignalPlus: Introduction to Generative AI

Now, every combination (including those not in the original dataset) has a non-zero sampling probability, however this is still not a satisfactory generative model since the probability of a point not in the original dataset is a constant. If we try to use such a model to generate a painting of Van Gogh, it will operate on the following two paintings with equal probability:

  1. Reproductions of original Van Gogh paintings (not in the original dataset)
  2. Paintings made of random pixels (not in the original data set)

This is obviously not the generative model we want, we hope that it can learn some inherent structure from the data, so that it can increase the probability weight of the regions in the sample space that it thinks are more likely, instead of putting all the probability weights in the data set point of existence.

3.2 Subsimplified model

The Naive Bayes model (Naive Bayes) can greatly reduce the number of combinations of the above features, and according to its model, each feature is assumed to be independent of each other. Going back to the above data, a person's hair color (feature xj ) is not related to his clothes color (feature xk ), expressed in a mathematical expression is:

p(xj | xk) = p(xk)

With this assumption, we can calculate

SignalPlus: Introduction to Generative AI

The Naive Bayesian model simplifies the original problem "probability estimation for each feature combination" to "probability estimation for each feature". It turns out that we need to use 4031 ( 3 * 8 * 4 * 6 * 7) parameters, now only 23 ( 3 + 8 + 4 + 6 + 7) parameters are needed, and the expression of each parameter is:

SignalPlus: Introduction to Generative AI

Among them, N is the number of observed data, that is, 50, n*kl is the number of the kth feature and the *lth eigenvalue below it number.

Through the statistics of 50 data, the following table gives the parameter values of the Naive Bayesian model.

SignalPlus: Introduction to Generative AI

To calculate the probability of a model generating a data feature, just multiply the probabilities in the table above, for example:

SignalPlus: Introduction to Generative AI

The above combination did not appear in the original dataset, but the model still assigned it a non-zero probability, so it was still able to be generated by the model. Thus, Naive Bayesian models are able to learn some structure from the data and use it to generate new examples not seen in the original dataset. The picture below is a picture of 10 new fashion collocations generated by the model.

SignalPlus: Introduction to Generative AI

In this problem, only 5 features belong to low-dimensional data. It is reasonable for the Naive Bayesian model to assume that they are independent of each other, so the results generated by the model are not bad. Let's look at an example of model collapse.

4. Difficulties in generating models

4.1 High-dimensional data

As the chief fashion officer, you have successfully generated 10 new sets of fashion collocations using Naive Bayesian. You are so confident that your model is invincible until you encounter the following data set.

SignalPlus: Introduction to Generative AI

The data set is no longer represented by five features, but represented by 32* 32 = 1024 pixels, each pixel value can go to one of 0 to 255, 0 means white, 255 means black. The following table lists the values of pixels 1 to 5 for the first 10 images.

SignalPlus: Introduction to Generative AI

Use the same model to generate 10 sets of brand new fashion collocations. The following is the result of the model generation. Each is ugly and similar, and different features cannot be distinguished. Why is this so?

SignalPlus: Introduction to Generative AI

First of all, since the Naive Bayesian model samples pixels independently, adjacent pixels are actually very similar. For clothes, in fact, the pixels should be roughly the same, but the model is randomly sampled, so the clothes in the above picture are all colorful. Second, there are too many possibilities in a high-dimensional sample space, only a fraction of which are identifiable. If a Naive Bayesian model deals directly with such highly correlated pixel values, the chances of it finding a satisfactory combination of values are very small.

To sum up, for sample spaces with low dimensions and low correlation of features, Naive Bayesian effect is very good through independent sampling; but for sample spaces with high dimensions and high correlation of features, independent sampling pixels are used to find effective Human faces are almost impossible.

This example highlights two difficulties that generative models must overcome in order to be successful:

  1. How does the model handle conditional dependencies between high-dimensional features?
  2. How does the model find a very small proportion of observations that satisfy the condition from a high-dimensional sample space?

For generative models to be successful in high-dimensional and highly correlated sample spaces, deep learning models must be utilized. We need a model that can infer relevant structures from the data, rather than being told which assumptions to make ahead of time. Deep learning can form its own features in low-dimensional space, and this is a form of representation learning (representation learning).

4.2 Representation Learning

Representation learning is to learn the meaning of the representation of high-dimensional data.

Suppose you go to meet a netizen who has never met, and there are many people who can't find her at the meeting place, so you call her to describe your appearance. I believe you will not say that the color of pixel 1 in your image is black, the color of pixel 2 is light black, the color of pixel 3 is gray and so on. On the contrary, you will think that netizens will have a general understanding of the appearance of ordinary people, and then give this understanding to describe the characteristics of the pixel group, for example, you have short black and beautiful hair, wearing a pair of golden glasses and so on. Usually with no more than 10 such descriptions, a netizen can generate an image of you from his mind. The image may be rough, but it does not prevent the netizen from finding you among hundreds of people, even if she has never seen you.

This is the core idea behind representation learning, instead of trying to directly model the high-dimensional sample space (high-dimensional sample space), but using some low-dimensional latent space (low-dimensional latent space ) to describe each observation in the training set, and then learn a mapping function (mapping function), which can take a point in the latent space and map it to the original sample space. In other words, each point in the latent space represents a feature of the high-dimensional data.

If the above words are not easy to understand, please see the training set below consisting of some grayscale jar images.

SignalPlus: Introduction to Generative AI

It is not difficult to see that these jars can be described by only two characteristics: height and width. Therefore, we can convert the high-dimensional pixel space of the image into a two-dimensional latent space, as shown in the figure below. This way we can sample (blue dots) from the latent space and then convert it into an image via the mapping function f.

SignalPlus: Introduction to Generative AI

It is not easy for the machine to realize that the original data set can be represented by a simpler latent space. First, the machine needs to determine that height and width are the two latent space dimensions that best describe the data set, and then learn the mapping function f can take a point in this space and map it to a grayscale can map. Deep learning allows us to train machines to find these complex relationships without human guidance.

5. Classification of generated models

All types of generative models ultimately aim to solve the same task, but they all model density functions in slightly different ways, and generally fall into two categories:

  • explicitly modeling (explicitly modeling) the density function,

But somehow constrain the model in order to calculate the density function, like normalizing FLOW model(normalizing FLOW model)

But to approximate the density function, such as variational autoencoder (iational autoencoder, VAE) and diffusion model (diffusion model)

  • Implicitly modeling (implicitly modeling) the density function, through a stochastic process that directly generates data. For example, Generative adversarial network (generative adversarial network, GAN)

SignalPlus: Introduction to Generative AI

Summarize

Generative artificial intelligence (GenAI) is a type of artificial intelligence that can be used to create new content and ideas, including text, images, video, and music. Like all artificial intelligence, GenAI is a super-large model pre-trained by a deep learning model based on a large amount of data, often called foundation model (FM). With GenAI, we can draw more cool images, write more beautiful text, and compose more moving music, but the first step requires us to understand how GenAI creates new things, as the head of the article Richard Feynman said "I won't understand what I can't create".

View Original
  • Reward
  • Comment
  • Share
Comment
No comments