Machine Learning Assignment 8 – Aislynn’s Documentation Blog

Markov chain (either word- or character-level):

It is trained by splitting original text into groups and getting the possibility of the next word or letter.The first thing that distinguishes it from other methods is that it has no memory, so it cannot exist as an independent trained model but must have the input of samples before it can have the output.

Secondly, its output is in sequence: from one group to the next and then to the next, but maybe the direct relationship between the first and third group is no longer cared.

I think effectively, at least in my practice, it does a good job of retaining some sentences and some phrases. But repetitive sentences can occur.

Grammars (such as Tracery):

Grammars feel like complete manual writing of the frame of the sentence itself and then randomizing the components in the frame.It is supposed to be the simplest and the furthest from machine learning. It is a completely artificial way to disassemble the sentences inside the samples and find the common points between the sentences and the structure of the sentences. Therefore it is not universal, or it is already a specific trained model by the time it is written.

The beauty of Grammars is that it is completely controllable. You know exactly what the result will be when you write the code. So if you’re willing to spend enough time, it can be a brilliantly realistic piece of work.

What’s really amazing is that I find the process of making it is more like writing than coding, and it’s a very unique experience to write many similar pieces at once.

Character-level Recurrent Neural Networks (such as charRNN):

Finally, it’s time to talk about the machine learning models. The difference between this one and the first two is that it is logically much more complex, so it should work better by definition. And it can adjust the parameters, so it has more control space compared to Markov chain, and more free space compared to Grammar.

In practice, when it is given a very short sample, because it is at the character level, it is difficult to form words. So the result will be like this:

But if the input sample is long enough, it can also form the correct word, which I think is very impressive. But this time, I found it easy to form a very long sentence (without punctuation), and these words can easily make no sense grammatically. (Meaning it’s not the words themselves but the composition of the sentence that doesn’t make sense).

Large deep neural network language models (such as GPT-2):

Although I’m not entirely sure about the principles of Large deep neural network language models, they are clearly the most complete ones. Compared to the previous three, they can adjust more parameters, which means we have larger control over output. It also has a longer output compared to Character-level Recurrent Neural Networks.

From the result perspective, it has surprising accuracy. Although it may still be illogical in the sense of the vocabulary: for example, the place where the vocabulary should be an animal is an object. It has been possible to present the sample accurately from the perspective of syntax. As far as I know, Chuck’s article often has some abrupt affirmative phrases, and this part is perfectly reflected in the output. And it also succeeds in capturing many corresponding keywords. So throwing out the semantics, at first glance the overall style-wise, it works fascinatingly well.

So if I could run these mods successfully, I would be willing to use these. But compared to the previous models I can tune the output by adjusting the samples, with this type of model, I feel it may be more necessary to adjust the output by adjusting the parameters.

Leave a Reply Cancel reply