books,  tech

I Forced an AI Bot to Write a Warrior Cats Book

If you know me, then you probably know that one of my biggest obsessions through elementary and middle school was the warrior cats book series by Erin Hunter. Part of the reason that I fell in love with this book series and the warrior cats universe is how much content there is to get through— with over 100 main-series books, novellas, and super-editions combined, there was a never-ending flow of content for little me to consume. I used to read every single new book as soon as it came out, and even though I haven’t been keeping up as closely since graduating middle school, the series still holds a very special place in my heart.

Since the popularity of Chat GPT, I thought it might be fun to see if I could train my own neural network with the specific task of generating text that mimics that of the warrior cats books. I watched Andrej Karpathy’s video on nanoGPT and decided to see if I could modify it to write warrior cats book, because you can never have too many warrior cats books, of course. I set my expectations pretty low, considering that I was running everything on my 2016 macbook pro.

I first used just the very first installment of the warrior cats series, Into the Wild, as the input txt file, and tokenized the input each character itself was considered its own token. My training set had 1,003,854 tokens, which left 111,540  tokens in my validation set with roughly a 90/10 split. I ran this for around 2000 iterations on my laptop, which produced some… interesting results…

“Ay, The other cats no share the sleep. Firepaw, their catching from below silence the warness Tigerclaw. The ThunderClan would for his paws!” Tigerclaw suched.

There was also a fair share of gibberish:

Do prowlected his eyes like him might. The main-andch as alongi0 confuly. He sharried his paws to her across eit.

Still, you could definitely see the influences of the characters and settings of the warrior cats books in the samples produced, so this was a decent start. I tried to improve the coherence of the generated content by increasing the input sample size by appending the second book in the series, Fire and Ice, to the input file, and also slightly increasing the number of iterations to 2500. The training loop ran for around 17 minutes, and the samples produced looked ever so slightllyyyy better.

Cinderpaw himself and took and qluack of here.

I’m still not sure what a qluack is…

Clan in the cats. The cats that had enemy with be reached and with it out.

This is a pretty good description of the book series as a whole.

I was still seeing a pretty large amount of unintelligible words though, so I thought that using OpenAI’s TIktoken tokenizer might be a better way of tokenizing the input text. Tiktoken uses byte pair encoding (BPE) to convert text into tokens; two significant advantages that this provides are that it compresses the text and also tries to allow the model to see common subwords when splitting words into tokens. Instead of splitting a word like “meowed” into 6 individual tokens, it might split it into just two, “meow” and “ed”, which could help the model to better understand such grammar structures.

In addition to changing the tokenization method, I also added a third book, Forest of Secrets, to the input text, and increased the number of iterations to 4500. Despite tripling the number of books represented in the input text, the number of tokens in the training set was reduced to 763,576 with the new tokenization method. This took around 4 hours to train on my laptop, so I would recommend either using a GPU or simply letting it run on your laptop overnight:)

I woke up the next morning to a nice surprise- a new warrior cats book generated just for me 😎

    The moon stood up at Yellowfang. The other cats was sitting at Lionheart’s den. “Bluestar always been more here.”
    “I was that—he knows,” Bluestar whispered, “There’s no need to patrol from ThunderClan?”
    Fireheart felt disappointed. He felt a kittypetet, but would be only left on the forest, but for her.
    “No, Yellowfang till she needs to leave, and you have been left their own battles in our territory.”
   “No,” Fireheart agreed. “But I’m not training for you’s.”
    “Who must be?” growled Tigerclaw.
    “It was the day we can only need to be an apprentice alone!” spat Graystripe went to Yellowfang.
   “Barley,” Sandpaw offered. “And I think you’ve got to her own to Tallstar’s territory.”
    Fireheart followed the RiverClan leader and looked thoughtful.

I’m sure I was also looking thoughtful, trying to understand what was going on here.

“Well?” Fireheart agreed. “He’s an accident!”

ouch.

Some other interesting excerpts:

“They are round,” Bluestar interrupted him. “You will take bringing back of them.”

   “You may be long kittypet!” spat Graystripe, sounding strong and haunches on her eyes. “I’m better check the whole Clan you.”

“That’s all right,” Tigerclaw meowed. “The dogs is very battle.”

Even though I’m not sure that what was produced is quality literature, it certainly is interesting to read— maybe I’ll be able to improve upon the model one day and produce the 115th installment of the series that would make Erin Hunter proud.

Until next time!

Leave a Reply

Your email address will not be published. Required fields are marked *