reverse engineering llms : transformer technology in a few words

how long does it take to get a serious algorithmic engineer to learn machine learning ? I would say a few years. From what I see as an AI outsider, LLM is only a subset of millions of available deep learning techniques. And within LLM, you find the transformer. but What is a transformer ?

Well Transformer paradigm seems to describe both the principle, as layed out by google brains around 2017, in their famous paper "Attention is all you need" , and the programs that run the idea, as popularized by openai's Chat GPT.

I first met transformers, not when I was a kid, but a few minutes ago on this introduction to LLM technology article from medium that divides the process into 10 steps, among which

by the way, thanks growtika for the nice 3D image. Exactly what I needed here to illustrate technical articles.

  • data gathering : prepare some serious amount of text, image, whatever, on the subject you want the machine to be trained. I suppose the data can be scraped from the web, but that really depends on your objective.
  • data tokenization : chunk text into small units, words, subwords, special characters, punctuation. Images can be tokenized too, by equal division of image into small square pixel sets, or in a smarter way into variable image subset depending on specific area detail , where low-saliency areas of the image are processed in low resolution, for instance on this mixed resolution tokenization model
  • choose transformer : this is the machine that will process your data !! examples of opensource transformers are Meta's PyTorch, Google's TensorFlow, or frenchies hugginface.
  • Then let the machine do the work.. there are many more steps to work on to have your LLM machine working. Can be run on you desktop if you have small set of data, or on the cloud if you need more power. Can be run from your own virgin data set, or be run on a pretrained data set...

on short, Transformer technology is a set of tools that do the data training work and process the prompt sentence evaluation function, all of this based on Neural Networks.

To top