LLM | Home

Understanding Differential Attention.

Introduction Over the last few years, Transformers have emerged as the de-facto deep learning architecture in language models. Fundamentally changing the field of machine learning and Artificial intelligence as a whole. Their unprecendented success in solving complex language tasks, reasoning (or mimmicking it) in solving math and coding problems, have ushered in a new era in AI, powering successful AI products like ChatGPT. The key innovation of transformers lies in the self-attention mechanism, which allows each tokens in the input sequence to directly interact with every other token in the sequence....

Finetuning GPT2 to Reconstruct Sentences

Two words are anagrams if one can be formed by permuting the letters of the other. Applying the same logic to a sentence, would be saying that two sentences are anagrams(no such thing) if their component words can be permutated to form clones of each other. I thought it would be interesting to teach a language model to do this. You might be thinking that simply re-arranging words in a sentence doesn’t require intelligence and can be done with very trivial algorithms,you would be right, but I added an edge to this task, given a random sequence of words, the language model has to return a grammatically correct sequence using the same set of words....

SabiYarn: Training a Foundational Language Model on Nigerian Languages

Training a 125M parameter decoder-only model on 70b tokens of Nigerian Languages. demo Article

Descrambling Sentences with GPT2

Finetuning GPT2 to Reconstruct sentences Two words are anagrams if one can be formed by permuting the letters of the other. Applying the same logic to a sentence, would be saying that two sentences are anagrams(no such thing) if their component words can be permutated to form clones of each other. I thought it would be interesting to finetune a language model to do this. You might be thinking that simply re-arranging words in a sentence doesn’t require intelligence and can be done with very trivial algorithms,you would be right, but I added an edge to this task, given a random sequence of words, the language model has to return a grammatically correct sequence using the same set of words....

Classifying Code snippets with BERT.

This is a fun side project where I explored transformers based sentiment classification for the first time by training BERT to identify 15 of the most popular programming languages. i startED with simple machine learning approaches and gradually work our way up to more complex methods till we have a satisfactory solution. The Dataset Our dataset is a csv containing 45,000 samples. The dataset is made up of two columns, the ‘code’ feature contains code snippets we want to classify and the language column, which is our label contains the programming language it belongs to....