Writings
All Articles
Cut Cross Entropy: 20x Memory reduction in LLM Pre-training through optimized cross entropy kernels
Jan 2026Introduction Whilst working on pretraining SabiYarn in 2025, I came across a really interesting paper by a team at Apple called “Cut Your Losses In Large-Vocabulary Language...
Handling Blocking Coroutines effectively in Asyncio
Dec 2024Introduction I recently had to implement a feature at work that basically sent a ton of emails, with custom attachments to a ton of buyers across the world. Since we had a broker,...
Finetuning GPT2 to Reconstruct Sentences
Jun 2024Two words are anagrams if one can be formed by permuting the letters of the other. Applying the same logic to a sentence, would be saying that two sentences are anagrams(no such...
Classifying Code snippets with BERT.
Aug 2023This is a fun side project where I explored transformers based sentiment classification for the first time by training BERT to identify 15 of the most popular programming...
Byte-Pair Encoding, The Tokenization algorithm powering Large Language Models.
Jul 2023Tokenization is an umbrella term for the methods used to turn texts into chunks of words or sub-words. Tokenization has a lot of applications in computer science, from compilers...
A guide on how AI is changing Computational Photography
May 2023And Enhance!! (from Blade Runner ), that’s Computational Photography . Computational photography describes signal processing techniques and algorithms that allow computers...