Cross Entropy

Cut Cross Entropy: 20x Memory reduction in LLM Pre-training through optimized cross entropy kernels

Introduction Whilst working on pretraining SabiYarn in 2025, I came across a really interesting paper by a team at Apple called “Cut Your Losses In Large-Vocabulary Language Models”, they had a very interesting proposition - the cross entropy loss function has had a memory problem that has quietly crept up with a recent trend in LLM development, Large Vocabulary sizes. Deepseek’s emergence in December 2024 marked a significant turning point in the LLM industry. As major AI labs continued to scale model performance through ever-increasing compute budgets, DeepSeek showed that gains in performance, cost and scalability came from optimizing the whole stack, from compute kernels to optimized memory access, networking and storage.While pretraining DeepSeek V3, the team developed an open source distributed file system 3FS (Fire-Flyer- FileSystem) optimized for high throughput training, a new attention mechanism (MultiHead Latent Attention with custom kernels), a highly tuned communication library for mixture-of-experts models (Deep-EP), and Deep GEMM, an FP-8 optimized matrix multiplication kernel library. ...

January 16, 2026 · 15 min · 2990 words · Damilola John

Classifying Code snippets with BERT.

This is a fun side project where I explored transformers based sentiment classification for the first time by training BERT to identify 15 of the most popular programming languages. i startED with simple machine learning approaches and gradually work our way up to more complex methods till we have a satisfactory solution. The Dataset Our dataset is a csv containing 45,000 samples. The dataset is made up of two columns, the ‘code’ feature contains code snippets we want to classify and the language column, which is our label contains the programming language it belongs to.Our train and test datasets were created from stratified sampling based on the target variable. ...

August 19, 2023 · 4 min · 841 words · Damilola John
tokenizers

Byte-Pair Encoding, The Tokenization algorithm powering Large Language Models.

Tokenization is an umbrella term for the methods used to turn texts into chunks of words or sub-words. Tokenization has a lot of applications in computer science, from compilers to Natural Language Processing. In this article, we would be focusing on tokenizers in Language models, in particular, a method of tokenization called Byte Pair Encoding. The last few years have witnessed a revolution in NLP catalyzed mainly by the introduction of the transformers architecture in 2017 with the paper ‘Attention is all you need ’ epitomized by the introduction of ChatGPT in late 2022. ...

July 20, 2023 · 13 min · 2564 words · Damilola John
image sensor

A guide on how AI is changing Computational Photography

And Enhance!! (from Blade Runner ), that’s Computational Photography . Computational photography describes signal processing techniques and algorithms that allow computers to replicate photographic processes like motion - blur correction , auto-focus ,depth-sensing , zoom and other features that would otherwise be impossible without optics ,while some of these processes use artificial intelligence techniques, Computational Photography is more than just AI , it involves a series of process like that takes an image from the Ones and Zeros on captured by image signal sensors and process to the final image displayed on screens . This article is going to be majorly focused on some computational photographical techniques employing AI. Smartphone Cameras have compensated for their hardware limitations due to the limited space to fit actual optics (like movable lenses to alter focus or depth of view ), and the limitations that comes with the technology behind digital cameras (CMOS sensors) , with the enormous computational power of their processors and have had to use clever algorithms to provide features like Zoom, Object-sensitive focus among others. These algorithms have incorporated some AI techniques in recent times to provide some unimaginable features like taking Google pixel’s night mode that allows you to take high definition pictures in extremely low-light . ...