Language Modeling is Compression

This repository provides an re-implementation of Google Deepmind's ICLR 2024 paper Language Modeling is Compression. In the original implementation released by Google, the Haiku framework (paired with JAX) is used—a setup that can present a steep learning curve for beginners, particularly those less familiar with functional programming paradigms.

To address this accessibility gap, we have reimplemented key components of the paper’s methods using standard PyTorch code (adopting a non-functional programming style).

Critically, our reimplementation retains the ability to reproduce the exact results reported in the original paper.

Installation

Pls follow the installation guide in the orginal repo: language_modeling_is_compression

Usage

If you want to compress with a language model, you need to train it first using:

python train_enwik_torch.py -e 3 -b 128;

To evaluate the compression rates, use (assume the model generated in train step is saved as trained_model.pth):

python compress_enwik_torch.py -m trained_model.pth

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
compressors		compressors
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
arithmetic_coder.py		arithmetic_coder.py
batch_train.sh		batch_train.sh
compress.py		compress.py
compress_enwik_torch.py		compress_enwik_torch.py
constants.py		constants.py
data_loaders.py		data_loaders.py
overview.png		overview.png
requirements.txt		requirements.txt
train.py		train.py
train_compress_pytorch.ipynb		train_compress_pytorch.ipynb
train_enwik_torch.py		train_enwik_torch.py
transformer.py		transformer.py
transformer_torch.py		transformer_torch.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Modeling is Compression

Installation

Usage

About

Uh oh!

Releases

Packages

Languages

License

txsing/GoogleLLMCompress

Folders and files

Latest commit

History

Repository files navigation

Language Modeling is Compression

Installation

Usage

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages