How to download models from Hugging Face?

Leon Chase

17 Feb 2025 • 4 min read

Downloading models from Hugging Face is a straightforward process, and there are several ways to do it depending on your needs. Hugging Face provides both a user-friendly web interface and programmatic methods (via Python libraries) for downloading and using models. Below is a step-by-step guide on how to download models from Hugging Face:

1. Using the Hugging Face Website

Step 1: Create an Account

If you don’t already have a Hugging Face account, you’ll need to create one by visiting Hugging Face.
Once registered, you can log in to access the model repository.

Step 2: Search for a Model

Go to the Hugging Face Models Hub.
Use the search bar to find the model you’re interested in (e.g., Llama, GPT-NeoX, BERT, etc.).
You can filter models by task (e.g., text generation, classification), library (e.g., PyTorch, TensorFlow), or language.

Step 3: Download the Model Files

Once you’ve found the model you want, click on its page.
Scroll down to the Files and versions section.
You will see a list of files associated with the model (e.g., config.json, pytorch_model.bin, tokenizer.json, etc.).
To download individual files, click the Download icon next to each file.
Alternatively, you can download the entire model as a .zip file by clicking the Download all files button.

2. Using the Hugging Face CLI (`huggingface-cli`)

Hugging Face provides a command-line tool called huggingface-cli that allows you to download models directly from the terminal.

Step 1: Install the Hugging Face CLI

You can install the Hugging Face CLI via pip:

pip install huggingface_hub

Step 2: Log In to Hugging Face

To download private models or models that require authentication, you’ll need to log in:

huggingface-cli login

This will prompt you to enter your Hugging Face API token, which you can find in your Hugging Face account settings.

Step 3: Download the Model

Use the following command to download a model:

huggingface-cli download <model_name>

For example, to download the bert-base-uncased model:

huggingface-cli download bert-base-uncased

The model files will be downloaded to your local machine.

3. Using the `transformers` Library (Python)

If you’re working with Python, the easiest way to download and use a model is through the transformers library provided by Hugging Face.

Step 1: Install the `transformers` Library

First, install the transformers library if you haven’t already:

pip install transformers

Step 2: Load and Download the Model

You can use the from_pretrained() method to download and load a model. This method automatically downloads the model files and caches them locally.

Example: Downloading a Pre-trained Model

from transformers import AutoModel, AutoTokenizer

# Specify the model name
model_name = "bert-base-uncased"

# Download and load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Download and load the model
model = AutoModel.from_pretrained(model_name)

The first time you run this code, the model and tokenizer will be downloaded and cached in your local directory (usually ~/.cache/huggingface/transformers). Subsequent runs will load the model from the cache, so it won’t need to be re-downloaded.

Step 3: Use the Model

Once the model is loaded, you can use it for inference or fine-tuning. For example:

# Tokenize input text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")

# Generate output
outputs = model(**inputs)

# Print the output
print(outputs)

4. Downloading Specific Versions or Files

Sometimes you may want to download a specific version of a model or only certain files (e.g., the tokenizer or configuration files). You can do this using the transformers library or by specifying the revision or file path.

Example: Downloading a Specific Revision

from transformers import AutoModel

# Download a specific revision of the model
model = AutoModel.from_pretrained("bert-base-uncased", revision="main")

Example: Downloading Only the Tokenizer

from transformers import AutoTokenizer

# Download only the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

5. Using Git to Clone the Model Repository

If you want to download the entire model repository (including all files and metadata), you can use git to clone the repository.

Step 1: Install Git

Make sure you have git installed on your system. You can check by running:

git --version

Step 2: Clone the Repository

Navigate to the model’s page on Hugging Face, and copy the Git URL (found under the "Use in Transformers" section). Then, clone the repository using:

git lfs install
git clone https://huggingface.co/<model_name>

For example, to clone the bert-base-uncased model:

git lfs install
git clone https://huggingface.co/bert-base-uncased

This will download the entire model repository, including all files and metadata.

6. Caching Models Locally

When you download models using the transformers library, they are cached locally in the ~/.cache/huggingface/transformers directory. You can specify a custom cache directory by setting the TRANSFORMERS_CACHE environment variable:

export TRANSFORMERS_CACHE=/path/to/custom/cache

Alternatively, you can pass the cache_dir argument when loading the model:

model = AutoModel.from_pretrained("bert-base-uncased", cache_dir="/path/to/custom/cache")

7. Handling Large Models

Some models on Hugging Face are very large (e.g., GPT-NeoX, LLaMA). If you’re working with large models, consider the following:

Quantized Models: Some models have quantized versions (e.g., GGUF format) that reduce the size and improve inference speed. You can find these versions on the model’s page.
Streaming Downloads: The transformers library supports streaming downloads for large models, which can help reduce memory usage during the download process.

8. Conclusion

Downloading models from Hugging Face can be done in several ways, depending on your needs:

Web Interface: Use the Hugging Face website to manually download model files.
CLI: Use the huggingface-cli tool to download models from the terminal.
Python: Use the transformers library to programmatically download and load models.
Git: Clone the entire model repository using git.

Each method has its own advantages, and the best choice depends on your workflow. If you’re working in Python, the transformers library is the most convenient option, while the CLI or Git may be better for manual downloads or offline use.

By following these steps, you can easily download and use models from Hugging Face for a wide range of NLP tasks!