{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# AMULETY CLI Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "This tutorial demonstrates how to use AMULETY command line interface (CLI) to translate and embed both BCR (B-cell receptor) and TCR (T-cell receptor) sequences. \n", "\n", "AMULETY supports a wide range of embedding models for different immune receptor types. For a full list of the supported models, please check the [Usage](../usage.md) documentation page.\n", "\n", "## Installation\n", "\n", "Before getting started, please install AMULETY. You can install AMULETY through conda or pip. The conda installation will already install the IgBlast dependency, while if installing via pip, the IgBLAST dependency will need to be installed separately." ] }, { "cell_type": "markdown", "metadata": { "vscode": { "languageId": "shellscript" } }, "source": [ "Install AMULETY through conda:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "conda install -c conda-forge -c bioconda amulety --strict-channel-priority" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To verify the installation and print the help message, run:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "\u001b[1m \u001b[0m\n", "\u001b[1m \u001b[0m\u001b[1;33mUsage: \u001b[0m\u001b[1mamulety [OPTIONS] COMMAND [ARGS]...\u001b[0m\u001b[1m \u001b[0m\u001b[1m \u001b[0m\n", "\u001b[1m \u001b[0m\n", "\u001b[2m╭─\u001b[0m\u001b[2m Options \u001b[0m\u001b[2m───────────────────────────────────────────────────────────────────\u001b[0m\u001b[2m─╮\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-install\u001b[0m\u001b[1;36m-completion\u001b[0m Install completion for the current shell. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-show\u001b[0m\u001b[1;36m-completion\u001b[0m Show completion for the current shell, to copy \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m it or customize the installation. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-help\u001b[0m Show this message and exit. \u001b[2m│\u001b[0m\n", "\u001b[2m╰──────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n", "\u001b[2m╭─\u001b[0m\u001b[2m Commands \u001b[0m\u001b[2m──────────────────────────────────────────────────────────────────\u001b[0m\u001b[2m─╮\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36mtranslate-igblast \u001b[0m\u001b[1;36m \u001b[0m Translates nucleotide sequences to amino acid sequences \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m \u001b[0m using IgBlast. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36membed \u001b[0m\u001b[1;36m \u001b[0m Embeds sequences from an AIRR rearrangement file using \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m \u001b[0m the specified model. It returns the \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36mcheck-deps \u001b[0m\u001b[1;36m \u001b[0m Check if optional embedding dependencies and tools are \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m \u001b[0m installed. \u001b[2m│\u001b[0m\n", "\u001b[2m╰──────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n", "\n" ] } ], "source": [ "! amulety --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading example data and reference database\n", "\n", "The following command downloads an example AIRR format file of BCR sequences and the reference IgBlast database." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2025-09-24 13:00:01-- https://zenodo.org/records/17186858/files/AIRR_subject1_FNA_d0_1_Y1.tsv\n", "Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.43.25, 188.185.48.194, ...\n", "Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 479753 (469K) [application/octet-stream]\n", "Saving to: 'tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv.2'\n", "\n", "AIRR_subject1_FNA_d 100%[===================>] 468.51K 578KB/s in 0.8s \n", "\n", "2025-09-24 13:00:02 (578 KB/s) - 'tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv.2' saved [479753/479753]\n", "\n", "--2025-09-24 13:00:03-- https://github.com/nf-core/test-datasets/raw/airrflow/database-cache/igblast_base.zip\n", "Resolving github.com (github.com)... 140.82.112.3\n", "Connecting to github.com (github.com)|140.82.112.3|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/database-cache/igblast_base.zip [following]\n", "--2025-09-24 13:00:03-- https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/database-cache/igblast_base.zip\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 1204742 (1.1M) [application/zip]\n", "Saving to: 'tutorial/igblast_base.zip'\n", "\n", "igblast_base.zip 100%[===================>] 1.15M --.-KB/s in 0.1s \n", "\n", "2025-09-24 13:00:03 (7.89 MB/s) - 'tutorial/igblast_base.zip' saved [1204742/1204742]\n", "\n", "Archive: tutorial/igblast_base.zip\n", " creating: tutorial/igblast_base/\n", " creating: tutorial/igblast_base/internal_data/\n", " creating: tutorial/igblast_base/internal_data/rhesus_monkey/\n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.pin \n", " creating: tutorial/igblast_base/internal_data/rhesus_monkey/CVS/\n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/CVS/Repository \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/CVS/Entries \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/CVS/Root \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey.pdm.imgt \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey.pdm.kabat \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.nsd \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.nin \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_J.nog \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_J.nsq \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_D.nhr \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_J.nsi \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.psq \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.nhr \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.phr \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.pog \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.nog \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_J.nsd \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_D.nsi \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_J.nhr \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_D.nsd \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.nsi \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.psd \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.nsq \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_D.nsq \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey.ndm.kabat \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_D.nin \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_D.nog \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_V.psi \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey_J.nin \n", " inflating: tutorial/igblast_base/internal_data/rhesus_monkey/rhesus_monkey.ndm.imgt \n", " creating: tutorial/igblast_base/internal_data/human/\n", " inflating: tutorial/igblast_base/internal_data/human/human.ndm.imgt \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.nsq \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.nsd \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.psq \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.pog \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.pog \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.nog \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.psd \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.phr \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.nsi \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.nsi \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.nsq \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.pin \n", " inflating: tutorial/igblast_base/internal_data/human/human.ndm.kabat \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.nin \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.nhr \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.psi \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.psd \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.nog \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.psi \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.phr \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.psq \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.nin \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.nhr \n", " inflating: tutorial/igblast_base/internal_data/human/human.pdm.kabat \n", " inflating: tutorial/igblast_base/internal_data/human/human.pdm.imgt \n", " inflating: tutorial/igblast_base/internal_data/human/human_V.nsd \n", " inflating: tutorial/igblast_base/internal_data/human/human_TR_V.pin \n", " creating: tutorial/igblast_base/internal_data/rat/\n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.nhr \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.nin \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.nog \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.pin \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.nsd \n", " inflating: tutorial/igblast_base/internal_data/rat/rat.pdm.kabat \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.psq \n", " inflating: tutorial/igblast_base/internal_data/rat/rat.ndm.imgt \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.pog \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.nsq \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.phr \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.psi \n", " inflating: tutorial/igblast_base/internal_data/rat/rat.ndm.kabat \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.psd \n", " inflating: tutorial/igblast_base/internal_data/rat/rat.pdm.imgt \n", " inflating: tutorial/igblast_base/internal_data/rat/rat_V.nsi \n", " creating: tutorial/igblast_base/internal_data/mouse/\n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.nsi \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.nin \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.nsd \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.nog \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.nhr \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.pog \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse.pdm.kabat \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse.ndm.kabat \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.pin \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.nsq \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.pin \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.psq \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.phr \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.psq \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.nsd \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse.ndm.imgt \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.nsq \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.psi \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse.pdm.imgt \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.phr \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.nog \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.pog \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.nhr \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.psd \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.psi \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.nin \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_V.psd \n", " inflating: tutorial/igblast_base/internal_data/mouse/mouse_TR_V.nsi \n", " inflating: tutorial/igblast_base/internal_data/readme \n", " creating: tutorial/igblast_base/internal_data/rabbit/\n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.psq \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit.ndm.imgt \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit.pdm.imgt \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit.pdm.kabat \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.nsq \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.phr \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.nin \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.psi \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.nsi \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.nog \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit.ndm.kabat \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.nhr \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.pin \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.pog \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.nsd \n", " inflating: tutorial/igblast_base/internal_data/rabbit/rabbit_V.psd \n", " creating: tutorial/igblast_base/fasta/\n", " inflating: tutorial/igblast_base/fasta/imgt_human_tr_d.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_human_ig_d.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_aa_human_ig_v.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_aa_human_tr_v.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_ig_d.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_aa_mouse_ig_v.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_human_tr_c.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_human_ig_v.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_tr_j.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_tr_c.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_ig_v.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_human_tr_j.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_ig_j.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_human_ig_c.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_aa_mouse_tr_v.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_ig_c.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_tr_d.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_mouse_tr_v.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_human_ig_j.fasta \n", " inflating: tutorial/igblast_base/fasta/imgt_human_tr_v.fasta \n", " creating: tutorial/igblast_base/optional_file/\n", " inflating: tutorial/igblast_base/optional_file/human_gl.aux \n", " inflating: tutorial/igblast_base/optional_file/human_gl.aux.testonly \n", " inflating: tutorial/igblast_base/optional_file/rabbit_gl.aux \n", " inflating: tutorial/igblast_base/optional_file/mouse_gl.aux \n", " inflating: tutorial/igblast_base/optional_file/readme \n", " inflating: tutorial/igblast_base/optional_file/rat_gl.aux \n", " inflating: tutorial/igblast_base/optional_file/rhesus_monkey_gl.aux \n", " creating: tutorial/igblast_base/database/\n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.nhr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.ntf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.nhr \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.pin \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.pjs \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.ndb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.ntf \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.nog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.nos \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.nhr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.njs \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.ntf \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.not \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.nin \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.njs \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.nhr \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.nos \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.nog \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.nin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.nos \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.njs \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.nin \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.nos \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.pog \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.njs \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.nsd \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.not \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.nin \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.ndb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.nog \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.ptf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.nos \n", " inflating: tutorial/igblast_base/database/ncbi_human_c_genes.tar \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.nto \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.pto \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.nos \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.nhr \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.ndb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.ntf \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.ndb \n", " inflating: tutorial/igblast_base/database/mouse_gl_D.nsd \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.ndb \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.pdb \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.nsq \n", " extracting: tutorial/igblast_base/database/imgt_mouse_tr_d.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.nto \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.pot \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.ndb \n", " inflating: tutorial/igblast_base/database/mouse_gl_D.nsi \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.nos \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.not \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.pog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.nog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.njs \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.njs \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.pto \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.njs \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.nto \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.nhr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.nos \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.ndb \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.nin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.nin \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.nsd \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.nin \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.pos \n", " inflating: tutorial/igblast_base/database/mouse_gl_J.nog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.nhr \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.nin \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.not \n", " inflating: tutorial/igblast_base/database/mouse_gl_D.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.nsq \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.pdb \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.nog \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_J.nog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.ntf \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.ntf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.not \n", " inflating: tutorial/igblast_base/database/mouse_gl_J.nsi \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.not \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.nog \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.psq \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.pot \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.njs \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.nto \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.nog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.nog \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.nog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.not \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.ntf \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.pin \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.nsq \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.pos \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.nog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.not \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.nhr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.nos \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.ndb \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.ndb \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.nos \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.nto \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.nog \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.nto \n", " inflating: tutorial/igblast_base/database/mouse_gl_J.nsd \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.nin \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_J.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.njs \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.ntf \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.phr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.nto \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.not \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.nos \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.nto \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.nog \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_J.nsi \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.phr \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.nin \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.nin \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.psq \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.pot \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.nin \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.pin \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_v.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.nhr \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.pog \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.nin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.njs \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.nsq \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.psq \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.ptf \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.pin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.ntf \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.nhr \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.phr \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.phr \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.ndb \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.nsi \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.ndb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.ntf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.nto \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.nhr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.ndb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.nos \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.not \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.pto \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.pog \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.nog \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.psq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.nin \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.njs \n", " extracting: tutorial/igblast_base/database/mouse_gl_J.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.ntf \n", " inflating: tutorial/igblast_base/database/mouse_gl_D.nog \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.not \n", " extracting: tutorial/igblast_base/database/imgt_human_tr_d.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.not \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.nto \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.ntf \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.pjs \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.ndb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.nin \n", " creating: tutorial/igblast_base/database/airr/\n", " inflating: tutorial/igblast_base/database/airr/airr_c_human.tar \n", " inflating: tutorial/igblast_base/database/airr/airr_c_mouse.tar \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.not \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.nos \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.njs \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_J.nsd \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.njs \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.nog \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.pin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_j.nto \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_J.nhr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.nto \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.nhr \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.nos \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_VJ.tar \n", " inflating: tutorial/igblast_base/database/mouse_gl_VDJ.tar \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.nto \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.ptf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.nog \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.not \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.phr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.ntf \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_v.ntf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.nto \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.nsi \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_v.nin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.nos \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.pdb \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.nhr \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.phr \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.njs \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.nhr \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.pog \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.psq \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.psd \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.ptf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.nog \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.pdb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.nhr \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.njs \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.ndb \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_tr_v.pjs \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.pog \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.not \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.nin \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.pos \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_d.ntf \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.pos \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.nhr \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.pto \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_tr_v.pot \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.nin \n", " inflating: tutorial/igblast_base/database/imgt_aa_mouse_ig_v.pjs \n", " inflating: tutorial/igblast_base/database/mouse_gl_D.nin \n", " inflating: tutorial/igblast_base/database/mouse_gl_J.nin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.ndb \n", " inflating: tutorial/igblast_base/database/mouse_gl_D.nhr \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.nog \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.nsq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.nto \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_j.ntf \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_j.nsq \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_j.nto \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.psq \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_v.not \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_d.nhr \n", " inflating: tutorial/igblast_base/database/imgt_human_ig_c.nos \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.psd \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_d.nog \n", " inflating: tutorial/igblast_base/database/imgt_human_tr_c.nhr \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_V.psi \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_c.nsq \n", " inflating: tutorial/igblast_base/database/imgt_aa_human_ig_v.pin \n", " inflating: tutorial/igblast_base/database/mouse_gl_J.nhr \n", " inflating: tutorial/igblast_base/database/rhesus_monkey_J.nin \n", " inflating: tutorial/igblast_base/database/imgt_mouse_tr_c.ndb \n", " inflating: tutorial/igblast_base/database/imgt_mouse_ig_d.njs \n", " inflating: tutorial/igblast_base/database/mouse_gl_V.psi \n" ] } ], "source": [ "# Create tutorial directory and download example data\n", "! mkdir -p tutorial\n", "! wget -P tutorial https://zenodo.org/records/17186858/files/AIRR_subject1_FNA_d0_1_Y1.tsv\n", "\n", "# Download and extract IgBlast reference database\n", "! wget -P tutorial -c https://github.com/nf-core/test-datasets/raw/airrflow/database-cache/igblast_base.zip\n", "! unzip tutorial/igblast_base.zip -d tutorial\n", "! rm tutorial/igblast_base.zip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Translating nucleotides to amino acid sequences\n", "\n", "The inputs to the embedding models are [AIRR format files](https://docs.airr-community.org/en/stable/datarep/overview.html#datarepresentations) with immune receptor amino acid sequences. If the AIRR file only contains nucleotide sequences, the `amulety translate-igblast` command can help with the translation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 11:28:42,713 - INFO - Converting AIRR table to FastA for IgBlast translation...\n", "2025-09-24 11:28:42,720 - INFO - Calling IgBlast for running translation...\n", "2025-09-24 11:28:44,404 - INFO - Saved the translations in the dataframe (sequence_aa contains the full translation and sequence_vdj_aa contains the VDJ translation).\n", "2025-09-24 11:28:44,407 - INFO - Took 1.69 seconds\n", "2025-09-24 11:28:44,408 - INFO - Saved the translations in tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv file.\n" ] } ], "source": [ "! amulety translate-igblast -i tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv -o tutorial -r tutorial/igblast_base" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Embedding sequences\n", "\n", "Now we are ready to embed the sequences using various models. AMULETY uses a unified `embed` command that supports all available models.\n", "\n", "To print the help message for the embedding command run:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "\u001b[1m \u001b[0m\n", "\u001b[1m \u001b[0m\u001b[1;33mUsage: \u001b[0m\u001b[1mamulety embed [OPTIONS]\u001b[0m\u001b[1m \u001b[0m\u001b[1m \u001b[0m\n", "\u001b[1m \u001b[0m\n", " Embeds sequences from an AIRR rearrangement file using the specified model. It \n", " returns the \n", " \n", " \u001b[2mExample usage:\u001b[0m \n", " \u001b[2mamulety embed \u001b[0m\u001b[1;2;36m-\u001b[0m\u001b[1;2;36m-chain\u001b[0m\u001b[2m HL \u001b[0m\u001b[1;2;36m-\u001b[0m\u001b[1;2;36m-model\u001b[0m\u001b[2m antiberta2 \u001b[0m\u001b[1;2;36m-\u001b[0m\u001b[1;2;36m-output\u001b[0m\u001b[1;2;36m-file-path\u001b[0m\u001b[2m out.pt \u001b[0m \n", " \u001b[2mairr_rearrangement.tsv\u001b[0m \n", " \n", "\u001b[2m╭─\u001b[0m\u001b[2m Options \u001b[0m\u001b[2m───────────────────────────────────────────────────────────────────\u001b[0m\u001b[2m─╮\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[31m*\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-input\u001b[0m\u001b[1;36m-airr\u001b[0m \u001b[1;33mTEXT \u001b[0m The path to the input data file. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m The data file should be in AIRR \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m format. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2;31m[required] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[31m*\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-chain\u001b[0m \u001b[1;33mTEXT \u001b[0m Input sequences. For BCR: H=Heavy, \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m L=Light, HL=Heavy-Light pairs, \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m LH=Light-Heavy pairs, H+L=Both \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m chains separately. For TCR: \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m H=Beta/Delta, L=Alpha/Gamma, \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m HL=Beta-Alpha/Delta-Gamma pairs, \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m LH=Alpha-Beta/Gamma-Delta pairs, \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m H+L=Both chains separately. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2;31m[required] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[31m*\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-model\u001b[0m \u001b[1;33mTEXT \u001b[0m The embedding model to use. BCR: \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m ['ablang', 'antiberta2', \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m 'antiberty', 'balm-paired']. TCR: \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m ['tcr-bert', 'tcrt5']. Immune (BCR \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m & TCR): ['immune2vec']. Protein: \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m ['esm2', 'prott5', 'custom']. Use \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m 'custom' for fine-tuned models with \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-model\u001b[0m\u001b[1;36m-path\u001b[0m, \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-embedding\u001b[0m\u001b[1;36m-dimension\u001b[0m, and \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-max\u001b[0m\u001b[1;36m-length\u001b[0m parameters. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2;31m[required] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[31m*\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-output\u001b[0m\u001b[1;36m-file-path\u001b[0m \u001b[1;33mTEXT \u001b[0m The path where the generated \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m embeddings will be saved. The file \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m extension should be .csv, or .tsv. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m for a dataframe, .pt for a pickled \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m torch object, or .h5ad for an \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m anndata object. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2;31m[required] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-cache\u001b[0m\u001b[1;36m-dir\u001b[0m \u001b[1;33mTEXT \u001b[0m Cache dir for storing the \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m pre-trained model weights. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: /tmp/amulety] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-sequence\u001b[0m\u001b[1;36m-col\u001b[0m \u001b[1;33mTEXT \u001b[0m The name of the column containing \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m the amino acid sequences to embed. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: sequence_vdj_aa] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-cell\u001b[0m\u001b[1;36m-id-col\u001b[0m \u001b[1;33mTEXT \u001b[0m The name of the column containing \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m the single-cell barcode. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: cell_id] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-batch\u001b[0m\u001b[1;36m-size\u001b[0m \u001b[1;33mINTEGER\u001b[0m The batch size of sequences to \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m embed. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: 50] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-model\u001b[0m\u001b[1;36m-path\u001b[0m \u001b[1;33mTEXT \u001b[0m Path to custom model (HuggingFace \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m model name or local path). Required \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m for 'custom' model. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-embedding\u001b[0m\u001b[1;36m-dimension\u001b[0m \u001b[1;33mINTEGER\u001b[0m Embedding dimension for custom \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m model. Required for 'custom' model. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-max\u001b[0m\u001b[1;36m-length\u001b[0m \u001b[1;33mINTEGER\u001b[0m Maximum sequence length for custom \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m model. Required for 'custom' model. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-duplicate\u001b[0m\u001b[1;36m-col\u001b[0m \u001b[1;33mTEXT \u001b[0m The name of the numeric column used \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m to select the best chain when \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m multiple chains of the same type \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m exist per cell. Default: \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m 'duplicate_count'. Custom columns \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m must be numeric and user-defined. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: duplicate_count] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-installation\u001b[0m\u001b[1;36m-path\u001b[0m \u001b[1;33mTEXT \u001b[0m Custom path to Immune2Vec \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m installation directory. Only \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m applies to 'immune2vec' model. \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[2m[default: None] \u001b[0m \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-residue\u001b[0m\u001b[1;36m-level\u001b[0m \u001b[1;33m \u001b[0m If True, returns residue-level \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m embeddings of dimension sequence \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m length x embedding dimension (L x \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m D) instead of sequence-level (1 x \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m D). \u001b[2m│\u001b[0m\n", "\u001b[2m│\u001b[0m \u001b[1;36m-\u001b[0m\u001b[1;36m-help\u001b[0m \u001b[1;33m \u001b[0m Show this message and exit. \u001b[2m│\u001b[0m\n", "\u001b[2m╰──────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n", "\n" ] } ], "source": [ "! amulety embed --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### BCR embedding examples\n", "\n", "Let's demonstrate embedding BCR sequences using different models:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### AntiBERTy (BCR-specific model)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 11:28:55,583 - INFO - Detected single-cell data format\n", "2025-09-24 11:28:55,585 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 11:28:55,586 - INFO - Removed 102 sequences not matching H chain\n", "2025-09-24 11:29:02,850 - INFO - AntiBERTy loaded. Size: 26.03 M\n", "2025-09-24 11:29:02,850 - INFO - Batch 1/48\n", "2025-09-24 11:29:02,887 - INFO - Batch 2/48\n", "2025-09-24 11:29:02,912 - INFO - Batch 3/48\n", "2025-09-24 11:29:02,933 - INFO - Batch 4/48\n", "2025-09-24 11:29:02,955 - INFO - Batch 5/48\n", "2025-09-24 11:29:02,976 - INFO - Batch 6/48\n", "2025-09-24 11:29:02,997 - INFO - Batch 7/48\n", "2025-09-24 11:29:03,017 - INFO - Batch 8/48\n", "2025-09-24 11:29:03,037 - INFO - Batch 9/48\n", "2025-09-24 11:29:03,059 - INFO - Batch 10/48\n", "2025-09-24 11:29:03,079 - INFO - Batch 11/48\n", "2025-09-24 11:29:03,099 - INFO - Batch 12/48\n", "2025-09-24 11:29:03,119 - INFO - Batch 13/48\n", "2025-09-24 11:29:03,140 - INFO - Batch 14/48\n", "2025-09-24 11:29:03,161 - INFO - Batch 15/48\n", "2025-09-24 11:29:03,181 - INFO - Batch 16/48\n", "2025-09-24 11:29:03,202 - INFO - Batch 17/48\n", "2025-09-24 11:29:03,222 - INFO - Batch 18/48\n", "2025-09-24 11:29:03,243 - INFO - Batch 19/48\n", "2025-09-24 11:29:03,290 - INFO - Batch 20/48\n", "2025-09-24 11:29:03,312 - INFO - Batch 21/48\n", "2025-09-24 11:29:03,332 - INFO - Batch 22/48\n", "2025-09-24 11:29:03,352 - INFO - Batch 23/48\n", "2025-09-24 11:29:03,373 - INFO - Batch 24/48\n", "2025-09-24 11:29:03,393 - INFO - Batch 25/48\n", "2025-09-24 11:29:03,414 - INFO - Batch 26/48\n", "2025-09-24 11:29:03,433 - INFO - Batch 27/48\n", "2025-09-24 11:29:03,452 - INFO - Batch 28/48\n", "2025-09-24 11:29:03,472 - INFO - Batch 29/48\n", "2025-09-24 11:29:03,492 - INFO - Batch 30/48\n", "2025-09-24 11:29:03,514 - INFO - Batch 31/48\n", "2025-09-24 11:29:03,534 - INFO - Batch 32/48\n", "2025-09-24 11:29:03,554 - INFO - Batch 33/48\n", "2025-09-24 11:29:03,575 - INFO - Batch 34/48\n", "2025-09-24 11:29:03,594 - INFO - Batch 35/48\n", "2025-09-24 11:29:03,614 - INFO - Batch 36/48\n", "2025-09-24 11:29:03,635 - INFO - Batch 37/48\n", "2025-09-24 11:29:03,657 - INFO - Batch 38/48\n", "2025-09-24 11:29:03,680 - INFO - Batch 39/48\n", "2025-09-24 11:29:03,700 - INFO - Batch 40/48\n", "2025-09-24 11:29:03,721 - INFO - Batch 41/48\n", "2025-09-24 11:29:03,743 - INFO - Batch 42/48\n", "2025-09-24 11:29:03,763 - INFO - Batch 43/48\n", "2025-09-24 11:29:03,783 - INFO - Batch 44/48\n", "2025-09-24 11:29:03,804 - INFO - Batch 45/48\n", "2025-09-24 11:29:03,825 - INFO - Batch 46/48\n", "2025-09-24 11:29:03,845 - INFO - Batch 47/48\n", "2025-09-24 11:29:03,866 - INFO - Batch 48/48\n", "2025-09-24 11:29:03,879 - INFO - Took 1.03 seconds\n", "2025-09-24 11:29:03,880 - INFO - Generated embeddings with dimensions torch.Size([95, 512])\n", "2025-09-24 11:29:03,880 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 11:29:03,881 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 11:29:03,885 - INFO - Saved embedding at tutorial/test_embedding.pt\n" ] } ], "source": [ "! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model antiberty --batch-size 2 --output-file-path tutorial/test_embedding.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### AntiBERTa2 (BCR-specific model)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 11:44:05,686 - INFO - Detected single-cell data format\n", "2025-09-24 11:44:05,688 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 11:44:05,688 - INFO - Removed 102 sequences not matching H chain\n", "tokenizer_config.json: 100%|████████████████████| 116/116 [00:00<00:00, 339kB/s]\n", "vocab.txt: 100%|█████████████████████████████| 80.0/80.0 [00:00<00:00, 1.56MB/s]\n", "special_tokens_map.json: 100%|█████████████████| 124/124 [00:00<00:00, 1.24MB/s]\n", "config.json: 100%|█████████████████████████████| 575/575 [00:00<00:00, 1.76MB/s]\n", "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n", "2025-09-24 11:44:08,458 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n", "model.safetensors: 100%|█████████████████████| 811M/811M [00:20<00:00, 40.2MB/s]\n", "RoFormerForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.\n", " - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes\n", " - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).\n", " - If you are not the owner of the model architecture class, please contact the model code owner to update it.\n", "2025-09-24 11:44:29,008 - INFO - AntiBERTa2 loaded. Size: 202.642462 M\n", "2025-09-24 11:44:29,009 - INFO - Batch 1/48.\n", "2025-09-24 11:44:30,501 - INFO - Batch 2/48.\n", "2025-09-24 11:44:30,743 - INFO - Batch 3/48.\n", "2025-09-24 11:44:30,963 - INFO - Batch 4/48.\n", "2025-09-24 11:44:31,181 - INFO - Batch 5/48.\n", "2025-09-24 11:44:31,400 - INFO - Batch 6/48.\n", "2025-09-24 11:44:31,614 - INFO - Batch 7/48.\n", "2025-09-24 11:44:31,839 - INFO - Batch 8/48.\n", "2025-09-24 11:44:32,061 - INFO - Batch 9/48.\n", "2025-09-24 11:44:32,275 - INFO - Batch 10/48.\n", "2025-09-24 11:44:32,496 - INFO - Batch 11/48.\n", "2025-09-24 11:44:32,714 - INFO - Batch 12/48.\n", "2025-09-24 11:44:32,926 - INFO - Batch 13/48.\n", "2025-09-24 11:44:33,146 - INFO - Batch 14/48.\n", "2025-09-24 11:44:33,370 - INFO - Batch 15/48.\n", "2025-09-24 11:44:33,588 - INFO - Batch 16/48.\n", "2025-09-24 11:44:33,812 - INFO - Batch 17/48.\n", "2025-09-24 11:44:34,033 - INFO - Batch 18/48.\n", "2025-09-24 11:44:34,255 - INFO - Batch 19/48.\n", "2025-09-24 11:44:34,474 - INFO - Batch 20/48.\n", "2025-09-24 11:44:34,692 - INFO - Batch 21/48.\n", "2025-09-24 11:44:34,916 - INFO - Batch 22/48.\n", "2025-09-24 11:44:35,129 - INFO - Batch 23/48.\n", "2025-09-24 11:44:35,391 - INFO - Batch 24/48.\n", "2025-09-24 11:44:35,613 - INFO - Batch 25/48.\n", "2025-09-24 11:44:35,832 - INFO - Batch 26/48.\n", "2025-09-24 11:44:36,059 - INFO - Batch 27/48.\n", "2025-09-24 11:44:36,281 - INFO - Batch 28/48.\n", "2025-09-24 11:44:36,500 - INFO - Batch 29/48.\n", "2025-09-24 11:44:36,714 - INFO - Batch 30/48.\n", "2025-09-24 11:44:36,934 - INFO - Batch 31/48.\n", "2025-09-24 11:44:37,151 - INFO - Batch 32/48.\n", "2025-09-24 11:44:37,365 - INFO - Batch 33/48.\n", "2025-09-24 11:44:37,590 - INFO - Batch 34/48.\n", "2025-09-24 11:44:37,813 - INFO - Batch 35/48.\n", "2025-09-24 11:44:38,037 - INFO - Batch 36/48.\n", "2025-09-24 11:44:38,285 - INFO - Batch 37/48.\n", "2025-09-24 11:44:38,503 - INFO - Batch 38/48.\n", "2025-09-24 11:44:38,729 - INFO - Batch 39/48.\n", "2025-09-24 11:44:38,949 - INFO - Batch 40/48.\n", "2025-09-24 11:44:39,168 - INFO - Batch 41/48.\n", "2025-09-24 11:44:39,391 - INFO - Batch 42/48.\n", "2025-09-24 11:44:39,608 - INFO - Batch 43/48.\n", "2025-09-24 11:44:39,838 - INFO - Batch 44/48.\n", "2025-09-24 11:44:40,064 - INFO - Batch 45/48.\n", "2025-09-24 11:44:40,283 - INFO - Batch 46/48.\n", "2025-09-24 11:44:40,507 - INFO - Batch 47/48.\n", "2025-09-24 11:44:40,735 - INFO - Batch 48/48.\n", "2025-09-24 11:44:40,868 - INFO - Took 11.86 seconds\n", "2025-09-24 11:44:40,872 - INFO - Generated embeddings with dimensions torch.Size([95, 1024])\n", "2025-09-24 11:44:40,873 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 11:44:40,875 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 11:44:40,881 - INFO - Saved embedding at tutorial/AIRR_subject1_FNA_d0_1_Y1_antiberta2.pt\n" ] } ], "source": [ "# Embed heavy-light chain pairs using AntiBERTa2\n", "! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model antiberta2 --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_antiberta2.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### AbLang (BCR-specific model with separate heavy/light models)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 11:45:05,420 - INFO - Detected single-cell data format\n", "2025-09-24 11:45:05,421 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 11:45:06,418 - INFO - AbLang heavy chain model loaded\n", "2025-09-24 11:45:06,418 - INFO - Batch 1/99\n", "2025-09-24 11:45:06,500 - INFO - Batch 2/99\n", "2025-09-24 11:45:06,570 - INFO - Batch 3/99\n", "2025-09-24 11:45:06,642 - INFO - Batch 4/99\n", "2025-09-24 11:45:06,712 - INFO - Batch 5/99\n", "2025-09-24 11:45:06,790 - INFO - Batch 6/99\n", "2025-09-24 11:45:06,864 - INFO - Batch 7/99\n", "2025-09-24 11:45:06,939 - INFO - Batch 8/99\n", "2025-09-24 11:45:07,011 - INFO - Batch 9/99\n", "2025-09-24 11:45:07,081 - INFO - Batch 10/99\n", "2025-09-24 11:45:07,153 - INFO - Batch 11/99\n", "2025-09-24 11:45:07,224 - INFO - Batch 12/99\n", "2025-09-24 11:45:07,298 - INFO - Batch 13/99\n", "2025-09-24 11:45:07,372 - INFO - Batch 14/99\n", "2025-09-24 11:45:07,447 - INFO - Batch 15/99\n", "2025-09-24 11:45:07,521 - INFO - Batch 16/99\n", "2025-09-24 11:45:07,594 - INFO - Batch 17/99\n", "2025-09-24 11:45:07,672 - INFO - Batch 18/99\n", "2025-09-24 11:45:07,748 - INFO - Batch 19/99\n", "2025-09-24 11:45:07,823 - INFO - Batch 20/99\n", "2025-09-24 11:45:07,902 - INFO - Batch 21/99\n", "2025-09-24 11:45:07,981 - INFO - Batch 22/99\n", "2025-09-24 11:45:08,057 - INFO - Batch 23/99\n", "2025-09-24 11:45:08,137 - INFO - Batch 24/99\n", "2025-09-24 11:45:08,221 - INFO - Batch 25/99\n", "2025-09-24 11:45:08,307 - INFO - Batch 26/99\n", "2025-09-24 11:45:08,392 - INFO - Batch 27/99\n", "2025-09-24 11:45:08,476 - INFO - Batch 28/99\n", "2025-09-24 11:45:08,553 - INFO - Batch 29/99\n", "2025-09-24 11:45:08,630 - INFO - Batch 30/99\n", "2025-09-24 11:45:08,707 - INFO - Batch 31/99\n", "2025-09-24 11:45:08,783 - INFO - Batch 32/99\n", "2025-09-24 11:45:08,859 - INFO - Batch 33/99\n", "2025-09-24 11:45:08,935 - INFO - Batch 34/99\n", "2025-09-24 11:45:09,012 - INFO - Batch 35/99\n", "2025-09-24 11:45:09,087 - INFO - Batch 36/99\n", "2025-09-24 11:45:09,161 - INFO - Batch 37/99\n", "2025-09-24 11:45:09,237 - INFO - Batch 38/99\n", "2025-09-24 11:45:09,311 - INFO - Batch 39/99\n", "2025-09-24 11:45:09,389 - INFO - Batch 40/99\n", "2025-09-24 11:45:09,467 - INFO - Batch 41/99\n", "2025-09-24 11:45:09,545 - INFO - Batch 42/99\n", "2025-09-24 11:45:09,621 - INFO - Batch 43/99\n", "2025-09-24 11:45:09,699 - INFO - Batch 44/99\n", "2025-09-24 11:45:09,775 - INFO - Batch 45/99\n", "2025-09-24 11:45:09,851 - INFO - Batch 46/99\n", "2025-09-24 11:45:09,927 - INFO - Batch 47/99\n", "2025-09-24 11:45:10,004 - INFO - Batch 48/99\n", "2025-09-24 11:45:10,082 - INFO - Batch 49/99\n", "2025-09-24 11:45:10,155 - INFO - Batch 50/99\n", "2025-09-24 11:45:10,234 - INFO - Batch 51/99\n", "2025-09-24 11:45:10,307 - INFO - Batch 52/99\n", "2025-09-24 11:45:10,380 - INFO - Batch 53/99\n", "2025-09-24 11:45:10,455 - INFO - Batch 54/99\n", "2025-09-24 11:45:10,525 - INFO - Batch 55/99\n", "2025-09-24 11:45:10,599 - INFO - Batch 56/99\n", "2025-09-24 11:45:10,672 - INFO - Batch 57/99\n", "2025-09-24 11:45:10,744 - INFO - Batch 58/99\n", "2025-09-24 11:45:10,817 - INFO - Batch 59/99\n", "2025-09-24 11:45:10,893 - INFO - Batch 60/99\n", "2025-09-24 11:45:10,967 - INFO - Batch 61/99\n", "2025-09-24 11:45:11,042 - INFO - Batch 62/99\n", "2025-09-24 11:45:11,112 - INFO - Batch 63/99\n", "2025-09-24 11:45:11,197 - INFO - Batch 64/99\n", "2025-09-24 11:45:11,305 - INFO - Batch 65/99\n", "2025-09-24 11:45:11,380 - INFO - Batch 66/99\n", "2025-09-24 11:45:11,448 - INFO - Batch 67/99\n", "2025-09-24 11:45:11,521 - INFO - Batch 68/99\n", "2025-09-24 11:45:11,597 - INFO - Batch 69/99\n", "2025-09-24 11:45:11,660 - INFO - Batch 70/99\n", "2025-09-24 11:45:11,734 - INFO - Batch 71/99\n", "2025-09-24 11:45:11,810 - INFO - Batch 72/99\n", "2025-09-24 11:45:11,883 - INFO - Batch 73/99\n", "2025-09-24 11:45:11,955 - INFO - Batch 74/99\n", "2025-09-24 11:45:12,028 - INFO - Batch 75/99\n", "2025-09-24 11:45:12,102 - INFO - Batch 76/99\n", "2025-09-24 11:45:12,177 - INFO - Batch 77/99\n", "2025-09-24 11:45:12,252 - INFO - Batch 78/99\n", "2025-09-24 11:45:12,325 - INFO - Batch 79/99\n", "2025-09-24 11:45:12,399 - INFO - Batch 80/99\n", "2025-09-24 11:45:12,472 - INFO - Batch 81/99\n", "2025-09-24 11:45:12,546 - INFO - Batch 82/99\n", "2025-09-24 11:45:12,627 - INFO - Batch 83/99\n", "2025-09-24 11:45:12,704 - INFO - Batch 84/99\n", "2025-09-24 11:45:12,780 - INFO - Batch 85/99\n", "2025-09-24 11:45:12,853 - INFO - Batch 86/99\n", "2025-09-24 11:45:12,926 - INFO - Batch 87/99\n", "2025-09-24 11:45:12,998 - INFO - Batch 88/99\n", "2025-09-24 11:45:13,073 - INFO - Batch 89/99\n", "2025-09-24 11:45:13,144 - INFO - Batch 90/99\n", "2025-09-24 11:45:13,211 - INFO - Batch 91/99\n", "2025-09-24 11:45:13,286 - INFO - Batch 92/99\n", "2025-09-24 11:45:13,358 - INFO - Batch 93/99\n", "2025-09-24 11:45:13,424 - INFO - Batch 94/99\n", "2025-09-24 11:45:13,495 - INFO - Batch 95/99\n", "2025-09-24 11:45:13,566 - INFO - Batch 96/99\n", "2025-09-24 11:45:13,638 - INFO - Batch 97/99\n", "2025-09-24 11:45:13,709 - INFO - Batch 98/99\n", "2025-09-24 11:45:13,779 - INFO - Batch 99/99\n", "2025-09-24 11:45:13,817 - INFO - AbLang embedding completed. Took 7.4 seconds\n", "2025-09-24 11:45:13,828 - INFO - Generated embeddings with dimensions torch.Size([197, 768])\n", "2025-09-24 11:45:13,829 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 11:45:13,831 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 11:45:13,838 - INFO - Saved embedding at tutorial/AIRR_subject1_FNA_d0_1_Y1_ablang.pt\n", "\u001b[0m" ] } ], "source": [ "# Embed both heavy and light chains separately using AbLang\n", "! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H+L --model ablang --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_ablang.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### BALM-paired model (BCR paired chains)\n", "\n", "BALM-paired is a specialized model for BCR trained on paired heavy-light chains. We can embed concatenated heavy and light chains with AMULETY with the `--chain HL` option." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 11:51:36,752 - INFO - Detected single-cell data format\n", "2025-09-24 11:51:36,754 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 12:02:54,987 - INFO - Model size: 303.92M\n", "Batch 1/48\n", "\n", "Batch 2/48\n", "\n", "Batch 3/48\n", "\n", "Batch 4/48\n", "\n", "Batch 5/48\n", "\n", "Batch 6/48\n", "\n", "Batch 7/48\n", "\n", "Batch 8/48\n", "\n", "Batch 9/48\n", "\n", "Batch 10/48\n", "\n", "Batch 11/48\n", "\n", "Batch 12/48\n", "\n", "Batch 13/48\n", "\n", "Batch 14/48\n", "\n", "Batch 15/48\n", "\n", "Batch 16/48\n", "\n", "Batch 17/48\n", "\n", "Batch 18/48\n", "\n", "Batch 19/48\n", "\n", "Batch 20/48\n", "\n", "Batch 21/48\n", "\n", "Batch 22/48\n", "\n", "Batch 23/48\n", "\n", "Batch 24/48\n", "\n", "Batch 25/48\n", "\n", "Batch 26/48\n", "\n", "Batch 27/48\n", "\n", "Batch 28/48\n", "\n", "Batch 29/48\n", "\n", "Batch 30/48\n", "\n", "Batch 31/48\n", "\n", "Batch 32/48\n", "\n", "Batch 33/48\n", "\n", "Batch 34/48\n", "\n", "Batch 35/48\n", "\n", "Batch 36/48\n", "\n", "Batch 37/48\n", "\n", "Batch 38/48\n", "\n", "Batch 39/48\n", "\n", "Batch 40/48\n", "\n", "Batch 41/48\n", "\n", "Batch 42/48\n", "\n", "Batch 43/48\n", "\n", "Batch 44/48\n", "\n", "Batch 45/48\n", "\n", "Batch 46/48\n", "\n", "Batch 47/48\n", "\n", "Batch 48/48\n", "\n", "2025-09-24 12:03:21,260 - INFO - Took 26.27 seconds\n", "2025-09-24 12:03:21,266 - INFO - Generated embeddings with dimensions torch.Size([95, 1024])\n", "2025-09-24 12:03:21,267 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 12:03:21,270 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 12:03:21,273 - INFO - Saved embedding at tutorial/AIRR_subject1_FNA_d0_1_Y1_balm_paired.pt\n" ] } ], "source": [ "# Embed heavy-light chain pairs using BALM-paired\n", "# The model will be automatically downloaded on first use\n", "! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain HL --model balm-paired --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_balm_paired.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Protein Language Models\n", "\n", "Then we want to use the same dataset to embed using the general protein language models.\n", "\n", "#### ESM2 (Protein language model)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 11:29:55,935 - INFO - Detected single-cell data format\n", "2025-09-24 11:29:55,935 - INFO - Processing both BCR and TCR sequences from the file.\n", "2025-09-24 11:29:55,936 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 11:29:55,936 - INFO - Removed 102 sequences not matching H chain\n", "tokenizer_config.json: 100%|██████████████████| 95.0/95.0 [00:00<00:00, 157kB/s]\n", "vocab.txt: 100%|█████████████████████████████| 93.0/93.0 [00:00<00:00, 1.33MB/s]\n", "special_tokens_map.json: 100%|██████████████████| 125/125 [00:00<00:00, 448kB/s]\n", "config.json: 100%|█████████████████████████████| 724/724 [00:00<00:00, 2.76MB/s]\n", "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n", "2025-09-24 11:29:58,760 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n", "model.safetensors: 100%|███████████████████| 2.61G/2.61G [01:08<00:00, 38.2MB/s]\n", "2025-09-24 11:31:07,501 - INFO - ESM2 650M model size: 652.36 M\n", "2025-09-24 11:31:07,501 - INFO - Batch 1/95.\n", "2025-09-24 11:31:13,329 - INFO - Batch 2/95.\n", "2025-09-24 11:31:14,066 - INFO - Batch 3/95.\n", "2025-09-24 11:31:14,759 - INFO - Batch 4/95.\n", "2025-09-24 11:31:15,492 - INFO - Batch 5/95.\n", "2025-09-24 11:31:16,153 - INFO - Batch 6/95.\n", "2025-09-24 11:31:16,798 - INFO - Batch 7/95.\n", "2025-09-24 11:31:17,454 - INFO - Batch 8/95.\n", "2025-09-24 11:31:18,110 - INFO - Batch 9/95.\n", "2025-09-24 11:31:18,772 - INFO - Batch 10/95.\n", "2025-09-24 11:31:19,412 - INFO - Batch 11/95.\n", "2025-09-24 11:31:20,058 - INFO - Batch 12/95.\n", "2025-09-24 11:31:20,715 - INFO - Batch 13/95.\n", "2025-09-24 11:31:21,671 - INFO - Batch 14/95.\n", "2025-09-24 11:31:22,346 - INFO - Batch 15/95.\n", "2025-09-24 11:31:23,040 - INFO - Batch 16/95.\n", "2025-09-24 11:31:23,723 - INFO - Batch 17/95.\n", "2025-09-24 11:31:24,406 - INFO - Batch 18/95.\n", "2025-09-24 11:31:25,055 - INFO - Batch 19/95.\n", "2025-09-24 11:31:25,714 - INFO - Batch 20/95.\n", "2025-09-24 11:31:26,358 - INFO - Batch 21/95.\n", "2025-09-24 11:31:27,010 - INFO - Batch 22/95.\n", "2025-09-24 11:31:27,664 - INFO - Batch 23/95.\n", "2025-09-24 11:31:28,306 - INFO - Batch 24/95.\n", "2025-09-24 11:31:28,956 - INFO - Batch 25/95.\n", "2025-09-24 11:31:29,610 - INFO - Batch 26/95.\n", "2025-09-24 11:31:30,291 - INFO - Batch 27/95.\n", "2025-09-24 11:31:30,959 - INFO - Batch 28/95.\n", "2025-09-24 11:31:31,616 - INFO - Batch 29/95.\n", "2025-09-24 11:31:32,260 - INFO - Batch 30/95.\n", "2025-09-24 11:31:32,915 - INFO - Batch 31/95.\n", "2025-09-24 11:31:33,563 - INFO - Batch 32/95.\n", "2025-09-24 11:31:34,215 - INFO - Batch 33/95.\n", "2025-09-24 11:31:34,877 - INFO - Batch 34/95.\n", "2025-09-24 11:31:35,533 - INFO - Batch 35/95.\n", "2025-09-24 11:31:36,186 - INFO - Batch 36/95.\n", "2025-09-24 11:31:36,835 - INFO - Batch 37/95.\n", "2025-09-24 11:31:37,492 - INFO - Batch 38/95.\n", "2025-09-24 11:31:38,145 - INFO - Batch 39/95.\n", "2025-09-24 11:31:38,793 - INFO - Batch 40/95.\n", "2025-09-24 11:31:39,455 - INFO - Batch 41/95.\n", "2025-09-24 11:31:40,097 - INFO - Batch 42/95.\n", "2025-09-24 11:31:40,755 - INFO - Batch 43/95.\n", "2025-09-24 11:31:41,418 - INFO - Batch 44/95.\n", "2025-09-24 11:31:42,113 - INFO - Batch 45/95.\n", "2025-09-24 11:31:42,801 - INFO - Batch 46/95.\n", "2025-09-24 11:31:43,463 - INFO - Batch 47/95.\n", "2025-09-24 11:31:44,124 - INFO - Batch 48/95.\n", "2025-09-24 11:31:44,776 - INFO - Batch 49/95.\n", "2025-09-24 11:31:45,437 - INFO - Batch 50/95.\n", "2025-09-24 11:31:46,097 - INFO - Batch 51/95.\n", "2025-09-24 11:31:46,754 - INFO - Batch 52/95.\n", "2025-09-24 11:31:47,421 - INFO - Batch 53/95.\n", "2025-09-24 11:31:48,078 - INFO - Batch 54/95.\n", "2025-09-24 11:31:48,740 - INFO - Batch 55/95.\n", "2025-09-24 11:31:49,402 - INFO - Batch 56/95.\n", "2025-09-24 11:31:50,067 - INFO - Batch 57/95.\n", "2025-09-24 11:31:50,727 - INFO - Batch 58/95.\n", "2025-09-24 11:31:51,390 - INFO - Batch 59/95.\n", "2025-09-24 11:31:52,045 - INFO - Batch 60/95.\n", "2025-09-24 11:31:52,711 - INFO - Batch 61/95.\n", "2025-09-24 11:31:53,381 - INFO - Batch 62/95.\n", "2025-09-24 11:31:54,035 - INFO - Batch 63/95.\n", "2025-09-24 11:31:54,692 - INFO - Batch 64/95.\n", "2025-09-24 11:31:55,358 - INFO - Batch 65/95.\n", "2025-09-24 11:31:56,032 - INFO - Batch 66/95.\n", "2025-09-24 11:31:56,698 - INFO - Batch 67/95.\n", "2025-09-24 11:31:57,364 - INFO - Batch 68/95.\n", "2025-09-24 11:31:58,028 - INFO - Batch 69/95.\n", "2025-09-24 11:31:58,678 - INFO - Batch 70/95.\n", "2025-09-24 11:31:59,360 - INFO - Batch 71/95.\n", "2025-09-24 11:32:00,035 - INFO - Batch 72/95.\n", "2025-09-24 11:32:00,710 - INFO - Batch 73/95.\n", "2025-09-24 11:32:01,464 - INFO - Batch 74/95.\n", "2025-09-24 11:32:02,132 - INFO - Batch 75/95.\n", "2025-09-24 11:32:02,799 - INFO - Batch 76/95.\n", "2025-09-24 11:32:03,452 - INFO - Batch 77/95.\n", "2025-09-24 11:32:04,112 - INFO - Batch 78/95.\n", "2025-09-24 11:32:04,775 - INFO - Batch 79/95.\n", "2025-09-24 11:32:05,430 - INFO - Batch 80/95.\n", "2025-09-24 11:32:06,089 - INFO - Batch 81/95.\n", "2025-09-24 11:32:06,751 - INFO - Batch 82/95.\n", "2025-09-24 11:32:07,423 - INFO - Batch 83/95.\n", "2025-09-24 11:32:08,078 - INFO - Batch 84/95.\n", "2025-09-24 11:32:08,731 - INFO - Batch 85/95.\n", "2025-09-24 11:32:09,408 - INFO - Batch 86/95.\n", "2025-09-24 11:32:10,059 - INFO - Batch 87/95.\n", "2025-09-24 11:32:10,719 - INFO - Batch 88/95.\n", "2025-09-24 11:32:11,371 - INFO - Batch 89/95.\n", "2025-09-24 11:32:12,039 - INFO - Batch 90/95.\n", "2025-09-24 11:32:12,692 - INFO - Batch 91/95.\n", "2025-09-24 11:32:13,349 - INFO - Batch 92/95.\n", "2025-09-24 11:32:14,010 - INFO - Batch 93/95.\n", "2025-09-24 11:32:14,681 - INFO - Batch 94/95.\n", "2025-09-24 11:32:15,338 - INFO - Batch 95/95.\n", "2025-09-24 11:32:15,998 - INFO - Took 68.5 seconds\n", "2025-09-24 11:32:16,012 - INFO - Generated embeddings with dimensions torch.Size([95, 1280])\n", "2025-09-24 11:32:16,013 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 11:32:16,015 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 11:32:16,033 - INFO - Saved embedding at tutorial/AIRR_subject1_FNA_d0_1_Y1_esm2.pt\n" ] } ], "source": [ "# Embed heavy chains only using ESM2\n", "! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model esm2 --batch-size 1 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_esm2.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Custom/Fine-tuned models\n", "\n", "You can use custom or fine-tuned models from HuggingFace or local paths using the `custom` model type:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 12:30:43,593 - INFO - Detected single-cell data format\n", "2025-09-24 12:30:43,595 - INFO - Processing both BCR and TCR sequences from the file.\n", "2025-09-24 12:30:43,596 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 12:30:43,597 - INFO - Removed 102 sequences not matching H chain\n", "Some weights of EsmForMaskedLM were not initialized from the model checkpoint at AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-localization and are newly initialized: ['lm_head.bias', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n", "2025-09-24 12:30:46,532 - INFO - Model size: 7.84M\n", "Batch 1/48\n", "\n", "Batch 2/48\n", "\n", "Batch 3/48\n", "\n", "Batch 4/48\n", "\n", "Batch 5/48\n", "\n", "Batch 6/48\n", "\n", "Batch 7/48\n", "\n", "Batch 8/48\n", "\n", "Batch 9/48\n", "\n", "Batch 10/48\n", "\n", "Batch 11/48\n", "\n", "Batch 12/48\n", "\n", "Batch 13/48\n", "\n", "Batch 14/48\n", "\n", "Batch 15/48\n", "\n", "Batch 16/48\n", "\n", "Batch 17/48\n", "\n", "Batch 18/48\n", "\n", "Batch 19/48\n", "\n", "Batch 20/48\n", "\n", "Batch 21/48\n", "\n", "Batch 22/48\n", "\n", "Batch 23/48\n", "\n", "Batch 24/48\n", "\n", "Batch 25/48\n", "\n", "Batch 26/48\n", "\n", "Batch 27/48\n", "\n", "Batch 28/48\n", "\n", "Batch 29/48\n", "\n", "Batch 30/48\n", "\n", "Batch 31/48\n", "\n", "Batch 32/48\n", "\n", "Batch 33/48\n", "\n", "Batch 34/48\n", "\n", "Batch 35/48\n", "\n", "Batch 36/48\n", "\n", "Batch 37/48\n", "\n", "Batch 38/48\n", "\n", "Batch 39/48\n", "\n", "Batch 40/48\n", "\n", "Batch 41/48\n", "\n", "Batch 42/48\n", "\n", "Batch 43/48\n", "\n", "Batch 44/48\n", "\n", "Batch 45/48\n", "\n", "Batch 46/48\n", "\n", "Batch 47/48\n", "\n", "Batch 48/48\n", "\n", "2025-09-24 12:30:51,159 - INFO - Took 4.63 seconds\n", "2025-09-24 12:30:51,159 - INFO - Generated embeddings with dimensions torch.Size([95, 320])\n", "2025-09-24 12:30:51,160 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 12:30:51,161 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 12:30:51,165 - INFO - Saved embedding at tutorial/custom_model_embeddings.pt\n" ] } ], "source": [ "# Example: Using a fine-tuned ESM2 model from HuggingFace\n", "! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model custom \\\n", " --model-path \"AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-localization\" \\\n", " --embedding-dimension 320 \\\n", " --max-length 512 \\\n", " --batch-size 2 \\\n", " --output-file-path tutorial/custom_model_embeddings.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TCR embedding examples\n", "\n", "AMULETY also supports TCR-specific models. Here we also provide TCR example data and you can download and have a try: " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2025-09-24 11:35:16-- https://zenodo.org/records/17186858/files/AIRR_tcr_sample.tsv\n", "Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.48.194, 188.185.43.25, ...\n", "Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 40915 (40K) [application/octet-stream]\n", "Saving to: 'tutorial/AIRR_tcr_sample.tsv'\n", "\n", "AIRR_tcr_sample.tsv 100%[===================>] 39.96K 166KB/s in 0.2s \n", "\n", "2025-09-24 11:35:17 (166 KB/s) - 'tutorial/AIRR_tcr_sample.tsv' saved [40915/40915]\n", "\n" ] } ], "source": [ "# Download TCR example data\n", "! wget -P tutorial https://zenodo.org/records/17186858/files/AIRR_tcr_sample.tsv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### TCR-BERT (TCR-specific model)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 12:31:02,594 - INFO - Detected single-cell data format\n", "2025-09-24 12:31:02,595 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 12:31:02,599 - INFO - Dropping 100 cells with missing heavy or light chain...\n", "2025-09-24 12:31:02,600 - INFO - Loading TCR-BERT model for TCR embedding...\n", "2025-09-24 12:31:04,618 - INFO - Successfully loaded TCR-BERT model\n", "2025-09-24 12:31:04,619 - INFO - TCR-BERT model loaded. Size: 57.39 M\n", "2025-09-24 12:31:04,619 - INFO - TCR-BERT Batch 1/25.\n", "2025-09-24 12:31:04,663 - INFO - TCR-BERT Batch 2/25.\n", "2025-09-24 12:31:04,689 - INFO - TCR-BERT Batch 3/25.\n", "2025-09-24 12:31:04,712 - INFO - TCR-BERT Batch 4/25.\n", "2025-09-24 12:31:04,735 - INFO - TCR-BERT Batch 5/25.\n", "2025-09-24 12:31:04,756 - INFO - TCR-BERT Batch 6/25.\n", "2025-09-24 12:31:04,780 - INFO - TCR-BERT Batch 7/25.\n", "2025-09-24 12:31:04,802 - INFO - TCR-BERT Batch 8/25.\n", "2025-09-24 12:31:04,827 - INFO - TCR-BERT Batch 9/25.\n", "2025-09-24 12:31:04,849 - INFO - TCR-BERT Batch 10/25.\n", "2025-09-24 12:31:04,872 - INFO - TCR-BERT Batch 11/25.\n", "2025-09-24 12:31:04,895 - INFO - TCR-BERT Batch 12/25.\n", "2025-09-24 12:31:04,917 - INFO - TCR-BERT Batch 13/25.\n", "2025-09-24 12:31:04,940 - INFO - TCR-BERT Batch 14/25.\n", "2025-09-24 12:31:04,961 - INFO - TCR-BERT Batch 15/25.\n", "2025-09-24 12:31:04,984 - INFO - TCR-BERT Batch 16/25.\n", "2025-09-24 12:31:05,006 - INFO - TCR-BERT Batch 17/25.\n", "2025-09-24 12:31:05,028 - INFO - TCR-BERT Batch 18/25.\n", "2025-09-24 12:31:05,052 - INFO - TCR-BERT Batch 19/25.\n", "2025-09-24 12:31:05,074 - INFO - TCR-BERT Batch 20/25.\n", "2025-09-24 12:31:05,097 - INFO - TCR-BERT Batch 21/25.\n", "2025-09-24 12:31:05,119 - INFO - TCR-BERT Batch 22/25.\n", "2025-09-24 12:31:05,143 - INFO - TCR-BERT Batch 23/25.\n", "2025-09-24 12:31:05,166 - INFO - TCR-BERT Batch 24/25.\n", "2025-09-24 12:31:05,188 - INFO - TCR-BERT Batch 25/25.\n", "2025-09-24 12:31:05,211 - INFO - TCR-BERT embedding took 0.59 seconds\n", "2025-09-24 12:31:05,212 - INFO - Generated embeddings with dimensions torch.Size([50, 768])\n", "2025-09-24 12:31:05,212 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 12:31:05,213 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 12:31:05,215 - INFO - Saved embedding at tutorial/tcr_embeddings_tcrbert.pt\n" ] } ], "source": [ "# Embed TCR beta-alpha chain pairs using TCR-BERT\n", "# Note: This assumes you have TCR data in AIRR format\n", "! amulety embed --input-airr tutorial/AIRR_tcr_sample.tsv --chain HL --model tcr-bert --batch-size 2 --output-file-path tutorial/tcr_embeddings_tcrbert.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### TCRT5 (TCR beta chain only)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "2025-09-24 12:31:11,221 - INFO - Detected single-cell data format\n", "2025-09-24 12:31:11,221 - INFO - Single-cell AIRR data detected (all entries have cell_id).\n", "2025-09-24 12:31:11,222 - INFO - Removed 100 sequences not matching H chain\n", "2025-09-24 12:31:11,222 - INFO - Loading TCRT5 model for TCR embedding...\n", "tokenizer_config.json: 21.1kB [00:00, 23.3MB/s]\n", "spiece.model: 100%|██████████████████████████| 238k/238k [00:00<00:00, 2.78MB/s]\n", "added_tokens.json: 2.35kB [00:00, 16.2MB/s]\n", "special_tokens_map.json: 2.64kB [00:00, 12.0MB/s]\n", "The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. \n", "The tokenizer class you load from this checkpoint is 'TCRT5Tokenizer'. \n", "The class this function is called from is 'T5Tokenizer'.\n", "config.json: 100%|█████████████████████████████| 970/970 [00:00<00:00, 8.67MB/s]\n", "model.safetensors: 100%|█████████████████████| 168M/168M [00:03<00:00, 46.0MB/s]\n", "/opt/anaconda3/envs/torchen/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:817: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_attentions` is. When `return_dict_in_generate` is not `True`, `output_attentions` is ignored.\n", " warnings.warn(\n", "/opt/anaconda3/envs/torchen/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:817: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_hidden_states` is. When `return_dict_in_generate` is not `True`, `output_hidden_states` is ignored.\n", " warnings.warn(\n", "/opt/anaconda3/envs/torchen/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:817: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_scores` is. When `return_dict_in_generate` is not `True`, `output_scores` is ignored.\n", " warnings.warn(\n", "generation_config.json: 100%|██████████████████| 249/249 [00:00<00:00, 3.32MB/s]\n", "2025-09-24 12:31:17,681 - INFO - TCRT5 Batch 1/50.\n", "2025-09-24 12:31:17,707 - INFO - TCRT5 Batch 2/50.\n", "2025-09-24 12:31:17,719 - INFO - TCRT5 Batch 3/50.\n", "2025-09-24 12:31:17,730 - INFO - TCRT5 Batch 4/50.\n", "2025-09-24 12:31:17,740 - INFO - TCRT5 Batch 5/50.\n", "2025-09-24 12:31:17,753 - INFO - TCRT5 Batch 6/50.\n", "2025-09-24 12:31:17,763 - INFO - TCRT5 Batch 7/50.\n", "2025-09-24 12:31:17,773 - INFO - TCRT5 Batch 8/50.\n", "2025-09-24 12:31:17,784 - INFO - TCRT5 Batch 9/50.\n", "2025-09-24 12:31:17,795 - INFO - TCRT5 Batch 10/50.\n", "2025-09-24 12:31:17,806 - INFO - TCRT5 Batch 11/50.\n", "2025-09-24 12:31:17,817 - INFO - TCRT5 Batch 12/50.\n", "2025-09-24 12:31:17,827 - INFO - TCRT5 Batch 13/50.\n", "2025-09-24 12:31:17,837 - INFO - TCRT5 Batch 14/50.\n", "2025-09-24 12:31:17,848 - INFO - TCRT5 Batch 15/50.\n", "2025-09-24 12:31:17,860 - INFO - TCRT5 Batch 16/50.\n", "2025-09-24 12:31:17,871 - INFO - TCRT5 Batch 17/50.\n", "2025-09-24 12:31:17,882 - INFO - TCRT5 Batch 18/50.\n", "2025-09-24 12:31:17,893 - INFO - TCRT5 Batch 19/50.\n", "2025-09-24 12:31:17,904 - INFO - TCRT5 Batch 20/50.\n", "2025-09-24 12:31:17,914 - INFO - TCRT5 Batch 21/50.\n", "2025-09-24 12:31:17,924 - INFO - TCRT5 Batch 22/50.\n", "2025-09-24 12:31:17,934 - INFO - TCRT5 Batch 23/50.\n", "2025-09-24 12:31:17,944 - INFO - TCRT5 Batch 24/50.\n", "2025-09-24 12:31:17,955 - INFO - TCRT5 Batch 25/50.\n", "2025-09-24 12:31:17,966 - INFO - TCRT5 Batch 26/50.\n", "2025-09-24 12:31:17,976 - INFO - TCRT5 Batch 27/50.\n", "2025-09-24 12:31:17,987 - INFO - TCRT5 Batch 28/50.\n", "2025-09-24 12:31:17,997 - INFO - TCRT5 Batch 29/50.\n", "2025-09-24 12:31:18,008 - INFO - TCRT5 Batch 30/50.\n", "2025-09-24 12:31:18,018 - INFO - TCRT5 Batch 31/50.\n", "2025-09-24 12:31:18,028 - INFO - TCRT5 Batch 32/50.\n", "2025-09-24 12:31:18,038 - INFO - TCRT5 Batch 33/50.\n", "2025-09-24 12:31:18,049 - INFO - TCRT5 Batch 34/50.\n", "2025-09-24 12:31:18,060 - INFO - TCRT5 Batch 35/50.\n", "2025-09-24 12:31:18,070 - INFO - TCRT5 Batch 36/50.\n", "2025-09-24 12:31:18,080 - INFO - TCRT5 Batch 37/50.\n", "2025-09-24 12:31:18,090 - INFO - TCRT5 Batch 38/50.\n", "2025-09-24 12:31:18,100 - INFO - TCRT5 Batch 39/50.\n", "2025-09-24 12:31:18,111 - INFO - TCRT5 Batch 40/50.\n", "2025-09-24 12:31:18,121 - INFO - TCRT5 Batch 41/50.\n", "2025-09-24 12:31:18,131 - INFO - TCRT5 Batch 42/50.\n", "2025-09-24 12:31:18,141 - INFO - TCRT5 Batch 43/50.\n", "2025-09-24 12:31:18,150 - INFO - TCRT5 Batch 44/50.\n", "2025-09-24 12:31:18,160 - INFO - TCRT5 Batch 45/50.\n", "2025-09-24 12:31:18,171 - INFO - TCRT5 Batch 46/50.\n", "2025-09-24 12:31:18,181 - INFO - TCRT5 Batch 47/50.\n", "2025-09-24 12:31:18,191 - INFO - TCRT5 Batch 48/50.\n", "2025-09-24 12:31:18,201 - INFO - TCRT5 Batch 49/50.\n", "2025-09-24 12:31:18,212 - INFO - TCRT5 Batch 50/50.\n", "2025-09-24 12:31:18,224 - INFO - TCRT5 embedding took 7.0 seconds\n", "2025-09-24 12:31:18,225 - INFO - Generated embeddings with dimensions torch.Size([100, 256])\n", "2025-09-24 12:31:18,226 - INFO - Saving embedding as a pickled torch object.\n", "2025-09-24 12:31:18,226 - INFO - Saving sequence filtered metadata as TSV file.\n", "2025-09-24 12:31:18,228 - INFO - Saved embedding at tutorial/tcr_embeddings_tcrt5.pt\n" ] } ], "source": [ "# Embed TCR beta chains using TCRT5 (only supports H/beta chains)\n", "! amulety embed --input-airr tutorial/AIRR_tcr_sample.tsv --chain H --model tcrt5 --batch-size 2 --output-file-path tutorial/tcr_embeddings_tcrt5.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking dependencies\n", "\n", "Some models require additional dependencies that are not installed by default. You can check which dependencies are missing:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " █████ ███ ███ ██ ██ ██ ███████ ████████ ██ ██\n", "██ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██\n", "███████ ██ ████ ██ ██ ██ ██ █████ ██ ████\n", "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██\n", "██ ██ ██ ██ ██████ ███████ ███████ ██ ██\n", "\n", "AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and \n", "antibodY\n", " version \u001b[1;36m2.0\u001b[0m\n", "\n", "Checking AMULETY dependencies...\n", "\n", "IgBlast (for translate-igblast command):\n", " IgBlast (igblastn) is available\n", "\n", "Embedding model dependencies:\n", "2025-09-24 12:51:20,234 - INFO - Available models: AntiBERTy, AbLang, TCR-BERT, TCRT5, ESM2, ProtT5\n", "2025-09-24 12:51:20,234 - WARNING - Missing model dependencies: Immune2Vec\n", " 1 dependencies are missing.\n", " AMULETY will raise ImportError with installation instructions when these models are used.\n", "\n", " To install missing dependencies:\n", " • Immune2Vec: git clone https://bitbucket.org/yaarilab/immune2vec_model.git && add to Python path\n", "\n", " Note: Models will provide detailed installation instructions when used.\n", "\u001b[0m" ] } ], "source": [ "# Check which optional dependencies are missing\n", "! amulety check-deps" ] } ], "metadata": { "kernelspec": { "display_name": "torchen", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }