amulety.tcr_embeddings
TCR embedding functions using various models.
Functions
Check if optional TCR embedding dependencies are installed and provide installation instructions. |
|
|
Embeds T-Cell Receptor (TCR) sequences using the TCR-BERT model. |
|
Embeds T-Cell Receptor (TCR) sequences using the TCRT5 model. |
- check_tcr_dependencies()[source]
Check if optional TCR embedding dependencies are installed and provide installation instructions.
- tcr_bert(sequences, cache_dir: str | None = None, batch_size: int = 32, residue_level: bool = False)[source]
Embeds T-Cell Receptor (TCR) sequences using the TCR-BERT model.
- Parameters:
sequences – Input TCR sequences (pd.Series for single chain or pd.DataFrame for H+L mode)
cache_dir – Directory to cache model files
batch_size – Number of sequences to process in each batch
Note:
Pretrained on 88,403 human TRA/TRB sequences from VDJdb and PIRD. Non-fine-tuned version focused on human TCR data only. The maximum length of the sequences to be embedded is 64.
- tcrt5(sequences, cache_dir: str | None = None, batch_size: int = 32, residue_level: bool = False)[source]
Embeds T-Cell Receptor (TCR) sequences using the TCRT5 model.
- Parameters:
sequences – Input TCR sequences (pd.Series for single chain or pd.DataFrame for H+L mode)
cache_dir – Directory to cache model files
batch_size – Number of sequences to process in each batch
Note:
TCRT5 was pre-trained on masked span reconstruction using ~14M CDR3 β sequences from TCRdb and ~780k peptide-pseudosequence pairs from IEDB. This model only supports beta chains (H chains for TCR). Maximum sequence length: 20 amino acids. Embedding dimension: 256.
Reference: https://huggingface.co/dkarthikeyan1/tcrt5_pre_tcrdb