Research
HD-Prot: A Protein Language Model for Joint Sequence-Structure Modeling with Continuous Structure Tokens
The paper introduces HD-Prot, a hybrid diffusion protein language model that integrates continuous structural tokens with discrete sequence tokens for joint sequence-structure modeling. By employing a continuous-valued diffusion head on top of a discrete pLM, HD-Prot captures inter-token dependencies and achieves competitive performance in tasks such as protein structure prediction and motif-scaffolding, despite being developed with significantly lower computational resources. This approach demonstrates the feasibility of combining categorical and continuous distributions within a single model architecture, presenting a new avenue for enhancing multimodal protein language models.
protein-language-modelsequence-structurediffusion