Skip to content

Dia model cannot load/ generation error #43016

@jiqing-feng

Description

@jiqing-feng

System Info

transformers main branch

Who can help?

@zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import set_seed, AutoProcessor, DiaForConditionalGeneration

set_seed(42)

torch_device = "cuda"
model_checkpoint = "nari-labs/Dia-1.6B-0626"

#text = ["[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."]
text = ["[S1] Hello, my dog is cooler than you!"]
processor = AutoProcessor.from_pretrained(model_checkpoint)
inputs = processor(text=text, padding=True, return_tensors="pt").to(torch_device)

model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
outputs = model.generate(**inputs, max_new_tokens=3072, guidance_scale=3.0, temperature=1.8, top_p=0.90, top_k=45)

print(outputs)
outputs = processor.batch_decode(outputs)
processor.save_audio(outputs, "dia_example.mp3")

I ran the previous script on A100.
The output audio is noise for the input2: ["[S1] Hello, my dog is cooler than you!"]. For input1:["[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."], the output audio is correct. I want to know if it's a bug from the model. Thanks!

BTW, if we load the most common dia model nari-labs/Dia-1.6B:

from transformers import pipeline

pipe = pipeline("text-to-audio", model="nari-labs/Dia-1.6B")

The error will be:

>>> pipe = pipeline("text-to-audio", model="nari-labs/Dia-1.6B")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jiqing/transformers/src/transformers/pipelines/__init__.py", line 781, in pipeline
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/models/auto/configuration_auto.py", line 1352, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in nari-labs/Dia-1.6B. Should have a `model_type` key in its config.json, or contain one of the following
 strings in its name: afmoe, aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-trans
former, audioflamingo3, audioflamingo3_encoder, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbir
d_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, ca
nine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, cod
egen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, cwm,
...

Expected behavior

Hi @zucchini-nlp . Could you please take a look at this issue? Maybe at the Dia maintainer to look it together. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions