-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Open
Labels
Description
System Info
transformers main branch
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import set_seed, AutoProcessor, DiaForConditionalGeneration
set_seed(42)
torch_device = "cuda"
model_checkpoint = "nari-labs/Dia-1.6B-0626"
#text = ["[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."]
text = ["[S1] Hello, my dog is cooler than you!"]
processor = AutoProcessor.from_pretrained(model_checkpoint)
inputs = processor(text=text, padding=True, return_tensors="pt").to(torch_device)
model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
outputs = model.generate(**inputs, max_new_tokens=3072, guidance_scale=3.0, temperature=1.8, top_p=0.90, top_k=45)
print(outputs)
outputs = processor.batch_decode(outputs)
processor.save_audio(outputs, "dia_example.mp3")I ran the previous script on A100.
The output audio is noise for the input2: ["[S1] Hello, my dog is cooler than you!"]. For input1:["[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."], the output audio is correct. I want to know if it's a bug from the model. Thanks!
BTW, if we load the most common dia model nari-labs/Dia-1.6B:
from transformers import pipeline
pipe = pipeline("text-to-audio", model="nari-labs/Dia-1.6B")The error will be:
>>> pipe = pipeline("text-to-audio", model="nari-labs/Dia-1.6B")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jiqing/transformers/src/transformers/pipelines/__init__.py", line 781, in pipeline
config = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiqing/transformers/src/transformers/models/auto/configuration_auto.py", line 1352, in from_pretrained
raise ValueError(
ValueError: Unrecognized model in nari-labs/Dia-1.6B. Should have a `model_type` key in its config.json, or contain one of the following
strings in its name: afmoe, aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-trans
former, audioflamingo3, audioflamingo3_encoder, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbir
d_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, ca
nine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, cod
egen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, cwm,
...
Expected behavior
Hi @zucchini-nlp . Could you please take a look at this issue? Maybe at the Dia maintainer to look it together. Thanks!