Dia model cannot load/ generation error

### System Info

transformers main branch

### Who can help?

@zucchini-nlp 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```python
from transformers import set_seed, AutoProcessor, DiaForConditionalGeneration

set_seed(42)

torch_device = "cuda"
model_checkpoint = "nari-labs/Dia-1.6B-0626"

#text = ["[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."]
text = ["[S1] Hello, my dog is cooler than you!"]
processor = AutoProcessor.from_pretrained(model_checkpoint)
inputs = processor(text=text, padding=True, return_tensors="pt").to(torch_device)

model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
outputs = model.generate(**inputs, max_new_tokens=3072, guidance_scale=3.0, temperature=1.8, top_p=0.90, top_k=45)

print(outputs)
outputs = processor.batch_decode(outputs)
processor.save_audio(outputs, "dia_example.mp3")
```
I ran the previous script on A100.
The output audio is noise for the input2: ` ["[S1] Hello, my dog is cooler than you!"]`. For input1:`["[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."]`, the output audio is correct. I want to know if it's a bug from the model. Thanks!

BTW, if we load the most common dia model [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B):
```python
from transformers import pipeline

pipe = pipeline("text-to-audio", model="nari-labs/Dia-1.6B")
```
The error will be:
```
>>> pipe = pipeline("text-to-audio", model="nari-labs/Dia-1.6B")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jiqing/transformers/src/transformers/pipelines/__init__.py", line 781, in pipeline
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/models/auto/configuration_auto.py", line 1352, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in nari-labs/Dia-1.6B. Should have a `model_type` key in its config.json, or contain one of the following
 strings in its name: afmoe, aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-trans
former, audioflamingo3, audioflamingo3_encoder, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbir
d_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, ca
nine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, cod
egen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, cwm,
...
```

### Expected behavior

Hi @zucchini-nlp . Could you please take a look at this issue? Maybe at the Dia maintainer to look it together. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dia model cannot load/ generation error #43016

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dia model cannot load/ generation error #43016

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions