Studio Ghibli style lora. I did not train it with audio. I used Seruva's dataset. I had to bucket the video by runtime length, but still worked out pretty decently. It is rank 16, I don't know if rank 32 would be better but this is still pretty decent. All examples are generated using the LTX2 distilled model.
