Foreword
While preparing a Python script that can modify the metadata in a .safetensor file I got stuck with an error message for a while. Reading, modifying and writing of a .safetensor worked fine up from the early beginning. The error occurred afterwards when using the newly written .safetensor file in the AI web UI.
Error Message while Runtime
When the AI web UI tried to load the new .safetensor file I got the following error message in the terminal window:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
At first glance, the error message seems clear, but not at second glance. It took a short while to figure out why this error message happened.
Troubleshooting
At first glance, the old and new files looked correct. At least that is what it looked like in the hex editor. A hex editor is used in such cases because the .safetensor files are binary files. Since I had to disassemble and reassemble the .safetensor file, it was clear that the error should be somewhere in the metadata, possibly before the metadata or possibly after the metadata.
The Meaning of the Error Message
After I could not see any obvious error in the new .safetensor file, I consulted the documentation [1].
The meaning of the error message
InvalidHeaderDeserialization
is following the entry in the reference [1]
The header does contain a valid string, but it is not valid JSON.
Now the error, which occured, was more clear. Writing of the new assembled JSON data should be the problem.
Explanation of the Error Message
Now I go more in detail. I took a closer look at the old and new file .saftensor and found the difference.
The metadata section is organised as JSON data. The error was in my case produced by reading following dictionary entry:
{'1': {'tag_keyword': 170}}
Writing the new metadata JSON data the single quotes were somehow mangled. And that was producing the error message.
In the original file using the hex editor it looked like:
{"ss_tag_frequency":"{\"1\": {\"tag_keword\": 170}}"}
In the new file using a hex editor it looked like:
{"ss_tag_frequency":"{"1": {"tag_keword": 170}}"}
Small difference but big problem. Obviously the backslash is missing in front of the double quotes.
The solution was, that single quotes have to be converted to double like in the following code block:
from
1. {'1': {'tag_keyword': 163}}
to
2. {"1": {"tag_keyword": 163}}
Doing this and using the later double quoted dictionary entry no error will occur.
Technical Background
The metadata I am talking about are part of the header of the .safetensor file. The .safetensor file can contain a block with metadata. According to the documentation, metadata is optional. Then there are also pointer section to the tensor position in the binary.
Final Words
I need some more investigation on this topic. It is for me a kind of open issue. The process that leads to the error is not yet completely clear to me. It is also not quite clear to me why the JSON format looks the way it does in the correc2t file [22].
Finally
Have a nice day! Have fun! Be inspired!
Abbreviations
JSON → JavaScript Object Notation
References
[1] https://docs.rs/safetensors/latest/safetensors/tensor/enum.SafeTensorError.html
[2] https://github.com/huggingface/safetensors/blob/main/README.md
[3] https://github.com/zentrocdot/artificial-intelligence-tools/tree/main/python