Hi,
I happen to have made a tool to manage word captions and tagging of datasets for training purposes.
I am not doing a lot of training myself (especially right now), and it is probably kind of a craftsman's level tool rather than an industrial one, but please, give it a try.
It works with python, and you can find the sources there :
https://github.com/Ravath/WordMaster
I may not be able to give intensive support right now, but I will still appreciate feedback anyway.