🖥️Welcome to try out the open-source GPT4V-Image-Captioner, developed by my friend and me. It offers a one-click installation and comes integrated with multiple features including image pre-compression, image tagging, and tag statistics. Recently, we also launched the webui plugin version of this tool, everyone is welcome to use it!
📖2024.2.22 Introducing "HW5.0_Euler_a_Lightning"
This model is a run-accelerated version of the HelloWorld SDXL base model, incorporating both SDXL-Lightning technologies. Equipped with the Eular a sampler and CFG 1, it is capable of generating images in 6-8 steps, which is three times faster than the original SDXL version. Moreover, upon comparison, its imaging results are superior to those of LCM or Turbo versions.
The recommended parameters for generating images with this model are:
Sampler: Eular a (Important! The model is specifically adapted to Eular a, other samplers may not yield as good results)
CFG scale: 1
Sampling steps: 8 steps (6~8 steps are acceptable)
Hires algorithm: ESRGAN 4x / 8x_NMKD-Faces_160000_G
Hires Upscale factor: 1.5x
Hires steps: 8 steps
Hires Denoising strength: 0.3
📖2024.2.11 Introducing "HelloWorld 5.0 GPT4V"
HelloWorld 5.0 is the most substantial update in the history of the HelloWorld series, tagged with GPT-4v, and has undergone significant fine-tuning in fields such as science fiction, animals, architecture, and illustration.
Comparative tests show improvements in this version include:
1. More varied and dynamic character poses and image compositions, creating visually engaging pictures;
2. The film dataset has been extensively trained. While the film texture was weak from versions 2.0 to 4.0, many fans missed the leogirl style of version 1.0. Therefore, this update has specifically strengthened the film texture without compromising other photographic qualities. The film texture can be triggered by phrases such as film grain texture and analog photography aesthetic;
3. Enhanced expressiveness in themes like science fiction, thriller, and animals, with mechas and other subjects having a more designed feel. Animals like snow leopard, red panda, giant panda, tiger, the Pallas's cat, and domestic cats and dogs are more lifelike;
4. Thanks to GPT tagging, prompt adherence and conceptual accuracy have been further improved.
However, the drawbacks of this version include:
1. As this is a substantial fine-tuning update, the error rate for limbs and such may slightly increase, a normal phenomenon when moving out of a comfort zone into new areas of relative optimization. Previous versions underwent extensive limb testing for improvements, while the new version had limited time for such enhancements. Nevertheless, the accuracy of limbs in this version is at least higher than in version 1.0, and I will continue to make improvements in future updates.
2. Due to the reinforced film texture, even though GPT tagging is as accurate as possible, there can be an unavoidable default warm tone in images. However, you can use prompts like studio light or sharp focus to produce high-definition studio-quality images, and with proper use of prompts, the output can have better skin tones and visual appeal than previous versions.
3. This version includes more full-body character images to enhance the full-body effect, so the model may produce wider scenes than before if no specific character composition is directed. Currently, the facial details in 1024 resolution full-body shots might be less sharp compared to half-body or close-up shots. However, this can be improved by adetailer and a 1.5x Hires. fix at 0.3 intensity, or by using prompts like specifying composition to avoid generating full-body images.
4. Since a small number of high-quality illustration datasets have been added, there is a chance that prompts related to animated styles will produce animated images. If this concerns you, please adjust your prompts accordingly.
These are the main updates for this version. Training the SDXL base model is challenging, and when the training set approaches ten thousand images, the cost for tagging and training for each model exceeds 300 USD. I welcome everyone to use the model and appreciate any feedback you can provide! If you find this model satisfactory, I would be immensely grateful if you could help spread the word about it.
📖2024.1.31 Introducing "HelloWorld 4.0"
HelloWorld4.0 is a progressive transitional version from tagging with blip+clip to tagging with GPT4V. I initially trained a pure GPT4V tagging model, and then merged it with a large proportion of the HelloWorld3.2 version and 0.05 proportion of Juggernaut XL (to adjust the skin tone). The new version has shown improvements in prompt compliance and concept coverage compared to the 3.2 version.
The new GPT4V tagging training set has doubled from the 4000 images of the helloworld3 series to 8000 images, covering not only portraits but also animals, architecture, nature, food, illustrations, and more. However, the pure GPT4V version encountered an overfitting problem, which is preliminarily attributed to the doubling of the number of training images. One of the next steps in iterative optimization is to find out how to include as many non-portrait concepts as possible while ensuring sufficient training of portraits. At this stage, a fusion of the new and old versions has been used for fine-tuning to ensure a smooth transition between versions, so the expanded concept set and the advantages brought by GPT4V tagging are not very perceptible at the moment. These advantages will become increasingly apparent in the subsequent generations 5 and 6 of the model.
📖2024.1.5 Introducing "HelloWorld 3.2"
Version 3.2 is an iteration optimized with DPO technology, and compared to version 3.0, there are optimizations in skin tone and limb accuracy, but the improvements are not significant. That's why this version is marked as 3.2 rather than being labeled as 4.0.
📖2023.12.15 Introducing "HelloWorld 3.0"
The new version has expanded the training set, enhancing the model's ability to express in different artistic styles, including science fiction and art.
It has integrated a self-made quality enhancement LoCon (created using slider technology), to improve image texture and alleviate issues of distortion in fingers and limbs.
📖2023.11.17 Introducing "HelloWorld 2.0"
Thank you all for your patience. After overcoming various challenges, the HelloWorld 2.0 version is finally ready to be presented to you all in a state that I'm satisfied with. The main differences between HelloWorld 2.0 and 1.0 are as follows:
HelloWorld 2.0 no longer requires trigger words, and the results are comparable in quality to version 1.0 with trigger words.. The trigger word 'leogirl' in 1.0 was highly associated with East Asians. After the cancellation of the trigger words, while words like '1girl' will still likely generate East Asian portraits when race is not specified, you can now specify the race by using keywords like nationality, skin color, etc. For example, the trigger effects for words like 'Chinese', 'Russian', 'Iranian', 'Jamaican', 'Kenyan', 'dark-skinned', 'pale-skinned', etc., are listed below.
You can also get different styles of characters by writing the names of people from different countries and genders in the prompt, such as Han Meimei (China), Sophie Martin (France), Priya Patel (India), Fatima Al-Hassan (Arab), Wanjiru Mwangi (Kenya). The above prompts are just examples, there are many available prompts and ways to play, and you're welcome to explore and share them by yourself.
HelloWorld 2.0 has balanced the quality/color and offers more style options. The 1.0 version, when used with 'leogirl', would likely produce images with a strong film texture. HelloWorld 2.0 is no longer tied to a film texture and can be customized with some quality-related prompts. Some prompts that have been tested and work well include:
high-end fashion photoshoot, product introduction photo, popular Korean makeup, aegyo sal, Sharp High-Quality Photo, studio light, medium format photo, Mamiya photography, analog film, Medium Portrait with Soft Light, real-life image, refined editorial photograph, raw photo, real photo, Scanned Photo, film still
The color effects of these prompts are as follows:
The training set for HelloWorld 2.0 significantly increased the proportion of full-body photos to improve the effects of SDXL in generating full-body and distant view portraits. Although it has improved compared to version 1.0, it is still strongly recommended to use 'adetailer' in the process of generating full-body photos. Also, for users with enough video memory (24g), it is recommended to perform 1.5x high-resolution repair on the image, which can significantly improve facial details.
📖2023.8.29 Introducing "HelloWorld" SDXL Base Model
Special reminder: When using the HelloWorld 1.0 model, please remember to add the trigger word "leogirl".
Distinct from SD1.5 base model “MoonFilm”, “HelloWorld” is a brand new realistic SDXL base model series, . In order to allow more users to discover HelloWorld, I have retained the original Moonfilm's model link. It can be perceived as a spiritual continuation of Moonfilm on the SDXL new platform, but HelloWorld aims to achieve more than just the pursuit of realism and film-like quality in portraits. Thanks to the far superior amount of information and text understanding capabilities of SDXL compared to SD1.5, HelloWorld is a base model that seeks to realistically depict all things, or in other words, I hope to gradually build a virtual photography world using HelloWorld.
The realistic base model of SD1.5 has developed to a quite mature stage, and it is unlikely to have a significant performance improvement. Unless there is a breakthrough technology for SD1.5 platform, the Moonfilm & MoonMix series will basically stop updating. I will devote my main energy to the development of the HelloWorld SDXL large model. The 1.0 version is now available for download, and the 2.0 version is being developed urgently and is expected to be updated in early September.
As a brand new SDXL model, there are three differences between HelloWorld and traditional SD1.5 models:
Unlike SD1.5 base models, which typically do not include trigger words, please remember to use the trigger word "leogirl" when using HelloWorld 1.0. This ensures that the SDXL model triggers the training set effect more stably.
The HelloWorld model supports direct output at a resolution of 1024*1024 pixels, eliminating the need for high-resolution magnification. The quality of close-up portrait directly output is not inferior to the SD1.5 version, but there are still flaws when outputting distant portraits directly. Therefore, it is suggested to use ADetailer plugin, which can effectively correct the problems of distant faces.
SDXL now allows for easier output using simple natural language prompts. It is recommended to try more natural language prompts, which will result in better outcomes when outputting AI realistic photos.
After multiple rounds of testing, the suggested drawing parameter settings are:
Steps ≥ 25
Sampler: DPM++ 2M Karras
CFG scale: 10
Size ≥ 1024x1024
Everyone is welcome to try HelloWorld and provide plenty of feedback. Your valuable opinions are very important for the next step of model improvement!
The HelloWorld series of models (hereinafter "the Model") has been crafted by myself (hereinafter "the Owner") with the assistance of the LiblibAI platform. Republishing the Model on platforms excluding LiblibAI and Civitai is unauthorized by the Owner.
The Owner permits the use of images generated by the Model for non-commercial educational or informative purposes at no cost, on the condition that:
- Users adhere to applicable laws and do not violate the rights of the Model or any third-party.
- Attribution for the images must be clearly stated as "created by LEOSAM's HelloWorld base model".
For any form of commercial utilization, a prior commercial license agreement with the Owner is required. For inquiries related to commercial licensing and model personalization, please reach out to the Owner via the contact information available on the Owner's homepage.
The development and free distribution of the SDXL model represent significant endeavors. The Owner pledges ongoing complimentary updates to the HelloWorld model for individual enthusiasts as a token of appreciation for the community's contributions to open-source development. Collaborative commercial engagements are vital for the Model's advancement and refinement. The Owner appreciates every user for their understanding and support.
Unauthorized use may breach applicable laws and carry legal repercussions. The Owner retains exclusive rights to interpret this statement, which is governed by prevailing laws and regulations.