image-caption-comfyui

Image caption node for ComfyUI. You can load your image caption model and generate prompts with the given picture.

Insert prompt node is added here to help the users to add their prompts easily.

Setup

Clone the repository to the custom_nodes folder
Run ComfyUI
Place the folder which contains your model under the models/image_captioners folder
Click Refresh button in ComfyUI, if it didn't work restart ComfyUI
Processor has to be in the folder

You can find the models in this link
For to use the pretrained model follow these steps:
- Download the model and unzip to models/image_captioners folder.
- Click Refresh button in ComfyUI
- Then select the image caption model with the node's model_name variable (If you can't see the generator, restart ComfyUI).

female_image_caption_blip | (Training In Process)
- Base model
- using Salesforce/blip-image-captioning-base
female_image_caption_git | (Training In Process)
- Base model
- using microsoft/git-base

Variable Names	Definitions
model_name	Folder name that contains the model
min_new_tokens	The minimum numbers of tokens to generate, ignoring the number of tokens in the prompt.
max_new_tokens	The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
num_beams	Number of steps for each search path
penalty_alpha	The values balance the model confidence and the degeneration penalty in contrastive search decoding.
top_k	The number of highest probability vocabulary tokens to keep for top-k-filtering.
repetition_penalty	The parameter for repetition penalty. 1.0 means no penalty

For more information, follow this link.
Check this link for text generation strategies.
Increasing the min_new_tokens and max_new_tokens variables' values will help to generate more accurate prompts.

Variable Names	Definitions
prompt_string	Want to be inserted prompt. It is replaced with `{prompt_string}` part in the prompt_format variable
prompt_format	New prompts with including `prompt_string` variable's value with `{prompt_string}` syntax. For example, `prompt_string` value is `hdr` and `prompt_format` value is `1girl, solo, {prompt_string}`. Then the output is `1girl, solo, hdr`. The `{prompt_string}` syntax will be added anywhere in the string.

After including the node, change the prompt_string variable to input
For string output; additional comma is added for compability with other nodes which are appending new prompts for the given string, for example Prompt Generator.

If the below solutions are not fixed your issue please create an issue with bug label

The image caption node is based on transformers package. So most of the problems may be caused from this package. For overcome these problems you can try to update package:
For Manual Installation of the ComfyUI
1. Activate the virtual environment if there is one.
2. Run the pip install --upgrade transformers command.
For Portable Installation of the ComfyUI
1. Go to the ComfyUI_windows_portable folder.
2. Open the command prompt in this folder.
3. Run the .\python_embeded\python.exe -s -m pip install --upgrade transformers command.

Sometimes the variables are changed with updates, so it may broke the workflow. But don't worry, you have to just delete the node in the workflow and add it again.

Contributions are welcome. If you have an idea and want to implement it by yourself please follow these steps:
1. Create a fork
2. Pull request the fork with the description that explaining the new feature
If you have an idea but don't know how to implement it, please create an issue with enhancement label.
The contributing can be done in several ways. You can contribute to code or to README file.

Reference (Used) Image	Output

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
images		images
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
image_caption.py		image_caption.py
preprocess.py		preprocess.py
pyproject.toml		pyproject.toml