Image and caption share one map.

This page uses the same Jina v5 omni model for both modalities. Each image point connects to its caption, and the hover text shows the exact pair plus the relation between them.

6paired image-caption examples from a small Hugging Face dataset.
12embedded points across both modalities.
0.689mean cosine distance for the matching image-caption pairs.
0.458mean cosine distance for non-matching cross-pair comparisons.
image caption
paired multimodal example: image and caption | image: a drawing of a green pokemon with red eyes | caption: a drawing of a green pokemon with red eyes | relation: image content should align with its captionpaired multimodal example: image and caption | image: a green and yellow toy with a red nose | caption: a green and yellow toy with a red nose | relation: image content should align with its captionpaired multimodal example: image and caption | image: a red and white ball with an angry look on its face | caption: a red and white ball with an angry look on its face | relation: image content should align with its captionpaired multimodal example: image and caption | image: a cartoon ball with a smile on it's face | caption: a cartoon ball with a smile on it's face | relation: image content should align with its captionpaired multimodal example: image and caption | image: a bunch of balls with faces drawn on them | caption: a bunch of balls with faces drawn on them | relation: image content should align with its captionpaired multimodal example: image and caption | image: a cartoon character with a potted plant on his head | caption: a cartoon character with a potted plant on his head | relation: image content should align with its caption image point | pair 1 | caption: a drawing of a green pokemon with red eyes | relation: visual content from the imagecaption point | pair 1 | text: a drawing of a green pokemon with red eyes | relation: natural-language caption for the imageimage point | pair 2 | caption: a green and yellow toy with a red nose | relation: visual content from the imagecaption point | pair 2 | text: a green and yellow toy with a red nose | relation: natural-language caption for the imageimage point | pair 3 | caption: a red and white ball with an angry look on its face | relation: visual content from the imagecaption point | pair 3 | text: a red and white ball with an angry look on its face | relation: natural-language caption for the imageimage point | pair 4 | caption: a cartoon ball with a smile on it's face | relation: visual content from the imagecaption point | pair 4 | text: a cartoon ball with a smile on it's face | relation: natural-language caption for the imageimage point | pair 5 | caption: a bunch of balls with faces drawn on them | relation: visual content from the imagecaption point | pair 5 | text: a bunch of balls with faces drawn on them | relation: natural-language caption for the imageimage point | pair 6 | caption: a cartoon character with a potted plant on his head | relation: visual content from the imagecaption point | pair 6 | text: a cartoon character with a potted plant on his head | relation: natural-language caption for the image
Sample Pairs
a drawing of a green pokemon with red eyes
pair 1

a drawing of a green pokemon with red eyes

a green and yellow toy with a red nose
pair 2

a green and yellow toy with a red nose

a red and white ball with an angry look on its face
pair 3

a red and white ball with an angry look on its face

a cartoon ball with a smile on it's face
pair 4

a cartoon ball with a smile on it's face

a bunch of balls with faces drawn on them
pair 5

a bunch of balls with faces drawn on them

a cartoon character with a potted plant on his head
pair 6

a cartoon character with a potted plant on his head