Image and caption share one map.

This page uses the same Jina v5 omni model for both modalities. Each image point connects to its caption, and the hover text shows the exact pair plus the relation between them.

6paired image-caption examples from a small Hugging Face dataset.

12embedded points across both modalities.

0.689mean cosine distance for the matching image-caption pairs.

0.458mean cosine distance for non-matching cross-pair comparisons.

image caption

Sample Pairs

pair 1

a drawing of a green pokemon with red eyes

pair 2

a green and yellow toy with a red nose

pair 3

a red and white ball with an angry look on its face

pair 4

a cartoon ball with a smile on it's face

pair 5

a bunch of balls with faces drawn on them

pair 6

a cartoon character with a potted plant on his head