References

Visual grounding

Using the source input as ground truth will help trust the system and makes it easy to interpret its process and what might have gone wrong.

A screenshot of a pricing table or spreadsheet. The main focus is a highlighted cell displaying the amount "€ 35,00". This same amount is also shown in a separate box or label within the image. The surrounding cells contain various other monetary amounts ranging from around €27 to €165. Based on the layout and formatting, this seems to be a financial or pricing-related document.
Human needs

When checking data, I want to be able to see how the system arrived at its answer, so I can trust the data and identify any potential errors in the process.

Considerations
  • AI Transparency and Explainability: Make AI systems transparent and understandable by explaining how and why decisions are made.
  • Multimodal Context: In this example we used the context of an image of a receipt, but it can also include other modalities, such as audio.
Explore Further

More of the Witlist

Comparing embedding shapes

Comprehend and compare large documents by visualizing embeddings and their scores, enabling a clear and concise understanding of vast data sources in a single, intuitive visualization.

Evaluate predictions

When an observation is added to the context from an implicit action and a prediction is made, users should be able to easily evaluate and dismiss it.

Realtime prompt feedback

Guide users to understand what makes a good prompt will help them learn how to craft prompts that result in better outputs.

Realtime image generation

Realtime generation allows people to manipulate content instantly, giving them more agency in using generative AI as a tool for exploration.

Exploring language spatially

Use a spatial dimension to explore and manipulate language. By pulling text around on a map, you can play with different features in a playful and meaningful way.

Pinch to summarize

In Arc, a playful pinch interaction lets you quickly distill any webpage into a brief summary, capturing the essence of the content in moments.