Apple Releases A Curated AI Dataset For Image Editing Research

Apple Releases A Curated AI Dataset For Image Editing Research

uaetodaynews.com — Apple releases a curated AI dataset for image editing research

Apple has released Pico-Banana-400K, a highly curated 400,000-image research dataset which, interestingly, was built using Google’s Gemini-2.5 models. Here are the details.

Apple’s research team has published an interesting study called “Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing”.

In addition to the study, they also released the full 400,000-image dataset it produced, which has a non-commercial research license. This means that anyone can use it and explore it, provided it is for academic work or AI research purposes. In other words, it can’t be used commercially.

Right, but what is it?

A few months ago, Google released the Gemini-2.5-Flash-Image model, also known as Nanon-Banana, which is arguably the state-of-the-art when it comes to image editing models.

Other models have also shown significant improvements, but, as Apple’s researchers put it:

“Despite these advances, open research remains limited by the lack of large-scale, high-quality, and fully shareable editing datasets. Existing datasets often rely on synthetic generations from proprietary models or limited human-curated subsets. Furthermore, these datasets frequently exhibit domain shifts, unbalanced edit type distributions, and inconsistent quality control, hindering the development of robust editing models.”

So, Apple set out to do something about it.

Building Pico-Banana-400K

The first thing Apple did was pull an unspecified number of real photographs from the OpenImages dataset, “selected to ensure coverage of humans, objects, and textual scenes.”

Yes, they actally used Comic Sans

Then, it came up with a list of 35 different types of changes a user could ask the model to make, grouped into eight categories. For instance:

  • Pixel & Photometric: Add film grain or vintage filter
  • Human-Centric: Funko-Pop–style toy figure of the person
  • Scene Composition & Multi-Subject: Change weather conditions (sunny/rainy/snowy)
  • Object-Level Semantic: Relocate an object (change its position/spatial relation)
  • Scale: Zoom in

Next, the researchers would upload an image to Nano-Banana, alongside one of these prompts. Once Nano-Banana was done generating the edited image, the researchers would then have Gemini-2.5-Pro analyze the result, either approving it or rejecting it, based on instruction compliance and visual quality.

The result became Pico-Banana-400K, which includes images produced through single-turn edits (a single prompt), multi-turn edit sequences (multiple iterative prompts), and preference pairs comparing successful and failed results (so models can also learn what undesirable outcomes look like).

While acknowledging Nano-Banana’s limitations in fine-grained spatial editing, layout extrapolation, and typography, the researchers say that they hope Pico-Banana-400K will serve as “a robust foundation for training and benchmarking the next generation of text-guided image editing models.”

You can find the study on arXivand the dataset is freely available on GitHub.

Accessory deals on Amazon

FTC: We use income earning auto affiliate links. .


Disclaimer: This news article has been republished exactly as it appeared on its original source, without any modification.
We do not take any responsibility for its content, which remains solely the responsibility of the original publisher.


Disclaimer: This news article has been republished exactly as it appeared on its original source, without any modification.
We do not take any responsibility for its content, which remains solely the responsibility of the original publisher.


Author: uaetodaynews
Published on: 2025-10-29 06:57:00
Source: uaetodaynews.com

chicago76.com

Find the latest breaking news and in-depth coverage on world affairs, business, culture, and more

Related Articles

Back to top button