Visual Understanding Beyond Naming


If you torture data long enough, it will confess.

“90% of the traffic will be visual data” – Cisco This is “Big Data”, which is too big for humans.

Visual data is “digital dark matter” because it’s noisy, unsegmented, high-entropy, two or three dimentional, so it is difficult to handle.

Visual similarity via semantics

Using words to understand/index pictures. Object naming -> Object categorization Image labeling (e.g., with CNNs + ImageNet)

Two problems

  1. Long tails! We have huge amount of data.
  2. There are many more things in our visual world.

Visual world

  • Not one-to-one (Visual world -> City)
  • The language bottleneck