Article Correctness Is Author's Responsibility: AI Tool Promises Better Automated Analysis of Datasets with Rare Items, a Key Real-World Limitation

The MiikeMineStamps dataset of stamps provides a unique window into the workings of a large Japanese corporation, opening unprecedented possibilities for researchers in the humanities and social sciences. But some of the stamps in this archive only appear in a small number of instances. This makes for a "long tail" distribution that poses particular challenges for AI learning, including fields in which AI has experienced serious failures. A collaboration between scientists at the University of Pittsburgh (Pitt), PSC, DeepMap Inc. of California and Carnegie Mellon University (CMU) took up this challenge, using PSC's Bridges and Bridges-2 systems to build a new machine learning (ML) based tool for analyzing "long tail" distributions.