Atlantic reporter Alex Reisner not too long ago uncovered four datasets of music getting used to coach AI models and made them fully searchable for the general public. Two of the units are completely monumental at 12 million and 9 million tracks. The opposite two are a lot smaller, however nonetheless characterize a big quantity of coaching information at over 100,000 songs every.
In keeping with Reisner, the units have been downloaded hundreds of instances and, whereas it’s unimaginable to know precisely who has used them, Google and Stability have each confirmed they’ve in analysis papers. A few of the sources, just like the Free Music Archive dataset, are free to stream for private use however require licensing for business purposes.
Whereas the datasets are freely out there on the web in concept, utilizing them as coaching information isn’t so simple as downloading a ZIP file and feeding it to an AI mannequin. As Reisner explains:
Three of the datasets I discovered are distributed as a listing of hyperlinks to songs on YouTube or Spotify. AI builders obtain the precise audio utilizing instruments that automate the job, a few of which permit builders to bypass logins, commercials, and mechanisms that may earn cash or subscribers for creators. Such instruments violate the phrases of service of those platforms.
