The Atlantic Unveils Searchable Database of Music Used for AI Training

Translated for your language. Read the original.

AI-assisted draft.

In this article

The Atlantic ਨੇ AI ਟ੍ਰੇਨਿੰਗ ਲਈ ਵਰਤੀ ਜਾਣ ਵਾਲੀ ਸੰਗੀਤ ਦੀ ਇੱਕ ਖੋਜਣਯੋਗ ਡਾਟਾਬੇਸ ਦਾ ਖੁਲਾਸਾ ਕੀਤਾ ਹੈ

ਜਨਰੇਟਿਵ AI ਟ੍ਰੇਨਿੰਗ ਵਿੱਚ ਪਾਰਦਰਸ਼ਤਾ ਦੀ ਕਮੀ ਨੂੰ ਹੁਣ ਇੱਕ ਇਤਿਹਾਸਕ ਜਾਂਚ ਯਤਨ ਰਾਹੀਂ ਪੂਰਾ ਕਰ ਦਿੱਤਾ ਗਿਆ ਹੈ। The Atlantic ਨੇ ਇੱਕ ਜਨਤਕ, ਖੋਜਣਯੋਗ ਡਾਟਾਬੇਸ ਲਾਂਚ ਕੀਤਾ ਹੈ ਜੋ ਆਰਟੀਫੀਸ਼ੀਅਲ ਇੰਟੈਲੀਜੈਂਸ ਮਾਡਲਾਂ ਦੁਆਰਾ ਵਰਤੇ ਜਾ ਰਹੇ ਕਾਪੀਰਾਈਟ ਵਾਲੇ ਸੰਗੀਤ ਦੇ ਵਿਸ਼ਾਲ ਪੱਧਰ ਨੂੰ ਪ੍ਰਗਟ ਕਰਦਾ ਹੈ।

ਵਿਸ਼ਾਲ ਡੇਟਾ ਸੈੱਟਾਂ ਦਾ ਖੁਲਾਸਾ: ਲੱਖਾਂ ਟ੍ਰੈਕ ਸਾਹਮਣੇ ਆਏ

ਜਾਂਚਕਾਰ ਰਿਪੋਰਟਰ Alex Reisner ਨੇ ਚਾਰ ਮੁੱਖ ਡੇਟਾ ਸੈੱਟਾਂ ਦੀ ਪਛਾਣ ਕੀਤੀ ਹੈ ਜੋ ਇਸ ਸਮੇਂ AI ਸੰਗੀਤ ਟ੍ਰੇਨਿੰਗ ਲਈ ਰੀੜ੍ਹ ਦੀ ਹੱਡੀ ਵਜੋਂ ਕੰਮ ਕਰ ਰਹੇ ਹਨ। ਇਹਨਾਂ ਰਿਪੋਜ਼ਟਰੀਆਂ ਦਾ ਪੱਧਰ ਹੈਰਾਨ ਕਰਨ ਵਾਲਾ ਹੈ: ਦੋ ਡੇਟਾ ਸੈੱਟਾਂ ਵਿੱਚ ਕ੍ਰਮਵਾਰ 12 ਮਿਲੀਅਨ ਅਤੇ 9 ਮਿਲੀਅਨ ਟ੍ਰੈਕ ਹਨ, ਜਦੋਂ ਕਿ ਦੋ ਛੋਟੇ ਸੈੱਟਾਂ ਵਿੱਚ 100,000 ਤੋਂ ਵੱਧ ਗੀਤ ਹਨ।

ਇਹ ਖੁਲਾਸਾ AI ਉਦਯੋਗ ਵਿੱਚ ਇੱਕ ਪ੍ਰਣਾਲੀਗਤ ਸਮੱਸਿਆ ਨੂੰ ਉਜਾਗਰ ਕਰਦਾ ਹੈ ਜਿੱਥੇ ਮੂਲ ਸਿਰਜਣਹਾਰਾਂ ਦੀ ਸਪੱਸ਼ਟ ਇਜਾਜ਼ਤ ਤੋਂ ਬਿਨਾਂ ਮੀਡੀਆ ਦੀ ਭਾਰੀ ਮਾਤਰਾ ਨੂੰ ਟ੍ਰੇਨਿੰਗ ਸੈੱਟਾਂ ਵਿੱਚ ਇਕੱਠਾ ਕੀਤਾ ਜਾਂਦਾ ਹੈ। ਇਹ ਡਾਟਾਬੇਸ ਕਿਸੇ ਨੂੰ ਵੀ ਇਹਨਾਂ ਸੰਗ੍ਰਹਿਾਂ ਵਿੱਚ ਖੋਜ ਕਰਨ ਦੀ ਇਜਾਜ਼ਤ ਦਿੰਦਾ ਹੈ, ਜਿਸ ਵਿੱਚ Lady Gaga, Bruce Springsteen, ਅਤੇ Radiohead ਵਰਗੇ ਮੁੱਖ ਪ੍ਰਤੀਕਾਂ ਤੋਂ ਲੈ ਕੇ Hainbach ਵਰਗੇ ਪ੍ਰਯੋਗਸ਼ੀਲ ਸੰਗੀਤਕਾਰਾਂ ਅਤੇ Aphex Twin ਵਰਗੇ ਇਲੈਕਟ੍ਰਾਨਿਕ ਕਲਾਕਾਰਾਂ ਤੱਕ ਸੰਗੀਤਕ ਪ੍ਰਤਿਭਾ ਦਾ ਇੱਕ ਵਿਸ਼ਾਲ ਘੇਰਾ ਸ਼ਾਮਲ ਹੈ।

ਤਕਨੀਕੀ ਲੂਪਹੋਲ: ਪਲੇਟਫਾਰਮ ਦੀਆਂ ਸੁਰੱਖਿਆਵਾਂ ਨੂੰ ਬਾਈਪਾਸ ਕਰਨਾ

ਇਹ ਖੋਜ AI ਡਿਵੈਲਪਰਾਂ ਦੁਆਰਾ ਟ੍ਰੇਨਿੰਗ ਡੇਟਾ ਪ੍ਰਾਪਤ ਕਰਨ ਲਈ ਵਰਤੇ ਜਾਣ ਵਾਲੇ ਇੱਕ ਗੁੰਝਲਦਾਰ ਤਕਨੀਕੀ ਤਰੀਕੇ (workaround) ਦਾ ਖੁਲਾਸਾ ਕਰਦੀ ਹੈ। ਇਹਨਾਂ ਵਿੱਚੋਂ ਜ਼ਿਆਦਾਤਰ ਡੇਟਾ ਸੈੱਟ ਸਿੱਧੀਆਂ ਆਡੀਓ ਫਾਈਲਾਂ ਨਹੀਂ ਹਨ, ਸਗੋਂ YouTube ਅਤੇ Spotify ਵਰਗੇ ਪਲੇਟਫਾਰਮਾਂ ਦੇ ਲਿੰਕਾਂ ਦੀਆਂ ਸੂਚੀਆਂ ਹਨ।

ਇਹਨਾਂ ਲਿੰਕਾਂ ਨੂੰ ਵਰਤੋਂ ਯੋਗ ਟ੍ਰੇਨਿੰਗ ਡੇਟਾ ਵਿੱਚ ਬਦਲਣ ਲਈ, ਡਿਵੈਲਪਰ ਆਟੋਮੇਟਿਡ ਸਕ੍ਰੈਪਿੰਗ ਟੂਲਸ (automated scraping tools) ਦੀ ਵਰਤੋਂ ਕਰਦੇ ਹਨ ਜੋ ਸਿੱਧੇ ਤੌਰ 'ਤੇ ਆਡੀਓ ਡਾਊਨਲੋਡ ਕਰਨ ਲਈ ਤਿਆਰ ਕੀਤੇ ਗਏ ਹਨ। ਇਹ ਟੂਲ ਖਾਸ ਤੌਰ 'ਤੇ ਲੌਗਇਨ ਨੂੰ ਬਾਈਪਾਸ ਕਰਨ, ਵਿਗਿਆਪਨਾਂ ਨੂੰ ਛੱਡਣ ਅਤੇ ਉਹਨਾਂ ਵਿਧੀਆਂ—ਜਿਵੇਂ ਕਿ ਸਬਸਕ੍ਰਿਪਸ਼ਨ ਮਾਡਲ ਅਤੇ ਪੇਵਾਲ (paywalls)—ਨੂੰ ਟਾਲਣ ਲਈ ਤਿਆਰ ਕੀਤੇ ਗਏ ਹਨ ਜੋ ਸਿਰਜਣਹਾਰਾਂ ਨੂੰ ਆਪਣੇ ਕੰਮ ਤੋਂ ਆਮਦਨ ਕਮਾਉਣ ਦੀ ਇਜਾਜ਼ਤ ਦਿੰਦੇ ਹਨ। ਹਾਲਾਂਕਿ ਇਹ ਡੇਟਾ ਸੈੱਟ ਇੰਟਰਨੈਟ 'ਤੇ "ਉਪਲਬਧ" ਹੋ ਸਕਦੇ ਹਨ, ਪਰ ਡੇਟਾ ਕੱਢਣ ਦਾ ਇਹ ਤਰੀਕਾ ਅਕਸਰ ਹੋਸਟਿੰਗ ਪਲੇਟਫਾਰਮਾਂ ਦੀਆਂ ਸੇਵਾ ਦੀਆਂ ਸ਼ਰਤਾਂ ਦੀ ਉਲੰਘਣਾ ਕਰਦਾ ਹੈ ਅਤੇ ਕਲਾਕਾਰਾਂ ਦੀ ਰੱਖਿਆ ਲਈ ਬਣਾਏ ਗਏ ਡਿਜੀਟਲ ਰਾਈਟਸ ਮੈਨੇਜਮੈਂਟ (DRM) ਨੂੰ ਨੁਕਸਾਨ ਪਹੁੰਚਾਉਂਦਾ ਹੈ।

ਉਦਯੋਗਕ ਪ੍ਰਭਾਵ ਅਤੇ AI ਵਾਚਡੌਗ

The impact of this data ingestion is not theoretical; major industry players have already acknowledged its use. Both Google and Stability AI have confirmed the utilization of these datasets in their official research papers. This confirmation underscores a growing tension between the rapid advancement of multimodal AI and the legal frameworks governing intellectual property.

By hosting this information on The Atlantic’s "AI Watchdog" site, the publication is providing a critical tool for developers, legal experts, and artists to track how their intellectual property is being utilized. This move shifts the conversation from speculation to empirical evidence, providing the necessary groundwork for upcoming copyright litigation and regulatory debates regarding fair use in the age of machine learning.

Key Takeaways

Massive Scale of Ingestion: AI training datasets contain millions of tracks, including two massive sets of 12 million and 9 million songs.
Circumvention of Terms: Developers use automated tools to bypass YouTube and Spotify protections, effectively stripping creators of ad revenue and subscription fees.
Corporate Accountability: Major AI entities, including Google and Stability AI, have verified the use of these datasets in their published research.

The Atlantic Unveils Searchable Database of Music Used for AI Training

The Atlantic ਨੇ AI ਟ੍ਰੇਨਿੰਗ ਲਈ ਵਰਤੀ ਜਾਣ ਵਾਲੀ ਸੰਗੀਤ ਦੀ ਇੱਕ ਖੋਜਣਯੋਗ ਡਾਟਾਬੇਸ ਦਾ ਖੁਲਾਸਾ ਕੀਤਾ ਹੈ

ਵਿਸ਼ਾਲ ਡੇਟਾ ਸੈੱਟਾਂ ਦਾ ਖੁਲਾਸਾ: ਲੱਖਾਂ ਟ੍ਰੈਕ ਸਾਹਮਣੇ ਆਏ

ਤਕਨੀਕੀ ਲੂਪਹੋਲ: ਪਲੇਟਫਾਰਮ ਦੀਆਂ ਸੁਰੱਖਿਆਵਾਂ ਨੂੰ ਬਾਈਪਾਸ ਕਰਨਾ

ਉਦਯੋਗਕ ਪ੍ਰਭਾਵ ਅਤੇ AI ਵਾਚਡੌਗ

Key Takeaways

Continue reading

The AI Trust Gap: Why 60% of U.S. Consumers Reject AI Messaging

ਵੈੱਬ ਡਾਟਾ ਇਨਫਰਾਸਟ੍ਰਕਚਰ ਦਾ ਉਭਾਰ: AI ਦੀ ਗਿਆਨ ਦੀ ਰੁਕਾਵਟ ਨੂੰ ਹੱਲ ਕਰਨਾ

Tidal to Demonetize AI Music While Implementing New Labeling Rules

Bridging the Gap: Why Process Excellence is the Secret to AI Success