OpenAI has unveiled Jukebox, an open-source artificial intelligence system that can generate complete songs with music, meaningful lyrics and vocals.
Researchers have trained Jukebox on 1.2 million pieces of music in almost all genres (for example, ristenk). Now the AI can create its own songs, which are often similar to the works of the artists it was trained on. Jukebox is able to mimic a particular genre of music and can recreate the style of a particular artist.

"Our models can create songs from a wide variety of musical genres, such as rock, hip-hop and jazz. They can mimic the melody, rhythm and sound of a wide variety of instruments, as well as vocals that will go along with the music."
OpenAI has been working on creating music for several years. The company's previous development, MuseNet, was capable of creating MIDI tracks, but before Jukebox there was no AI that could write songs with full vocal parts in different genres.
According to The Next Web, despite Jukebox's superiority over other music neural networks, the project is far from perfect. The AI still lacks the skill to reproduce a standard song with choruses and repeating motifs. In addition, Jukebox requires huge computational resources. Because of this, using OpenAI's new development in a home or studio setting is not yet possible.
"We have shared Jukebox with several musicians, and these musicians are not yet able to apply it to their creative process," the company points out.
Some musicians also point out that Jukebox could cause copyright issues.
"The new OpenAI tool, which automatically generates songs in the style of world-famous celebrities and includes reproducing their voices, is not only a technologically impressive and very exciting project, but also a terrifying phenomenon in terms of copyright law. Have Kanye West, Katy Perry, Aretha Franklin, Elvis Presley and other artists given OpenAI permission to use their audio recordings as training material for this algorithm? I don't think so," musician Cheri Hu tweeted.
Nevertheless, the developers hope that in the future Jukebox will be able not only to imitate compositions, but also to create entirely new tracks that will be indistinguishable from the work of real musicians.
For now it takes Jukebox about nine hours to write one minute of a song. You can listen to the compositions written by the AI on their website.
Automated music creation with artificial intelligence
As more and more people spend time at home, creating, listening to and using music in various projects is becoming more important in their lives. The early successes in creating, producing and editing music using artificial intelligence are stunning and will accelerate this trend even further.
However, automatically generating music is quite a challenge for many reasons. The biggest obstacle is that a simple three-minute song, which a group of people can easily memorize, contains too many variables for a computer. In addition, there is as yet no perfect way to train an artificial intelligence to be a musician.
And the goal itself is also far from obvious to us developers. Are we trying to create music out of thin air or from some form of input values? Or do we want to create a system that can accompany a person as they play?
We believe there is currently no reason for musicians to worry about their career prospects. It is unlikely that artificial intelligence will fully automate the industry in 2021. But creating professional-quality music will clearly become easier and cheaper in the near future.
Let's take a look at three companies that are trying to automatically generate music, and assess the possibility that a data scientist will soon be awarded the first Grammy.
OpenAI's Jukebox - for the futurists
OpenAI is one of the companies founded by that deer-loving, rocket and car inventor named Elon Musk. OpenAI has several creative projects, the most notable of which is GPT-3, dedicated to literature. But as a music lover, I've given a special place in my heart to Jukebox.
"We present Jukebox, a neural network that generates music, including primitive singing, in the form of raw sound across different genres and styles of artists."
- OpenAI
The basic idea is that they take raw sound and encode it using convolutional neural networks (CNNs). Think of it as a way to compress a large number of variables down to a smaller number. Such a measure is necessary because there are 44 100 variables in just one second of sound, and there are many in a song. Then they do the process of generating this reduced set of variables and uncompressing it back down to 44,100.
Amper Music - for everyone
A completely different approach is taken by Amper Music. The Amper generator creates the music itself and does not allow a human to control the process. It does this by using so-called descriptors.
"Descriptors are musical algorithms that reproduce a particular style of music. One descriptor might be created to play New York punk rock and another to play laid-back beach folk."
- Amper Music.
When generating, you can choose two parameters: the length of the song and a set of characteristics for the descriptor. I chose "playful futuristic documentary," and the result is quite nice and potentially usable. After that, the suggestion is to choose a set of tools to go with the descriptor. I settled on forks and knives.
The difference in approach using the mechanical piano tape example
The approach used in Amper most likely combines the first two solutions, and it's hard for me to speculate exactly how it works.
The main difference between AIVA and Jukebox is the nature of the data structure (the way music is stored). To understand the difference between Jukebox and AIVA, we must first understand the difference between audio recording and the MIDI standard. In our case, MIDI can be understood as a set of multiple tapes for a mechanical piano (pianola), where the tape is essentially a single instrument.
Piano tape is one of the oldest specialized data structures. Its foundation was laid in 1896. Originally developed for automatic piano playing, today it serves as a canvas for music.