Metadata - Musical fingerprinting

The identification of works is of prime importance for everyone who relies on royalties for a living. Tracing their use and identifying them for the redistribution of the revenue that they have generated is where metadata come into play.

With the explosion in the volumes involved, the management of metadata has become a crucial issue for SACEM.
 

"Metadata form the identity card of any work. They are used to identify it and to direct the amounts associated with its use to its owners." For Louis Diringer, SACEM Membership Director, behind the name, which sounds strange to the uninitiated, there hides in reality a great deal of information which must accompany the work.
The first stage in the acquisition of metadata is the registration of the work. Since February 2014, in order to meet the demands of its members, SACEM has developed an online registration service.
This has proven to be a great success (almost 50% of the 170,000 works registered in 2014). In total, including works from other countries, SACEM has recorded data for more than 1.7 million works.
Its database is today a rich source of detailed information relating to 17,995,300 different works.


Standardised international codes

With the proliferation of broadcasting channels, tracing the use of a work is becoming increasingly complex. With the arrival of online use the volumes involved have exploded.
In 2014, SACEM processed 2.6 billion usage report lines which represented 251.6 billion download and streaming transactions.

How are these astronomical quantities of data analysed? One answer lies in the use of single identifiers.
The movement towards the standardisation of identification elements for works began at the end of the 1990s. Modelled on the ISRC code used for identifying phonographic recordings, the authors' society community developed a single code for identifying musical works: the ISWC.
This ISO-certified number is managed by CISAC, the international federation of authors' societies. In each country a national agency is given responsibility for assigning the ISWCs for musical works created in their territory. In France, SACEM automatically allocates these to registered works.
The first ISWC was issued in 1995 to the song Dancing Queen by the group Abba. Since then more than 19 million codes have been assigned worldwide. "Assigning ISWCs is important, but there is a constant ongoing dissemination effort to convince broadcasters to include these in their broadcasting records," warns Louis Diringer. Another answer involves the exchange of data between societies of authors.
This is realised through the CIS-Net network, which interconnects the databases of 97 authors’ societies across the world, allowing each of them to access the documentation for more than 44 million musical works.
Finally, SACEM also stores a large amount of external data: codes or identifiers assigned internally by broadcasters or internet platforms.
For Ali Mouhoub, managing director of Yacast, a service provider that works with SACEM, "metadata are a central issue. With the appearance and proliferation of digital services, basic information is no longer sufficient. The various databases must be able to ‘talk’ to each other. Interoperability of systems is achieved using clean metadata, in particular the systematic use of standardised international codes."

 

Of men and machines

If metadata are correctly entered, the processing of usage registrations and identification of works is automated and carried out for the most part overnight and on weekends.
This ensures that optimum IT resources are made available (up to 174 automatic codification transactions per second). "These very high volumes will continue to increase in the years to come. We have in addition just acquired the new 100% Flash ExtremIO bay from EMC2 to accelerate calculation speeds," explains Veronique Sinclair, SACEM's Chief Information Officer. If automatic identification is insufficient, SACEM services carry out associations.

Lyne Tastet-Yonke, International Distribution Management Director, explains: "the registrations of use for a single work may contain spelling mistakes or typing errors (missing accents, hyphens or spaces, etc.) or be fragmented. Not to mention homonyms: How many John Smiths or Martin Duponts are there in the databases? How many songs entitled 'I Love You'? We have an intelligent search engine which learns; as identification is under way, it remembers the difference occurrences and links for each broadcaster. The errors only need to be corrected once."

As well as human and technical resources, service providers are used.
For TV, radio and discotheque broadcasts, Yacast tracks several hundred media in French territory. This company has developed a platform for SACEM, which records the broadcasting outputs of these media for ten years. "We provide SACEM with all radio and TV broadcasts. With Media Archiver it's possible to verify whether a musical work has been used in just a few clicks," explains Ali Mouhoub. Although tracking of traditional media is by now a well-established process, new online uses are more complex to identify.

As part of Armonia, a partnership between several European authors' societies, SACEM has called upon the services of the Spanish company BMAT, which scrutinises the flow on Web platforms and compares this with the contents of its database of digital audio records, which includes fifty million references. This service has proven to be enormous useful, in particular for the notorious UGC (user-generated content), uploaded by individuals onto platforms like YouTube. Metadata are usually non-existent in these cases. And the volumes on YouTube are colossal: more than three hundred hours of video added every minute!
A recent test showed a rate of identification of the music elements used on the order of 30% when conventional systems were unable to identify them. "With the development of exchanges of video and music on this type of platform or on social networks like Twitter, Facebook and Instagram, where 'conventional' metadata are almost non-existent, fingerprinting technologies are currently the best means of identification. There is also great potential for improving identification beyond the reference master; of live performances, mixes, and some day perhaps even Aunty Chantal who sings out of tune in the shower and is a big hit on YouTube," explains Xavier Costaz, project director for SACEM.

 

"ONI" (œuvre non identifiée): unidentified works

Despite all the care and attention paid, not all works and uses can be identified. What happens to these unidentified works then?
"If we can't identify a work after a manual search, we create what we call a forced work. The money associated with its use is retained until it is identified," explains Cynthia Lipskier, repertoire manager within the Members' directorate. At each distribution (four per year), unidentified works are made available to members in their secure area on SACEM.fr where they can be accessed for five years.
Members can then search to establish whether some of their works are present. They must use ingenuity plus a good dose of intuition, since if SACEM has not been able to identify them, it's because there are no metadata allowing them to do so.

Among the main reasons for failure to identify works are: failure of rights holders to register the work (40% of claims processed), despite an efficient online works registration service being available since the beginning of 2014.

Another reason is an incorrect character or incomplete information supplied by broadcasters.
"We have regular meetings with major broadcasters to encourage them to be as accurate as possible," emphasises Louis Diringer. An ongoing update programme to raise awareness has allowed SACEM to distribute royalties in 2014 to over 276,000 songwriters, composers and publishers, for the use of two million different works.

Published August 11 2015