Callout research not only tells you how your target audience feels about your main individual songs, but also about (the dynamics between) your essential music clusters.

In our previous article in this series, we looked at setting up your callout research, including criteria to commission panel building or use live recruiting. Once you have your sample in place, you can create the music list and prepare the song hooks for (the next callout wave of) your music test. Radio music research expert Stephen Ryan explains how.


Callout research participants should be able to recognise every song you’re testing, which is especially important when it’s a relatively new release (image: YouTube / Ed Sheeran)


Callout is focused on trending the life cycle of your Currents and Recurrents. Each wave allows you to put your tested songs in a new hierarchy so that your rotations are maximised, and you’ll give the best songs the highest exposure. While some fieldwork companies may mix CATI (Computer-Assisted Telephone Interviewing) and online methodology to get the sample for each wave, it is more likely that CATI will dominate. You can experiment with how many titles you are testing each time, but when rating song hooks played down a telephone line, respondents can typically tolerate around 30 titles in one session while retaining their full concentration.


When you have the luxury of 44 weeks of almost consecutive callout throughout a year, you may have an opportunity to occasionally test a few older Recurrents, maybe even some Golds. But you basically want to leave the older repertoire testing to an AMT (described in our series on Auditorium Music Testing) and prioritise your Currents & Recurrents for callout. This is especially true when you have 26 weeks of callout; one wave every alternative week, where every slot in that song list becomes even more precious.


When designing your song list, start with all songs from your highest-rotating category, and work downwards from there. Begin with your Power Currents, followed by your Power Recurrents, and your Secondary Currents. Depending on how many slots are then still available, you can then include some songs from your Tertiary Currents and/or New Songs. Prior to testing them in callout, you want to play new releases for a couple of weeks first. Because while all songs on your test list will have varying degrees of familiarity to your audience, every tested song must be familiar (and that includes those new additions). There is a way to make sure that only familiar songs will be included in your music test results.


Radio exposure is still a major factor in building song familiarity (image: 123RF, Antonio Li Piani)


As outlined before, you can use a 6-point scale to distinguish between Favourite, Like, Neutral, Burn, Negative and Unfamiliar. Instruct your participants to avoid rating a song they’re not familiar with and to select ‘Unfamiliar’ instead. Some research methods aim to predict the potential of a song that may be heard for the first time, but callout is not the place for this experimentation. If a song is tested too early, and there is a high Unfamiliar score (over 20-25%), it’s often accompanied by a high Negative rating. However, for a lot of songs, as the Unfamiliarity lowers, so does the negativity. If you took a view on the early ratings of such a song, it may be prematurely removed from the playlist.


There’s a smarter way to see the potential of a new and still a bit unfamiliar song. When you’re using an index-based metric, such as a Pop (Popularity) score, you can calculate a PTL (Potential) score to predict what would happen to the Pop score if the Unfamiliarity score would reduce to zero. Take the existing Unfamiliarity score, and then apportion it pro-rata to the existing Favourite, Like, Neutral, Burn and Negative percentages. Potential is a helpful indicator, but the result can be distorted if a song has an Unfamiliar score that’s too high. So make sure that all tested songs (are all likely to) have an appropriate level of familiarity.


Building a song’s familiarity used to depend only on radio exposure. Within a New Song category, they would get a limited exposure for 3 weeks until the total number of plays reached around 60-70. The song would then be familiar enough to put it in callout. Today, we consider a song’s exposure on many different platforms, including streaming, download & online services. While it may shorten how much radio exposure a song requires to hit an appropriate familiarity level, it’s still important to add new songs and expose them to your specific audience prior to testing. While streaming service stats are a good indicator of a song’s potential, the way a radio listener consumes a song is quite different.


Similar to music scheduling, you want to apply some sort of artist separation as well as genre, tempo & gender spread to your callout research music list (image: Ryan Research)


Once you have constructed your list of songs, carefully consider the order in which they will be played during the test. Look at ordering your callout song list in the same way you would schedule a perfect music hour. Therefore, separate similar genres and tempos, and avoid playing too many male or female led songs in a row. In recent years, we have seen how the window between the release of a current song and the next song from a particular artist has dramatically shortened. At the time of writing, Ed Sheeran released two songs (described as two ‘A’ sides) simultaneously; Castle On The Hill, and Shape Of You, which have hit the number 1 and number 2 position in charts across many markets. Justin Bieber and Adele are examples of current artists who often have even more songs in high rotation at the same time. Therefore, you want your callout music list to have appropriate spaces in between.


We should mention at this stage the issue of possible ‘list bias’ from using the same list order for every respondent, as people may give higher importance (or at least different importance) to elements that are higher versus lower in the list. In music testing, the concern is that songs heard at the start of the test could be viewed (or heard) differently than those toward the middle and end. To alleviate the concern, some may consider to randomise the order of songs played to each respondent. However, randomising will remove the ability to create a balanced and nicely spread list of songs.


Across the thousands of callout waves we have processed, analysed and reported on, we have seen no real evidence that any form of list bias has become a significant issue. Remember, this is not an isolated AMT; a callout wave is one among many. If there is a concern, ensure that your song list for the next wave is in a different order (where it contains a significant number of the same songs). When your CATI system allows, you can use an inverted list, where 50% of the sample hears your test songs in the order from 1-30, and the other 50% hears them in the order from 30-1. This way, you’ll retain a good spread of artist, gender and tempo. If such a potential bias existed, you would see noticeable bouncing on the trends across multiple waves.


Therefore, keep your song hooks short, and their duration consistent (image: Ryan Research)


Having designed the list of songs, you now need to prepare the respective hooks. The format required (wav, mp3, ogg, etc.) will depend on the requirements of your fieldwork company’s CATI system. At times, you may simply be required to provide them in wav quality, and they will convert them accordingly on receipt. From the preparation point of view, the main thing is to let every hook truly represent the song, and base the hook on the song’s most recognisable part(usually the chorus). Selecting the best part of the song really comes down to your expertise and skill.


The most difficult ones are often Dance songs, which are primarily instrumental and/or have elongated chorus sections that are difficult to capture in a short sequence. You may have to test specific songs over a longer period of time. In a number of radio markets, Manuel Riva & Eneli’s Mhm Mhm, which you could describe as a ‘catchy’ track, retained a high Unfamiliar score for some time, despite a consistent exposure (both on air and across streaming services). It took more time than usual for the hook to sink into people’s minds as representing a song that they recognise.


If you are happy with the hooks, just make sure they are not unnecessarily long. Most hooks can be edited down to 7 to 9 seconds. Keep in mind that when you are testing 30 songs, and each hook is 9 seconds in duration, the result will be about 270 seconds (or 4.5 minutes) of audio. Take that to 12 seconds each, and you’ll get to 6 minutes — without gaps in between to take the respondents answer. In addition, keep all song hooks to a consistent duration. As people go through the test, they’ll get into a rhythm, and subliminally get used to the average duration of each hook. If they have listened to a series of hooks with an average length of 9 seconds, and are suddenly presented with a hook that is 14 seconds long, they may feel there is a deliberate emphasis on that particular hook (and view it differently).


It’s good to spot shifts in popularity of (relations between) music genres (image: 123RF / joris484)


Once your music list and song hooks are prepared and sent over to the fieldwork company, there is one more important thing to get into the habit of doing. Prior to the launch of every callout wave, get the fieldwork company to call you and go through the survey. It allows you to check that the song list and song hooks are in matching sort order, and to hear how the song hooks are sounding when they’re played from the CATI system down a telephone line. While they will be in mono, and not exactly in the best audio quality, check if they’re loud enough and not unnecessarily distorted. Distortion may occur when you forward a wav version which the fieldwork company then converts to another format.


Every callout survey should be a blind test regarding two elements. The first, as mentioned in previous articles in this music research series, is that each panellist used should not be aware of why they’ve been selected other than ‘we are interested in your view on songs as a radio listener’. The second is that the interviewer should give or infer no information about title or artist. Respondent should simply listen to the hooks, and give their response.


Finally, consider adding music genre / music cluster testing. Looking at music styles there defining your format lets you notice any changes in your overall music appeal, as well as any changes in (dynamics between) the popularity of certain music genres. However, you’ll also get some of these insights from testing your individual songs. When many songs in a particular genre start to show less potential, then the overall exposure of that style may need a review. If you intend to test genres, do so prior to testing individual songs (and clearly explain that you want them to judge the overall style; not the individual fragments). Then, prior to testing the individual hooks, explain that now they should focus on rating each song one by one. To keep YOU hooked: the next article in this series on callout research is going to cover the interpretation and analysis of the results you’ll get from each callout wave!


31a8ca497da06282eb497b8005c82431Thomas Giger is a European radio broadcasting specialist and publisher of Radio))) ILOVEIT, based in the Netherlands, and serving the radio industry worldwide.