• C26000 ha detto...
    • Utente
    • Mag 3 2006, 15:23

    The Formula

    We have now 4 members , thanks sheermonkey and minusblinfold for joining :),

    we have also the first pending member rejected, maybe he didn't read well the description of the gruoup, he had AEP = -1.00.

    im not sure about the join condition, what do you think about it?

    in my tests i have found AEP = -6 (for a fake top fan) and AEP = 4.6, and it seems working well for me. but im open to hear yours opinions.

  • Re: The Formula

    Quoth C26000:
    We have now 4 members , thanks sheermonkey and minusblinfold for joining :),


    Wahey, what do I win? Lol :-)
    To be honest, I didn't really get the whole formula thing, but it's your group, so you can do what you like ;-)

    Like the sentiment that being a true fan doesn't mean you have to listen to one artist over and over at home, school, work, car, iPod, et cetera, et cetera... makes even more sense to me as I have such a wide taste in music. (Although this week I can tell you there might be some bias, I've just got hold of a top new track - see sig, and do check it out, it's fantastic - i've been looking for, for ages, but hey, swings and roundabouts, it's not like i'm going to play it 500 times before the end of the week...)

    Anyways, hope I can be of some use, I'd recommend some friends, but they all tend to have the 1-5 artists that stick out miles into their blank space on their profile pages ;-)

    See you about,

    • [Utente eliminato] ha detto...
    • Utente
    • Mag 6 2006, 8:10
    That is a mighty big sig you've got there... Thankfully it's rather more tastefull than most, but still... ;)

    It's an interesting idea and statistic/formula. Maybe with some of the new updates in charts and graphs that may be coming to a Last.fm near you, such a statistic can be more integrated into a user's profile. It might also be interesting to calculate on a weekly/monthly/other time period basis in addition to just overall. And when my profile has filled up a bit, maybe I'll be able to join.
    Cheers,
    Nils

  • Hey, no soap radio, great name!

    I thought any formula should have some relationship between the top 50 and the total number of tracks listened to.

    Love peace and truth incorporated for all who seek
    • C26000 ha detto...
    • Utente
    • Lug 15 2006, 4:40
    i also thought about that ,blackadders

    but i thought that most people listen to music in an exponential way (some people more than others) and the effect of the total trakcs is aprox included in the AverageTop50 value.

    but im not sure about this,, any idea?

  • In your current formula: divide the total for your top 50 by the total plays and then multiplying the result by a factor of, say, 3 - experimentation would have to be done first to find out what results to expect.

    The result would tend to "flatter" someone who listens to lots of different music.

    Actually, I've rethought - there's a slight problem with using the total figure, which is that the total plays is updated constantly but the top 50 chart is only updated weekly so the later in the week you do the calculation, the "better" the result. I suppose the longer you've been on last.fm, the smaller the deviation.

    Love peace and truth incorporated for all who seek
    • Anrky ha detto...
    • Utente
    • Lug 27 2006, 22:53
    I like yer formula a lot, it's complex for most music listners, but then again I think we'd only want the more intellegent music fans on last.fm. It's more acurate than my symplistic version I created when the "How Mainstream Are You?" meme...

    http://www.last.fm/user/Anrky/journal/2006/04/20/120355/


    It seemed like everywhere i went i found people who listened to 1-2 bands and practically nothing else, how can they call them selves fans of Music?

    Maybe I should be happy my meme didn't catch on...

    • leonelf ha detto...
    • Utente
    • Set 1 2006, 16:37
    Hi, I've been toying around with your formula, I'm not sure it's the best way to solve your problem.
    I've taken 3 profiles as an example: C26000, wich would be an example of a "non exponential profile", koal, a member of the We Have Exponential Profiles, as an example of an "exponential profile", and myself, somewhere in between (If I calculated correctly, my AEP is of 4.08).

    In my opinion, someone with a "non exponential profile" is someone who has weak preferences for every artist he likes, whereas an "exponential profile" would be someone that would rather listen to his number 1 favourite artist than any of the others, but, among the less favourite artists, also has a weak preference for any of them.

    Taking my stats as an example, my number 1 group are the Cocteau Twins, with 319 listens, number 2 is Tricky, with 297 listens, and my number 3 is Felix da Housecat, with 265 listens.
    For now, I listen to number 2 93% times (297/319) as often as number 1, and to number 3 83% as often as number 1 or 89% as often as number 2.
    Taking koal stats, she has listened to her number 1 artist, Jack Johnson, 660 times, her number 2 artist has 291 listens and her number 3, 180 listens.
    For now, she listens to number 2 44% times as often as number 1, to number 3 27% times as often as number 1 - 62% times as often as number 2.
    She appears to have stronger preferences for her favourite artist than I have for mine (44% versus 93%).

    It appears, following this logic, that a "non exponential profile" will have a constant ratio <number of listens to artist n>/<number of listens to artist n-1>, in other words his profile will best be described by an exponential function.
    On the contrary, an "exponential profile" will listen to the few first artists way more often than the remaining, the before mentioned ratio will not be constant. In other words, his profile will no be well described by an exponential function.

    What I did was to import the 50 most listened to artists of the 3 profiles I take as an example, and to:
    1) Divide the number of times every artist was listened by the number of listens of number 1 (in my case, 319). That way, for every listener, artist number 1 will have a score of 1
    2) Subtract 1 to the rank of the artists, so that artist number 1 becomes artist number 0, and so on
    3) Fit an exponential function to the profile

    The fit for C26000 was: score=exp(-0.0252*(rank-1)). His average ratio is therefore of 98% : he listens to artist n+1, on average, 98% times as often as he listens to artist n.
    My fit was: score=exp(-0.0356*(rank-1)). On average, I listen to artist n+1 97% as often as I listen to artist n.
    The fit for koal was: score=exp(-0.0899*(rank-1)). On average, she listens to artist n+1 40% as often as she listens to artist n.

    The coefficient of the fit seems therefore to be a good way to distinguish "non-exponential profiles" from "exponential profiles". There is another criterion that does the job: the R2, a goodness of fit indicator. It was of 83% for C26000, of 84% for me, and of -37% for koal. In other words, an exponential function describes well the number of times C26000 and me have listened to our favourite artists (a perfect fit would yield an R2 of 100%), but not for koal.

    Of course, I could have used another type of function (such as the linear one), and my method uses to many calculations anyway to be used routinely, but I found it funny that the names of the forums We Have Exponential Profiles and We Don't Have Exponential Profiles are mixed up. Hope you don't mind the intrusion!

  • Hhhhhmmmmm.... yeah, but the good thing about the formula as it stands is that thickies like me understand it :)

    Love peace and truth incorporated for all who seek
    • C26000 ha detto...
    • Utente
    • Set 2 2006, 7:04
    @leonelf

    first of all thank you very much for taking your time to analyse all this mathematical stuff

    leonelf said:
    It appears, following this logic, that a "non exponential profile" will have a constant ratio <numberof listens to artist n>/<number of listens to artist n-1>, in other words his profile will best be described by an exponential function.


    if you are rigorous about the exponential function is true , but we can't be so rigorous, we should focus more in how the music is played, as you said someone with an exponential profile is someone that shows stronger preference for their top artists. In the following graph the blue line shows stronger preference for the top artists even if it has constant ratio between n and n-1 artists.



    leonelf said:
    The fit for C26000 was: score=exp(-0.0252*(rank-1)). His average ratio is therefore of 98% : he listens to artist n+1, on average, 98% times as often as he listens to artist n.My fit was: score=exp(-0.0356*(rank-1)). On average, I listen to artist n+1 97% as often as I listen to artist n.
    The fit for koal was: score=exp(-0.0899*(rank-1)). On average, she listens to artist n+1 40% as often as she listens to artist n.


    :)
    this was my first idea when i wanted to start the group, apparently is the best option, you can classify how exponential a profile is based in the adjustable parameter of your formula, you can sort the usernames from less to more exponential

    C26000 0.0252
    leonelf 0.0356
    koal 0.0899

    it looks like an elegant solution , but in the application i found problems because not all profiles are easy to adjust using a simple exponential function, specially those that are very exponential. As you have shown in your example the R2 of koal profile is 37%! that low number shows that the function that you obtained for her profile is not well adapted and any calculation derived from it is not reliable .

    leonelf said:
    here is another criterion that does the job: the R2, a goodness of fit indicator. It was of 83% for C26000, of 84% for me and of -37% for koal


    This is not a valid criterion you can have a perfect fit in very exponential profile and also the very bad fit in a more exponential profile



    Now i'm going to explain a little more my formula, if you find some case in which my formula fails please tell me that I'll try to correct it inmediatly.

    The logic behind my formula is really simple and it has worked really well.

    AEP = 5 - 25 *( Slope / AverageTop50 )

    the slope term measures how fast the number of times listened decreases

    and the averagetop50 has 2 functions:

    1. To normalize slope value

    2. To add some extra points to profiles that are less exponential because in a exponential profile the averagetop50 is less than in a non-exponential
    profile, and this increases the value of AEP when the profile is less exponential because it makes the term 25*(slope/averagetop50) smaller.




    oooaaaa!

    you kept me busy for almost 4 hours

    anyway as i have just graduated i'm an unemployed chemical engineer with no girlfriend :)

  • Cool stuff...

    neocronos
  • Wouldn't exponential regression make more sense, it seems to be what you are going after. The exponential coefficient would be a number >0, the higher it is the 'more exponential' you profile is.

    • C26000 ha detto...
    • Utente
    • Dic 31 2006, 3:20
    that's right , it makes more sense , but the application is another history,, quoting myself in from this same thread

    it looks like an elegant solution , but in the application i found problems because not all profiles are easy to adjust using a simple exponential function, specially those that are very exponential. As you have shown in your example the R2 of koal profile is 37%! that low number shows that the function that you obtained for her profile is not well adapted and any calculation derived from it is not reliable .

    • nwo ha detto...
    • Utente
    • Apr 2 2007, 23:00
    This post continues a discussion from http://www.last.fm/group/We+Don%27t+Have+Exponential+Profiles/journal/2006/05/4/129052.
    Basically I state that the AEP is only a linear approximation based on the 1st and 50th track with the exception of your average term which makes this approximation a rational one. There is no exponential part in the formula, but that does not necessarily mean it does not do its job,

    @C26000
    if you are rigorous about the exponential function is true , but we can't be so rigorous, we should focus more in how the music is played, as you said someone with an exponential profile is someone that shows stronger preference for their top artists. In the following graph the blue line shows stronger preference for the top artists even if it has constant ratio between n and n-1 artists.


    I don't want to seem like a mathematics teacher, but the blue curve does not have a constant ratio but a constant difference (between two arbitrary intervals of same lengths). That is exactly the reason, why the blue line is linear and NOT exponential. Also note that an exponential fit for the blue graph would create a smaller coefficient (a bit like koal) and also a coefficient of determination R2 (I didn't know about this before) which is significant lower than 1 (not as extreme as for koal). So clearly in this example a real exponential fit would prefer the pink graph with both the R2 and exp-coefficient criteria.
    But you are right to point out, that the R2 criteria is meaningless if it is used alone.
    As a consequence I think the exponential coefficient is the best measure for an "exponential profile" both mathematically and practically.

    But maybe I give you some more disadvantages of the AEP. You sent me an excel sheet
    (http://www.sendspace.com/file/9bsam1), which should demonstrate that the AEP takes all values for every artist between the 1st and the 50th rank. My claim was wrong, your statement is clearly right, however I only need one additional constraint. If two profiles have the same sum of played tracks in their top50, than the AEP only further depends on the slope of the 1st and 50th track. Whereas the distribution between the 2nd and 49th place can be completely arbitrary and does not have to be exponential at all.
    For example people who have one artist they favor a little but apart from this a really "non-exponential" profile are really punished with the AEP. Maybe this is what you want but it does not confirm with the definition of an _exponential_ profile.
    I prefer the definition, that a non-exponential profiles means, that you do not listen to only a few favorite artists. A real exponential coefficient would support this definition much more than the AEP.

    Still the AEP is not useless, so I hope you don't feel offended by my analysis. Maybe I am really missing some important part or maybe it is just too late and I should go and get some sleep.

    • C26000 ha detto...
    • Utente
    • Apr 4 2007, 6:26
    nwo said:
    so I hope you don't feel offended by my analysis


    Not at all :)

    nwo said:
    I don't want to seem like a mathematics teacher, but the blue curve does not have a constant ratio but a constant difference (between two arbitrary intervals of same lengths).


    oh, you are right sorry.

    nwo said:
    As a consequence I think the exponential coefficient is the best measure for an "exponential profile" both mathematically and practically.


    It has some problems take a look at the following graphic and analysis.



    Visually, Profile 2 is less exponential than Profile 1, the AEP is agree with that because AEP2(3,88) > AEP1(3,57), but if we use the exponential aproximation (a^(b*(X-50))+c) and use the coeficient b as a indicator of exponentiality, we have b1 (0,0201) < b2 (0,0203) , that implies that profile 2 is more exponential than profiles 1 (as b increases more exponential are the profiles) that contradiction is because an exponential function doesn't fit very well the profile 2 , that's the main problem when using exponential fitting for calculating the exponentiality of a musical profile.


    nwo said:
    however I only need one additional constraint. If two profiles have the same sum of played tracks in their top50, than the AEP only further depends on the slope of the 1st and 50th track. Whereas the distribution between the 2nd and 49th place can be completely arbitrary and does not have to be exponential at all.



    The values of the artists between 2dn and 49th can't be completely arbitrary they must be between the 1st and 50th place and must be always decreasing. those are important restrictions:




    The AverageTop50 term includes a sum of all the top 50 values and it's directly proportional to the area under the profile curve.And like all the profile curves are always decreasing is very hard to find 2 profiles with the same Slope and AverageTop50 values that are very different from each other, and that is even harder to do with very non-exponential profiles (orange points and line), with the other types of profile is somewhat easier to find different profiles that have the same Slope and AverageTop50 values but even if you can find them, more or less your logic tells you that they should have the same AEP* , in the graph I show different profiles (points) and attempts to make a different profiles (thin lines) with the same aep (the same area behind the profile curve as the dotted profiles)


    * For example the profile with the purple points has a strong preference for its top artists but it has a more diverse top sub 20 chart than the purple line profile that has weaker preference for its top artist, in this case both profiles have the same AEP because there are 2 contrary effects.


    here is the xls file if you want to play with some numbers :) -->

    http://rapidshare.com/files/24241350/aep.xls.html

    • nwo ha detto...
    • Utente
    • Apr 4 2007, 19:51
    That was really a lot of work I think, thanks.
    Can you do me the favor and try to divide every value by the area under the curve (Sum(1:50)), that would make it a discrete random variable. After this try the exponential fit. The coefficient for P2 should now be smaller. Maybe It will still be greater than P1's coefficient, or not significant small enough. But lets give it a try.
    By the way, which exponential fit do you apply? You didn't write that in you excel sheet.

    O.k., I don't have much time left. But as hard as I think, I can't really find a bad example against the AEP for the 2nd..29th place. It depends on the slope and you can either

    • choose one constant c for all the places 2..49

    • or
    • choose one constant (max = value of 1st) for the first few values and one constant (min = value of 50th) for the last. (And maybe one for the value in between.)



    Those two possibilities would result in the same AEP. But it is hard to say, if this is really that bad in practice.

    • C26000 ha detto...
    • Utente
    • Apr 4 2007, 21:08
    nwo said:
    Can you do me the favor and try to divide every value by the area under the curve (Sum(1:50)), that would make it a discrete random variable. After this try the exponential fit. The coefficient for P2 should now be smaller. Maybe It will still be greater than P1's coefficient, or not significant small enough. But lets give it a try.


    I did it and it isn't smaller, b2 =0.021777 and b1=0.02

    nwo said:
    By the way, which exponential fit do you apply? You didn't write that in you excel sheet.


    I used this function

    E(Pos) = a^(b*(Pos-50))+c

    and I adjusted a,b and c with solver making R2 min.

    • nwo ha detto...
    • Utente
    • Apr 5 2007, 0:09
    Strange the parameters are nearly the same like before, even if the values are now between 0 and 1?
    However it is probably right that exponential coefficient is smaller for the blue line. Interesting, but maybe the AEP is in this case really a better estimator for an exponential profile than a real exponential approximation.

    I hereby surrender and will now follow the one and only true (AEP) faith. Although if I have some time, I will try a least-squares approximation instead of the R2-approximation. And maybe impose some conditions on the parameters 'a' or 'c' by requiring, that the curve should go through the first or last point. For now lets call it settled.

    • Geoj4 ha detto...
    • Utente
    • Gen 28 2008, 14:02

    Entropy anyone

    Can you elaborate, why you are not using classical measurements like Shannon's entropy. Not only would this allow you to combine measurements of subsets, but it would in the end be much more meaningful in terms of "how much space is needed to encode my playlist". Or in other words, how wide-spread is the distribution of the music I listened to. Independent of monotonic order of the elements.

    • C26000 ha detto...
    • Utente
    • Gen 28 2008, 14:47
    well I can't make a deep elaboration about that ;). The first reason is because I didn't know about shannon's entropy before (I just read something in wikipedia and I still don't know how to apply it to our problem, can you give me an example?) and the second reason is because the aep has worked reasonably well, it doesn't have serious flaws or exceptional cases that require a redesign of the formula.

    • Geoj4 ha detto...
    • Utente
    • Gen 30 2008, 8:12
    Hi, hope my post didn't came too much as criticism. I would probably be much more interested in comparing the measures directly. (Probably together with some subjective evaluation of different profiles for after all these are in the end subjective benchmarks we're talking about.)

    The entropy term is rather easy to calculate. The distribution of "number of plays" has to be normalized to have some statistical probability distribution. (simply do P(x)=x/all in X) And then apply Shannon's entropy formula for discrete channels.

    H = - \sum p(x) log_2(p(x))

    and you should have some evaluation of the uncertainty within the distribution.

    • C26000 ha detto...
    • Utente
    • Gen 30 2008, 15:54
    No problem, I like to see new approaches to the same problem, can you please provide us an explicit formula about what you are talking about, sorry I'm not an statistic expert :S

    • Geoj4 ha detto...
    • Utente
    • Feb 4 2008, 9:06
    I am no statistics expert too. The Shannon-Entropy is a ground laying information theory measure. Just take the above formula without my typos.

    X = the set of observed values (e.g. the values in the top 50)
    p(x) = P(X) ; jep, that has been a typo

    and than make the sum in the entropy formula one over x in X and you did it.

    The beauty of this approach is, that you easily can
    a) normalize it
    b) break it down into subsets (and calculate with it)

    ad a) A normalized entropy would be the quotient of the entropy using p(x) and the entropy of a uniform distribution p_unif = 1/|X|. This would map to [0,1] and you could e.g. compare the normalized entropy of someones top 50 with someone else's top 100. (Not that I deem it very useful, but it's a nice feature.)

    ad b) some decomposition into entropies of subchoices:
    H(top100) =
    H(1/sum(top50), 1/sum(50till100)) 1/sum(top50) * H(top 50) + 1/sum(50till100) H(50till100)

    Where top50 is the set of values of the Top 50 and 50till100 is the set of values from 51 till 100's place.

    Right, b) is of little practical use for us, but, it's probably nice. You could, for example make a combined measure for someone with a split persona.


    Nay, please excuse me, I don't want to advertise here or criticise you formula. I would simply be interested in seeing both measures combined.

    • Geoj4 ha detto...
    • Utente
    • Feb 4 2008, 14:43
    I think I just grasped the main problem of your distribution. Essentially you are relating the slope to the mean of your distribution. which is not bad, but you do not really consider the slope of the function.

    One could argue, but essential the first example below would be just a special case of someone with a very narrow musical selection, while the second example is at least somewhat specialized. The AEP of both would be the same nontheless.

    exmpl 1
    100 100 100 100 0 0 0 0

    exmpl 2
    100 80 65 50 50 35 20 0

    In the entropy measure both should be distinguishable.

    Am I correct?

    • C26000 ha detto...
    • Utente
    • Feb 4 2008, 23:05
    I don't know, I know that the aep of the following profiles is the same, but for me it's not clear is its fair or not.


    100
    100
    100
    100
    100
    0
    0
    0
    0
    0

    AEP TOP 10 = 0



    100
    90
    75
    65
    45
    45
    45
    25
    10
    0

    AEP TOP 10 = 0

    what values produces your approach for those profiles?

Gli utenti anonimi non possono inviare messaggi. Per inserire messaggi nei forum, accedi o crea il tuo account.