Infodump updates: contact dates, comment length, metatalk closures, munging December 14, 2009 7:20 AM Subscribe
Infodump update: we've added a few new features.
New since the August relaunch:
1. Comment length files. For folks interested in analyzing how the general size of a comment correlates to other aspects of site activity, you can now work with number-of-characters information about each comment on mefi, askme, meta and music. These are stored in new files separate from the existing commentdata files.
2. Metatalk thread closure information. We've had a "deleted" column in the postdata files previously, listing a 0 for undeleted and 1 for deleted threads, but now that column in the metatalk file can also have a value of 2 for closed threads and 3 for (rare) threads that are both closed and deleted.
3. Contact creation dates. If you're interested in looking at networking activity over time, you can now explicitly examine contact info in that light. Some of this info is approximate, since we didn't originally track creation date in that table. Details are on the wiki.
4. ID munging. On request from one user back in August, there's now an ID-munging function built into the Infodump scripts, which, for any user who specifically requests to be on the munge list, swaps out their actual userid for a unique 7-digit fake id throughout the dump. It's a very low hurdle to identification, but it's there, for whatever that's worth. Folks doing any analysis that makes assumptions about userids themselves as meaningful values should be aware of and account for this in setting up their analyses.
My to-do list is now completely clear. If folks have other Infodump additions they'd like to see in the future, let me know.
Also, there's been some interesting graphs coming out of this post-November thread, in case you're interested in datawankery but missed it somehow.
New since the August relaunch:
1. Comment length files. For folks interested in analyzing how the general size of a comment correlates to other aspects of site activity, you can now work with number-of-characters information about each comment on mefi, askme, meta and music. These are stored in new files separate from the existing commentdata files.
2. Metatalk thread closure information. We've had a "deleted" column in the postdata files previously, listing a 0 for undeleted and 1 for deleted threads, but now that column in the metatalk file can also have a value of 2 for closed threads and 3 for (rare) threads that are both closed and deleted.
3. Contact creation dates. If you're interested in looking at networking activity over time, you can now explicitly examine contact info in that light. Some of this info is approximate, since we didn't originally track creation date in that table. Details are on the wiki.
4. ID munging. On request from one user back in August, there's now an ID-munging function built into the Infodump scripts, which, for any user who specifically requests to be on the munge list, swaps out their actual userid for a unique 7-digit fake id throughout the dump. It's a very low hurdle to identification, but it's there, for whatever that's worth. Folks doing any analysis that makes assumptions about userids themselves as meaningful values should be aware of and account for this in setting up their analyses.
My to-do list is now completely clear. If folks have other Infodump additions they'd like to see in the future, let me know.
Also, there's been some interesting graphs coming out of this post-November thread, in case you're interested in datawankery but missed it somehow.
aww... I was just coming in here to poke fun about the story in the blue.
but, since you've put the kibosh on that, all I can say is thanks for the good work. I've never used the infodump myself, but I think it's super cool that you guys go to the effort to make this information so easily available.
/tip of the hat.
posted by 256 at 7:26 AM on December 14, 2009
but, since you've put the kibosh on that, all I can say is thanks for the good work. I've never used the infodump myself, but I think it's super cool that you guys go to the effort to make this information so easily available.
/tip of the hat.
posted by 256 at 7:26 AM on December 14, 2009
wag of the finger.
posted by gman at 7:27 AM on December 14, 2009 [1 favorite]
posted by gman at 7:27 AM on December 14, 2009 [1 favorite]
pump of the rump.
posted by Secret Life of Gravy at 7:31 AM on December 14, 2009
posted by Secret Life of Gravy at 7:31 AM on December 14, 2009
Also, since I'm not positive I hadn't made a couple small script revisions since the last run of the dump, I'm regenerating it now. The contacts data is I think the only section that would be affected by this, however, so if you're raring to go on the rest of it you're all set immediately.
posted by cortex (staff) at 7:33 AM on December 14, 2009
posted by cortex (staff) at 7:33 AM on December 14, 2009
Stan: Those New Yorker kids are gonna be here any second, and we still don't know what queef means.
Kyle: Well, we can still pre-tend like we know what it means.
Stan: No, they'll catch on. Hey, wait a minute. I've got a great idea. Let's make up our own word. We can make up a word, and then use it, …and then they'll act like they know it, and then we'll bust 'em.
Kyle: Yeah. That'll make 'em look stupid!
Stan: What word could we make up?
Kyle: How about… finkleroy?
Stan: No, uhno, not finkleroy.
Cartman: How about geebo, or, or mung?
Stan: Yeah, mung.
Kyle: Mung's good.
Stan: Sh. Here they come. [the New Yorkers arrive]
Tough Guy 1: Well hel-lo there, queefs. All bundled up nice and warm, are we?
Stan: You know what you guys are? You guys are nothing but mung?
Tough Guy 2: We're not mung. You're mung.
Kyle: Oh, so you know what mung means, hunh?
Tough Guy 1: Of course we know what mung means!
Athlete: Yeah, D'ya think we wouldn't know what mung means? [Stan laughs, then Kyle, Cartman, and Kenny join in]
Stan: We busted you!
Kyle: Hyeh. Yeah. Mung isn't even a word! We made it up! [they resume laughing]
Tough Guy 1: You guys are even stupider than I thought! Mung is so a word!
Stan: [the boys stop laughing] It is?
New Yorkers: [behind the two toughs and two others] Yeah. [they turn around]
Athlete: It sure is.
New Yorker 1: Yeah.
New Yorker 2: Uh huh. [turns around]
Tough Guy 1: Yeah! Mung is the stuff that comes out when you push down on a pregnant woman's stomach.
Kyle: [winces] Eewww.
Stan: Ooogh.
Tough Guy 1: You guys didn't know that? [the rest of the New Yorkers turn around and they all laugh. Then, the rest of the 4 million+ kids laugh with them] Come on, guys. Let's get away from these rednecks before we get redneckasitis, or somethin'! [they leave. Stan, Kyle, and Kenny turn on Cartman]
Stan: You dumbass, Cartman!
Kyle: Yeah! Next time you make up a word, don't make up one that already exists!
posted by jefficator at 7:56 AM on December 14, 2009
Kyle: Well, we can still pre-tend like we know what it means.
Stan: No, they'll catch on. Hey, wait a minute. I've got a great idea. Let's make up our own word. We can make up a word, and then use it, …and then they'll act like they know it, and then we'll bust 'em.
Kyle: Yeah. That'll make 'em look stupid!
Stan: What word could we make up?
Kyle: How about… finkleroy?
Stan: No, uhno, not finkleroy.
Cartman: How about geebo, or, or mung?
Stan: Yeah, mung.
Kyle: Mung's good.
Stan: Sh. Here they come. [the New Yorkers arrive]
Tough Guy 1: Well hel-lo there, queefs. All bundled up nice and warm, are we?
Stan: You know what you guys are? You guys are nothing but mung?
Tough Guy 2: We're not mung. You're mung.
Kyle: Oh, so you know what mung means, hunh?
Tough Guy 1: Of course we know what mung means!
Athlete: Yeah, D'ya think we wouldn't know what mung means? [Stan laughs, then Kyle, Cartman, and Kenny join in]
Stan: We busted you!
Kyle: Hyeh. Yeah. Mung isn't even a word! We made it up! [they resume laughing]
Tough Guy 1: You guys are even stupider than I thought! Mung is so a word!
Stan: [the boys stop laughing] It is?
New Yorkers: [behind the two toughs and two others] Yeah. [they turn around]
Athlete: It sure is.
New Yorker 1: Yeah.
New Yorker 2: Uh huh. [turns around]
Tough Guy 1: Yeah! Mung is the stuff that comes out when you push down on a pregnant woman's stomach.
Kyle: [winces] Eewww.
Stan: Ooogh.
Tough Guy 1: You guys didn't know that? [the rest of the New Yorkers turn around and they all laugh. Then, the rest of the 4 million+ kids laugh with them] Come on, guys. Let's get away from these rednecks before we get redneckasitis, or somethin'! [they leave. Stan, Kyle, and Kenny turn on Cartman]
Stan: You dumbass, Cartman!
Kyle: Yeah! Next time you make up a word, don't make up one that already exists!
posted by jefficator at 7:56 AM on December 14, 2009
My to-do list is now completely clear.
Perfect! Could you go run some errands for me?
posted by amyms at 8:00 AM on December 14, 2009
Perfect! Could you go run some errands for me?
posted by amyms at 8:00 AM on December 14, 2009
Quiver of the liver.
posted by the littlest brussels sprout at 8:01 AM on December 14, 2009
posted by the littlest brussels sprout at 8:01 AM on December 14, 2009
I was playing around with the infodump just yesterday. I wanted to know how many users had achieved the triple-K, which I define as >= 1000 comments on each of MeFi, Ask, and MeTa. To my surprise it was a relatively low number: 44 people.
I fess up: I'm 12 blue comments away from achieving this and I was a little curious how common it was.
posted by Rhomboid at 8:17 AM on December 14, 2009
I fess up: I'm 12 blue comments away from achieving this and I was a little curious how common it was.
posted by Rhomboid at 8:17 AM on December 14, 2009
I guess it doesn't hurt, but for anyone that wants the info, the userid masking is so easy to get around as to seem nearly pointless.
posted by gsteff at 8:17 AM on December 14, 2009
posted by gsteff at 8:17 AM on December 14, 2009
Fart of the heart?
posted by Captain Cardanthian! at 8:22 AM on December 14, 2009
posted by Captain Cardanthian! at 8:22 AM on December 14, 2009
cortex is a self-linker!
posted by cjorgensen at 8:28 AM on December 14, 2009
posted by cjorgensen at 8:28 AM on December 14, 2009
Rhomboid: "To my surprise it was a relatively low number: 44 people."
Considering I'm 2.5-K (I've got exactly 500 in the green), and I consider myself far from prolific, I'm shocked that it's that low.
posted by Plutor at 8:30 AM on December 14, 2009 [1 favorite]
Considering I'm 2.5-K (I've got exactly 500 in the green), and I consider myself far from prolific, I'm shocked that it's that low.
posted by Plutor at 8:30 AM on December 14, 2009 [1 favorite]
Yeah, me too. I didn't investigate any farther but I suspect that there's shitloads of people with a kilo of blue and a crapload with a kilo of green, and a ton with both, but also having the kilo of grey really narrows the field.
posted by Rhomboid at 8:31 AM on December 14, 2009
posted by Rhomboid at 8:31 AM on December 14, 2009
(And BTW that's with the Dec-6 dataset, not that I'd expect it to change much over a week.)
posted by Rhomboid at 8:37 AM on December 14, 2009
posted by Rhomboid at 8:37 AM on December 14, 2009
I was playing around with the infodump just yesterday. I wanted to know how many users had achieved the triple-K, which I define as >= 1000 comments on each of MeFi, Ask, and MeTa. To my surprise it was a relatively low number: 44 people.
I'd say "Challenge accepted," but I don't think there's any possible way I could make another 883 quality answers on the green in less than a long long time. I'm just not that smart.
posted by Caduceus at 8:49 AM on December 14, 2009
I'd say "Challenge accepted," but I don't think there's any possible way I could make another 883 quality answers on the green in less than a long long time. I'm just not that smart.
posted by Caduceus at 8:49 AM on December 14, 2009
Time certainly does seem to be a factor. Of the 44, only 4 signed up in 2006 or later; 25 were pre-$5-signup users.
posted by Rhomboid at 9:02 AM on December 14, 2009
posted by Rhomboid at 9:02 AM on December 14, 2009
I'm another 2.5Ker (500 AskMe answers). I'd be curious how many people break the 1000 comment barrier in MetaTalk.
posted by Kattullus at 9:08 AM on December 14, 2009
posted by Kattullus at 9:08 AM on December 14, 2009
To my surprise it was a relatively low number: 44 people.
only 4 signed up in 2006 or later; 25 were pre-$5-signup users.
I guess I shouldn't be surprised to be on this list - my nickname when I was a kid was motormouth.
posted by rtha at 9:09 AM on December 14, 2009
only 4 signed up in 2006 or later; 25 were pre-$5-signup users.
I guess I shouldn't be surprised to be on this list - my nickname when I was a kid was motormouth.
posted by rtha at 9:09 AM on December 14, 2009
Ohh, I'm getting closer! Is there a prize? I like prizes!
posted by iamkimiam at 9:11 AM on December 14, 2009
posted by iamkimiam at 9:11 AM on December 14, 2009
I wanted to know how many users had achieved the triple-K
*checks profile*
Damn it. I guess I'm going to be spending some time in Askme.
posted by quin at 9:13 AM on December 14, 2009
*checks profile*
Damn it. I guess I'm going to be spending some time in Askme.
posted by quin at 9:13 AM on December 14, 2009
Burhanistan: "24Heh. And even stranger"
Somebody should drop him a MeFi Mail..."Don't touch anything!"
posted by iamkimiam at 9:15 AM on December 14, 2009 [1 favorite]
Somebody should drop him a MeFi Mail..."Don't touch anything!"
posted by iamkimiam at 9:15 AM on December 14, 2009 [1 favorite]
There are only 44 and I'm one of them? Maybe I should spend less time here.
posted by grouse at 9:16 AM on December 14, 2009 [1 favorite]
posted by grouse at 9:16 AM on December 14, 2009 [1 favorite]
Here's the full breakdown:
blue-K: 634
green-K: 287
grey-K: 114
blue-K + grey-K: 108
blue-K + green-K: 142
grey-K + green-K: 45
triple-K: 44
posted by Rhomboid at 9:23 AM on December 14, 2009
blue-K: 634
green-K: 287
grey-K: 114
blue-K + grey-K: 108
blue-K + green-K: 142
grey-K + green-K: 45
triple-K: 44
posted by Rhomboid at 9:23 AM on December 14, 2009
Man I wish google chart's venn diagrams weren't such shit.
posted by cortex (staff) at 9:28 AM on December 14, 2009
posted by cortex (staff) at 9:28 AM on December 14, 2009
I'm deeply disturbed to be in a group of people 108 strong. If it turns out we all spent too much time watching Lost I think I might just 'splode.
posted by Kattullus at 9:33 AM on December 14, 2009
posted by Kattullus at 9:33 AM on December 14, 2009
Kattullus: "If it turns out we all spent too much time watching Lost I think I might just 'splode."
If I said "there's no such thing as 'too much time watching Lost'" would that help your head situation?
posted by Plutor at 9:41 AM on December 14, 2009
If I said "there's no such thing as 'too much time watching Lost'" would that help your head situation?
posted by Plutor at 9:41 AM on December 14, 2009
So green K is for Superman, everyone knows that; I'm pretty sure blue K works on Bizarros, but what's this grey K for?
posted by jtron at 9:52 AM on December 14, 2009 [1 favorite]
posted by jtron at 9:52 AM on December 14, 2009 [1 favorite]
I stopped watching when they had that incredibly contrived mudwrestling scene at the end of season 3. Did it ever pick up again?
posted by Kattullus at 9:52 AM on December 14, 2009
posted by Kattullus at 9:52 AM on December 14, 2009
Obligatory "I want Markovfilter to come back" post.
posted by flatluigi at 10:06 AM on December 14, 2009 [1 favorite]
posted by flatluigi at 10:06 AM on December 14, 2009 [1 favorite]
I'm deeply disturbed to be in a group of people 108 strong
I think maybe you, me, and the other hundred and six should get together and establish some kind of club.
Well call ourselves the MetaMilitia and sit around drinking and cursing the cabal. (all while secretly trying to figure out how to get in ourselves.)
Not that there is one, of course.
posted by quin at 10:13 AM on December 14, 2009
I think maybe you, me, and the other hundred and six should get together and establish some kind of club.
Well call ourselves the MetaMilitia and sit around drinking and cursing the cabal. (all while secretly trying to figure out how to get in ourselves.)
Not that there is one, of course.
posted by quin at 10:13 AM on December 14, 2009
There are only 44 and I'm one of them? Maybe I should spend less time here.
posted by grouse
Uhhh ditto. And that's not counting my sockpuppet.
But I like calling us The Fab 44, so there's that.
posted by The Deej at 10:14 AM on December 14, 2009
posted by grouse
Uhhh ditto. And that's not counting my sockpuppet.
But I like calling us The Fab 44, so there's that.
posted by The Deej at 10:14 AM on December 14, 2009
I think "triple K" has bad connotation though.
posted by The Deej at 10:19 AM on December 14, 2009 [1 favorite]
posted by The Deej at 10:19 AM on December 14, 2009 [1 favorite]
I guess the SQL DDLs need to be reworked to include the new columns too.
posted by smackfu at 10:21 AM on December 14, 2009
posted by smackfu at 10:21 AM on December 14, 2009
I think "triple K" has bad connotation though.
Agreed. Maybe the "Three G" instead? Or, as I like to call you, "Bastards who have answered more questions than me".
posted by quin at 10:39 AM on December 14, 2009
Agreed. Maybe the "Three G" instead? Or, as I like to call you, "Bastards who have answered more questions than me".
posted by quin at 10:39 AM on December 14, 2009
Huh, I am 6 away from elite 44 status. Time for an alphabet thread!
posted by Rumple at 11:06 AM on December 14, 2009
posted by Rumple at 11:06 AM on December 14, 2009
Let's just shorten it to 'Dumpers.
Good, it's settled... 'Dumpers.
posted by the littlest brussels sprout at 11:06 AM on December 14, 2009
Good, it's settled... 'Dumpers.
posted by the littlest brussels sprout at 11:06 AM on December 14, 2009
So what's the status of MarkovFilter? I tried finding it the other night but ran into some 404's :(
posted by localhuman at 11:07 AM on December 14, 2009 [1 favorite]
posted by localhuman at 11:07 AM on December 14, 2009 [1 favorite]
It remains in Unsolved Security Issue territory. Me and pb have talked about a couple possible ways to try and make it resurrectable without any worries, but it's a little complicated and just hasn't been that high of a priority.
posted by cortex (staff) at 11:32 AM on December 14, 2009
posted by cortex (staff) at 11:32 AM on December 14, 2009
I'm a bit imbalanced, so I need to post more in Metatalk.
posted by smackfu at 11:33 AM on December 14, 2009
posted by smackfu at 11:33 AM on December 14, 2009
smackfu: I'm a bit imbalanced, so I need to post more in Metatalk.
I believe that it's in fact that because you're balanced you don't post that much in MetaTalk.
I was mighty perturbed to realize that I am fast approaching 2000 MetaTalk comments. And I seriously never thought of myself as a top 100 MetaTalk commenter.
posted by Kattullus at 11:37 AM on December 14, 2009
I believe that it's in fact that because you're balanced you don't post that much in MetaTalk.
I was mighty perturbed to realize that I am fast approaching 2000 MetaTalk comments. And I seriously never thought of myself as a top 100 MetaTalk commenter.
posted by Kattullus at 11:37 AM on December 14, 2009
I guess the SQL DDLs need to be reworked to include the new columns too
I'm on it.
posted by FishBike at 11:56 AM on December 14, 2009 [1 favorite]
I'm on it.
posted by FishBike at 11:56 AM on December 14, 2009 [1 favorite]
Huh, I'm only 36 35 comments away from being blue-K + grey-K.
posted by Pronoiac at 12:22 PM on December 14, 2009
posted by Pronoiac at 12:22 PM on December 14, 2009
from
posted by OmieWise at 12:34 PM on December 14, 2009 [1 favorite]
posted by OmieWise at 12:34 PM on December 14, 2009 [1 favorite]
three.
I really didn't mean this joke to go on this long.
posted by OmieWise at 12:35 PM on December 14, 2009
I really didn't mean this joke to go on this long.
posted by OmieWise at 12:35 PM on December 14, 2009
Ok, I've updated the SQL scripts to create and load Infodump databases. If you have any trouble with these, MeFiMail me.
Also, as I don't have MySQL, I can't test the scripts for it, so would someone please volunteer to try the two MySQL scripts and let me know if they work for you?
posted by FishBike at 12:46 PM on December 14, 2009
Also, as I don't have MySQL, I can't test the scripts for it, so would someone please volunteer to try the two MySQL scripts and let me know if they work for you?
posted by FishBike at 12:46 PM on December 14, 2009
Hey cortex, on a related note, I noticed a few months back that MarkovFilter was down. (I wanted to show a friend what distilled me sounded like and was disappointed that he had to make do with undistilled me.) It's still down. Is this intentional? Any chance we could get it running again?
posted by painquale at 12:52 PM on December 14, 2009
posted by painquale at 12:52 PM on December 14, 2009
Since August, it's been an open question whether the jump in contacts was due to spouses or the Tenth Anniversary parties. I've compiled a graph of contact adding dates, using the new timestamp data (thanks, cortex!). You can see the stunning conclusion there.
spoiler: *shruggo*
posted by Pronoiac at 12:54 PM on December 14, 2009
spoiler: *shruggo*
posted by Pronoiac at 12:54 PM on December 14, 2009
Wow, the 3K and 2-out-of-3K numbers are indeed suprisingly low. Can we attach usernames to these numbers somehow or is that unkosher? I'm curious now.
posted by goodnewsfortheinsane at 12:59 PM on December 14, 2009
posted by goodnewsfortheinsane at 12:59 PM on December 14, 2009
Er, yeah I just looked at the pastebin dump. Can you maybe do this for the "2 out of 3"s?
posted by goodnewsfortheinsane at 1:00 PM on December 14, 2009
posted by goodnewsfortheinsane at 1:00 PM on December 14, 2009
Incidentally, I love the comment length data. Some more charts will be forthcoming as a result of this.
I'm not too sure what the MeTa thread closure information will be good for, but I assume somebody had a use for it already, and that's why it's there now.
Contact creation dates, even approximate ones, have a variety of cool uses. They're useful in the analysis of how a person's contact network influences their use of the site, because now we can tell roughly what that network looked like on any particular date, instead of just what it looks like now. And this adds another dimension to the contact network visualization possibilities.
I have a couple of questions about the userid munging:
posted by FishBike at 1:05 PM on December 14, 2009
I'm not too sure what the MeTa thread closure information will be good for, but I assume somebody had a use for it already, and that's why it's there now.
Contact creation dates, even approximate ones, have a variety of cool uses. They're useful in the analysis of how a person's contact network influences their use of the site, because now we can tell roughly what that network looked like on any particular date, instead of just what it looks like now. And this adds another dimension to the contact network visualization possibilities.
I have a couple of questions about the userid munging:
- Is the userid in the usernames table also munged? (From a quick inspection, I think it's not.)
- Is a munged userid munged to the same value in all the tables where it appears, other than the usernames table? (I hope it is).
posted by FishBike at 1:05 PM on December 14, 2009
Hmm. I'm 522..er 521 grey comments out of contention.
I don't think I have enough _________ to make up that ground before 2k becomes the new 1k.
posted by juv3nal at 1:15 PM on December 14, 2009
I don't think I have enough _________ to make up that ground before 2k becomes the new 1k.
posted by juv3nal at 1:15 PM on December 14, 2009
We need more Metafilter Achievements, with unlockable DLC and special profile themes.
posted by backseatpilot at 1:16 PM on December 14, 2009 [3 favorites]
posted by backseatpilot at 1:16 PM on December 14, 2009 [3 favorites]
Is the userid in the usernames table also munged? (From a quick inspection, I think it's not.)
Nope. That's left as normal; low a barrier as the munging is, it'd be even lower if the username table explicitly identified the munged id with the username, heh.
Is a munged userid munged to the same value in all the tables where it appears, other than the usernames table? (I hope it is).
It is. Every munge is unique from every other munge (through a not-very-fancy arithmetic transform), and the munge should remain static over time—the transformation function won't need revisiting until we close in on userid 1,000,000 and at that point we'll all be driving around the post-apocalyptic landscape in dunebuggies, wearing leather and spikes and discussing who it is, exactly, that runs Bartertown.
Obviously it's possible to demunge manually where necessary, but for politeness' sake it'd be good to take apparent munging as a hint to not include a munged user's identity in any name-based results generated from analysis.
posted by cortex (staff) at 1:19 PM on December 14, 2009
Nope. That's left as normal; low a barrier as the munging is, it'd be even lower if the username table explicitly identified the munged id with the username, heh.
Is a munged userid munged to the same value in all the tables where it appears, other than the usernames table? (I hope it is).
It is. Every munge is unique from every other munge (through a not-very-fancy arithmetic transform), and the munge should remain static over time—the transformation function won't need revisiting until we close in on userid 1,000,000 and at that point we'll all be driving around the post-apocalyptic landscape in dunebuggies, wearing leather and spikes and discussing who it is, exactly, that runs Bartertown.
Obviously it's possible to demunge manually where necessary, but for politeness' sake it'd be good to take apparent munging as a hint to not include a munged user's identity in any name-based results generated from analysis.
posted by cortex (staff) at 1:19 PM on December 14, 2009
Obviously it's possible to demunge manually where necessary, but for politeness' sake it'd be good to take apparent munging as a hint to not include a munged user's identity in any name-based results generated from analysis
Cool, thanks for the clarifications. Yeah, I assumed the point was to prevent accidental disclosure of names for those MeFites who don't want to appear in the results of our datawankery efforts. This is a good way of doing that as it would be hard to include them by accident.
Anyone who includes them on purpose, well, I guess they just get the rubber hose treatment, or similar.
posted by FishBike at 1:33 PM on December 14, 2009
Cool, thanks for the clarifications. Yeah, I assumed the point was to prevent accidental disclosure of names for those MeFites who don't want to appear in the results of our datawankery efforts. This is a good way of doing that as it would be hard to include them by accident.
Anyone who includes them on purpose, well, I guess they just get the rubber hose treatment, or similar.
posted by FishBike at 1:33 PM on December 14, 2009
I've got over 3,000 comments on the blue, 2,000 on the grey, and 2,500 favorites received. I've also met almost 150 mefites. I'm going to play with the dump while I wait to find out what my prize is.
posted by Eideteker at 1:38 PM on December 14, 2009
posted by Eideteker at 1:38 PM on December 14, 2009
Also, as I don't have MySQL, I can't test the scripts for it, so would someone please volunteer to try the two MySQL scripts and let me know if they work for you?
Thanks! It seems to be fine. I did get some warnings on a few rows in a few tables but that might have been existing (and I don't really know how to tell what the warning was for in MySQL).
posted by smackfu at 1:56 PM on December 14, 2009
Thanks! It seems to be fine. I did get some warnings on a few rows in a few tables but that might have been existing (and I don't really know how to tell what the warning was for in MySQL).
posted by smackfu at 1:56 PM on December 14, 2009
Although... apparently MySQL does not like doing joins between two three-million-row tables. That makes the length tables somewhat less useful. Maybe I can just add length to the commentdata table.
posted by smackfu at 2:12 PM on December 14, 2009
posted by smackfu at 2:12 PM on December 14, 2009
Here's a top 25 list of users with more than 500 comments in the blue, green, and grey combined, by ratio of grey comments to blue+green comments.
posted by FishBike at 2:13 PM on December 14, 2009 [1 favorite]
Uh oh, I'm on that list! And I'm the newest MeFite on that list, too.
- pb: (9.570:1)
- If I Had An Anus: (3.067:1)
- and hosted from Uranus: (2.811:1)
- gramschmidt: (2.727:1)
- timeistight: (2.669:1)
- cortex: (2.510:1)
- little e: (1.865:1)
- cgc373: (1.524:1)
- dg: (1.352:1)
- Kwine: (1.331:1)
- Cranberry: (1.299:1)
- Plutor: (1.193:1)
- breezeway: (1.180:1)
- It's Raining Florence Henderson: (1.100:1)
- team lowkey: (1.083:1)
- SpiffyRob: (1.047:1)
- Dave Faris: (0.996:1)
- jessamyn: (0.989:1)
- carsonb: (0.957:1)
- Alvy Ampersand: (0.929:1)
- FishBike: (0.923:1)
- Ethereal Bligh: (0.920:1)
- CKmtl: (0.918:1)
- anapestic: (0.916:1)
- gleemax: (0.906:1)
posted by FishBike at 2:13 PM on December 14, 2009 [1 favorite]
If we got achievements for Metafilter, I'd be tempted to aim for the Triple-K (100GC). As it is... whatever... have your elite cabal.
posted by yeti at 2:18 PM on December 14, 2009
posted by yeti at 2:18 PM on December 14, 2009
If we got achievements for Metafilter, I'd be tempted to aim for the Triple-K (100GC).
At great personal peril, I hacked into the sekrit cabal-only site and found this.
posted by juv3nal at 2:48 PM on December 14, 2009
At great personal peril, I hacked into the sekrit cabal-only site and found this.
posted by juv3nal at 2:48 PM on December 14, 2009
Here's a top 25 list of users with more than 500 comments in the blue, green, and grey combined, by ratio of grey comments to blue+green comments.
It's interesting to see people who spend a disproportionate amount of time on one specific subsite. How about the same list for [green vs grey/blue] and [blue vs green/grey]?
posted by chrisamiller at 2:51 PM on December 14, 2009
It's interesting to see people who spend a disproportionate amount of time on one specific subsite. How about the same list for [green vs grey/blue] and [blue vs green/grey]?
posted by chrisamiller at 2:51 PM on December 14, 2009
I expect that would be much more lopsided. There are tons of people that primarily just hang out on AskMe or the blue and infrequently venture to other parts of the site.
posted by Rhomboid at 3:00 PM on December 14, 2009
posted by Rhomboid at 3:00 PM on December 14, 2009
there's now an ID-munging function built into the Infodump scripts, which, for any user who specifically requests to be on the munge list
Hi cortex, can you please put my ID on the munge list? Thanks!
posted by Blazecock Pileon at 3:18 PM on December 14, 2009
Hi cortex, can you please put my ID on the munge list? Thanks!
posted by Blazecock Pileon at 3:18 PM on December 14, 2009
pump of the rump.
Throb of the knob. Angle of the dangle. And, away we go.
posted by ericb at 3:21 PM on December 14, 2009
Throb of the knob. Angle of the dangle. And, away we go.
posted by ericb at 3:21 PM on December 14, 2009
Should be possible to do a ternary plot of proportion of users' comments in each of blue, grey, green.
posted by Electric Dragon at 3:23 PM on December 14, 2009
posted by Electric Dragon at 3:23 PM on December 14, 2009
Consider yourself munged, BP. The Infodump regenerates weekly (on Sunday morning, I think), so changes to the munge list will take up to a week to percolate into the data files.
posted by cortex (staff) at 3:24 PM on December 14, 2009
posted by cortex (staff) at 3:24 PM on December 14, 2009
Thanks, cortex.
posted by Blazecock Pileon at 3:34 PM on December 14, 2009
posted by Blazecock Pileon at 3:34 PM on December 14, 2009
Never mind - I got bored and crunched them myself. This shows users who disproportionately contribute to one subsite.
Same criteria as FishBike - onlye users that have 500 or more comments on all three sites combined. This obviously excludes those who have only contributed to one of the three subsites to avoid divisions by zero.
Blue : (Green+Grey)
Green : (Grey+Blue)
Grey : (Blue+Green)
BTW, FishBike, my numbers differ from yours for the grey - I've got mathowie in at #8, and he's absent from your list all together. The rest seem the same, though
posted by chrisamiller at 3:52 PM on December 14, 2009
Same criteria as FishBike - onlye users that have 500 or more comments on all three sites combined. This obviously excludes those who have only contributed to one of the three subsites to avoid divisions by zero.
Blue : (Green+Grey)
1) Foosnark 1112.0
2) tomplus2 962.0
3) HTuttle 950.5
4) spazzm 941.5
5) peeping_Thomist 823.0
6) rough ashlar 768.333
7) raygirvan 656.0
8) Cerebus 636.0
9) tgrundke 594.0
10) Western Infidels 579.0
11) Relay 540.0
12) QuietDesperation 539.0
13) dwivian 530.0
14) mike3k 513.0
15) FormlessOne 476.0
16) kozad 436.25
17) zoogleplex 424.6
18) kliuless 415.833
19) Elim 369.5
20) Mitrovarr 318.25
21) fleener 273.333
22) CynicalKnight 263.833
23) digaman 258.333
24) sfts2 251.333
25) kgasmart 250.0
Green : (Grey+Blue)
1) thinkingwoman 783.333
2) 4ster 650.0
3) rhizome 626.2
4) cooker girl 590.0
5) autojack 566.0
6) londongeezer 501.0
7) mu~ha~ha~ha~har 381.0
8) sully75 374.666
9) JimN2TAW 334.5
10) peanut_mcgillicuty 333.0
11) Nelsormensch 314.5
12) christinetheslp 270.0
13) jon1270 239.25
14) vanoakenfold 216.333
15) anadem 207.75
16) PatoPata 200.5
17) Gerard Sorme 171.333
18) bkeene12 169.0
19) devilsbrigade 135.8
20) Sara Anne 131.8
21) crazycanuck 119.0
22) sharkfu 110.25
23) Good Brain 108.047
24) advicepig 105.6
25) micawber 102.6
Grey : (Blue+Green)
1) pb 9.57
2) If I Had An Anus 3.067
3) and hosted from Uranus 2.81
4) gramschmidt 2.727
5) timeistight 2.668
6) cortex 2.51
7) little e 1.864
8) mathowie 1.557
9) cgc373 1.523
10) dg 1.352
11) Kwine 1.33
12) Cranberry 1.299
13) Plutor 1.192
14) breezeway 1.179
15) It's Raining Florence Henderson 1.099
16) team lowkey 1.082
17) SpiffyRob 1.046
18) Dave Faris 0.996
19) jessamyn 0.989
20) carsonb 0.957
21) Alvy Ampersand 0.928
22) FishBike 0.923
23) Ethereal Bligh 0.92
24) CKmtl 0.917
25) anapestic 0.916
BTW, FishBike, my numbers differ from yours for the grey - I've got mathowie in at #8, and he's absent from your list all together. The rest seem the same, though
posted by chrisamiller at 3:52 PM on December 14, 2009
"I'm going to play with the dump"
ew.
posted by mr_crash_davis mark II: Jazz Odyssey at 4:24 PM on December 14, 2009
ew.
posted by mr_crash_davis mark II: Jazz Odyssey at 4:24 PM on December 14, 2009
BTW, FishBike, my numbers differ from yours for the grey - I've got mathowie in at #8, and he's absent from your list all together. The rest seem the same, though
Ha! That's because mathowie is the first user in the usernames table, and stupidly I tried to fix what looked like a bug in the data load script, and instead fixed the part of it that was right. It was skipping the first row in each file, which would be virtually unnoticeable for any table except username. Thanks for catching that!
I've re-updated the infodump_load.sql script to correct this, sorry to anyone who downloaded it earlier this evening. (The MySQL scripts I think are fine in this regard.)
posted by FishBike at 5:04 PM on December 14, 2009
Ha! That's because mathowie is the first user in the usernames table, and stupidly I tried to fix what looked like a bug in the data load script, and instead fixed the part of it that was right. It was skipping the first row in each file, which would be virtually unnoticeable for any table except username. Thanks for catching that!
I've re-updated the infodump_load.sql script to correct this, sorry to anyone who downloaded it earlier this evening. (The MySQL scripts I think are fine in this regard.)
posted by FishBike at 5:04 PM on December 14, 2009
I don't know what the munge list even is, but... I want to be on it.
And I know that's fucked up.
posted by flapjax at midnite at 5:38 PM on December 14, 2009
And I know that's fucked up.
posted by flapjax at midnite at 5:38 PM on December 14, 2009
Oh, I just read the post. Yes, munge me, please. i wanna lunge into the munge.
posted by flapjax at midnite at 5:39 PM on December 14, 2009
posted by flapjax at midnite at 5:39 PM on December 14, 2009
Do longer comments get more favorites? Yes. And graphing comment length frequency sure is regular.
(These are using the MeFi comment data only, rounding the comment length up to the nearest 10, and then weeding out any that then had a count of less than 1000. The AskMe data is essentially the same graphs.)
The only two zero-length comments with favorites: 1, 2
posted by smackfu at 5:47 PM on December 14, 2009 [4 favorites]
(These are using the MeFi comment data only, rounding the comment length up to the nearest 10, and then weeding out any that then had a count of less than 1000. The AskMe data is essentially the same graphs.)
The only two zero-length comments with favorites: 1, 2
posted by smackfu at 5:47 PM on December 14, 2009 [4 favorites]
I thought it might be interesting to find the shortest best answers, but they are just "yes" or "no". So it's not.
posted by smackfu at 5:57 PM on December 14, 2009
posted by smackfu at 5:57 PM on December 14, 2009
I've got a lot of comments to make on the gray if I'm going to win this. *frantically composes erotic haiku involving decapods and traditional Bavarian garments in need of a polish*
I've also met almost 150 mefites.
I have 69 contacts that I claim to have met, but I have no idea what the actual number is. I suck at remembering names at the best of times, but give me a real name plus an internet name to remember, and throw in my excessive alcohol consumption over the summer and I'm hopeless. I know I've forgotten a decent number of names. It is also entirely possible that I've claimed to have met someone, when in reality I met someone with a remotely similar username. Though I have yet to receive any baffled mefi mail from a new contact questioning our alleged encounter.
posted by little e at 6:08 PM on December 14, 2009
I've also met almost 150 mefites.
I have 69 contacts that I claim to have met, but I have no idea what the actual number is. I suck at remembering names at the best of times, but give me a real name plus an internet name to remember, and throw in my excessive alcohol consumption over the summer and I'm hopeless. I know I've forgotten a decent number of names. It is also entirely possible that I've claimed to have met someone, when in reality I met someone with a remotely similar username. Though I have yet to receive any baffled mefi mail from a new contact questioning our alleged encounter.
posted by little e at 6:08 PM on December 14, 2009
I thought it might be interesting to find the shortest best answers, but they are just "yes" or "no". So it's not.
I'm curious what the relationship of comment length to "best answer" is. Do longer comments tend to be marked "best answer" more often than shorter comments?
posted by FishBike at 6:10 PM on December 14, 2009
I'm curious what the relationship of comment length to "best answer" is. Do longer comments tend to be marked "best answer" more often than shorter comments?
posted by FishBike at 6:10 PM on December 14, 2009
smackfu: "Although... apparently MySQL does not like doing joins between two three-million-row tables. "
It doesn't mind unless your tables are totally un-indexed, which they are. Here's my recommendations for indexes to add to the MySQL file. Can't test it from here, though.
posted by Plutor at 6:12 PM on December 14, 2009
It doesn't mind unless your tables are totally un-indexed, which they are. Here's my recommendations for indexes to add to the MySQL file. Can't test it from here, though.
posted by Plutor at 6:12 PM on December 14, 2009
Oops, I meant to come back and mention that too. No primary keys in there either, which you've added too. Maybe Fishbike can swap out your create script for the one he has.
posted by smackfu at 7:10 PM on December 14, 2009
posted by smackfu at 7:10 PM on December 14, 2009
I thought I'd leave the indexes as the dreaded exercise for the reader, since useful indexes will depend on what queries you're running and what you're running them on. But if there's a set that will be generally useful, I can put a script or two for those onto that page with the other scripts, if anybody would like to send me a file.
Similarly, if defining a primary key makes a material difference with a particular database server platform, and anybody wants to send me an updated copy of the table creation script, I'll update the page with that.
I have a few other things I do with the Infodump data once it's loaded into a database, such as combining the data that's split into 4 separate files (for the 4 sub-sites) into a set of consolidated tables with a "siteid" field. I then transform the favorites data so it's easy to match up with those consolidated tables. If anybody would find that sort of stuff useful, let me know and I'll add that to the page too.
posted by FishBike at 7:25 PM on December 14, 2009
Similarly, if defining a primary key makes a material difference with a particular database server platform, and anybody wants to send me an updated copy of the table creation script, I'll update the page with that.
I have a few other things I do with the Infodump data once it's loaded into a database, such as combining the data that's split into 4 separate files (for the 4 sub-sites) into a set of consolidated tables with a "siteid" field. I then transform the favorites data so it's easy to match up with those consolidated tables. If anybody would find that sort of stuff useful, let me know and I'll add that to the page too.
posted by FishBike at 7:25 PM on December 14, 2009
Is it possible to have an indexed sqlite database available for download? It would allow exportation to a variety of formats (csv, etc), and shouldn't be difficult to create. Having this format available with the text files would solve the various parsing issues that crop up and would reduce the amount of time needed to start using the data.
posted by null terminated at 7:50 PM on December 14, 2009 [1 favorite]
posted by null terminated at 7:50 PM on December 14, 2009 [1 favorite]
Also, since the database is static, it makes sense to index everything (where 'everything' is some subset of everything).
posted by null terminated at 7:51 PM on December 14, 2009
posted by null terminated at 7:51 PM on December 14, 2009
There's something oddly fitting, to me, that 2 of the top 3 posters to metatalk are literally assholes.
posted by Cold Lurkey at 8:51 PM on December 14, 2009
posted by Cold Lurkey at 8:51 PM on December 14, 2009
2 of the top 3 posters to metatalk are literally assholes.
And, iirc, the same person? or no?
I think I have met the most MeFites. True?
posted by jessamyn (staff) at 9:10 PM on December 14, 2009
And, iirc, the same person? or no?
I think I have met the most MeFites. True?
posted by jessamyn (staff) at 9:10 PM on December 14, 2009
Probably, although cortex caught up a bit with his nationwide tour. Alas, the contact details like "met" aren't in the dump. Blah blah privacy or some such.
posted by smackfu at 9:15 PM on December 14, 2009
posted by smackfu at 9:15 PM on December 14, 2009
According to her contacts, jessamyn has met 374 users; cortex stands at 295. DaShiv, who's been at a bunch of meetups, has 158, and Ambrosia Voyeur at 108. Many of the users I often see on local meetup threads have 30-60 contacts (yes I read meetup threads for places I don't live in).
There might, of course, be other people with a lot of "met" contacts, since I just did that by browsing around.
posted by Monday, stony Monday at 9:34 PM on December 14, 2009
There might, of course, be other people with a lot of "met" contacts, since I just did that by browsing around.
posted by Monday, stony Monday at 9:34 PM on December 14, 2009
(Did I ever miss someone: ThePinkSuperHero, at 278.
posted by Monday, stony Monday at 9:37 PM on December 14, 2009
posted by Monday, stony Monday at 9:37 PM on December 14, 2009
I have 135.
DaShiv, I'm coming for you!
I miss DaShiv... anyone know why he disappeared?
posted by Kattullus at 9:42 PM on December 14, 2009
DaShiv, I'm coming for you!
I miss DaShiv... anyone know why he disappeared?
posted by Kattullus at 9:42 PM on December 14, 2009
I can confirm through the infodump that jessamyn, cortex and ThePinkSuperHero have "met" the most people through contacts.
posted by Monday, stony Monday at 9:52 PM on December 14, 2009
posted by Monday, stony Monday at 9:52 PM on December 14, 2009
SQL question: how can I treat all the commentdata_* tables as a single big table? Something like a view (call "it commentdata_all") with a primary key of (site,commentid). My SQL was never very good to start with, and I haven't done any since back when MoFi was so active they were having hosting problems.
posted by Monday, stony Monday at 11:03 PM on December 14, 2009
posted by Monday, stony Monday at 11:03 PM on December 14, 2009
how can I treat all the commentdata_* tables as a single big table
UNION.
posted by Electric Dragon at 2:44 AM on December 15, 2009
UNION.
posted by Electric Dragon at 2:44 AM on December 15, 2009
I would be really interesting in seeing a graph of favorites over time both site total and by user, because I vaguely remember having a pretty middling number of favorites until like nine months ago when I achieved some sort of SNARK APOTHEOSIS. I was curious whether favorites behavior had changed or I just posted a shitload more.
posted by Optimus Chyme at 7:21 AM on December 15, 2009
posted by Optimus Chyme at 7:21 AM on December 15, 2009
Man I wish google chart's venn diagrams weren't such shit.
Triple-K full breakdown
posted by Akeem at 7:25 AM on December 15, 2009 [1 favorite]
Triple-K full breakdown
posted by Akeem at 7:25 AM on December 15, 2009 [1 favorite]
Monday, stony Monday: "SQL question: how can I treat all the commentdata_* tables as a single big table? Something like a view (call "it commentdata_all") with a primary key of (site,commentid). My SQL was never very good to start with, and I haven't done any since back when MoFi was so active they were having hosting problems"
I guess this answers my earlier question about whether or not anybody would find my table consolidation code helpful. So I've posted it here in a new Table Consolidation and Favorites Transformation section of my SQL scripts page.
The script does two things:
I guess this answers my earlier question about whether or not anybody would find my table consolidation code helpful. So I've posted it here in a new Table Consolidation and Favorites Transformation section of my SQL scripts page.
The script does two things:
- consolidates all the Infodump data that is split into separate tables by sub-site (adding a 'siteid' field to distinguish them)
- makes a transformed version of the favorites data that replaces the 'type, target, parent' business with 'siteid, postid, commentid' for easier joining with other tables
I can confirm through the infodump that jessamyn, cortex and ThePinkSuperHero have "met" the most people through contacts.
I wanted to find out who has "married" the most people through contacts, but apparently that information is not in the dump. Anyway, I bet I know who it is [NOT SPOUSIST].
posted by grouse at 7:38 AM on December 15, 2009
I wanted to find out who has "married" the most people through contacts, but apparently that information is not in the dump. Anyway, I bet I know who it is [NOT SPOUSIST].
posted by grouse at 7:38 AM on December 15, 2009
Optimus Chyme: "I would be really interesting in seeing a graph of favorites over time both site total and by user, because I vaguely remember having a pretty middling number of favorites until like nine months ago when I achieved some sort of SNARK APOTHEOSIS. I was curious whether favorites behavior had changed or I just posted a shitload more"
It's not quite a graph, but here's a table showing how your activity compares to the site as a whole since January 2008, by month.
Average favorites per comment has been going up steadily, site-wide. Your average favorites per comment has also been going up steadily. The last column is especially noisy, but I think it shows that overall, your favorites/comment growth outpaces the general growth rate on the site as a whole a little bit.
posted by FishBike at 9:15 AM on December 15, 2009 [5 favorites]
It's not quite a graph, but here's a table showing how your activity compares to the site as a whole since January 2008, by month.
/- comments ----------------------\ /- favorites----------------------\ /- favorites per comment ----\ all user user all user user year month # # % of all # # % of all all user user/all 2008 1 84934 31 0.036% 48299 102 0.211% 0.569 3.290 5.786 2008 2 76556 37 0.048% 41213 97 0.235% 0.538 2.622 4.870 2008 3 79243 28 0.035% 49141 74 0.151% 0.620 2.643 4.262 2008 4 80191 15 0.019% 43806 47 0.107% 0.546 3.133 5.736 2008 5 74541 36 0.048% 41725 136 0.326% 0.560 3.778 6.749 2008 6 75003 61 0.081% 48774 240 0.492% 0.650 3.934 6.050 2008 7 80187 73 0.091% 55553 173 0.311% 0.693 2.370 3.421 2008 8 78087 39 0.050% 57007 72 0.126% 0.730 1.846 2.529 2008 9 81515 69 0.085% 78116 525 0.672% 0.958 7.609 7.940 2008 10 80923 78 0.096% 69410 281 0.405% 0.858 3.603 4.200 2008 11 74267 36 0.048% 64087 149 0.232% 0.863 4.139 4.796 2008 12 75877 85 0.112% 64664 282 0.436% 0.852 3.318 3.893 2009 1 84811 89 0.105% 79682 326 0.409% 0.940 3.663 3.899 2009 2 75059 127 0.169% 71410 395 0.553% 0.951 3.110 3.269 2009 3 87028 166 0.191% 80043 786 0.982% 0.920 4.735 5.148 2009 4 85818 157 0.183% 81562 529 0.649% 0.950 3.369 3.545 2009 5 80044 128 0.160% 81900 566 0.691% 1.023 4.422 4.322 2009 6 92503 160 0.173% 101426 1011 0.997% 1.096 6.319 5.763 2009 7 95361 120 0.126% 103988 451 0.434% 1.090 3.758 3.447 2009 8 90459 100 0.111% 98420 424 0.431% 1.088 4.240 3.897 2009 9 90221 154 0.171% 106918 1135 1.062% 1.185 7.370 6.219 2009 10 90370 102 0.113% 111718 497 0.445% 1.236 4.873 3.941 2009 11 88463 115 0.130% 112296 918 0.817% 1.269 7.983 6.288 2009 12 43060 85 0.197% 49820 652 1.309% 1.157 7.671 6.630So, around February 2009 your number of comments went up, both in absolute terms, and as a percentage of the total number of comments site-wide that were yours. The same thing happened with favorites--more for you in absolute numbers, and more as a percentage of site-wide favorites given in each month.
Average favorites per comment has been going up steadily, site-wide. Your average favorites per comment has also been going up steadily. The last column is especially noisy, but I think it shows that overall, your favorites/comment growth outpaces the general growth rate on the site as a whole a little bit.
posted by FishBike at 9:15 AM on December 15, 2009 [5 favorites]
That is awesome. You are the best.
posted by Optimus Chyme at 9:32 AM on December 15, 2009
posted by Optimus Chyme at 9:32 AM on December 15, 2009
Metafilter: I can confirm through the infodump
posted by flapjax at midnite at 4:57 PM on December 15, 2009
posted by flapjax at midnite at 4:57 PM on December 15, 2009
I didn't investigate any farther but I suspect that there's shitloads of people with a kilo of blue and a crapload with a kilo of green, and a ton with both, but also having the kilo of grey really narrows the field.
Is having kilo of grey as good as having buns of steel?
posted by Secret Life of Gravy at 5:21 PM on December 15, 2009
Is having kilo of grey as good as having buns of steel?
posted by Secret Life of Gravy at 5:21 PM on December 15, 2009
So far in 2009:
MetaFilter: The most frequent commenting 10% (885 people) made 72% of all the comments.
AskMe: The most frequent commenting 10% (1202 people) made 64% of all the comments.
MetaTalk: The most frequent commenting 10% (380 people) made 71% of all the comments.
posted by smackfu at 5:38 PM on December 15, 2009 [3 favorites]
MetaFilter: The most frequent commenting 10% (885 people) made 72% of all the comments.
AskMe: The most frequent commenting 10% (1202 people) made 64% of all the comments.
MetaTalk: The most frequent commenting 10% (380 people) made 71% of all the comments.
posted by smackfu at 5:38 PM on December 15, 2009 [3 favorites]
Is that 10% of the "active workforce" (all people with at least one comment in 2009) or 10% of all registered users?
But thanks for that; that's something I really wanted to see. In fact (and I may already have said as much), if anybody knows a really good way of showing the repartition of "wealth" among commenters (in terms of volume of comment), I'd like to know.
posted by Monday, stony Monday at 6:18 PM on December 15, 2009
But thanks for that; that's something I really wanted to see. In fact (and I may already have said as much), if anybody knows a really good way of showing the repartition of "wealth" among commenters (in terms of volume of comment), I'd like to know.
posted by Monday, stony Monday at 6:18 PM on December 15, 2009
10% of the people who commented in 2009.
Unless I did something wrong. Which I might have, because by my stats, there are 42k users and only 20k have EVER posted a comment to the blue. Same numbers for the green. Unless those two sets of users are very disjoint, that seems weird.
posted by smackfu at 6:27 PM on December 15, 2009
Unless I did something wrong. Which I might have, because by my stats, there are 42k users and only 20k have EVER posted a comment to the blue. Same numbers for the green. Unless those two sets of users are very disjoint, that seems weird.
posted by smackfu at 6:27 PM on December 15, 2009
I just ran a quick count of how many distinct users have commented in each sub-site:
posted by FishBike at 6:35 PM on December 15, 2009 [1 favorite]
- AskMe: 19847
- MeFi: 19892
- MeTa: 8411
- Music: 1634
posted by FishBike at 6:35 PM on December 15, 2009 [1 favorite]
Ah, I think a big part of that is that there are around 10k free users who never posted. Which makes sense, since free isn't much of a barrier.
posted by smackfu at 7:06 PM on December 15, 2009
posted by smackfu at 7:06 PM on December 15, 2009
This is a preventive (and probably unnecessary) service announcement: http://www.metafilter.com/81106/Information-doesnt-want-to-be-scale-free
posted by Monday, stony Monday at 7:19 PM on December 15, 2009
posted by Monday, stony Monday at 7:19 PM on December 15, 2009
The rest will as links to another site; but I thought this was really interesting. Here's the number of comments and commenters for every month, ever, on mefi.
+------+-----------+----------+-------+ | yyyy | mmmmm | comments | users | +------+-----------+----------+-------+ | 1999 | June | 1 | 1 | | 1999 | July | 11 | 3 | | 1999 | August | 8 | 3 | | 1999 | September | 21 | 4 | | 1999 | October | 16 | 5 | | 1999 | November | 48 | 9 | | 1999 | December | 26 | 10 | | 2000 | January | 312 | 64 | | 2000 | February | 853 | 150 | | 2000 | March | 1371 | 196 | | 2000 | April | 1931 | 269 | | 2000 | May | 3750 | 367 | | 2000 | June | 3723 | 359 | | 2000 | July | 3299 | 368 | | 2000 | August | 3222 | 384 | | 2000 | September | 3961 | 412 | | 2000 | October | 5355 | 434 | | 2000 | November | 6056 | 505 | | 2000 | December | 4769 | 475 | | 2001 | January | 9677 | 662 | | 2001 | February | 8144 | 755 | | 2001 | March | 10478 | 862 | | 2001 | April | 12324 | 995 | | 2001 | May | 13983 | 1183 | | 2001 | June | 15416 | 1052 | | 2001 | July | 15226 | 1024 | | 2001 | August | 9028 | 1097 | | 2001 | September | 22378 | 1955 | | 2001 | October | 25053 | 1683 | | 2001 | November | 20565 | 1458 | | 2001 | December | 16531 | 1537 | | 2002 | January | 22730 | 1632 | | 2002 | February | 22049 | 1490 | | 2002 | March | 20921 | 1507 | | 2002 | April | 20110 | 1512 | | 2002 | May | 18138 | 1437 | | 2002 | June | 15761 | 1367 | | 2002 | July | 18875 | 1446 | | 2002 | August | 24407 | 1967 | | 2002 | September | 24977 | 2152 | | 2002 | October | 26055 | 2325 | | 2002 | November | 19668 | 2002 | | 2002 | December | 16105 | 1834 | | 2003 | January | 18980 | 1759 | | 2003 | February | 22479 | 1960 | | 2003 | March | 21440 | 1851 | | 2003 | April | 20634 | 1806 | | 2003 | May | 16518 | 1733 | | 2003 | June | 17787 | 1705 | | 2003 | July | 21308 | 1703 | | 2003 | August | 17823 | 1643 | | 2003 | September | 19177 | 1642 | | 2003 | October | 19263 | 1629 | | 2003 | November | 19607 | 1604 | | 2003 | December | 22625 | 1734 | | 2004 | January | 23809 | 1765 | | 2004 | February | 17445 | 1579 | | 2004 | March | 26638 | 1822 | | 2004 | April | 27461 | 1878 | | 2004 | May | 24825 | 1777 | | 2004 | June | 30673 | 1894 | | 2004 | July | 28376 | 1830 | | 2004 | August | 29036 | 1806 | | 2004 | September | 29570 | 1821 | | 2004 | October | 28602 | 1806 | | 2004 | November | 42375 | 3031 | | 2004 | December | 56359 | 3479 | | 2005 | January | 53239 | 3553 | | 2005 | February | 52862 | 3657 | | 2005 | March | 49939 | 3719 | | 2005 | April | 47690 | 3779 | | 2005 | May | 46391 | 3625 | | 2005 | June | 49878 | 3715 | | 2005 | July | 53149 | 3863 | | 2005 | August | 55293 | 4081 | | 2005 | September | 60548 | 4112 | | 2005 | October | 58220 | 4218 | | 2005 | November | 70861 | 4431 | | 2005 | December | 70252 | 4502 | | 2006 | January | 72000 | 4699 | | 2006 | February | 62773 | 4653 | | 2006 | March | 75535 | 4883 | | 2006 | April | 64176 | 4753 | | 2006 | May | 66221 | 4870 | | 2006 | June | 67162 | 4988 | | 2006 | July | 70283 | 5102 | | 2006 | August | 72707 | 5310 | | 2006 | September | 62130 | 5058 | | 2006 | October | 70227 | 5315 | | 2006 | November | 67866 | 5297 | | 2006 | December | 65938 | 5404 | | 2007 | January | 74891 | 5663 | | 2007 | February | 65430 | 5367 | | 2007 | March | 68375 | 5477 | | 2007 | April | 76363 | 5664 | | 2007 | May | 80248 | 5758 | | 2007 | June | 73615 | 5659 | | 2007 | July | 78175 | 5763 | | 2007 | August | 76476 | 5734 | | 2007 | September | 75698 | 5697 | | 2007 | October | 82245 | 5984 | | 2007 | November | 85161 | 5978 | | 2007 | December | 74929 | 5912 | | 2008 | January | 84934 | 6322 | | 2008 | February | 76556 | 6240 | | 2008 | March | 79243 | 6503 | | 2008 | April | 80191 | 6425 | | 2008 | May | 74541 | 6322 | | 2008 | June | 75003 | 6310 | | 2008 | July | 80187 | 6467 | | 2008 | August | 78087 | 6364 | | 2008 | September | 81515 | 6468 | | 2008 | October | 80923 | 6369 | | 2008 | November | 74267 | 6393 | | 2008 | December | 75877 | 6551 | | 2009 | January | 84811 | 6716 | | 2009 | February | 75059 | 6512 | | 2009 | March | 87028 | 6886 | | 2009 | April | 85818 | 6850 | | 2009 | May | 80044 | 6699 | | 2009 | June | 92503 | 6906 | | 2009 | July | 95361 | 7217 | | 2009 | August | 90459 | 7025 | | 2009 | September | 90221 | 7095 | | 2009 | October | 90370 | 7086 | | 2009 | November | 88463 | 7076 | | 2009 | December | 43060 | 5427 | +------+-----------+----------+-------+posted by Monday, stony Monday at 10:51 PM on December 15, 2009 [1 favorite]
Nice, MsM. Clearly the next step is a pretty graphs. But that more or less confirms the impression I've had from previous data-diving outings: we've had steady growth over time, but at a pretty slow rate compared to what sometimes feels like the popular perception.
The last two years has seen something like a 10% increase in total comments and commenting users, which is nothing to sneeze at but also not quite the storming-of-the-gates that gets suggested.
Of course, the flat numbers like that don't express the rate of turnover; if there's approximately the same number of folks commenting each month but a hundred oldbies bail and a hundred newbies take their place, that's a lot of displacement, which would create a legitimate sense of growth/change even if the raw aggregate numbers are fairly static.
Also interesting to look at the peak-and-decline around Sept 2002, that only ever partially recovered before Nov 2004 when signups reopened. The whole stretch from Sept 2001 to Nov 2004 is an interesting period in userbase history.
posted by cortex (staff) at 7:03 AM on December 16, 2009
The last two years has seen something like a 10% increase in total comments and commenting users, which is nothing to sneeze at but also not quite the storming-of-the-gates that gets suggested.
Of course, the flat numbers like that don't express the rate of turnover; if there's approximately the same number of folks commenting each month but a hundred oldbies bail and a hundred newbies take their place, that's a lot of displacement, which would create a legitimate sense of growth/change even if the raw aggregate numbers are fairly static.
Also interesting to look at the peak-and-decline around Sept 2002, that only ever partially recovered before Nov 2004 when signups reopened. The whole stretch from Sept 2001 to Nov 2004 is an interesting period in userbase history.
posted by cortex (staff) at 7:03 AM on December 16, 2009
Metafilter owes its success to The Events of September 11th.
I leave the reader to work out the implications.
posted by Rumple at 10:11 AM on December 16, 2009
I leave the reader to work out the implications.
posted by Rumple at 10:11 AM on December 16, 2009
The most active 10% of those active in a given year: how often are they commenting?
posted by Pronoiac at 10:45 AM on December 16, 2009
posted by Pronoiac at 10:45 AM on December 16, 2009
I made a chart of the number of active users across all sub-sites, by month with some extra information to look at user turnover. I categoried the users who were active (posted or commented) in each month as follows:
posted by FishBike at 10:57 AM on December 16, 2009
- Arrived and stayed: their first activity was in this month, and they were active in a subsequent month.
- Stayed: their first activity was in an earlier month, and they were active in a subsequent month, too.
- Arrived and left: their only activity was in this month.
- Left: their first activity was in an earlier month, but their last activity was in this month.
posted by FishBike at 10:57 AM on December 16, 2009
I'm curious, do posts to the Blue made during the weekend get fewer comments and favorites on average than posts made during the week?
posted by Kattullus at 6:58 AM on December 20, 2009
posted by Kattullus at 6:58 AM on December 20, 2009
Hmm, we've looked at the timing of posts, comments, and favorites before, but not how the timing of the post itself affects these. So based on the datestamp of every front-page post:
posted by FishBike at 7:13 AM on December 20, 2009 [1 favorite]
Day Avg Comments Avg Favorites Sun 32.892721 4.764401 Mon 33.367296 4.028600 Tue 34.662431 3.989863 Wed 33.393797 3.988335 Thu 33.086604 3.909726 Fri 31.663129 3.708222 Sat 31.191443 4.626126There seems to be a slight drop in comments on Friday, Saturday, and Sunday... but a significant increase in average favorites for posts made on the weekend. That's for all of MeFi's history. If we look at just the data for posts in 2009:
Day Avg Comments Avg Favorites Sun 46.402571 13.958456 Mon 49.941473 13.713150 Tue 50.714749 12.743989 Wed 49.653896 13.207142 Thu 49.413397 12.966203 Fri 48.208473 12.802286 Sat 45.667016 13.702731The numbers are larger, but the overall trend looks about the same.
posted by FishBike at 7:13 AM on December 20, 2009 [1 favorite]
Huh? That's so not what I was expecting. I was pretty certain that there'd be little or no difference between the weekend and weekdays and I briefly considered that posts made during the weekend got a little bit less in terms of comments and favorites, never that they'd get more.
Also, 13 is considerably higher than I thought the favorites average would be. I thought it would be more like 10.
posted by Kattullus at 7:36 AM on December 20, 2009
Also, 13 is considerably higher than I thought the favorites average would be. I thought it would be more like 10.
posted by Kattullus at 7:36 AM on December 20, 2009
For MeFi's entire history, median favorites on posts is 0 (more than half of all posts don't have any, probably since most of them pre-date the favorites feature).
For 2009 so far, the median favorites per post is 8.
I wonder if the minor difference in comments and favorites on the weekend has to do with a subjective difference in the types of content people post on weekends? When I previously looked at the comments and favorites vs. tags used on the post, it was clear that the subjects people favorite a lot differ a great deal from the subjects people comment on a lot.
posted by FishBike at 7:52 AM on December 20, 2009 [2 favorites]
For 2009 so far, the median favorites per post is 8.
I wonder if the minor difference in comments and favorites on the weekend has to do with a subjective difference in the types of content people post on weekends? When I previously looked at the comments and favorites vs. tags used on the post, it was clear that the subjects people favorite a lot differ a great deal from the subjects people comment on a lot.
posted by FishBike at 7:52 AM on December 20, 2009 [2 favorites]
Do we see a dip in the raw volume of posts on the weekend? Somewhat fewer posts, each getting a slightly-higher-than-average disbursement of favorites, would un-rock my conceptual world, for example.
posted by cortex (staff) at 8:14 AM on December 20, 2009
posted by cortex (staff) at 8:14 AM on December 20, 2009
Do we see a dip in the raw volume of posts on the weekend?
Yes, in a previous Infodump discussion I made this chart showing site activity by weekday. Front page posting activity definitely drops off on weekends, as does commenting and favoriting. It looks like favoriting doesn't drop off as much, in relative terms, as posting activity.
The lines for favoriting activity include favorites on posts and comments, not just posts, but given the numbers above I don't see how the curves for favorites on posts only could be any different.
posted by FishBike at 8:28 AM on December 20, 2009
Yes, in a previous Infodump discussion I made this chart showing site activity by weekday. Front page posting activity definitely drops off on weekends, as does commenting and favoriting. It looks like favoriting doesn't drop off as much, in relative terms, as posting activity.
The lines for favoriting activity include favorites on posts and comments, not just posts, but given the numbers above I don't see how the curves for favorites on posts only could be any different.
posted by FishBike at 8:28 AM on December 20, 2009
To see if people post significantly different stuff on the weekends, I ran a list of the top 25 most popular tags for posts on weekdays vs weekends, for all of MeFi history and for just 2009. They look pretty similar, but there are minor differences:
All-Time 2009 Weekdays Weekends Weekdays Weekends music music music music art art art art politics politics photography Photography history photography history history photography history video video flash iraq science youtube science video youtube politics iraq war film science war science politics film video film obama comics bush flash game food film bush flash flash internet youtube food obama usa USA Comics game humor religion Movies documentary games humor Games tv movies Movies design design Religion games animation animation youtube television books China television terrorism war japan game books Internet economics terrorism game tv technology technology Internet literature movies law technology obituary batshitinsane books design religion booksposted by FishBike at 8:54 AM on December 20, 2009
There's some difference, but largely it seems it's not that great. Nothing that leaps out at me, anyway.
posted by Kattullus at 9:09 AM on December 20, 2009
posted by Kattullus at 9:09 AM on December 20, 2009
Perhaps people have more time to put together stupidly awesome mega-link posts over a weekend that then attract loads of favourites.
posted by Electric Dragon at 4:33 PM on December 21, 2009
posted by Electric Dragon at 4:33 PM on December 21, 2009
Perhaps people have more time to put together stupidly awesome mega-link posts over a weekend that then attract loads of favourites.
It's possible. Other hypotheses that come to mind:
posted by FishBike at 6:33 PM on December 21, 2009
It's possible. Other hypotheses that come to mind:
- Posts stay on the front page longer on weekends, giving people more time to notice and favorite them.
- Favoriting is far less time-consuming than posting, so people making minimal use of MeFi on weekends have time to favorite things but not to create new posts.
- People are in a better mood on weekends and are thus more likely to enjoy a post enough to favorite it.
- People don't have as much time to read the linked articles or the comment thread on weekends, so they're more likely to favorite it as a bookmark to come back to later.
posted by FishBike at 6:33 PM on December 21, 2009
Random science project, if anyone is interested.
posted by cortex (staff) at 10:46 AM on December 22, 2009
posted by cortex (staff) at 10:46 AM on December 22, 2009
but a significant increase in average favorites for posts made on the weekend.
I hate to be overly pedantic, but did you actually do a t-test or something to see if the apparent difference is likely to be caused by chance? Given the size of the dataset, I'm also inclined to believe that it's significant, but stats have a way of being non-intuitive sometimes.
posted by chrisamiller at 2:26 PM on December 22, 2009
I hate to be overly pedantic, but did you actually do a t-test or something to see if the apparent difference is likely to be caused by chance? Given the size of the dataset, I'm also inclined to believe that it's significant, but stats have a way of being non-intuitive sometimes.
posted by chrisamiller at 2:26 PM on December 22, 2009
I hate to be overly pedantic, but did you actually do a t-test or something to see if the apparent difference is likely to be caused by chance?
No, and since step 1 of me doing that would be "look up t-test on Wikipedia", I'm pretty sure I wouldn't do it right if I tried.
posted by FishBike at 3:02 PM on December 22, 2009
No, and since step 1 of me doing that would be "look up t-test on Wikipedia", I'm pretty sure I wouldn't do it right if I tried.
posted by FishBike at 3:02 PM on December 22, 2009
I've made a filter for Infodump files, which might help with analysis.
posted by Pronoiac at 12:21 PM on December 23, 2009
posted by Pronoiac at 12:21 PM on December 23, 2009
Heh, whoops, let me try that again: infodump filter.
posted by Pronoiac at 1:40 PM on December 23, 2009
posted by Pronoiac at 1:40 PM on December 23, 2009
Doing some sanity checks on beanplate, I ran across some gotchas.
1. These post titles are likely to cause hiccups while processing the Infodump. Maybe they should be edited.
AskMe 30523: ^M
Five minutes here, ten there, oh no!
Mefi 37099: Living can be lovely, here in New York State Ah, but I wish that I^M
Living can be lovely, here in New York state / Ah, but I wish that I were home again
Mefi 77685: “Every man dies - Not every man really lives.”^M
^M
--William Ross Wallace
2. It looks like the AskMe & Music postdata files have some (er, 1031?) entries without the "[NULL]" deletion reason. This is, admittedly, possibly utterly irrelevant to everyone & everything whatsoever.
I caught these with the beanplate condition: "\$#read_fields != \$#fields" - though running this on the posttitles files returns a not-terribly-helpful list of posts without titles.
3. The wiki has a note about an error - with favorites for comments without parents - but checking for "type =~ /^(2|4|6|9|12|13)$/ && parent == 0" showed that those have definitely been fixed. (I confirmed the output on an older Infodump with those errors.)
posted by Pronoiac at 3:59 PM on January 4, 2010
1. These post titles are likely to cause hiccups while processing the Infodump. Maybe they should be edited.
AskMe 30523: ^M
Five minutes here, ten there, oh no!
Mefi 37099: Living can be lovely, here in New York State Ah, but I wish that I^M
Living can be lovely, here in New York state / Ah, but I wish that I were home again
Mefi 77685: “Every man dies - Not every man really lives.”^M
^M
--William Ross Wallace
2. It looks like the AskMe & Music postdata files have some (er, 1031?) entries without the "[NULL]" deletion reason. This is, admittedly, possibly utterly irrelevant to everyone & everything whatsoever.
I caught these with the beanplate condition: "\$#read_fields != \$#fields" - though running this on the posttitles files returns a not-terribly-helpful list of posts without titles.
3. The wiki has a note about an error - with favorites for comments without parents - but checking for "type =~ /^(2|4|6|9|12|13)$/ && parent == 0" showed that those have definitely been fixed. (I confirmed the output on an older Infodump with those errors.)
posted by Pronoiac at 3:59 PM on January 4, 2010
Pretty sure I just fixed those titles, not sure about the other stuff.
posted by jessamyn (staff) at 4:49 PM on January 4, 2010
posted by jessamyn (staff) at 4:49 PM on January 4, 2010
You are not logged in, either login or create an account to post comments
posted by cortex (staff) at 7:20 AM on December 14, 2009 [6 favorites]