A relatively serious general-interest web community. April 8, 2015 3:48 AM Subscribe

Link rot on Mefi, discussed on Aeon Ideas.
posted by Segundus to MetaFilter-Related at 3:48 AM (64 comments total) 6 users marked this as a favorite

Interesting piece, though I'm sure many long-term members would take issue with the contention that Metafilter is a "veteran link site which is ... almost unchanged since [April 2000]." So much (posting norms; what's considered appropriate here in terms of commenting; the degree of moderation) has changed over the last 15 years, changes which are perhaps elided by the continuities in site design.

I do wonder if, the longer MeFi does continue to exist, it's going to become a significant primary resource in itself for studying digital culture in the early twenty-first century.
posted by Sonny Jim at 4:59 AM on April 8, 2015 [12 favorites]

The anchor tag specs should be amended to allow date information. Future browsers could then automatically redirect to the Internet Archive's cached version most proximate to that date.
posted by nobody at 5:06 AM on April 8, 2015 [28 favorites]

I've periodically thought of doing something like this, going back and checking say 50 FPPs from a year ago, 2 years ago, 3 years ago, etc. And also possibly looking at the links themselves to see if I could find the pages they were linking to at other addresses or behind paywalls (more likely for things like newspaper sites that periodically change their link format but don't necessarily drop content). But then I figure somebody else has probably done this more comprehensively and better than I could. There was an interesting similar check done on the Million Dollar Home Page. I suspect I read about it on Metafilter, actually.
posted by jacquilynne at 5:28 AM on April 8, 2015

Jessamyn and others did a serious link rot sort here in the not-so-distant past and tagged a lot of posts with the "brokenlink" tag, but obviously that's an ongoing battle.
posted by briank at 5:38 AM on April 8, 2015 [1 favorite]

This is terrifying if you think about it too much.
posted by Potomac Avenue at 6:13 AM on April 8, 2015 [5 favorites]

This is terrifying if you think about it too much.

It's even more terrifying if you go all paranoid and consider that someone might decide to break the links on purpose.

/Paranoiddudetalk
posted by digitalprimate at 6:25 AM on April 8, 2015 [1 favorite]

I'd love to see MetaFilter be on the forefront of some kind of technical solution to this problem. Given pb is some kind of time-travelling supergenius, it shouldn't be too hard, right?

Wait you guys is pb a Time Lord? That would explain so much.
posted by Rock Steady at 6:37 AM on April 8, 2015 [1 favorite]

Jessamyn and others did a serious link rot sort here in the not-so-distant past and tagged a lot of posts with the "brokenlink" tag, but obviously that's an ongoing battle.

I'm curious, what are people supposed to do with that once posts are tagged?
posted by smackfu at 6:44 AM on April 8, 2015

Maybe it would be interesting to have a service that pushed links in new MeFi posts to perma.cc. Then you could have a site acting as a mirror to MeFi that rewrote the links to point to the archive.
posted by topynate at 7:05 AM on April 8, 2015 [5 favorites]

I'm curious, what are people supposed to do with that once posts are tagged?

It was part of a larger backtagging thing where we went through and added tags to posts that didn't have them. Often this involved checking out the original link. If it was broken we said so. I think the idea was that we could maybe sub in an Internet Archive link or something in some future pass through all of this. But realistically, that's not a great idea. Linkrot is in and of itself a sort of interesting thing (whose links die, whose live) but I don't think it's a think MeFi should do something about programatically. I bet there's a Greasemonkey script or Chrome thingdoo that would let you check all the links in a post via the Wayback Machine.

Realistically finding old stuff is something the internet has decided it's not really for and since MeFi is a part of the Internet this was always part of the culture here. I Have Opinions about how good search/find mechanisms can alleviate this but I've been busy hassling the Internet Archive about this over the past year (you can imagine the number of fucks they give).
posted by jessamyn (retired) at 7:33 AM on April 8, 2015 [18 favorites]

"... the veteran link site which is still running..."
*writer muffles astonishment*
posted by joseph conrad is fully awesome at 7:33 AM on April 8, 2015

(Metafilter: the veteran link site which is still running 🎉)
posted by joseph conrad is fully awesome at 7:35 AM on April 8, 2015 [7 favorites]

*fiddles with ring on finger*
This too shall pass.
posted by the man of twists and turns at 7:47 AM on April 8, 2015

It's even more terrifying if you go all paranoid and consider that someone might decide to break the links on purpose.

The Cabal would like a word with you, digitalprimate. Right this way, please.
posted by Johnny Wallflower at 7:50 AM on April 8, 2015 [2 favorites]

Jessamyn and others did a serious link rot sort here in the not-so-distant past and tagged a lot of posts with the "brokenlink" tag, but obviously that's an ongoing battle.

Could we maybe introduce a "brokenlink" flag? Even if we can't do something like update everything with appropriate archive.org / current links, it'd be cool if we could throw up banner warnings that the links have died. Which would be kind of silly, but would make it clear that we care.
posted by Going To Maine at 8:11 AM on April 8, 2015 [1 favorite]

I guess I am a contrbutor to link-rot here, authoring musical megaposts.

Because of the current structure of the World-Loom, I am reduced to linking to Youtube to provide examples of what these tunes sound like. I test every one of them just before posting. However, Youtube links are quite volatile for various reasons. Some last seemingly the life of Youtube, others are 'cyst and deceased' within days. It can be difficult to predict which.

My first music megapost six years ago featured 19 links to Youtube. Today, all but 4 are broken. However, every one of those clips behind a broken link is actually still available on Youtube (and other places, if one were interested) -- at a different address.

Wak-a-mole!

Not to say that I'd advocate banning Youtube links, at least not before the World Wide Wurlitzer goes online. That wouldn't leave me with much to do here.

"I was a back~~stabb~~tagging superstar on Metafliter."
posted by Herodios at 9:13 AM on April 8, 2015 [2 favorites]

I've long wanted to build a custom Wayback Machine that just covers what Mefi links.
posted by Pronoiac at 9:34 AM on April 8, 2015 [3 favorites]

...Metafilter, the veteran link site which is still running, almost unchanged since then.
Yes. Excellent. Tell them all how old and decrepit Metafilter is. It keeps away the people who want to spend their time on websites with designs that are "fresh", such as the OP site that has a wicked-cool non-dimissable scrolling banner that takes up 1/4 of the reading area.
posted by double block and bleed at 9:51 AM on April 8, 2015 [14 favorites]

I'm super excited by Perma.cc (and not just because I'm an academic librarian who wrestles with this issue for a living), though I don't think that in its current form it would work for this.

I do think that the up and coming perma-link solutions are something for MetaFilter to keep an eye on. We don't have the responsibility to keep the whole internet alive but to my mind, MetaFilter values include going a little out of the way to make a long-lasting solution happen.
posted by librarylis at 9:52 AM on April 8, 2015 [4 favorites]

And yet, all of my old stupid comments will long outlive me.
Unless the Singularity happens, in which case I will become all of my old stupid comments.
posted by Atom Eyes at 10:01 AM on April 8, 2015 [6 favorites]

This is one of those things that concerns me more than I ever thought it would. I'm way more skeptical now of the "anything you do on the internet last forever" concern and much more worried that there aren't more people uneasy that we are going to actively be losing aspects of culture, in light of how much we do online now, and where we have gladly transferred a lot of our social/educational/cultural identity. You didn't get this sense in the nascent days of the internet, but now it feels like a glacier slowly moving behind us as the internet develops, crushing everything in its past, leaving the occasional fossil or two.
posted by SpacemanStix at 10:03 AM on April 8, 2015 [1 favorite]

I seem to recall finding links which had gone dead in the original post and which I found resurrected later. I sent them in via the Contact form suggesting they be substituted for the 404'ed originals and got turned down flat by the mod du jour at the time. Which I found baffling. Here was the same material with a new address and I was given a weird explanation of why it could not be used. Which came down to, if I recall correctly, it added to the workload, set a bad precedent or tldr; Too Hard or something. But I don't entirely trust my recollection and don't have the time now to check, so grain of salt this for the time being.
posted by y2karl at 10:19 AM on April 8, 2015

I Have Opinions about how good search/find mechanisms can alleviate this but I've been busy hassling the Internet Archive about this over the past year (you can imagine the number of fucks they give).

I give some fucks and I am a person in the Internet Business and it's not ludicrous for me to imagine that I might be in a position to do something about this at some point in my career. I imagine there are others like me reading. jessamyn we would like to hear your Opinions!
posted by Kwine at 10:33 AM on April 8, 2015 [1 favorite]

I Have Opinions about how good search/find mechanisms can alleviate this but I've been busy hassling the Internet Archive about this over the past year (you can imagine the number of fucks they give).

I can't imagine, could you elaborate? Knowing little about the subject, I would have imagined a group called The Internet Archive would care very much about that topic.
posted by ThePinkSuperhero at 11:07 AM on April 8, 2015 [2 favorites]

There are definitely some dead link checkers out there. The solution to the dead link problem (whatever it might be, & assuming we want one) could farm some labor out to the community.
posted by Going To Maine at 11:22 AM on April 8, 2015

Linkrot is absolutely infuriating if you're dealing with "hey, it's free on the internet, you don't need textbooks/libraries"-type thinking. When I cleaned out my blog sidebar a few months back, it was shocking to see just how many university-affiliated digital projects and resources had simply vanished without a trace, thanks to the vagaries of funding and academic affiliations. Commercial e-text sites are even worse--several online resources that I used regularly in my classes have now vanished right off the 'net. At this point, unless there's some sort of steady funding stream, I just assume that digital resources are going to be like dust in the wind.
posted by thomas j wise at 11:23 AM on April 8, 2015 [11 favorites]

Yeah, this is a good reason to always, when giving a recommendation or linking to something, say what you're recommending or linking. Rather than stuff like "The resource you need is this".

If you specify, then five years from now, knock wood, your answer will still be there in the thread saying "Take a look at Learning To Tapdance by Betsy Smith".... so even if the link is broken the reader can go look it up.

And why it's a nice thing to excerpt/quote a key paragraph or two from whatever you're discussing; more reproduction of it makes it more likely to persist for future readers.
posted by LobsterMitten (staff) at 11:33 AM on April 8, 2015 [41 favorites]

Yes! Thank you for the PSA, Sra. LM.
posted by Johnny Wallflower at 11:38 AM on April 8, 2015

The solution to the dead link problem (whatever it might be, & assuming we want one) could farm some labor out to the community.

I'm with jessamyn on thinking of this more as a third-party-script sort of situation; as much as I can appreciate the idea behind having linkrot be un-rotted, it's actually a big, complicated, hard-to-scope problem that wouldn't ever really go away and brings in a lot of questions of wanting to let the history/intentions/content of original posts stand vs. chasing down an illusion of a non-rotting web.

When you visit an old post with a link that's dead, you know that the link wasn't dead when the post went up, and you can do some digging if you're so inclined on an informed basis. When you visit an old post that's had a link quietly fixed, you don't know whether you're looking at what folks saw originally, and probably don't have a reason to even wonder. And when a broken-then-fixed link breaks again, the cycle continues.

I think linkrot is a huge bummer but I think it's a structural bummer that's part of the systems of the internet. I'm glad stuff like Internet Archive exists, and I'm also glad that I'm not in charge of it.
posted by cortex (staff) at 11:50 AM on April 8, 2015 [2 favorites]

I would have imagined a group called The Internet Archive would care very much about that topic.

Not as such. They care very much about archiving. They care less about findability figuring that they'll get to that later. And don't get me wrong, the archiving stuff is amazing and the people who work there are very good at tracking down obscure bits of random stuff that is there. But the search features are often... less than robust. Lists of search results are hard to interpret. There is a lot of metadata that it's tough to search for without getting waaaayy under the hood of search. Their basic search could be hot shit with some attention given to it but that's not where their focus is.

Which is fine, that is a choice that people can make. That is also a choice that MetaFilter has made. The site is optimized for community conversations and good moderation and a lot of encouraging people to communicate, meet, employ people and whatever. But not necessarily to be able to dig up old stuff because Google can do that with some small efforts so things like faceted search where you could search by date range, commenter, tags or a combination of those things is not a priority.

I'm a librarian and I can generally find whatever the heck I want with enough effort. But when I go looking through something like Trove or Hathi Trust and then I look at this advanced search page? The first two are made by people who care about user experience, the last one is made by engineers, and it shows.
posted by jessamyn (retired) at 11:58 AM on April 8, 2015 [14 favorites]

A large part of this unreached material may still exist, either accessible via such resources as the Internet Archive's Wayback Machine, or preserved elsewhere.

I was hoping there was some attempt to find those links in the Wayback Machine, because I've been able to read a good number of "dead" links like that.

LobsterMitten: If you specify, then five years from now, knock wood, your answer will still be there in the thread saying "Take a look at Learning To Tapdance by Betsy Smith".... so even if the link is broken the reader can go look it up.

Great point. I've taken to adding titles to some links when I don't spell out what the linked YouTube videos are, as seen here, which allows for a shortened reference in the text but all the information is there, tucked away. Nothing to the extent of y2karl's title text, though.
posted by filthy light thief at 12:33 PM on April 8, 2015

When you visit an old post that's had a link quietly fixed, you don't know whether you're looking at what folks saw originally, and probably don't have a reason to even wonder.

You say that like it's a bad thing? But you know, you could fix it non-quietly. You could put REFRESHED LINK or something. If it's too much work, it's too much work, but this concern for the audit trail over the actual content seems a bit overstated.
posted by Segundus at 12:45 PM on April 8, 2015

One interesting thing mentioned in the article: the sites where the links still work, somewhat. The site isn't dead, but redirects the user to somewhere else. This got me thinking - what about the sites that keep the link, but subtly change the content? That is a far more nefarious sort of thing that is possible on the internet, where nothing is set in stone (or even printed to paper). Even a link to a PDF could be the same, while some small portions in the PDF change.

And if you still have 3.5 inch floppies, there are a number of external USB-powered drives that do a decent job of reading old discs, and abandonware sites have old copies of programs that you could run through Dosbox to open your old files and re-save them, if you can't copy the contents from within Dosbox to something outside (I just thought of this, and have only used Dosbox for old games, so I haven't tried to see what can be copied into or out of the Dosbox environment).
posted by filthy light thief at 12:54 PM on April 8, 2015

You say that like it's a bad thing? But you know, you could fix it non-quietly. You could put REFRESHED LINK or something. If it's too much work, it's too much work, but this concern for the audit trail over the actual content seems a bit overstated.

You could do a lot of things, and different sites can choose to do different things. I personally care a lot about the Metafilter archives staying, as much as they can, the archives of what actually got posted on Metafilter when it got posted; that drives my thinking on this stuff, but I'm not canonizing that feeling for the internet at large.

But, again, it's one thing to say "yeah but the site could do this" and another thing to try and comprehensively tackle it in a future-proofed way. And doing it uncomprehensively or in a way that's just going to immediately start rotting away again seems like a poor use of site and community resources if there's not some very clear and specific reason why it needs to be done only partway and only for this moment in time.
posted by cortex (staff) at 12:58 PM on April 8, 2015 [5 favorites]

If you want to see the Archive machine version of a linked page, I'd suggest getting a bookmarklet that hits the Wayback Machine, and use it on the Mefi page that has the link.

Putting REFRESHED LINK notices all over the place would be ugly, yet anything less could quietly break and confound expectations.
posted by Pronoiac at 1:29 PM on April 8, 2015 [1 favorite]

MetaFilter: you can imagine the number of fucks they give.
posted by pibeandres at 2:04 PM on April 8, 2015 [4 favorites]

I feel partially responsible - especially in my "wendell" days, I did a lot of Newsfilter links and many of my sources (especially YahooNews which WAS a good place to get news links at the time) just didn't bother to keep archives for very long. In fact, my very first post was a joke you couldn't get without clicking both the links. I was comparing a new "high tech bandage" that used nanotechnology to keep wounds clean with something about plans to fight hacking being criticized as a "band-aid". Again, both links dead, joke deader. Well, at least Matt liked it.
posted by oneswellfoop at 2:28 PM on April 8, 2015

Ephemera gonna ephemerate.

But I will say that using URL shorteners, as ever, is a terrible thing to do, because those (even ones via Google, because we all know how willing Google is to unceremoniously dump stuff they change their minds about) suckers are going to get the linkrot faster and more frequently than the plain link might.
posted by stavrosthewonderchicken at 4:22 PM on April 8, 2015 [2 favorites]

One yielded, not the original site itself, but a kind of obituary page.

Oh, goatse.cx. You are missed.
posted by shmegegge at 5:43 PM on April 8, 2015 [1 favorite]

I used to have a link to an article on a site I had. I still own the domain name, though it is currently unused. It may be possible to find the site on the Wayback Machine. I am really not sure.

Anyway, a particular article I had linked to on that site that I valued a lot got lost to link rot and I couldn't find it or anything like it for a long time, in fact, for several years. Then, not terribly long ago, I made a comment on MetaFilter and I googled, hoping to find something similar, and, voila, the original article was again available online. It had been gone for several years and there it was, back again.

I'm not sure what to think about link rot what with this zombie link rising from the dead again.
posted by Michele in California at 6:02 PM on April 8, 2015

Previously regarding Perma.cc.
Previously -- New Yorker article on linkrot and ways to fight it.

So, I work on Perma.cc and am full of opinions on linkrot. Librarylis is probably right that we're not in a position right now to take on a site like Metafilter. On the other hand we sure would like to be in that position, and we'd love to talk about it.

Some other handy resources:

On the reader/plugin side, the Memento project now has a public search engine -- it was mentioned in the New Yorker article but hadn't yet launched. You can paste any link in there and it will tell you all of the times it's been archived at the Internet Archive and a bunch of other places (soon to include Perma).

They also have an API, so you can build tools to find the best available archive for any given link if you know the date you want.

And they have a Chrome plugin, so you can right-click on any given link and get the closest available archive they know about.

One small change Metafilter could make to help out with the plugin approach is to include each post's publication date in machine-readable form, so the Memento plugin can find it. For example, NYTimes uses this markup in their head tag:

<meta property="article:published" itemprop="datePublished" content="2015-03-12" />

Then when I right-click on a link in an NYTimes article, the Memento plugin finds that markup and offers me a cached version from March 12 -- the date the link was published.

There's a few valid ways to encode the date in the page and I'm not 100% clear on the most beautiful perfect standards-compliant way to do it, but if y'all are potentially interested I can look into it. (The coolest thing would probably be to include some other semantic data too, ala Schema.org's BlogPosting schema. You never know what other neat open-web tools you might be supporting.)

Of course, if anyone wants to write a Metafilter-specific Greasemonkey script using the Memento API, you could scrape the date from the HTML without waiting for a machine-readable format and probably offer a much-better-integrated interface.

The other angle on this problem is to cache links on the server as they're created. I'm working with a project at the Berkman Center called Amber, currently in private beta, that lets site operators do that. It comes as a plugin for WordPress/Drupal/Apache/Nginx. The idea is that you install it, give it a certain amount of disk space, and everything you link to gets cached on your server. When your site detects that a given link is down, it can more or less aggressively offer users your cached version of the link as an alternative -- or you can offer the cached version all the time as an inline or hover option, sort of like Mefi's inline YouTube-player icon.

Running a server-side cache like that would be a larger headache for a site with Mefi's traffic and profile than for most blogs, and I don't expect cortex to leap at the chance just now. But I do think when folks see it working for smaller blogs it will start to seem like a more attractive option for larger sites as well. It's very neat to see in action -- it makes your blog archives a nicer place to be.

Where this gets really exciting is when we manage to pipe Amber archives into Memento, so you right-click on a broken link in Metafilter and get whisked off to an Amber archive held by a random blogger who happened to link to the same page on the same day. But that's still a ways off at best ...
posted by jhc at 6:54 PM on April 8, 2015 [18 favorites]

Oh, this:

nobody: The anchor tag specs should be amended to allow date information. Future browsers could then automatically redirect to the Internet Archive's cached version most proximate to that date.

Memento has proposed exactly that in a draft spec, and various people involved in the projects I mentioned before have started a W3C Robustness and Archiving Community Group to talk about it. If anyone wants to follow or get involved with that effort that's probably a good place to start.

I think it makes a lot of sense -- most links have an implicit "when" as well as a "where," and as we get more and more old links lying around it will be important for browsers to have some way to make sense of that. Because of the sandbox security model there are limits to what you can do to patch things up in javascript. But as usual the details get tricky -- for starters, if we're going to add a new attribute to the anchor tag we'll all have to agree on what to call it, and I'm betting that'll be a fight.
posted by jhc at 7:09 PM on April 8, 2015 [4 favorites]

I understand this isn't actually the "how should mefi solve linkrot" thread, but I'm fantasizing about a world where Maciej Ceglowski could offer commercial/institutional-level archiving accounts on pinboard.
posted by jjwiseman at 7:43 PM on April 8, 2015

Yeah, this is a good reason to always, when giving a recommendation or linking to something, say what you're recommending or linking. Rather than stuff like "The resource you need is this".

I have tended to give links as "this" rather than using descriptive titles, and this is a good reminder that it is not always the best approach.
posted by Dip Flash at 9:25 PM on April 8, 2015 [3 favorites]

Back in my web writing/usability days, we encouraged people not to use "this" (or, worse, "click here") because people tend to scan text for links and that provides very little context; since so much of how we use the web involves more scanning than reading, it's important to use linking text that gives a bit of context on what the link is to.
posted by NoraReed at 3:06 AM on April 9, 2015 [3 favorites]

Yes, if you don't give me a clue about what's on the other end of your link, I'm probably not going to click it. These old eyes have seen too many unspeakable things that way, too many soul-searing unerasable visions of humanity's most abysmal degradations - many of them Rick Astley.
posted by Segundus at 3:23 AM on April 9, 2015 [4 favorites]

Plus the rise in mobile browsing means that fewer people are able to hover over your link to see the address.
posted by soelo at 7:07 AM on April 9, 2015 [3 favorites]

> I have tended to give links as "this" rather than using descriptive titles, and this is a good reminder that it is not always the best approach.

It is never the best approach. Seriously, people, don't do that. Even without the link-rot issue, it's annoying and counterproductive.
posted by languagehat at 7:19 AM on April 9, 2015 [2 favorites]

I just wanted to point out that other than the Internet Archive and Perma.cc there's also webcitation.org's WebCite® program. Link rot has become a major issue for scholarly publishers and we're looking at both Perma.cc and WebCite® as possible solutions to the problem.
posted by Toekneesan at 7:57 AM on April 9, 2015 [2 favorites]

There's also archive.today which has gained wider exposure -at least in my parts of the Internet- thanks to its repeated use during Gamergate. No idea who runs it, though, so I'm glad that there are other, more public options.
posted by Going To Maine at 9:17 AM on April 9, 2015

Reading "a relatively serious general-interest web community" to describe MetaFilter is like hearing your own voice tape recorded. You know this to be a true representation, but it still surprises you to hear it nonetheless.
posted by MCMikeNamara at 9:52 AM on April 9, 2015 [5 favorites]

Also, "this" is unhelpful for blind/screenreader users who often navigate round pages by hitting tab and listening to the link text.
posted by alasdair at 12:19 PM on April 9, 2015 [4 favorites]

Anchor text is one of the stronger & more useful signals for search, too (even in the face of the occasional Google bombing).
posted by jjwiseman at 2:13 PM on April 9, 2015

If only we could have link rot...to our pasts.
posted by turbid dahlia at 8:08 PM on April 9, 2015 [2 favorites]

In a few decades when the current contents of the NSA's Utah Data Center fit on a thumb drive and get leaked, we can have it all back, along with the deleted comments and all of the preview versions of every comment that got sent to the MeFi server. And everybody's MeMails. and, and, and...
posted by XMLicious at 4:11 PM on April 10, 2015 [1 favorite]

This.
posted by Joseph Gurl at 6:53 PM on April 10, 2015

jessamyn: "Realistically finding old stuff is something the internet has decided it's not really for and since MeFi is a part of the Internet this was always part of the culture here. "

This seems like nonsense to me. Ascribing a general attitude to something as diverse as "the Internet" makes no sense, and insisting that we ought to or even have to go along with a terrible flaw in the system because so many people are lazy or careless is like saying we should all get blotto if we're at a bar because bars have decided they're about getting people drunk.
posted by Conrad Cornelius o'Donald o'Dell at 2:01 PM on April 12, 2015

I don't think Jessamyn's asserting that no one cares or that no one should care, just that there's not a systemic culture of caring of the sort that, ten or fifteen or twenty years ago, a lot of people may have reasonably though there might be.

Practically speaking, there is a great deal more fragility and ephemerality to content, new and aging, hosted on the web than you'd likely have expected based on fevered editorials about the new Library of Alexandria in early issues of Wired; it turns out that permanence and continuity and archival are still hard, complicated things that require a lot of active and ongoing effort just to have a reasonable shot at in the face of all the things (mechanical, systemic, cultural) that can disrupt the presentation and storage of a given pile of information.

It doesn't make it bad or stupid or hopeless or anything else to want to fight that, to want to see stuff stick around and links to stay good and rot to be fought. But that's an uphill battle, and the problem with expecting Metafilter to specifically fight farther and harder and longer up that hill than the rest of the internet that it is daily linking to is that Metafilter never set out with that mission in mind and has limited resources that can be better used on other things more central to what the site itself does and what its members do here.
posted by cortex (staff) at 2:49 PM on April 12, 2015 [2 favorites]

is like saying we should all get blotto if we're at a bar because bars have decided they're about getting people drunk.

Yes! Let's go! Who's with me?
posted by Joseph Gurl at 3:25 PM on April 12, 2015

This seems like nonsense to me.

Yeah don't get me wrong, I think it's bullshit but I still see big companies that you've heard of redesigning their website and breaking all their old links because ... I don't know why. It's so stupidly avoidable but people act now and have acted like this is a thing that you don't need to handle as long as your website has a search box. I'm pretty sure every inbound link to MetaFilter still works with only a few exceptions and I think that is terrific. I think MeFi did it right. But that doesn't mean we should spend a lot of effort dealing with other people who did it wrong in the interest of trying to be more friendly to MeFi's crate diggers. I like the idea. I like watching people like perma.cc try to handle the problem (because broken URLs in legal citations is a huge disaster, imo) but as cortex says it's actually a hard problem with many more variables than it seems like it has and deciding to tackle it would be a pretty large resource-expenditure decision and there are other things that are More Worth It in my post-mod opinion and I'm sure the mods have other things higher on their todo list.
posted by jessamyn (retired) at 5:03 PM on April 12, 2015 [1 favorite]

Time flies at full warp.

And yet Sex Kitten endures.
posted by y2karl at 7:57 AM on April 13, 2015

One thing you could do is have a script trawl the db of posts, visit each link, and for every 404 reply or server timeout prepend https://web.archive.org/web/*/
to the href string.

This is the kind of thing that's theoretically easy to do, but in reality takes like 9 hours of your time for debugging.
posted by clarknova at 6:57 PM on April 24, 2015

clarknova: "One thing you could do is have a script trawl the db of posts, visit each link, and for every 404 reply or server timeout prepend https://web.archive.org/web/*/
to the href string.

This is the kind of thing that's theoretically easy to do, but in reality takes like 9 hours of your time for debugging."

It would take a lot longer than nine hours to 404-check every external link on metafilter. There has to be over a million links buried in unformatted text. If you allow a 30 second timeout for each link for slow sites, that's at least 5.5 days of processing time. That's easily parallelized but still a lot of work. There's lots of ways that much text munging could go wrong. Then you'd have to re-404-check your new links to find the ones that are still broken or got broken in the process.

The internet is ephemeral and ever-changing. Maybe it would be better to accept that everything has its time and place. I loved my Grandma and I loved her fried apple pies (secret ingredient: lard), but Grandma passed in 1982 and I haven't seen her or had a fried apple pie since. Some things don't last forever.
posted by double block and bleed at 1:43 PM on April 25, 2015

I meant nine hours to write, test, and debug. Not to actually run the task. Yes, if it's all happening in a single thread it could take up to a month or more. But so what. The actual processing and network traffic for that kind of thing is cheap. You could run the script on an RPi if you felt like. And once you prepend the archive.org string to a URL you don't need to make a network call for that link in subsequent passes.

Text processing isn't hard, and the text is hardly unformatted: it's HTML.

If we accepted that everything should just fall down the memory hole we might as well pulp the national archives into paper bags or something. Obviously this wasn't the philosophy of archive.org, which is a great resource thousands of people rely on.
posted by clarknova at 6:10 PM on April 25, 2015 [2 favorites]

« Older Metafilterians extensively quoted in today's... | Firefox is not loading full pages Newer »

You are not logged in, either login or create an account to post comments

MetaTalk

A relatively serious general-interest web community. April 8, 2015 3:48 AM Subscribe

Tags

Share