code stripping March 6, 2007 5:21 AM   Subscribe

It's pretty minor, but an & in the post title turns into 'amp' in the url. example.
posted by twine42 to Bugs at 5:21 AM (18 comments total)

I haven't got any suggestions what to replace it with, but as it is we've got a great new url structure with apparently random words in it...
posted by twine42 at 5:22 AM on March 6, 2007


"and" would seem a reasonable replacement.
posted by NinjaTadpole at 5:46 AM on March 6, 2007


How about "and per se and"? ;)
posted by goodnewsfortheinsane at 6:11 AM on March 6, 2007


& is the standard encoding for the ampersand. It looks to me like the ampersand and semicolon are getting stripped for being invalid URL entities.
posted by ardgedee at 6:36 AM on March 6, 2007


If you're going for "natural english" URLs, the sensible thing would be to convert any ampersands surrounded by spaces with the word "AND", and convert and ampersands that are attached on either side to characters as "%26" (the proper escaped character entity).
posted by Civil_Disobedient at 6:45 AM on March 6, 2007


ardgedee: "& is the standard encoding for the ampersand. It looks to me like the ampersand and semicolon are getting stripped for being invalid URL entities."

According to RFC 1738 § 5, both ampersand and semicolon are valid path characters (the path is defined as any number of xchars, xchars include reserved characters, of which both semicolon and ampersand are members).
posted by Plutor at 7:29 AM on March 6, 2007


Also: If you type "&lt;" into the AskMefi box and hit preview, the box now contains "<" which will then fail to post on the second iteration.
posted by DU at 7:59 AM on March 6, 2007


I'll change them to "and" in the future, since the goal is a readable URL.
posted by mathowie (staff) at 8:46 AM on March 6, 2007


On a related note, slashes in the title are stripped from the URL, as in this post. It probably makes more sense to replace them with a hyphen.
posted by Doofus Magoo at 8:55 AM on March 6, 2007


> ...both ampersand and semicolon are valid path characters...

No argument there, but the ampersand is commonly used as a field separator in GET requests, so there might have been an attempt to encode it.
posted by ardgedee at 9:00 AM on March 6, 2007


slashes in the title are stripped from the URL

Yeah, so that they're not interpreted as directories. I simply strip all punctuation and HTML from titles when creating/storing the URL stub.
posted by mathowie (staff) at 9:05 AM on March 6, 2007


I love it when no one listens to me. Do it again!
posted by Civil_Disobedient at 12:29 PM on March 6, 2007


Civil : This is mefi! No-one ever agrees with anyone - we just stop arguing with them.
posted by twine42 at 12:40 PM on March 6, 2007


How about turning a slash character into "-slash-" in the URL?
posted by nebulawindphone at 2:27 PM on March 6, 2007


That's a really cool etymology, goodnewfortheinsane! I didn't know that the word "ampersand" comes from "& per se and", or "& is defined to mean 'and'". There's some great self-reference going on in that definition right there.

If linguistic history had knocked the '&' out of the etymology, and we just called them "persands" or "per se ands", then

"'per se and' per se and."

would be a true statement that is a quine.
posted by painquale at 6:45 PM on March 6, 2007


And I like to poo per se.
posted by davy at 7:24 PM on March 6, 2007


Thanks for inspiring an FPP, painquale!
posted by goodnewsfortheinsane at 4:44 PM on March 7, 2007


While we're on the topic of URL creatification: the current routine strips out all punctuation and replaces spaces with dashes. However, it strips out dashes that were already there and replaces them with... nothing! Example. The title of the post has a dash between Must and Do. I suppose in this case the URL is still kinda readable but you lose relevant (sometimes) words in the URL, which is bad for google juice purposes.

But this is nitpicking.
posted by heydanno at 7:40 PM on March 7, 2007


« Older How to get a quick dose of Metafilter Music   |   Problem with popular projects page. Newer »

You are not logged in, either login or create an account to post comments