code stripping March 6, 2007 5:21 AM Subscribe
It's pretty minor, but an & in the post title turns into 'amp' in the url. example.
"and" would seem a reasonable replacement.
posted by NinjaTadpole at 5:46 AM on March 6, 2007
posted by NinjaTadpole at 5:46 AM on March 6, 2007
& is the standard encoding for the ampersand. It looks to me like the ampersand and semicolon are getting stripped for being invalid URL entities.
posted by ardgedee at 6:36 AM on March 6, 2007
posted by ardgedee at 6:36 AM on March 6, 2007
If you're going for "natural english" URLs, the sensible thing would be to convert any ampersands surrounded by spaces with the word "AND", and convert and ampersands that are attached on either side to characters as "%26" (the proper escaped character entity).
posted by Civil_Disobedient at 6:45 AM on March 6, 2007
posted by Civil_Disobedient at 6:45 AM on March 6, 2007
ardgedee: "& is the standard encoding for the ampersand. It looks to me like the ampersand and semicolon are getting stripped for being invalid URL entities."
According to RFC 1738 § 5, both ampersand and semicolon are valid path characters (the path is defined as any number of xchars, xchars include reserved characters, of which both semicolon and ampersand are members).
posted by Plutor at 7:29 AM on March 6, 2007
According to RFC 1738 § 5, both ampersand and semicolon are valid path characters (the path is defined as any number of xchars, xchars include reserved characters, of which both semicolon and ampersand are members).
posted by Plutor at 7:29 AM on March 6, 2007
Also: If you type "<" into the AskMefi box and hit preview, the box now contains "<" which will then fail to post on the second iteration.
posted by DU at 7:59 AM on March 6, 2007
posted by DU at 7:59 AM on March 6, 2007
I'll change them to "and" in the future, since the goal is a readable URL.
posted by mathowie (staff) at 8:46 AM on March 6, 2007
posted by mathowie (staff) at 8:46 AM on March 6, 2007
On a related note, slashes in the title are stripped from the URL, as in this post. It probably makes more sense to replace them with a hyphen.
posted by Doofus Magoo at 8:55 AM on March 6, 2007
posted by Doofus Magoo at 8:55 AM on March 6, 2007
> ...both ampersand and semicolon are valid path characters...
No argument there, but the ampersand is commonly used as a field separator in GET requests, so there might have been an attempt to encode it.
posted by ardgedee at 9:00 AM on March 6, 2007
No argument there, but the ampersand is commonly used as a field separator in GET requests, so there might have been an attempt to encode it.
posted by ardgedee at 9:00 AM on March 6, 2007
slashes in the title are stripped from the URL
Yeah, so that they're not interpreted as directories. I simply strip all punctuation and HTML from titles when creating/storing the URL stub.
posted by mathowie (staff) at 9:05 AM on March 6, 2007
Yeah, so that they're not interpreted as directories. I simply strip all punctuation and HTML from titles when creating/storing the URL stub.
posted by mathowie (staff) at 9:05 AM on March 6, 2007
I love it when no one listens to me. Do it again!
posted by Civil_Disobedient at 12:29 PM on March 6, 2007
posted by Civil_Disobedient at 12:29 PM on March 6, 2007
Civil : This is mefi! No-one ever agrees with anyone - we just stop arguing with them.
posted by twine42 at 12:40 PM on March 6, 2007
posted by twine42 at 12:40 PM on March 6, 2007
How about turning a slash character into "-slash-" in the URL?
posted by nebulawindphone at 2:27 PM on March 6, 2007
posted by nebulawindphone at 2:27 PM on March 6, 2007
That's a really cool etymology, goodnewfortheinsane! I didn't know that the word "ampersand" comes from "& per se and", or "& is defined to mean 'and'". There's some great self-reference going on in that definition right there.
If linguistic history had knocked the '&' out of the etymology, and we just called them "persands" or "per se ands", then
"'per se and' per se and."
would be a true statement that is a quine.
posted by painquale at 6:45 PM on March 6, 2007
If linguistic history had knocked the '&' out of the etymology, and we just called them "persands" or "per se ands", then
"'per se and' per se and."
would be a true statement that is a quine.
posted by painquale at 6:45 PM on March 6, 2007
While we're on the topic of URL creatification: the current routine strips out all punctuation and replaces spaces with dashes. However, it strips out dashes that were already there and replaces them with... nothing! Example. The title of the post has a dash between Must and Do. I suppose in this case the URL is still kinda readable but you lose relevant (sometimes) words in the URL, which is bad for google juice purposes.
But this is nitpicking.
posted by heydanno at 7:40 PM on March 7, 2007
But this is nitpicking.
posted by heydanno at 7:40 PM on March 7, 2007
You are not logged in, either login or create an account to post comments
posted by twine42 at 5:22 AM on March 6, 2007