non-ascii tags in posts October 26, 2019 3:57 AM   Subscribe

Small pony/nitpick: I tried to add a tag to my post about the Chilean protest movement with the text 'Piñera', the name of the president we're protesting against, and the system says: One of your tags couldn't be added because it contained non-ASCII characters or punctuation. It's 2019, we can handle unicode in tags and shouldn't assume the wide variety of human experience can be reduced to 26 letters.
posted by signal to Feature Requests at 3:57 AM (23 comments total) 19 users marked this as a favorite

This is worth revisiting and fixing.
posted by frimble (staff) at 3:58 AM on October 26, 2019 [24 favorites]


This has long been something I've been slightly annoyed with. If it can be done, I would be grateful to the programming gods.
posted by Fizz at 4:00 AM on October 26, 2019 [2 favorites]


This would be a very good thing to do.
posted by jessamyn (retired) at 8:28 AM on October 26, 2019 [1 favorite]


ooooh - I don't post much (so I haven't encountered this) but I really really like when things are meaningfully tagged. Now I know why some obvious tags, well, aren't.

Awesome catch.
posted by mce at 9:03 AM on October 26, 2019


Indeed, why don’t we already utf-8? I am curious about the history of this in the tags.
posted by bilabial at 10:03 AM on October 26, 2019


Friendly amendment: please filter out emojis. I can see that being (ab)used.
posted by tonycpsu at 10:05 AM on October 26, 2019 [4 favorites]


yeah, it's been vexing to me for a while now that i can't properly tag non-english words.

[eddie izzard voice] do you know ... there's other countries
posted by poffin boffin at 12:02 PM on October 26, 2019 [2 favorites]


In days of yore I asked pb about it, UTF-8 tags for CJK characters specifically. It may be a very large project, from what I remember about databases, charsets, and Cold Fusion, so let's allow frimble to do reconnaissance before holding them to any firm commitment, in case the above comment was made in haste.

But hopefully some related tech development has made it simple or frimble already has it all planned out and we can usher in a new era when the tag for potus45/uspolitics threads is just the poop emoji.
posted by XMLicious at 1:52 PM on October 26, 2019 [3 favorites]


Friendly amendment: please filter out emojis. I can see that being (ab)used.
Have you seen the post before this one? Nothing wrong there.
On topics. I am very much in favour of accents being allowed in tags.
å ā ă ą á â ã ä è é ê ê ë ē ė ę ě ĕ to name but a few as this is a global site. (Although it is sometimes hard to realize that).
Are non western scripts allowed in tags?
posted by adamvasco at 4:31 PM on October 26, 2019 [1 favorite]


Well obviously we need n̈
posted by aubilenon at 8:24 PM on October 26, 2019 [1 favorite]

Friendly amendment: please filter out emojis. I can see that being (ab)used.
Have you seen the post before this one? Nothing wrong there.
I read that as a request for no emojis in tags, but I might be wrong.
posted by I'm always feeling, Blue at 9:41 PM on October 26, 2019


> I read that as a request for no emojis in tags, but I might be wrong.

👍
posted by tonycpsu at 10:53 PM on October 26, 2019 [4 favorites]


Is it just in tags?

I was all ready to hold forth in this recent apostrophe AskMe about how the letter ' i ' can be dotted or un-dotted in Turkish, both upper or lower case; but I gave up in frustration.when I couldn't illustrate with the appropriate 300-level HTML chars -- at least, they wouldn't show in the preview box.

Well obviously we need n̈

¿Do you mean ñ?
posted by Rash at 3:31 PM on October 27, 2019


Ño
posted by aubilenon at 4:28 PM on October 27, 2019 [1 favorite]


Indeed, why don’t we already utf-8? I am curious about the history of this in the tags.

Like XMLicious noted, it had come up years back but at the time pb's conclusion was it was showstoppingly hard to sort out with the state of the site at the time. But it's been a few years and frimble's reported at least a generally hopeful take on its doability with the current state of things, so, fingers crossed that we can improve it a bit.

Is it just in tags?

IIRC, tags and usernames in particular are tightly restricted, primarily because they're mechanically depended on to do more complicated operations than are things like generic comment content, which really only need to be displayed as text in the appropriate areas. But some of those dependencies are architectural choices that Matt made at some point in the past that may not be iron-clad necessities so much as just where his preference was, so looking at these things periodically in a new light is worth doing.

With usernames part of the concern with extended character sets was also identity abuse though look-alike characters (and in fact I think usernames were clamped down to a subset of low ASCII partly in response to some specific deliberate abuse along those lines), which is a harder problem to solve than just character set support, but likewise one we've talked about trying to find time to at least revisit.
posted by cortex (staff) at 4:35 PM on October 27, 2019 [1 favorite]


¿Do you mean ñ?
posted by Rash An hour ago [+] [!]
Ño
posted by aubilenon 47 minutes ago [+] [!]


¡Ñí!
posted by signal at 5:16 PM on October 27, 2019 [1 favorite]


Mod note: Comment deleted; I appreciate the Zalgosian lilt to it but please don't break the page to make a point about the possibility of breaking the page.
posted by cortex (staff) at 7:16 PM on October 27, 2019 [2 favorites]


As a update, I poked around a bit since this was posted, and I can say that with the current state of things, tagging with utf-8 is definitely possible and I have a working test.
posted by frimble (staff) at 3:37 AM on October 28, 2019 [12 favorites]


Thanks frimble!
posted by signal at 4:46 AM on October 28, 2019


So in aubilenon's comment, I see the diaeresis/umlaut as off center, like shifted to the right.
posted by soelo at 9:36 AM on October 28, 2019


Never forget user 63007.
posted by Chrysostom at 4:50 PM on October 28, 2019 [5 favorites]


So in aubilenon's comment, I see the diaeresis/umlaut as off center, like shifted to the right.

Sorry, I'm doing the best I can
posted by aubilenon at 9:49 PM on October 28, 2019


It's just like Thuringia
posted by XMLicious at 12:34 AM on October 29, 2019


« Older 🥞🍳🥓 MetaBreakfast 🥐🥣🍩   |   Metatalktail Hour: (Spooky) Open Thread! Newer »

You are not logged in, either login or create an account to post comments