Text made easy once again Purpose of this documentThis document contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.To promote usage and support of the UTF-8 encoding, to convince that this should be the default choice of encoding for storing text strings in memory or on disk, for communication and all other uses. We believe that all other encodings of Unicode (or text, in general) belong to rare-edge cases of optimization and should be avoided by mainstream users. In particular, we believe the very popular UTF-16 encoding (mistakenly used as synonym to ‘widechar’ and ‘Unicode’ in Windows world) has no place in library APIs (except specialized libraries dealing with text). If, at this point, you already think we are crazy, please skip straight the FAQ section. This document recommends choosing UTF-8 as string storage in Windows applications, where this standard is less popular due to historical reasons and lack of the native UTF-8 support by the API. Yet, we believe that even on this platform the following arguments outweigh the lack of native support. Also, we recommend forgetting forever what are ‘ANSI codepages’ and what were they used for. It is in the customer’s bill of rights to mix any number of languages in any text string. We recommend avoiding C++ application code that depends on _UNICODE define. This includes TCHAR /LPTSTR types on Windows and APIs defined as macros such as CreateWindow . We also recommend alternative ways to reach the goals of these APIs.We also believe that if an application is not supposed to specialize in text, the infrastructure must make it possible for the program to be unaware of encoding issues. A file copy utility should not be written differently to support non-English file names. Joel’s great article on Unicode explains the encodings well for the beginners, but it lacks the most important part: how should a programmer proceed, if she does not care what is inside the string. BackgroundIn 1988, Joseph D. Becker published the first Unicode draft proposal. At the basis of his design was the assumption that 16 bits per character would suffice. In 1991, the first version of the Unicode standard was published, with code points limited to 16 bits. In the following years many systems added support for Unicode and switched to the UCS-2 encoding. It was especially attractive for new technologies, like Qt framework (1992), Windows NT 3.1 (1993) and Java (1995).However, it was soon discovered that 16 bits per character will not do. In 1996, the UTF-16 encoding was created so existing systems would be able to work with non-16-bit characters. This effectively nullified the rationale behind choosing 16-bit encoding in the first place, namely being a fixed-width encoding. Currently Unicode spans over 109449 characters, with about 74500 of them being CJK Ideographs. Microsoft has ever since mistakenly used ‘Unicode’ and ‘widechar’ as synonyms for ‘UCS-2’ and ‘UTF-16’. Furthermore, since UTF-8 can’t be set as the encoding for narrow string WinAPI, you must compile your code with _UNICODE rather than _MBCS . It educates Windows programmers that Unicode must be done with ‘widechars’. As a result of the mess, Windows C++ programmers are now among the most confused ones about what is the right thing to do about text.In the Linux and the Web worlds, however, there’s a silent agreement that UTF-8 is the most correct encoding for Unicode on the planet Earth. Even though it gives a strong preferences to English and therefore to computer languages (such as C++, HTML, XML, etc) over any other text, it is seldom less efficient than UTF-16 for commonly used character sets. The Facts
Our ConclusionsUTF-16 is the worst of both worlds—variable length and too wide. It exists for historical reasons, adds a lot of confusion and will hopefully die out.Portability, cross-platform interoperability and simplicity are more important than interoperability with existing platform APIs. So, the best approach is to use UTF-8 narrow strings everywhere and convert them back and forth on Windows before calling APIs that accept strings. Performance is seldom an issue of any relevance when dealing with string-accepting system APIs. There is a huge advantage to using the same encoding everywhere, and we see no sufficient reason to do otherwise. Speaking of performance, machines often use strings to communicate (e.g. HTTP headers, XML). Many see this as a mistake, but regardless of that it is nearly always done in English, giving UTF-8 advantage there. Using different encodings for different kinds of strings significantly increases complexity and consequent bugs. In particular, we believe that adding wchar_t to C++ was a mistake, and so are the Unicode additions to C++11. What must be demanded from the implementations though, is that the narrow strings would be capable of storing any Unicode data. Then every std::string or char* parameter would be Unicode-compatible. ‘If this accepts text, it should be Unicode compatible’—and with UTF-8 it is also easy to do.The standard facets have many design flaws. This includes std::numpunct , std::moneypunct and std::ctype not supporting variable-length encoded characters (non-ASCII UTF-8 and non-BMP UTF-16). They must be fixed:
How to do text on WindowsThe following is how we recommend to everyone else for compile-time checked Unicode correctness, ease of use and better multi-platformness of the code. This substantially differs from what is usually recommended as the proper way of using Unicode on Windows. Yet, an in-depth research of these recommendations resulted in the same conclusion. So here goes:
Working with files, filenames and fstreams on Windows
Conversion functionsThe policy uses the conversion functions from the CppCMS booster::nowide library, which can be downloaded as a separate package: The library also provides a set of wrappers for commonly used standard C and C++ library functions that deal with files.These functions and wrappers are easy to implement using Windows’ MultiByteToWideChar and WideCharToMultiByte functions. Any other (possibly faster) conversion routines can be used.FAQ
Myths
About the authorsThis manifesto is written by Pavel Radzivilovsky, Yakov Galka and Slava Novgorodov, as a result of much experience and research of real-world Unicode issues and mistakes done by real-world programmers. The goal is to improve awareness of text issues and to inspire industry-wide changes to make Unicode-aware programming easier, ultimately improving the experience of users of those programs written by human engineers. Neither of us is involved in the Unicode consortium.Much of the text is inspired by discussions on StackOverflow initiated by Artyom Beilis, the author of Boost.Locale. You can leave comments/feedback there. External links
|
Sunday, May 6, 2012
The UTF-8-Everywhere Manifesto
The UTF-8-Everywhere Manifesto:
Saturday, April 28, 2012
Why I'm Sticking With Dropbox (Over Google Drive)
Why I'm Sticking With Dropbox (Over Google Drive):
There has been a massive build-up to the release of Google Drive, and while this new offering from the search giant was always going to be a big one, I firmly believe that there’s a really convincing argument why Dropbox is a better choice for storing your stuff online: privacy, and retaining rights over your content. I’m no lawyer, but you don’t have to be to understand why the implications of Google’s privacy policy are probably something you want to avoid.
Notice the highlighted portion that reads:
Their stance essentially is the complete opposite of Google’s. Notice the highlighted portion in the above image which reads:
Enjoy this post? You should totally follow me on Twitter!
What you’re giving Google when using Drive
Take a look at Google’s Terms of Service:Notice the highlighted portion that reads:
When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content.Do you really want to sign over a worldwide license to use, modify, create derivative works, and publicly display or distribute for every document you upload to Google? My guess is your answer is no.
Dropbox FTW
Now let’s see what Dropbox’s terms say.Their stance essentially is the complete opposite of Google’s. Notice the highlighted portion in the above image which reads:
You retain full ownership to your stuff. We don’t claim any ownership to any of it. These Terms do not grant us any rights to your stuff or intellectual property except for the limited rights that are needed to run the Services, as explained below.Bravo, Dropbox! Well done for choosing a stance that supports my rights and privacy. That’s the kind of attitude more businesses should take.
Conclusion
Make up your mind yourself, but for me I know I’ll be sticking with Dropbox unless something radical changes.Enjoy this post? You should totally follow me on Twitter!
RSS will never die
RSS will never die:
This was supposed to be a post about translating a URL into a [good] RSS feed. After reading The War on RSS and some of the passionate debate it kicked off on HackerNews I decided to write something else.
In short: RSS will never die.
In May 2009 Steve Gillmor wrote on Techcrunch
In early 2011 RSS still wasn’t quite dead. “If RSS is dead, what’s next?“, a guy asked on Quora. This time, a very diplomatic answer came from the Robert Scoble (when I met him he said my startup idea is a fail just because it revolved around RSS):
Bummer.
Five months later he wrote about Feedly – an RSS reader for the iPad. Saying “don’t miss out and get Feedly on your iPad”. He called the idea of an RSS reader for the iPad stupid just 7 months prior.
Guess RSS isn’t that bad after all
This week – April 2012 – RSS still wasn’t quite dead. The War on RSS got a lot of passionate attention on HackerNews.
There’s a veritable explosion of companies removing RSS from their products … for whatever reason. Usually because it doesn’t directly benefit the bottom line – they prefer proprietary formats.
The next Mac OS – Mountain Lion – will likely ship without native RSS support. Gone from Safari (in favor of their proprietary Reader/Read Later thingy). Gone from Mail.
Somewhere in the last few versions Firefox removed the RSS icon from its usual place in the url bar.
Twitter removed public support for RSS feeds of user accounts. The feeds still exist – discovering them just takes a bit of trickery since they aren’t even mentioned in the HTML anymore.
Once upon a time even Facebook had support for profile RSS feeds. These have long been gone, so long in fact I don’t remember ever having seen them.
And there has never been native RSS support in Chrome. So much for that.
This time RSS is well and trully busted right? Took an arrow to the knee never to be heard from again.
For a piece of tech that was declared dead and boring almost three years ago, RSS can stir up a suprisingly strong debate … mostly passionate users clinging on for dear life.
I asked Twitter whether anyone still uses RSS as a human. The replies started flying in as quickly as I pressed the submit button. 11 yes, 1 no-ish, 1 sort of no and 1 resounding no.
The data is skewed, yes. Only people passionate about enough to care replied and I am well aware that Normal Humans ™ don’t knowingly use RSS. That’s also quite a bit of responses for a random question posted to Twitter by some random guy.
It shows RSS will never die because of a simple reality: power users.
There is something called the 90-9-1 rule of online participation. At its core is the idea that 90% of content comes from the top 1% of contributors.
Saying those top contributors are your power users is a pretty safe bet. And that’s why RSS is here to stay for at least a while longer – all those people doing most of the sharing? A lot of their stuff comes from RSS.
Ok, so the top 1% of that top 1% may have moved away from RSS and onto social media. Or at least that’s what everyone was claiming back in 2009 when Twitter was still something fresh, new and exciting. And most of all, much, much slower.
Twitter is not a replacement for RSS. Not by a long shot. It’s too busy!
My Twitter stream gets about 30 new messages every minute or two. This isn’t an environment to follow important-ish updates. Certainly not a place to look for 500+ word chunks of text that take ten minutes to read.
And god forbid anyone writes their blog only once a week, I’d miss 99% of their updates!
That’s where RSS comes in.
Not only does it take an hour for ten new posts to reach my Google Reader – when something does vanish, there is a sidebar full of subscriptions where I can see that, hey, there’s a bunch of stuff I want to read … eventually. No pressure. It’s all going to be here tomorrow, a week from now … even a month.
By the way, anything older than a week or two stops existing on Twitter.
When I want to read The Art of Manliness, I can just waltz over to Google Reader and check out the last few posts . No rush. The content is long, but it’s informative and it waits for me. There’s also no interruption or conversation. Just the curated best of what they have to say.
None of that on their Twitter though. Even though they only post every couple of hours, most of it is still reposts of old stuff and answering questions. I think there’s actually less than one new Actual Post ™ per day.
It gets worse for people, like me, who use Twitter as persons. Most of it is just random chitchat you don’t care about, sharing cool links from the web and generally everything but a RSS replacement for my personal blog.
Consequently, RSS offers bigger exposure to your content.
Looking at a recent personal post … tweeting three times creates 67 clickthroughs. Posting to RSS reached 145 readers, however Feedburner might be calculating that.
That’s a big difference!
RSS may have flopped for the regular user. It’s complex and kind of weird; but for that most important of readers – a fan - it will never really die.
And that’s before we even consider computers needing a simple and open way to follow websites’ updates.
In short: RSS will never die.
The War on RSS part un
In May 2009 Steve Gillmor wrote on Techcrunch
It’s time to get completely off RSS and switch to Twitter. RSS just doesn’t cut it anymore. The River of News has become the East River of news, which means it’s not worth swimming in if you get my drift.It sparked a meme. Suddenly everyone and their dog was convinced RSS was dead and we should all move on. Twitter will save us from something as horrible as a fourteen year old idea. That’s much too old for us web people.
~ Rest in Peace RSS, Steve Gillmor on Techcrunch, May 2009
In early 2011 RSS still wasn’t quite dead. “If RSS is dead, what’s next?“, a guy asked on Quora. This time, a very diplomatic answer came from the Robert Scoble (when I met him he said my startup idea is a fail just because it revolved around RSS):
First off, let’s define what dead means.Essentially Scoble thinks RSS is dead because Google Reader stopped working out for him and nobody is innovating in the RSS space anymore.
To me, anytime someone says a tech is dead it usually means that tech is not very interesting to discuss anymore, or isn’t seeing the most innovative companies doing new things with it
Bummer.
Five months later he wrote about Feedly – an RSS reader for the iPad. Saying “don’t miss out and get Feedly on your iPad”. He called the idea of an RSS reader for the iPad stupid just 7 months prior.
Guess RSS isn’t that bad after all
The War on RSS part deux
This week – April 2012 – RSS still wasn’t quite dead. The War on RSS got a lot of passionate attention on HackerNews.
There’s a veritable explosion of companies removing RSS from their products … for whatever reason. Usually because it doesn’t directly benefit the bottom line – they prefer proprietary formats.
The next Mac OS – Mountain Lion – will likely ship without native RSS support. Gone from Safari (in favor of their proprietary Reader/Read Later thingy). Gone from Mail.
Somewhere in the last few versions Firefox removed the RSS icon from its usual place in the url bar.
Twitter removed public support for RSS feeds of user accounts. The feeds still exist – discovering them just takes a bit of trickery since they aren’t even mentioned in the HTML anymore.
Once upon a time even Facebook had support for profile RSS feeds. These have long been gone, so long in fact I don’t remember ever having seen them.
And there has never been native RSS support in Chrome. So much for that.
This time RSS is well and trully busted right? Took an arrow to the knee never to be heard from again.
RSS Will Never Die
For a piece of tech that was declared dead and boring almost three years ago, RSS can stir up a suprisingly strong debate … mostly passionate users clinging on for dear life.
I asked Twitter whether anyone still uses RSS as a human. The replies started flying in as quickly as I pressed the submit button. 11 yes, 1 no-ish, 1 sort of no and 1 resounding no.
The data is skewed, yes. Only people passionate about enough to care replied and I am well aware that Normal Humans ™ don’t knowingly use RSS. That’s also quite a bit of responses for a random question posted to Twitter by some random guy.
It shows RSS will never die because of a simple reality: power users.
There is something called the 90-9-1 rule of online participation. At its core is the idea that 90% of content comes from the top 1% of contributors.
Saying those top contributors are your power users is a pretty safe bet. And that’s why RSS is here to stay for at least a while longer – all those people doing most of the sharing? A lot of their stuff comes from RSS.
Why do people still use RSS anyway?
Ok, so the top 1% of that top 1% may have moved away from RSS and onto social media. Or at least that’s what everyone was claiming back in 2009 when Twitter was still something fresh, new and exciting. And most of all, much, much slower.
Twitter is not a replacement for RSS. Not by a long shot. It’s too busy!
My Twitter stream gets about 30 new messages every minute or two. This isn’t an environment to follow important-ish updates. Certainly not a place to look for 500+ word chunks of text that take ten minutes to read.
And god forbid anyone writes their blog only once a week, I’d miss 99% of their updates!
That’s where RSS comes in.
Not only does it take an hour for ten new posts to reach my Google Reader – when something does vanish, there is a sidebar full of subscriptions where I can see that, hey, there’s a bunch of stuff I want to read … eventually. No pressure. It’s all going to be here tomorrow, a week from now … even a month.
By the way, anything older than a week or two stops existing on Twitter.
When I want to read The Art of Manliness, I can just waltz over to Google Reader and check out the last few posts . No rush. The content is long, but it’s informative and it waits for me. There’s also no interruption or conversation. Just the curated best of what they have to say.
None of that on their Twitter though. Even though they only post every couple of hours, most of it is still reposts of old stuff and answering questions. I think there’s actually less than one new Actual Post ™ per day.
It gets worse for people, like me, who use Twitter as persons. Most of it is just random chitchat you don’t care about, sharing cool links from the web and generally everything but a RSS replacement for my personal blog.
Consequently, RSS offers bigger exposure to your content.
Looking at a recent personal post … tweeting three times creates 67 clickthroughs. Posting to RSS reached 145 readers, however Feedburner might be calculating that.
That’s a big difference!
RSS may have flopped for the regular user. It’s complex and kind of weird; but for that most important of readers – a fan - it will never really die.
And that’s before we even consider computers needing a simple and open way to follow websites’ updates.
Related articles
GMail: designer arrogance and the cult of minimalism
GMail: designer arrogance and the cult of minimalism:
Posted by jonoscript under Uncategorized | Tags: email, gmail, google, listen to your users, UI FAIL |
[5] Comments
Posted by jonoscript under Uncategorized | Tags: email, gmail, google, listen to your users, UI FAIL |
[5] Comments
It looks like Google has finally pulled the plug on the old GMail UI. There’s no more “revert to the old look temporarily” button, so I guess they’re finally forcing us laggards onto the new theme. I’ve been a mostly happy GMail user since the very early days, but I strongly dislike the new UI.
As far as i can tell, this redesign is just change for the sake of change. I can’t see a single improvement! But I can spot three distinct un-provements *:
Assuming for the moment that these features were actually needed (which I think is arguable), the fact is that any of these features could have been added without making the interface a featureless white void or replacing helpful labels with cryptic icons.
Just today I read this blog post from a Google UX designer about “Change Aversion”, or the supposedly irrational tendency of users to fear change. The underlying attitude here is that users will like the new UI just fine once they try it, but they don’t want to give it a chance because they’re stubborn, like toddlers refusing to try an unfamiliar food.
I’ve certainly encountered this attitude before. Mozilla UX designers like to use the example of tabs-on-top: when we moved the tabs above the navigation bar in Firefox 4, many users balked at the change. But nobody could give a reason why tabs-on-top was worse — they just didn’t like it because it was unfamiliar.
The problem with this attitude is that sometimes the users may just be stubborn, but other times the users are encountering a real serious problem with the design; something they can feel is wrong, but can’t quite articulate precisely. Your users aren’t trained as designers, so they may not be able to argue their case convincingly in the language of design. If you dismiss all negative user feedback as mere stubbornness, you’ll miss important warning signs when you’re about to make a mistake. People have certainly been telling Google that they don’t like the new GMail interface, but it doesn’t seem like Google has been listening.
Change aversion might be a real thing, but designer arrogance is a real thing too.
* – “un-provements”: a word that I just made up because English lacks a word for discrete ways in which something has gotten worse. What would you say here? “three degradations”? “three backslides”? “three worsenings”?
Be the first to like this post.
As far as i can tell, this redesign is just change for the sake of change. I can’t see a single improvement! But I can spot three distinct un-provements *:
- The featureless white void: the old interface had colored borders and variations in background color which served to deliniate navigation from content and provide visual landmarks that helped me find my way around the page. It had visual ‘texture’. The new interface lacks that visual texture. Without borders or landmarks, everything blends together into a featureless sea of white and light grey. It requires more work for me to parse visually, to figure out what I’m looking at or to find the link I want to click.
This is what happens when the cult of “minimalism” goes too far. - The “importance” marker is now right next to the stars. I find the (algorithmically-applied) importance marker completely useless and would remove it if I could, but I use the stars quite heavily. In the old interface the importance marker was to the right, so I could ignore that column and scan the left column for stars. In the new interface, the two markers — being the same size, color, and location — blend together visually. I can no longer scan for stars; i have to look closely at each line to tell stars apart from importance markers.
- The new icons are inferior to the old text buttons. The text buttons were self-describing. The new icons are not. I’m not usually a fan of toolbar icons; they’re never as self-explanatory as their designers think they are, so they usually need text labels to be decipherable. At that point, why not cut out the middleman and just show the text label instead of the icon?
But these icons are particularly bad. Again with the cult of minimalism: the icons are so streamlined and featureless that they all look the same: a row of meaningless, square, grey objects. When I want to mark something as spam, I used to be able to click the “spam” button. Now I have to mouse over each square grey object one at a time, looking for the one that pops up a “Report Spam” tooltip. (It’s the stop sign. Why a stop sign? I don’t know. Years of using GUIs have trained me to interpret a stop sign as an error message.)
Assuming for the moment that these features were actually needed (which I think is arguable), the fact is that any of these features could have been added without making the interface a featureless white void or replacing helpful labels with cryptic icons.
Just today I read this blog post from a Google UX designer about “Change Aversion”, or the supposedly irrational tendency of users to fear change. The underlying attitude here is that users will like the new UI just fine once they try it, but they don’t want to give it a chance because they’re stubborn, like toddlers refusing to try an unfamiliar food.
I’ve certainly encountered this attitude before. Mozilla UX designers like to use the example of tabs-on-top: when we moved the tabs above the navigation bar in Firefox 4, many users balked at the change. But nobody could give a reason why tabs-on-top was worse — they just didn’t like it because it was unfamiliar.
The problem with this attitude is that sometimes the users may just be stubborn, but other times the users are encountering a real serious problem with the design; something they can feel is wrong, but can’t quite articulate precisely. Your users aren’t trained as designers, so they may not be able to argue their case convincingly in the language of design. If you dismiss all negative user feedback as mere stubbornness, you’ll miss important warning signs when you’re about to make a mistake. People have certainly been telling Google that they don’t like the new GMail interface, but it doesn’t seem like Google has been listening.
Change aversion might be a real thing, but designer arrogance is a real thing too.
* – “un-provements”: a word that I just made up because English lacks a word for discrete ways in which something has gotten worse. What would you say here? “three degradations”? “three backslides”? “three worsenings”?
Like this:
Be the first to like this post.
Subscribe to:
Posts (Atom)