Ben Hyde is raising two concerns with TypeKey in his series on authentication: the privacy implications of attributing users global and unique IDs and the lack of federation (the system is centralized).
Personally, I'm ready to accept this risk in exchange for a good single sign-on, at least when it comes to community sites. But I understand that some privacy "freaks" may not be as comfortable.
The fix that Ben suggests for the privacy concern is to use IDs that are local to each site. This way it is not as easy to track a user from sites to sites.
The problem with this approach is that it seems to restrict the possibilities to reach TypeKey's goal, which is reduce comment spam. A lot of power is lost if each site has to build his own list of known spammers and can't cross-reference it with that of other sites.
Cross-referencing and privacy
I had this gut feeling that the two objectives (ensuring users' privacy and fighting spam) were difficult to reconcile, because I assumed that a distributed reputation system required a unique ID across all participating sites.
Then, I read some of Norman Hardy's articles about identity, where he underlines that you don't really need to know who somebody is as much as you need evidence to his character.
Since TypeKey has to be able to map these local IDs to a global one, why not have TypeKey do the cross-referencing?
Here is a proposed solution.
Let's assume that each TypeKey-enabled site maintains a list of local IDs for the users that it trusts and one for the users that it mistrusts (because they commented junk/spam in the past).
If TypeKey can get this data from each site, then for each user, it could build a list of sites that trust him or mistrust him.
Whenever a user logs onto a site, these two lists would be passed along so that the site can make an informed decision, based on other sites that it trusts.
There are two technical problems that need to be solved in this approach: how does TypeKey securely get the lists of local IDs from each site and how does it communicate the list of sites that vouched for or against a user?
Passing lists of trusted and banned users to TypeKey
I could think of at least three ways that the two lists built by each site (trusted users and mistrusted users) could be gathered by TypeKey:
Build the list of sites that trust or banned a given user
When a user logs in, we'd need TypeKey to securely provide a list of sites that vouched for him and a list of those that banned him. An actual list of the site tokens is too big. The solution is to use a bloom filter instead of the list, as it is much more efficient storage-wise.
LOAF is a very interesting system that uses Bloom filters to share your contact list without revealing any of your contacts' email address.
Here is the original paper "Space/time trade-offs in hash coding with allowable errors" from Burton Bloom.
You can also read O'Reilly Networks' "Using Bloom filters" article.
Another explanation (from Steven M. Bellovin):
A Bloom filter is an array of m bits, initialized to zero. It requires a set of k hash functions that are independent and produce uniformly distributed output in the range [0,m-1] on the possible inputs.
To add an entry R to the filter, calculate
b = H1(R)
b = H2(R)
b[k] = Hk(R)
and set bits b[i] to 1 in the array.
To see if a record exists, calculate the same b[i] and check the bit values. If all k bits are 1, the record exists; if even a single bit is 0, the record does not exist.
In a database of any reasonable size, it is not possible to determine the input records from the bit array. Many different records can set any one bit; there is no way to tell which records actually did.
The size of the filters has to be adjusted so that the rate of false positives is acceptable.
One thing about Bloom filters is that you can add new entries, but you can't remove old ones. But it is possible that you change your mind about somebody you trusted, but that you now want to ban.
You could deal with this in two ways: TypeKey could re-generate the filters frequently based on the lists that they store or they could choose to only store the filters but have them expire every couple of weeks.
The second solution seems far better in terms of storage, for TypeKey, but then filters need to be cycled in some way. For example, a filter could last for two weeks and then a new one is rebuild from scratch. This means TypeKey needs to store twice the number of filters (the current and the next generation).
The time that it takes for a change to propagate (when you change your mind about somebody) can also be improved by using more "exception" filters, used to "remove" somebody from a previous filter...
Notice that Bloom filters could be used with the current implementation of TypeKey (using global IDs) for a site to efficiently share the list of users that it vouches for and that of users that it banned. You would fetch these lists from the blogs you trust and incorporate them into your moderation system.
Another option is to use the users' public keys in the filter instead of TypeKey IDs, to enable a distributed (TypeKey-less) reputation system.
Federation is the tech term for replacing a single authentication provider with a number of them.
One of the best examples of real-life federation is the ATM card system. More than one bank issue these cards, but they trust each other so that you can use your card from any ATM.
Another example, email, can be considered a not-too-secure federated authentication system (your ID is your email address). But central authentication systems like Passport or TypeKey are exactly the opposite, as they only allow one provider, and most site that use them are exclusive, as they only support one or the other, not both.
Kerberos is an authentication system that supports some federation, in the form of Kerberos realms. With Kerberos, if Yahoo and AOL trust each other, you could get authenticated by AOL and then go shop at Yahoo without needing a Yahoo ID.
The Liberty protocol (from Liberty Alliance) tries to achieve that as well. I'm not very familiar with the details, but O'Reilly Network has an overview.
Who do you trust?
Ben seems to think that a federated system is definitely better than a centralized one. I think there are obvious advantages like allowing interop, competition and enhanced network effect, but also the difficulties with federation go beyond the simple challenge of sending the user to the appropriate authentication provider when he needs to sign in.
The real problem is with the service that is going to consume the identity assertion. Which identities/providers/realms should it trust?
You wouldn't let any "bank" join the VISA network, would you?
Or if you are Paypal, would you choose to support users accounts provided by Passport, TypeKey or both? What is the risk you are taking by integrating TypeKey into your business? If TypeKey is found to have a security hole, how confident are you that it'll be handled to your satisfaction?
But we can assume that building a business is not the goal here, only to offer single sign-on to community sites and help fight comment spam...
Still, spammers could start creating hundreds of authentication services, or hacking into some competitors (that aren't as well administered/secured as TypeKey might be) to create spam accounts or hijack legit accounts. As a consumer of identity assertions you still care about the issuer of these.
Each user as his own identity provider
That's why authentication systems that aren't based on a provider are easier to manage. You don't care that a third-party authenticates me, you only care that I am the owner of a public key (or a public url) and that I can prove it. That described some of these in this post about authentication by url (which also mentions authentication by PGP public key).
With this kind of approach, each user is responsible for his own identity and sites don't assume any level of trust with a user just because a third-party said so... This is effectively separating the problem of authentication and the problem of trust/reputation.
Other forms of assertions could be created to handle the reputation part, going back to Norman Hardy's idea that "Certificates should relate to Character, not identity."
As a follow-up on my deconstruction of the TypeKey protocol Tomas suggested to do a threat modeling.
I'm still waiting on a more official documentation of the protocol to do that, but here are the risks I could think of so far (in random order):
Some more pointers:
A great introduction to Bloom filters, with an approachable analysis of the size and false positive trade-offs, as well as overviews of many practical applications. (via)
A quick overview of many variants of Bloom filters (video).