Revisiting the Username Hash: Mitigating the Vulnerability of Username / Password Reuse
In a typical system, there are several “usernames”. There is the name that the user types to begin the login operation. Then there can be a “display name” (to be added to forum posts, automatic emails…), a contact email address, a billing name, a cardholder name… It makes relatively little sense to protect one of these names without dealing with the others, since they all contain, on average similar information (it may sound surprising, but, given the choice, many people will prefer to use their own true name for all these purposes).
The “login name”, i.e. the username which the user types to begin the login operation, servers the following role: it allows the server to efficiently locate the user-specific data. The server may have thousands of users, in a large database, and it is inconvenient to scan all of them upon each login attempt; you want to quickly find the “user identifier” from which you can get the hashed password for that user.
Login names are not usually considered to be secret data (if they were secret data, we would call them “passwords”), if only because it is normal and expected that the login name is similar to the user’s administrative identity. Nevertheless, if you want to somehow hide the list of login names, you must use a deterministic injection: that’s a transform, such that given twice the same login name, you get the same output (that’s determinism, it is required for the “find the user-specific data” part to work, and it must be understood server-wide), and two distinct login names will yield distinct output (the transform must be injective, or at least should not allow collisions to appear with a non-negligible probability). There are several solutions:
- You can use a hash function such as SHA-256. You get worldwide determinism (that’s the same SHA-256 for everybody) and almost-injection with high probability.
- You can add a server-wide salt if you want to nullify any advantage an attacker may get from precomputations. The salt must still be the same for all login names on your server, because of determinism. The salt is equivalent to choosing your own, server-specific hash function. Note that an attacker who got the list of hashed login names will still be able to do a parallel attack on all the names (for each potential user name, the attacker hashes it — with the server-wide salt — and compares the result to all the hashed login names); thus, this specific salt does only a part of the job normally performed by a salt (but you cannot avoid it, because you need determinism).
- You could make the salt secret; then, it is no longer a salt, but a key, and the hash function is no longer a hash function, but a Message Authentication Code (such as HMAC). This will do you any good only if the attacker can get access to the database but not to the key (so that’s a restrictive attack model). Keys entail key management, something which is never simple, especially in the presence of multiple front-ends (they share the database, thus they must share the key — and secrecy is not utterly compatible with sharing).
Remember that all of this is about a second-level defense: the model assumes that an illicit read access was achieved by the attacker (and that’s already a big worry !). While such read accesses do occur in practice (that’s why we hash passwords), we must not forget that this should not be the primary line of defense; on the other hand, every hash or encryption adds complexity, which is the well-known nemesis of security. So there is a trade-off, and the usual wisdom is to avoid complexity if it would serve only for secondary protection of data which is already semi-public anyway (i.e. the login name).
As for the details, the rough outline of the scheme you suggest would work (you are free to consider as “password” the concatenation of the login name and the password typed by the user, if you wish — it will not make the scheme weaker). However, do yourself a favour: don’t invent your own hashing. For the hash of the login name and of the password, rely on bcrypt or PBKDF2. Your user-specific key encrypted with the user password is really the result of a Key Derivation Function; there again, you’d better use a proper KDF like PBKDF2; you would just have to store the user-specific salt. Both the “hashed password” and the “encrypted user-specific key” can be used as the basis for a dictionary attack (aka “trying potential passwords”), so they must include the same level of protection (many iterations to make the processing slow, a salt to thwart parallelism).
I would still consider it a poor trade-off: much added complexity, for little extra security.