Security by encryption, salting, hashing, obscurity and other means

grahroll · 5 November 2022 03:37

Posts about benefits, drawbacks, methods and why and why not, are welcome here.

Gregr · 2 November 2022 03:34

The Gov needs to specify how document identifiers are stored by organizations, not just that they be used for user verification, and retained for possible audit or law enforcement purposes.

One way is the document ID is encrypted with the Gov issuer’s public key and then stored by the business. The original plain text is discarded.
Only the issuer will know the related private key to decrypt it.
If the, say, passport ID needs to be checked against user information held by a business, then the encrypted version will need to be sent to the passport issuer for decryption. Same for medicare, licence, etc.

Another way is to only store a hash of the original. The business again discards the original ID. Only the document holder, and the document issuer will know the original ID used to create the hash value.

In both cases a data breach of a business will not reveal document IDs that could be used by hackers.

The business could of course not have user verification functions that relied on users providing IDs in plain text against data held in their databases in the encrypted case, but hashing certainly yes since the hash value will match.

person · 2 November 2022 04:46

Yes - maybe. Or maybe a complete rethink about how identification is done.

That could work - or variations on the theme. This is probably the easiest fix from among the choices of: complete rethink, encrypt, hash.

I don’t think this works. In the event that the document is found later on to be fraudulent in some way, the document presenter (the fraudster) will be nowhere to be found, the document itself will be nowhere to be found, and noone (not even the issuer) can reverse the hash to get the original document id in order to even begin the process of investigating the fraud.

(In reality though the search space is far too small and in fact a hash would be easily reversible by anyone in no time flat - so not very secure.)

And a hash can be forged easily by a fraudulent “telco”. (Someone may be thinking that the exact hash algorithm is kept “secret” but that’s obscurity, not security, and comes with the disadvantage of inventing new secret hash algorithms that are actually any good.)

If government thinks that it can trust the telco then there is no need to store anything at all. So we have to assume that government does not trust the telco.

Possible a HMAC could be used instead of a hash in order to solve both these problems (easy forgery and easy reversal).

Additionally the document ids themselves could be made much much longer, with the obvious negative in usability.

But yes that property could be met in either case.

(For the basic purpose of defeating a data breach, even just storing the document id encrypted by the “telco” is better than plaintext, particularly if the encryption key is kept on another system.)

Also, note that with either encrypt or hash, a fraudulent “telco” can simply clone the encrypted value or the hashed value from one account to another - so you really want something that is harder to do that with and get away with it - such as the unforgeable token that I proposed earlier.

Gregr · 2 November 2022 05:06

Do let me know when you, or someone else, or some spook organizations manage to determine the original text from a current hash algorithm using salting.
Very, very secure according to the security world. Which is why it is used widely with confidence.

And if the verifying document was fraudulent, forged, stolen when used, what difference does that make in how it is stored?
At least stored encrypted or hashed, and if it cannot be put back to plain text, or verified, it cannot be used again.

The trouble with business using their own keys for either symmetric encryption like AES or asymmetric using public/private keys is that the whole responsibility of maintaining the secrecy rests with them. And that requires trusted administrators and their access being secure.
Back to Medibank leaks, with an extra step in the decryption keys being revealed.

postulative · 2 November 2022 06:25

The important difference between your posts is the salt, and the assumption of unique identities. Without salt per instance of use, it would indeed be simple to just use the hash output for several business purposes. With a salt, you cannot simply copy it over because you need to be comparing your output with the right (unique) input.

No a hash cannot be reversed. Yes, hashed data can be guessed based upon how often it repeats in a dataset if there is no salt. It is possible to throw data at a system that does not salt until it turns into the hash you are after - but without repeated data (e.g. hackers know the most popular passwords and so will try them out), or an understanding of the format of the input (oops - credit card numbers have a standard format), you are going nowhere fast. And even with one of these pieces of information, a random salt means that every database entry will be unique and none will match that ‘standard format’.

Gregr · 2 November 2022 06:55

Indeed. Hashing without salting is fairly useless as was realised long ago when hashing passwords.
If you had lots of users with qwerty or password as their, well, password they would hash to the same value.

However, it would be expected that an issuer of passports, licences, medicare cards would have unique IDs for each identity issued, and therefore a unique hash value. In fact, salting is really not needed at all.

All that is required is that the hash value is of a suitably long binary string to make brute-force guessing infeasible. And the algorithm works very well to avoid any two different input text values to hash to the same value. Which modern hashes focus on.

person · 2 November 2022 08:20

First of all, nowhere did you mention salting.

Secondly, salting provides a good level of benefit if you don’t know the salt. Salting provides less benefit if you do know the salt. In a typical “stolen database” scenario, the hacker knows all the salt values.

Also, if you are going to mention salt then for this scenario, you need to specify whether the “telco” generates the salt or the government generates the salt - and who stores it.

So if I know that a customer has been identified with a NSW drivers licence (not a huge stretch) and I know the salt and I know the hashing algorithm and I know the resulting hash value, iterate through all 1 billion drivers licence numbers … licence value recovered in seconds with off-the-shelf hardware that any self-respecting organised crime gang could acquire (probably minutes with the computer on your desk).

The two main benefits of salting are:

equal hash values does not imply at all equal plaintext values (not really important for this application anyway)
hacker can’t use rainbow tables

If you are going to use hashing for this application then the most important thing will be to use many, many rounds of the hashing function, in order to slow down the hacker’s brute force, offline attack.

Best case scenario: The government generates the salt and stores it and does not provide it to the “telco”. Unfortunately that allows the government to repudiate its acceptance and also that does not allow the “telco” to verify the hash. (In other words, government might love this but everyone else will hate it.)

Next best scenario: The salt is provided to the telco but the telco does not store it but does verify the hash at the time before storing the hash.

The document itself will be fraudulent. The photo on it will be fraudulent. The document id (that will pass the Document Verification Service) is legitimate. So if a criminal has used your id to open a bank account that will be used to funnel money out of Australia then the cops want to know whose id has been used in order to warn you and in order to confirm that you were an unwitting accomplice (so they want to be able to recover the document id) and you still want your id protected as well as possible from a second criminal even if your id is being used fraudulently on someone else’s account by the first criminal.

I remain of the view that encryption is a good solution here whereas hashing, with or without salting, is not.

All of that is true - but we should perhaps not assume that government will employ only trusted administrators who will never stuff up and that government will maintain perfect security.

Gregr · 2 November 2022 14:21

Very good point. Typically the salt will be stored in plaintext. The hashing algorithm will be known. And let’s say the hashed value is a medicare number plus salt, and the original input format as a medicare number is also known.
A 10 digit number. That is 10 billion guesses, which is a very small keyspace these days to match a hash.

So, I retract the idea of hashing, and go back to public/private key encryption.

postulative · 3 November 2022 07:22

There is no reason not to make the hash larger than the input.
Why limit the salt to a single character? Use a 20 character salt, and you’ll be safe even if the data to be hashed is only three characters.

Gregr · 3 November 2022 11:54

Don’t understand what you are saying.

A hash value is a fixed size regardless of the input size.
Whether the input is one byte or a thousand bytes.

And where did a one character salt value come from? Not from me.

postulative · 4 November 2022 06:57

Yes, and can be much longer than the input. If you include a lengthy random salt, I have no idea why you would be concerned about the original input being ten characters because the output has no such limitation. Each additional character added by the salt is additional entropy for the hash function’s output.

Gregr · 4 November 2022 13:27

The problem is that if the original input format is known, say a 10 digit Medicare number, and then a salt of known value is appended or prepended which is what normally happens, then the search to find a matching hash comes down to trying each of the possible numeric values that could be in a 10 digit decimal number.

That is a keyspace of 10^10, which is considerably less than a 10 character password, which in traditional ASCII would be at least 95^10 if the control codes are not used in the password.

That is the unknown. If the salt is known unless kept secret, it is not a factor at all in the keyspace search. The salt value would have to be kept separate from the hashed data and accessible by basically nobody. It is generated randomly by a secure function, and accessed only by a secure function.

postulative · 5 November 2022 01:11

So have I!

Oops, wrong forum.

The salt is applied before hashing, and so is an integral part of the input. Doesn’t matter if you know the salt or not, the salt extends the length of the input to the hash value and so the ten digit example is meaningless. This is why you always salt before hashing - salting afterwards adds no value.

Gregr · 6 November 2022 03:22

Except that extending the length of data inputted to a hash function is not the purpose of salting. Never was.

The purpose is to add a random string of bits to a some data to try to make what could in many cases be non-unique and therefore produce the same hash value (like lots of users using the same password), into something unique to produce a unique hash value.

Now, since the salt value is associated with the data before hashing, it must be kept to be used with the original data when rentered in order to produce a matching hash value as in logins.

In Unix systems the salt is stored in plain text along with the userid in plain text with the hash value. However, it is in a supposedly secure ‘shadow’ password file accessible only by root.

Unlike encryption and decryption where a secret key is used for the decryption that is separate from the data, with hashing there is no ‘dehashing’. The task of a hacker is to try and guess the key used in the original hashing, which is the original data itself. This may or may not include a salt value, but if the salt value is known, then that is irrelevant.

If the original data format is unknown, and it could be anything from a pin, to a password, or a string of hundreds of characters, then the guessing task is very arduous.

But if the format of the original data is known, and I have been using the case of a ten digit Medicare number, then the length is known, the values that could be used are known, and the hacker task comes down to trying each value of numbers that could occur in a ten digit number, produce a hash using the known algorithm and known salt value, and see if it matches the stored hash value.

Much easier than decryption where a key value needs to be guessed, tried, see if the decrypted result makes any sense or works for a login, and if not move on to the next guess. Time consuming doing all that checking.

With hashing, the algorithm is very fast, and the only checking is seeing if a computed value matches the stored value.

postulative · 6 November 2022 09:09

Except it isn’t. The hash is not a reversible process, so if you have “12345” as the password and “abcde” as the salt specific to that user and password any attack has to break an effectively ten character password (“12345abcde”) even if they know five of those characters. Knowing half of the hashed input is of no help at all to cracking the hash.

No, this is not the purpose of a salt - which is to ensure that each individual input is unique. It is simply a handy side-effect.

So yes, the hacker knows that the pool of Medicare numbers is less than the pool of ten digit numbers, and can calculate that pool. As soon as a salt is added, the pool is expanded - whether by a single digit (in which case it is ten times the original pool) or by 50 characters (and no, I am not going to do the maths). Even if the hacker knows that the input is Medicare number with a 1 on the end as the salt, the number of possible combinations to produce the give hash output is still ten times greater than just the ten digit Medicare number.

Gregr · 6 November 2022 11:50

Exactly. You do not ‘crack’ the hash value which would be typically at least a 256 bit binary string in SHA2.

You guess what the input may have been to create that hash value.

And if you tell a nefarious hacker type how long the input was, and give away part of that input freely as the salt, then whatever is left is the part to be guessed.

Gregr · 6 November 2022 14:09

Here’s the maths.

Ten decimal digits gives you 10^10 combinations, which is ten billion.

The same ten unknown digits with one or fifty known and therefore fixed digits appended as a salt, is 10^10 combinations, which is ten billion.

Perhaps an analogy could help.

You have a combination padlock with five wheels on it, each having ten positions from zero to nine. Looks like I have 10^5 combinations to try, or ten thousand. Might be there awhile. But the last two wheels are fixed, or I already know them, the salt so to say. To make what is essentially a three wheel padlock look more formidable to crack.
My cracking task is reduced to 10^3 which is one thousand. Done by coffee break time.

postulative · 7 November 2022 06:35

Yes, I mansplained (messed up the maths) in my two posts, and made it sound harder than it is to crack a hash when you know the salt.

You are correct in stating that if the salt is known then an attacker need simply try all possible Medicare card number permutations with that salt as applied to the input (prefix, suffix, multiplier… I think in most cases a simple suffix).

If the salt is unknown by the hacker, then they do have to try all possible Medicare card number permutations with all possible salts - and even if the salt format and length is known that adds at least ten times the difficulty.

Then you are a lot more dexterous than me. Changing a padlock combination 500 times (so 50% chance I get it open) would seem to me a lengthy, arduous and… boring task. Can’t we get a computer to do it? And of course now I have looked at the numbers and figured that 50% should take less than 45 minutes at one attempt every five seconds.

person · 7 November 2022 11:28

In practical terms it won’t matter here and may not matter at all but … I have seen it suggested that, at least theoretically, with a plaintext salt that can be assumed to be known to the attacker it is not a good idea for the salt to dominate the combination of salt and password. Note that this is a crypto-theoretical point about (password) hash functions, not a point about the ease of brute forcing (which depends only on the length of the password).

Hence, for example, with the Linux passwords in /etc/shadow, the salt is limited to a maximum length of 16, I believe.

Gregr · 8 November 2022 05:32

One problem with salting passwords to create a strong and unique hash, is that it takes away the reponsibility of a password validation system to allow sensible passwords in the first place.

Commercial Linux systems I had logins for, Red Hat distro usually, had very onerous new password generation checks. Not at least 10 characters? Reject. Not at least two uppercase, two special chars, reject. Any group of chars that would appear in a standard dictionary, or could be considered a date? Reject. Any password a user has used in the last 13 months? Reject.

By the time you set a password that would pass the validation checks, there was almost no chance you would ever remember it a few day’s time and have to write it down or use some sort of password manager.

Join the conversation

Ask a question. Share tips. Help others.

Make yourself heard

Join our forum and be part of Australia’s biggest consumer movement.

Security by encryption, salting, hashing, obscurity and other means