If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Bitcoin: Cryptographic hash functions

What cryptographic hash functions are and what properties are desired of them. Created by Zulfikar Ramzan.

Want to join the conversation?

  • blobby green style avatar for user Michael Orlík
    Generally speaking, is it really so difficult to de-crypt the digest (hash)? Let's say I have a digest which was produced by the SHA-256 function. It applies to this function that one input always returns the same output. Now, if I know the algorithm that is employed in the calculation, then I should be able to trace back the steps and convert the digest back into the original input, shouldn't I? As far as I know, every mathematical function has its inverse function.

    I'll be grateful if anyone sheds some light on this. :-)
    (6 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user alexwhittemore
      Your assumption that every mathematical function has an inverse is the problem - that's not actually true. It's not even really close to true. For example, consider f(x)=0. The fact that you know the output (0) tells you absolutely nothing about the input, because you threw away all the information about the input during computation. Hash functions are similar - I can hash an arbitrary length input to a deterministically 256-bit output. How could I possibly derive in input such as 2 billion bits (2GB) of perfectly random (incompressible) data from its 256-bit hash? The information about the input required to recalculate it is simply thrown away in the function's output.

      The key, as Quinn and others stated, is to find COLLISIONS. Now, 2^256 doesn't seem like a lot. I can come up with 256 bits pretty quick. The thing is, to enumerate ALL POSSIBLE COMBINATIONS of 256 bits takes a VERY long time. And if you wanted to store the enumeration in a file, it'd take 2^256 TIMES 256 bits to do it - that's 3.7*10^78 bytes. For reference, an estimate of all the data ever produced by humanity is 256 exabytes, or 256*10^18 bytes. (http://www.smartplanet.com/blog/thinking-tech/what-is-the-worlds-data-storage-capacity/)

      Let's say that didn't deter you. You want to find hash collisions - you're cool not storing ALL hashes, maybe just interesting ones. To start, you figure "I'll just hash every integer, from 1 to 2^256" at which point you are guaranteed to find at least one collision with some known current hash (let's say finding this hash collision lets you spoof a transaction and get a bitcoin). A reasonable hash rate on a good computer might be 1GH/s (billion hashes per second) - it'll be at least within a couple orders of magnitude of that from the very best hasher to the very worst. 2^256 is the number of hashes you might have to do, and you can do a billion a second, 2^256/10^9 = 1.16*10^68 seconds. For reference, an estimate of the age of the universe is 4*10^17 seconds.

      Hopefully now it makes a bit more sense why, while 256 bits SEEMS like a relatively small number to need to put in the right order, doing so actually requires an inconceivable amount of work.
      (21 votes)
  • winston default style avatar for user Gard
    Even with these great informative videos I find it very hard to grasp the bitcoin subject. Does this subject require some former knowledge of information technology, algoritms and or cryptography? In other words, at what level do I need to start to be able to grasp this subject easily? Thanks:)
    (18 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Zakaria Midjiyawa
    What are the guarentee that these methodes are not reversible ? have a bitcoin transaction ever experienced hacking ?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • sneak peak green style avatar for user Cybernetic Organism
    Was it ever designed to be a "mass market" currency? It seems that bitcoin and any other potential crypto-currencies might be just to overly complex for average person to engage with.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • primosaur seedling style avatar for user Liam
      Currently yes, definitely too complex - like how the internet was in the early days. Then entrepreneurs who understood the potential came and started solving the problems. There are a huge amount of bitcoin startups racing to solve these problems of mass adoption. It will get easier to use bitcoins with time.
      (5 votes)
  • male robot hal style avatar for user domach1984
    Ok so in layman terms I do have a question. Would not the easiest way to make the two random inputs to map to the same message be to give each input its specific number of first entry into the system? There cannot be 3 entries in place number 451 or any other fixed place/point of entry?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Angelo
      1. The function has to be deterministic and create the same result everywhere. Creating a local table would mean different people have different digests for the same input.
      2. There are (theoretical) infinite inputs and only a finite number of digests, after the 2^246th message either the table would not give an output or would give a duplicate value anyway.
      (2 votes)
  • leaf green style avatar for user nickjarv
    What are our expectations for the adaptability of the Bitcoin security system in the long term, say, to the intellectual and technological environment that may exist a generation from now? For example, is there any likelihood that the Bitcoin network's current hashing regime would ever need to be modified or replaced in the future, such as if something even better comes along that they hadn't thought of back when this was first designed? If such a change was necessary, who would be responsible for making that decision? And could the existing network and ledger survive such a change intact, or would this require the creation of a new crypto-scrip scheme to replace the bitcoins?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • ohnoes default style avatar for user Tejas
      The Bitcoin rules can be changed. The way it works is that there are a bunch of developers who regularly propose to update the network. Most of these updates are minor, and so the nodes will make the update. However, if the developers propose a change that the nodes don't approve of, they won't make that update. More than 90% of the miners need to make the update in order for it to become part of the network.
      (2 votes)
  • old spice man green style avatar for user Dania  Zaheer
    Can you or can you not decrypt/reverse a hash function?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • leafers seedling style avatar for user Vincent Merkel
    If i got that right. A Bitcoin is a Collision of to Inputs, that result in the same output.

    Due to the fact, that the Input can be any length, this would mean, that there is a unlimited number of Inputs that can collide. Doesn't that mean, that the More Bitcoins exist, the less is thier worth?

    (Sorry for my bad english; im not a native speaker)
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Bank Investor
    good stuff. this is way too confusing still. will there ever be a video on the step by step basics? Basic questions i am talking about are these:

    alice Vka sends bob VKb 50. got it.

    only 1 transaction must be made, no matter how big your account is, lets say Alice has 10,000 bitcoin.
    she would have to send 50 bitcoin to bob and 9950 back to herself?

    where is the bitcoin miner price listed?
    is there a section where you could type it in?
    i.e. bitcoin miner fee ____

    what about the rest of the transaction... Alice will not send Bob bitcoins for free...will bob send a product back to Alice?

    what if many people , Jay, Alice, John, send Bob 50 bitcoin? how does BOB know alice sent the bitcoin and not Jon? so Bob knows where to send the product

    could you please over the very basics? not what it is, but the process in a transaction? thanks
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user scirocco21
    At where you talk about collision resistance (it should be hard to find two inputs that produce the same output), I was not quite sure why that is a desirable feature for a cryptographic application. How would an application that is not collision resistant be vulnerable to being exploited? Thank you for clarifying.

    Edit: Is the worry that one might be able to reverse engineer the hashing algorithm once different inputs can be mapped to the same output?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user SIPOX11
      The algorithm cannot be reversed engineered, since lots of information about the input get lost during the hashing function. There are a very large number of possible digests, but it is finite nevertheless. There are an infinite number of possible inputs, therefore there must be collisions.

      However, if the algorithm is well designed, let's say that if you brute-forced a hash it took you X number of years to find the original input. If the algorithm is not collision resistant, if there were 100 times more collisions than in another similar algorithm, it would take you X/100 years to crack it.

      Now if X= 100 years, that's an acceptable risk to take for your security. But X/100 = 1 year of computing is definitely not enough.
      (1 vote)

Video transcript

Voiceover: Cryptographic hash functions are basically fundamental building blocks that are used within many cryptographic algorithms and protocols, and they have a number of very important applications in the context of information security as a whole. Now, some of the more common algorithms in this category that are known as cryptographic has functions include: things like MD5, and also, it has some predecessors like MD4 and others, as well as algorithms like SHA-256, and actually, SHA-256 is preceded by other algorithms like SHA-1 and so on, and also there are some algorithms that maybe you've heard of, or maybe that are a bit lesser known, but things like RIPEMD, and BLAKE, and Skein, and others. Now, cryptographic hash functions are basically used as critical building blocks in many applications, and really the first motivating application, the first historical application of these types of hash functions was in the context of what's known as a digital signature, and digital signatures are used in so many different cryptographic applications today. They're the cornerstone of many ecommerce protocols. They are used in doing things like bitcoin generation and so on. Cryptographic has functions are also used in things like message authentication protocols, in pseudorandom number generation and password security, even encryption to some degree. In fact, aside from their use in digital signatures, these hash functions are also used in other places in the bitcoin protocol as well. First of all, let me talk about what a cryptographic hash function actually is, and of course, as the name implies, the first thing it is, it's a hash function. And by hash function, I basically mean that it will take input. It's a mathematical function, a transformation, if you will, that takes a particular input, and we call this input a message, and the message can be of arbitrary length, and the hash function basically applies a mathematical transformation, or maybe a set of mathematical transformations to this input to produce a single output, and we typically call this output a digest, although, sometimes you will see the output referred to as a tag, or as a hash, or as a fingerprint, but digest is sort of the most common nomenclature. In fact, MD5, which was one of the earlier hash functions, stands for message digest 5, and MD4 was message digest 4, and so on, and so forth. Now, the message, as I mentioned briefly can be of arbitrary size. It can be as long as you want, or as short as you want, but the output, the size of the digest or the size of the tag, is going to be fixed in length, and for example, in the context of a hash function like, let's say, SHA-256, the digest will actually be exactly 256 bits in length. It's going to have a fixed-length output, but an arbitrary-length input. The other thing I want to point out about these cryptographic hash functions is that the function here is a deterministic function, and by that, I mean that the output will always be the same for a given input, so if you have a given input, you're going to see the exact same output. You won't have a situation in which, maybe, a given input will have two different possible outputs. It's going to be consistent. Now, traditional hash functions have been used in computer science for quite some time, and they are used in many computing applications, so, for example, you may have heard of something like a hash function used in something called a hash table, but the kinds of hash functions that are used in hash tables, and I want to stress this, they aren't necessarily the same as cryptographic hash functions. The qualifier "cryptographic" here is very important, and it usually means or implies that that hash function has to have a certain set of critical design goals and maybe some particular properties in mind that make it suitable for use in applications that use cryptography or cryptographic applications, areas where perhaps security or privacy or confidentiality or authentications are a serious concern. First and foremost, maybe in describing some of these properties is that, and maybe this almost goes without saying, one of the first properties you want of a cryptographic hash function is that it should be computationally efficient, and by that, I mean that it shouldn't take a lot of time to compute the output from a given input. If you're given a message and you want to apply this set of transformations to that message to get a digest, that set of transformation should not take a long time to implement on a computer. It should be very fast, or relatively fast. It almost goes without saying, but I think it's important to stress and point it out because I've seen people come up with grossly inefficient hash functions sometimes, and those would not be considered suitable in the context of when typical cryptographic hash functions are used for cryptographic applications. The second property you typically need in the context, and this is especially in the context of digital signing, is that you want it to be the case that it's hard to find two inputs that actually map to the same output, and I mean two distinct inputs whose corresponding digest is identical. This property is typically referred to as collision resistance. That means it's hard to find a colliding pair of input, so in other words, if you have two inputs and let's say you have a message M1, and you have a message M2, their output under the application of the hash function should not be the same. You won't ever have it be the same that the, you won't even have it be the case, rather, that the output of M1 and M2 under an application of the hash function is the same. They should never be the same. It should always be different. Now, I should take a step back here and point out that of course in practice, given that messages can be of arbitrary size and given that the input or the output, rather, is a fixed size, it's not mathematically possible to guarantee that the output will always be different for two distinct messages, but what you typically want is not that the outputs are necessarily different, but that it's hard to find two distinct messages that produce the same output. We know they exist by virtue of the fact that there are many messages that can be hash, and only a finite, small number, relatively small number compared to the number of messages, a smaller number of possible digest values, but aside from that consideration, it should be hard. It should take a long time, and by long, I mean an astromonically long time to find two distinct messages that would result in the same output under the application of the hash function. Now, the third thing that I want to point out is that in many cases, you might want, also, in the context of a hash function, for the hash function to be able to hide information about the inputs. In other words, given the output, it should be hard to glean anything useful or interesting about the input. Anything, any relevant detail, and I don't just mean the whole input, but maybe even something as simple as was the input an even number or an odd number? I mean, that should be the kind of thing that should be hard to infer when you see the output, even something as simple as the, the evenness or oddness of the input. Now, a fourth property I want to point out is that you typically want the output to be well distributed. In other words, the output should look random. In other words, it should look like a set of coin flips took place, not that there was a predictable way in which the output was created. Really, what that means is that, and you can think of it maybe more conceptually as it's almost as if you flipped a bunch of coins to get to the output. It should look just that random. And so what you can maybe think of cryptographic hash functions as, as it's, perhaps, maybe the mathematical equivalent or mathematical analog of a meat grinder of sort. It can take inputs and apply these mathematical transformations to them such that the output looks, for all intents and purposes, completely random and unrelated to the original input. I do want to make a few quick remarks about these particular properties, and first of all, they are interrelated. For example, if you have a situation where the outputs, let's say, really appear to bear no relationship to the input and the outputs also look random, then that will, in fact, give you to a large degree, a lot of the collision resistance properties because just the fact that you can't predict the output and the fact that it hides all this information implies that it's going to be hard to find two inputs that are distinct that map to the same output. And so sometimes, you get one property in exchange for the others. The second thing I want to point out is that typically, these properties in practice, or maybe in the underlying mathematics, are things that you hope for but you can't always guarantee that they'll always hold, and it may be entirely possible that you could design a hash function that you think is completely collision resistant, but someone might come along a year from now and come up with a more clever way for finding a collision. Maybe they'll find a clever shortcut that does not involve doing a brute force search of any sort. It turns out that cryptographers, for better or worse currently do not have any mathematical techniques. They haven't developed techniques for being able to work around some of these limitations. So we often do take it on face value that these schemes are secure based on how long they've been around. Now, I also want to point out, and the last thing I want to point out, is that this treatment that I've given is not meant to be mathematically rigorous by any stretch of the imagination. There are far more precise ways to describe these underlying properties, but my hope, instead, is that this video gives you, maybe, a bit of a flavor for what is required of a cryptographic hash function without maybe getting bogged down in some of the mathematical minutia and formalism.