If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Proof of the Cauchy-Schwarz inequality

Proof of the Cauchy-Schwarz Inequality. Created by Sal Khan.

## Want to join the conversation?

• Why does he bring up an artificial function out of thin air? [p(t)= ty-x] • This reminds me of the uncertainty principle... Anyone know if there's a relation? • why is |c| ||y|| = || cy || ? •   || cy || =sqrt((c*y1)^2+(c*y2)^2+...+(c*yn)^2)
=sqrt(c^2*(y1^2+y2^2+...+(yn)^2))
=c*sqrt(y1^2+y2^2+...+(yn)^2)
=|c| ||y||
• I understand the outcome of this proof; but can anybody please explain what insight compels the decision to evaluate P(t) as P(b/2a)? It seems like if you didn't just inherently know to do this, by previous research into the Cauchy-Scwartz inequality, you wouldn't easily come to this substitution. Is there a naive way to come to b/2a? •  b/2a is something that comes up a lot when solving quadratic equations. All quadratic equations form a parabola which is given by the equation:
`ax² + bx +c = y`
All parabolas have a single vertex. The left and right sides of the graph are symmetric about the vertex, so if we can find the value at the vertex, we can easily find other information about the parabola and easily solve for x given any value for y (and a, b, and c obviously). If we change our equation into the form:
`ax²+bx = y-c`
Then we can factor out an x:
`x(ax+b) = y-c`
Since y-c only shifts the parabola up or down, it's unimportant for finding the x-value of the vertex. Because of this, I'll simply replace it with 0:
`x(ax+b) = 0`
Now, we just solve for x:
`x = 0` and
`ax+b = 0`
`x = -b/a`
This gives us 2 values of x that are an equal distance away from the vertex point. So, the vertex point is the value perfectly in between them (or the average). This gives:
`vx = (0+(-b/a))/2` or
`vx = -b/2a` (vx is the x-value of the vertex)
If you have any function, you can shift it left or right by changing the input:
`f(x-h)` shifts the graph `f(x)` to the right by h units.
So, when Sal inputs `b/2a` into the equation, what he's doing is inputting the value that will shift the vertex point to x=0. It's somewhat complex, but hopefully this helps.
Here's a website that talks more about the vertex of a parabola:
http://hotmath.com/hotmath_help/topics/vertex-of-a-parabola.html
• when he had 4ac >= b^2, is that just a coincidence that b^2 - 4ac is what is under the radical in the quadratic formula? •  Of course not, it comes from completing the square.

ax^2 + bx + c = 0
4a^2x^2 + 4abx + 4ac = 0
(2ax)^2 + 4abx + b^2 + 4ac - b^2 = 0
(2ax + b)^2 + 4ac - b^2 = 0
(2ax + b)^2 = b^2 - 4ac
2ax + b = ±√(b^2 - 4ac)
2ax = -b ±√(b^2 - 4ac)
x = (-b ±√(b^2 - 4ac)) / (2a)
• is there a difference between |x| and ||x|| •  |c| where c is a scalar is the absolute value of c.
|x| where x is a vector (note the bold letters) is equal to ||x||. There are multiple terms for this notation, it is called the following things: absolute value, norm, length, and/or the magnitude of the vector.
The difference is that ||c|| where c is a scalar doesn't really make sense, I have never seen that notation used for calculating the absolute value of a scalar.

Notice that in Sal's video here, he has |x dot y|. Remember that a dot product produces a scalar, so he is taking the absolute value of the scalar that comes from that particular dot product. In other words
||x dot y|| would not make sense, or maybe is just uncommon notation, as you don't usually see ||c|| (I will ask my professor if ||c|| exists to clarify later).

Edit: Just talked to my professor about ||c|| where c is a scalar. He said it is fine to write that, but uncommon. It still means the absolute value of the scalar.
• What irritates me a lot is the strategy for the proof. In former videos the strategy is clear. He wants to proof a certain relationship, for instance that the span of independent vectors with rank n spans R^n, so he represents this relationship in an equation and looks if it solves.

In this video it appears all so random for. The proof starts with "Well, let's take some function..." Why a function? Why this function? "And now, let's substitute..." Yest, but why? I can follow the whole process, but it is not understandable why he does one thing in the moment he does it. Can anyone explain the strategy of the proof a little bit more concrete? Thx. so much. • Where is the equation p(t) = ||ty-x||^2 from? How did you choose it? • Look at the earlier videos introducing vectors and recall the insight about what happens when two vectors are colinear, that one is just a scalar of the other, and then think about what it takes to think of a simple formula where you know it is not guaranteed that two vectors are colinear.

Also, watching early videos in the Geometry section regarding lengths of sides of a triangle and the conclusions you can draw about the relations between those lengths for a group of 3 lines (or intervals) to form a triangle. Can you form a triangular shape if one side is the sum of the lengths of the other two sides? Or can you only form a line? Don't even worry about the right answer if you find it difficult, just try and think about what you concepts you are using in your head to prove it to yourself.
• Oh my god this solution is genius! But what if x y ∈ C rather than R? • Great question,

What facts did we use about the real numbers in this proof? It seems like all we used was that |x| = sqrt(x_1^2+x_2^2+...+x_n^2) right? So we need to define a "norm" on C. If z=a+bi what should |z| be? The answer is |z|=a^2+b^2. We can extend this to a norm on a vector by writing |z|=sqrt(|z_1|^2+|z_2|^2+...+|z_n|^2). Then the proof follows the video exactly from here.

If you want a stranger way to think about it you might see that if z=a+bi we can think of C as actually being a copy of R^2 since they both have real dimension 2. In this since we have the correspondence z = (a,b). This gives us |z| = sqrt(a^2+b^2) which agrees with the definition we choose above. Then C^n is a copy of R^(2n) and we are done.
• let a and b be any vectors and x be the angle between a and b. Since we know that a.b = ||a|| ||b|| cos x, can we prove cauchy-schwarz inequality as follow:

a . b = ||a|| ||b|| cos x
|a . b| = | ||a|| ||b|| cos x |
|a . b| = ||a|| ||b|| |cos x|
|a . b| <= ||a|| ||b|| since |cos x| <= 1 (proved, this way is much easy to understand in my point of view) Please comnment! • Well, this would be fine, but the thing is that we usually define the the angle between two nonzero vectors `a` and `b` in `n`-space to be the number `x` for which `cos x = a · b / (||a|| ||b||)`, and the Cauchy-Schwarz inequality shows us that there is a unique such `x` in the interval `[0, π]`.