Abstract: We give the definition of differentiating a function of a single real variable, motivated by infinitesimals. But we use rigor and Big O notation to avoid ambiguities.
1. Weird Numbers
1.1. Slope. Consider some linear function
for some nonzero real number , and an arbitrary real number . We can calculate the slope by considering
some constant “shift” in (really is read as “the change in “), and using this to figure out the change in
What is this? Well, we plug in the definition of to find
which reduces to
Thus we may write the slope of as
which is independent of both and the choice of .
Can we do this in general for a polynomial
for some ? Let us try! We consider some nonzero term, and we write (for )
where the “bonus parts” are other stuff. Actually by the binomial theorem, it would have to be a polynomial in with a nonzero constant term. This information is really encoded in
where is a more rigorous way of saying “bonus parts at least quadratic in .” This gives us a more precise way to specify the error when writing out terms at most linear in .
Problem: What is ? What is ?
We see that we are abusing notation and writing
for some . So by dividing through by we obtain
and similar reasoning suggests
So let us go on with our considerations.
We then have
where we were slick and noted the definition of in order to plug it in. So, we can rewrite this as
and we want to divide both sides by . But we know how to do this now! First we will write
as shorthand, and rewrite our equation as
We divide both sides by
But we have a problem that we didn’t have before: the slope depends on and .
Historically, people noted that we were working with a term . If we could make that term equal to 0, then everything would work out nicely. How do we do this? Well, we formally invent a number and use it instead of a finite nonzero number .
1.2. and . We know that we have a “number” satisfying
There is no real number which satisfies this, but we can “adjoin” to . That is, we pretend that is a variable satisfying equation (19), then we have polynomials of the form
Of course, we can formally multiply these polynomials together, and we end up with the number system (“ring”) of complex numbers (we would have to prove that exists to make it a field). Why do we not have higher order terms in ? That is, a general polynomial ? Lets consider it. Suppose we did have
Then we plug in (19) to find
which simplifies to merely
But this is precisely of the form we described: there is some term which is a multiple of (the imaginary term) and another independent of (the real term).
Lets consider a similar problem. We want a nonzero “number” which is the “smallest” number possible. What would this mean? Suppose we have a “small” finite number
Then we see the property specifying that is small would be
But if we had the smallest number, then the general argument is we expect
We call such a an “Infinitesimal” number. If we formally consider such an (i.e., pretend it exists and obeys this relationship), then we can run into some problems. For example: what is ?
1.3. Division by Zero? The problem is: what is ? The answer is: we don’t know.
However, why would ever be useful? We can consider
for some . Then
can be simplified to what? Lets consider the case:
But the term vanishes, so
We see that
can be carried out as if it were polynomial multiplication. We then obtain
and again, the term vanishes. We thus obtain
Indeed the general pattern appears to be
We would like to write
Notice the difference this time: we don’t have any terms. The only price we paid is we cannot get rid of the factor .
1.4. Big O for the Bonus Parts The take home moral is that enables us to rigorously consider infinitesimals. How? Well, the most significant terms are written out explicitly, and the rest are swept under the rug with . For our example of
we saw we could write
which tells us the error of “truncating,” or cutting off the polynomial to be explicitly first order plus some change. This change we consider to be in effect “infinitesimal” in comparison to the term.
We still have these bonus parts when considering the slope. That is, for
some nonzero and arbitrary , we have
which gives us
We want to get rid of that on the right hand side. How to do this?
Lets be absolutely clear before moving on. We want to consider the slope of our function . To do this we considered a nonzero , and then constructed
This function described the difference between the values of at and at . So, to describe the rate of change we take
But we want to describe the instantaneous rate of change. Although this sounds scary, it really means we don’t want to work with some extra parameter . We want to consider the rate of change and describe it in such a way that it doesn’t depend on .
So what do we do? Well, the first answer is to set to be 0. This is tempting, but wrong, because we end up with
which is not well-defined. The second answer is to consider the limit , so we can avoid division-by-zero errors. This is better, and we write
following Leibniz’s notation. This is the definition of the derivative of .
1.5. Divide by Zero, and You Go To Hell! Well, formally, we need to take the limit . What does that mean for the left hand side? Could we accidentally be dividing by and get infinities? This is a problem we have to seriously consider.
The first claim is that
This would imply that
for some function . There would be no division by zero errors, but still we have to prove that equation (42) is true in general, i.e. for every function . We have seen it is true only for polynomials.
So, let us consider a function
for some . What to do? Well, lets consider what happens when , we change to be . We have
by definition of . We would expect then
What to do? Well, lets gather the terms together
which we can do, since we multiply both terms by 1 (the first term is , the second term is ). We can then add the fractions together
and consider expanding the numerator and denominators out. We see that to first order, we have
which shouldn’t be surprising (we’ve proven this many times so far!). The denominator expands out to be
which, for nonzero , cannot be made 0.
We combine these results to write
We observe that we can factor out a in the numerator (the upstairs part of the fraction) and then we can divide both sides by it:
So what happens if we set on the right hand side? Do we run into problems? Well, we run into problems on the left hand side, but not on the right hand side.
So what to do? Well, the formal mathematical procedure is to take the limit , which then lets us write
for the left hand side. For the right hand side, we can symbolically just set . This is sloppy, because it’s not quite true. But this is what’s done in practice. We get
Observe that we can combine these results to write
There was no risk of dividing by zero anywhere.
1.6. Product Rule Suppose we have two arbitrary functions and . Lets define a new function
I don’t know, let us look. We see that we first pick some nonzero and then consider
Now we plug in this expression to equation (56), the equation where we defined , and we find
We do the following trick: add to both sides
and we obtain
We can gather terms together
which simplifies to
As usual, we divide both sides by
By taking the limit we end up with
Notice that we implicitly noted
Of course, we assume that is continuous at , which turns out to be correct since differentiability implies continuity (we will prove this at some other time).
Theorem (Product Rule). Let , be differentiable and
is the derivative.
We’ve already proven this. So lets consider an example.
where , and
The claim is that
Is this surprising? No, but the surprising part is that it is a consequence of the product rule. How to prove this? Well, we need to do it by induction on .
Base Case () we see that
and we can see immediately that
So this proves the base case.
Inductive Hypothesis: suppose this will work for arbitrary .
Inductive Case: for , we have
Observe we can consider the first term and apply the base case
which is then
The second term is (recall ) simpler
We add both of these together to find
But this is precisely what we wanted! And that concludes the inductive proof.
1.7. Chain Rule We can combine functions together through composition. This looks like
The question is: what’s the derivative (rate of change) of in terms of the derivatives of and ?
Here we really take advantage of big-O notation. Observe for some nonzero we have
Let us write
Its finite difference is then
We plug this back into Eq (83.1) and obtain
Divide both sides by , in order to consider the derivative
Now what do we do? Well, we can do the following trick: multiply both sides by
This would give us
But what is ? We recall equation (83.2) and write
Using this, we can simplify our equation
Observe that we may take the limit as , which gives us
which intuitively looks like fractions cancelling out to give the right answer. Although this is the intuitive idea, DO NOT cancel terms!
Moreover, we should really clarify what is meant by
Let us first consider
describes what we should do. Namely, first take the derivative of and then evaluate it at .
Theorem (Chain Rule). Let , be differentiable at , and let
describes the derivative of at .
Again, we also proved this, which concludes this post.
Exercise 1. Prove or find a counter-example: if , then is constant.
Exercise 2. Find the derivative of .
Exercise 3 (“Baby Quotient Theorem”). Prove for any function that .
Exercise 4. Prove the following is also a good definition for the derivative at :
[Hint: consider .]
Exercise 5. The derivative describes rate of change of a curve. More precisely, we have a curve , its derivative is another function denoted [Newton’s notation] or [Leibniz’s notation].
If we take the derivative again, we get the “second derivative” of the function. What is the limit definition for the derivative of in terms of ?
Exercise 6. Dr Fig-Newton has a brilliant new definition of the derivative, defined by
Is this the same as our definition?