Taylor series is a familiar tool in analysis and often provides effective
polynomial approximations to complicated differentiable functions. The need
arises because polynomials are easier objects to manipulate. We say that the
Taylor series of an infinitely differentiable function f(x) around x0 can
be represented as an infinite sum of polynomials as
f(x)=f(x0)+1!f′(x)(x−x0)+2!f′′(x)(x−x0)2+….
In this notation, the number of ticks on f represent the derivative. This means
f′ represents the first derivative, f′′ the second and
so on. When used in practice, we resort to a truncated sum where we ignore the
higher order derivatives. Vector inputs can also be handled. The derivation
of this result is most easily seen for scalar inputs.
In this post, we want to derive Taylor polynomials using linear algebra. I must
confess that this will not look elegant at all from a calculus standpoint.
You've probably seen the most elegant and simple derivation already. I hope,
however, you'll appreciate how beautiful the result is from the perspective of
linear algebra. I call this "the hard way" not because it is conceptually
hard but because it is a long-winded path to an otherwise straightforward concept.
It is also important to note that what we'll arrive at is not Taylor series per se
and is probably better characterized as a "polynomial" approximation.
Background
The key result that we'll use from linear algebra is Orthogonal Projections.
Notation and Terminology
Finite Dimension (n): By definition, a vector space with finite number of
basis vectors, where a basis affords a unique way to represent every vector.
Vector Space, U or V: Finite-dimensional vector spaces over some field F[a]
that follow the usual rules of closure under addition, scalar multiplication,
additive identity, additive inverse, multiplicative identity and so on.
Vector Subspace: Subspaces are to spaces what subsets are to sets.
Linear Functionalϕ: This is a special name for linear maps that go
from V to the field F, a scalar. Since every linear map has an
associated matrix, it is equivalent to simply think of this as multiplying a
1×N matrix with a N×1 vector.
Inner Product⟨v,u⟩: This is an abstract generalization of
the dot product with the usual rules. The usual rules apply. I will note an
additional property of conjugate symmetry ⟨v,u⟩=⟨u,v⟩
which isn't needed for real fields (because conjugate is the scalar itself).
Norm∥⋅∥: Norm gives us the notion of distance in vector
spaces. It is defined as ∥v∥=⟨v,v⟩,
the square root of inner products. Our familiar Euclidean distance is also known
as the L2 norm.
Orthogonal Projection, PUx: The orthogonal projection of a vector x
onto the subspace U=span(x) is defined as PUx=∥x∥⟨v,x⟩x.
This should look familiar and intuition of "component" of a vector along a line
usually works (although the subspace U need not only be a line). Equivalently,
we can write this in an orthonormal basis e1,…,em of U asPUx=⟨x,e1⟩e1+⋯+⟨x,em⟩emwhich invokes the idea of projecting the vector onto each basis vector where
the coefficients defined by the inner products are unique. Since, these basis
vectors are normal, the norm is unity - ∥ek∥=1∀k∈{1,…,m}.
Orthogonal projections
Suppose U is a finite-dimensional subspace of V, v∈V, and u∈U. Then
∥v−PUv∥≤∥v−u∥.
Furthermore, the inequality is an equality if and only if u=PUv.
To prove this result, we note the following series of equations.
∥v−PUv∥≤∥v−PUv∥+∥PUv−u∥=∥v−u∥
The first equation follows simply from the positive definiteness of the norm
∥PUv−u∥ and the second follows by the Pythagoras theorem
(yes, it works in abstract spaces too!). The Pythagoras theorem is applicable
because the two vectors are orthogonal - v−PUv intuitively amounts to
removing all components of v in the subspace U[d] and PUv−u belongs
to U (by definition of projection and additive closure of subspaces). As a
consequence of this derivation, we also see that the inequality above would be
equal only when the norm of the second term is zero, implying u=PUv.
This result says that the shortest distance between a vector and a subspace
is given by the vector's orthogonal projection onto the subspace. This result is
akin to a two-dimensional result we've always been familiar with that the shortest
distance between a point p and a line x is along another line l perpendicular
to x that goes through p.
Polynomial approximations as Orthogonal projections
The key message of post is this - Taylor series can be viewed as an orthogonal
projection from the space of continuous functions to the subspace of polynomials [c].
We'll do this by example. Let C[−π,π] be the space of continuous
functions in the range [−π,π] and P5 be the space of
polynomials of degree at most 5. We would like to find the best polynomial
approximation to the function f(x)=sin(x), which does belong to
C[−π,π]. The precise notion of "best" requires us to
reformulate this problem in linear algebra speak.
By defining an inner product between two functions f and g in the space of
continuous functions as ⟨f,g⟩=∫−ππf(x)g(x)dx,
we can afford the notion of a norm. We desire the best approximation in the
sense of minimizing this norm. For a v∈C[−π,π], we seek
u∈P5 such that norm ∥v−u∥ is minimized. In our
case v is the sine function and from our previous result, we know that the
minimum is achieved by the projection of v onto the subspace P5.
Hence, the polynomial we are after is
u=PP5v
As we've noted before, projections can be written in an orthonormal basis of
e1,…,e6 of P5 as
PP5x=⟨x,e1⟩e1+⋯+⟨x,em⟩e6.
Note that I've preemptively chosen fixed the number 6 in the sequence above because
the dimension (number of basis vectors) of P5 is 6. As one can verify,
1,x,x2,x3,x4,x5 form a basis of P5. These, however, are not
orthonormal. Nevertheless, we can form an orthonormal basis from a known one using
the Gram–Schmidt process.
Be warned, there are ugly numbers ahead.
Orthonormal basis for polynomials
By the Gram-Schmidt orthonormalization, we note that for a given basis
b1,…,bm, an orthonormal basis is given by e1,…,em as
We've slightly overloaded the projection notation Pei here for brevity.
This actually is supposed to mean PUi where Ui is the span of basis
vector ei. Intuitively, this method is simply removing components of the
basis vector that already have been covered by previous basis vectors and then
simply normalizing each of them.
We need to solve more than a few integrals for the inner product
calculations as a part of the projections but they are straightforward.
You'll often find yourself computing symmetric integrals of odd polynomials which
just amount to zero. I will note the complete orthonormal basis here for reference.
With this derived orthonormal basis for P5, we are now in a position
to find the optimal (in the sense of norm) projection of sin(x) using the
projection identity
PP5f=⟨f,e1⟩e1+⋯+⟨f,em⟩e6.
For sin(x), we note that its inner product with e1,e3,e5 is zero because
these turn out to be symmetric integrals of odd functions around 0. We note the
following results for f=sin(x) sine function.
Our approximation is indeed very accurate as compared to the Taylor polynomial
which demands higher order terms to do better. Note how the green and yellow
curves stay very close and are virtually indistinguishable.
Summary
This was a fun way to discover polynomial approximations to functions and that
too quite accuracte. Of course, I promise to never use this in real life.
[a] It is enough think of fields as just real or complex numbers for now. You could also think of apples if you don't like abstract concepts (although you are probably going to have trouble thinking of irrational apples).↩
[d] Formally, we say v−PUv belongs to the orthogonal complement U⊥ of U. U⊥ is the set of all vectors that are orthogonal to all vectors in U.↩
[c] Applying the definitions may help us see why set of all continuous functions is a vector space. To start with, sum of two continuous functions is continuous and multiplication with a scalar also keeps the function continuous. Further, polynomials are just a subset of continuous functions and also satisfy these two closure properties. Trust me that all other necessary properties are also satisfied.↩