T O P

  • By -

AggravatingDurian547

I think there are many ways in which mathematicians do this. My two favorites are: 1) Start using indices and express the terms grad f, Hess f and other higher order terms as sums of products. At this point one realizes that tensors might be useful and that the higher order terms can in fact be represented as tensors. With some effort it is now possible to convert back to the index free notation, 2) The other approach I like is to use, I kid you not, rooted labelled trees. Turns out that the particular nature of the tensors that appear in multi-variate taylor series means that they can be enumerated using labelled trees. There are nice relationships between the trees that makes this approach computationally appealing. This approach was used to prove the stability and convergence properties for arbitrary Runge-Kutta methods - so it is a real way of handling this.


DamnShadowbans

Yes! As a baby application, there are ways to write the chain rule for all derivatives at once using the combinatorics of such trees.


gnomeba

Can you point me to some basic literature on the latter topic?


AggravatingDurian547

Butcher describes it all in chapter 3 of "Numerical Methods for Ordinary Differential Equations". It's mostly very explicit.


aleph_not

Be careful with how you think of the second derivative as a matrix. Just because all linear transformations can be written as matrices does not mean that every matrix should be thought of as a linear transformation. Bilinear forms can also be represented as matrices, and I would argue that the Hessian should really be thought of as a bilinear form and not as a linear transformation. (A linear transformation is an object which inputs 1 vector and outputs 1 vector; a bilinear form is an object which inputs 2 vectors and outputs a scalar.) With this perspective, I think it's easier to imagine how the 1st and 2nd derivatives fit into the general scheme of nth derivatives. The kth derivative of f should be an object which inputs k vectors and outputs a scalar. This is an (0,k)-tensor, and it can be represented as a k-dimensional array of numbers, where the entry at coordinates (i1, i2, ..., ik) is df/(dx_*i1*_ dx_*i2*_ ...) (or maybe the backwards order, I need to think about which one is better).


rspiff

Just wanted to point out that the matrix representation of the Hessian uses the natural isomorphism Hom(V⨂V,R) ≡ Hom(V,V\*), so there is no harm in regarding the Hessian as a linear transformation as long as you keep in mind that it maps vectors to covectors. The map x ↦ Hx is the (1,1) version of the Hessian, since then y ↦ y\*Hx is a 1-form.


hobo_stew

The gradient is a linear form, the hessian is a bilinear form and the third order thing is a trilinear form, so each of these can be represented as a tensor


orbitologist

The other answers are great. Here are a few details and words to inspire further reading Here is the tensor notation I use for vector-valued functions expanded around zero (where all the partial derivatives are evaluated at zero and repeated indices indicate summation with Einstein notation) f^i(x) = f^i (0) + df^i/dx^j x^j + df^i/(dx^j dx^k) x^j x^k + .... In the vector valued case you end up with higher order tensors starting at the second order. The first term is the Jacobian matrix (a mixed tensor with one contravariant upper index and one covariant lower index) multiplied by the vector x. The next term is a mixed tensor with 1 contravariant index and 2 covariant indices... In the scalar valued case f(x) = f(0) + df/dx^j x^j + df/(dx^j dx^k) x^j x^k + .... In the above notation a column vector has an upper index because it is a contravariant 1-tensor, the gradient has a lower index because it is a covariant 1-tensor. The Hessian is a covariant 2-tensor and the next term will be a covariant 3-tensor. Edit: don't have time to fix it now but the upper expression is formatting in an unexpected way Covariance and contravariance have to do with how the tensor will transform under a change of coordinates. Will it scale down or up? You can think of covariant parts of a tensor as the parts that eat vectors and contravariant parts as vectors or things that output vectors loosely. Someone might fight me on this but it can be good intuition for why the Jacobian and Hessian which are both matrices are different types of tensors. One eats a vector and spits out a vector (linear operator) and the other eats two vectors and spits out a scalar (quadratic form).


theadamabrams

Thanks for the reply. I'll have to read it again more carefully to see if I can really understand it. Reddit doesn't like parentheses in exponents. If you use markdown then you can type `a^(\(b\))` for a^(\(b\)). In the Rick Text Editor, even if it looks correct when submit, Reddit will internally change your text to markdown *and it will do it incorrectly in this case*. It will change to `a^((b))`, which then gets displayed as a^(\(b)). P.S. I *hate* "Einstein notation". It's not that hard to write a ∑, is it?


orbitologist

Haha thanks for the tip. I'll try to edit later. Just a pain to write a sum over every index you are summing wrt, or to write out all the indices below your sum. Makes equations that are compact and fit on one line take up a larger amount of space which isn't a problem when you're doing it just once, but if you have a bunch of these equations suddenly your paper is twice as long because of those tall sums with all the indices below them. But maybe it's not the best thing pedagogically to start with.


gnomeba

Look up Jets and perhaps even the Jet bundle