Step 1 label, they do this T, T^(*).
Step 2 compare, they do this by assuming they differ by *at least* one edge
Step 3 exchange, they show the trees must have the same cost.
The end of the argument is that we only need to exchange at most n-1 edges to finish the transformation.
Swapping edges between optimal trees also results in optimal trees, but we might need n-1 swaps to switch between any two optimal trees, we’ll also just get a bunch of other optimal trees along each swap.

So you have to be optimal every swap? Like for instance, you are allowed to "rule out" any swap that's not optimal?
Also, what would an "optimal swap" look like? Swapping equal length edges?
thanks for commenting by the way!

If you start with any two optimal trees, then all swaps are optimal.
We prove this also holds for the result of the greedy algorithm. So it is also optimal.
Try it, draw two optimal spanning trees, and then watch what happens when you add an edge to one. It makes a cycle so we kick out any cheapest edge.
Perhaps a good exercise is to prove that if all edge weights are distinct then the minimum spanning tree is unique.

Step 1 label, they do this T, T^(*). Step 2 compare, they do this by assuming they differ by *at least* one edge Step 3 exchange, they show the trees must have the same cost. The end of the argument is that we only need to exchange at most n-1 edges to finish the transformation. Swapping edges between optimal trees also results in optimal trees, but we might need n-1 swaps to switch between any two optimal trees, we’ll also just get a bunch of other optimal trees along each swap.

So you have to be optimal every swap? Like for instance, you are allowed to "rule out" any swap that's not optimal? Also, what would an "optimal swap" look like? Swapping equal length edges? thanks for commenting by the way!

If you start with any two optimal trees, then all swaps are optimal. We prove this also holds for the result of the greedy algorithm. So it is also optimal. Try it, draw two optimal spanning trees, and then watch what happens when you add an edge to one. It makes a cycle so we kick out any cheapest edge. Perhaps a good exercise is to prove that if all edge weights are distinct then the minimum spanning tree is unique.