from bug #137 it was discovered that 2D DCT is very common (4x4). therefore, create 2 extra loops to cover it.