It looks as if this might be a question of square vs distorted pixels.
The native sensor may be 1280x960 (1.3M, 4:3 or 16:12) and the image at that resolution shows all there is.
At 1280x720 (720p, 16:9) you have only two choices - (1) show the middle 720 rows of 960, or (2) squash the 960 sensors rows vertically into 720, a 4:3 ratio. Some would prefer the wider vertical angle of the latter approach, some would prefer the less distorted square pixels of the former approach; neither approach is right or wrong in any absolute sense, it's a matter of tradeoffs. They chose approach #1, and if they changed it they will get some negative feedback from those preferring the current approach.
D1 at 704:576 is also 4:3 like the 1.3M native, so they can scale all the the 1280x960 sensor pixels without distortion; likewise CIF.
Yeah, it might be cool if they let you choose how to handle a 16:9 output... but that's beyond the produce envelope apparently.
They seem more willing to distort the subchannel pixels.
Does this explanation fit your data?