Perceptual Quality Maximization for Video Calls with Packet Losses by Optimizing FEC, Frame Rate and Quantization

We consider video calls over networks with bursty packet losses and varying bitrate, where loss resiliency is provided through frame-level forward error correction (FEC) due to tight delay constraints. For this problem, both encoding frame rate (eFR) and decoded frame rate (dFR) are crucial; a high eFR at low bitrates leads to a larger quantization stepsize (QS), smaller frames and thus suboptimal FEC, while a low eFR at high bitrates diminishes the perceptual quality. At the receiver, damaged frames and others predicted from them are typically discarded, reducing dFR. To mitigate frame losses, hierarchical-P coding (hPP) may be used at the cost of lower coding efficiency than IPP..I coding (IPP), which results in irregular frame freezes in case of loss.

In this paper, we study the received video call quality maximization for both hPP and IPP by jointly optimizing eFR, QS and the FEC redundancy rates, under the sending bitrate constraint. We use a perceptual quality model, Q-STAR, which depends on average dFR and QS, along with a rate model, R-STAR, which depends on eFR and QS. We cast the problem as a combinatorial optimization problem, and after replacing QS with the video bitrate, employ exhaustive search and hill climbing methods to solve for the eFR and the video bitrate. We also use a greedy FEC packet distribution algorithm to determine the FEC redundancy rate for each frame.

Evaluating the results for both hPP and IPP, we show, for iid losses,

the total FEC bitrate ratio is an affine function of the packet loss rate,
the bitrate range, where low eFR is selected, gets wider for higher packet loss rates,
unequal error protection is less significant at higher bitrates,
IPP, while achieving higher Q-STAR values, is prone to abrupt freezing events, which are not considered by the Q-STAR model.

For bursty losses, we show that

compared to iid losses, FEC redundancies are much higher, and keep rising with the mean burst length, reaching up to 80\% when bursts are on average 50-packet-long,
hPP, despite its coding overhead, achieves higher Q-STAR values than IPP, at higher bitrates,
decoded frame distances are significantly smaller in mean and variance for hPP.

Resources