Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoder::stateless::h264::vaapi::tests::test_25fps_block regression on Intel #44

Open
Gnurou opened this issue Jul 26, 2023 · 5 comments · May be fixed by #66
Open

decoder::stateless::h264::vaapi::tests::test_25fps_block regression on Intel #44

Gnurou opened this issue Jul 26, 2023 · 5 comments · May be fixed by #66
Assignees

Comments

@Gnurou
Copy link
Collaborator

Gnurou commented Jul 26, 2023

This is a regression introduced by 39e3d00:

thread 'decoder::stateless::h264::vaapi::tests::test_25fps_block' panicked at 'called `Result::unwrap()` on an `Err` value: decoder error: decoder error: while syncing picture

The sync() call on submit_picture when in blocking mode causes an internal decoder error on the VAAPI side. This doesn't happen in non-blocking mode, and also doesn't happen prior to commit 39e3d00. The AMD driver also seems to be unaffected.

Guess the first thing to do would be to set LIBVA_TRACE and check whether the sequence of calls to libva before and after this CL is different - it should not, but obviously something has changed.

@Gnurou Gnurou self-assigned this Jul 26, 2023
@bgrzesik
Copy link
Collaborator

Hi! I decided to take a look at this one.

First I followed the suggestion to diff the libva traces between 39e3d00 and 66bf1d2. I wasn't able to find any difference between regular traces except different VABuffer management. I enabled LIBVA_TRACE_BUFDATA as well which showed that the contents of VASliceDataBufferType are different for one slice during vaRenderBuffer (like a second slice overwrites the first). But I checked if the contents supplied to decode_slice are different but weren't.

So I modified the va_trace.c to dump buffer contents during vaCreateBuffer and it was the same for passing and failing run (extended traces: trace-pass.txt & trace-fail.txt)

This led me to believe that it is potentially an issue in buffer management. I decided to try to delete buffers more proactively, eg. after vaSyncPicture and surprisingly this made the test pass, hence I am not sure if the issue is in cros-codecs code...

diff --git a/src/picture.rs b/src/picture.rs
index ea9a8d9..6e882e8 100644
--- a/src/picture.rs
+++ b/src/picture.rs
@@ -220,10 +220,15 @@ impl<T> Picture<PictureEnd, T> {
         let res = self.surface().sync();
  
         match res {
-            Ok(()) => Ok(Picture {
-                inner: self.inner,
-                phantom: PhantomData,
-            }),
+            Ok(()) => {
+                let mut inner = self.inner;
+                inner.buffers.clear();
+
+                Ok(Picture {
+                    inner,
+                    phantom: PhantomData,
+                })
+            }
             Err(e) => Err((e, self)),
         }
     }

@dwlsalmeida
Copy link
Collaborator

dwlsalmeida commented Feb 27, 2024

Some time ago, @ndufresne found out that if slices aren't submitted as a slice array, only the last slice is registered. This is just how the Intel driver seems to work.

Back then, the capability of submitting slices as an array was added to cros-libva and then to AV1 decoder in cros-codecs, but the other decoders were left untouched.

@bgrzesik
Copy link
Collaborator

Thanks for the tip @dwlsalmeida!

In such a case I wonder why it would work before 39e3d00. I hope it's not that Intel driver would work correctly if VABufferIDs are subsequent otherwise not 😅

In either way I think this should be the next step. But because of 39e3d00 it might require holding slices or their corresponding VA structures (libva::BufferType) between StatelessH264DecoderBackend::decode_slice. Which could be tricky since Nalu and Slice lifetime is limited to StatelessDecoder::decode call.

Just FYI I tried also changing slice_data_flag to VA_SLICE_DATA_FLAG_BEGIN and VA_SLICE_DATA_FLAG_END accordingly, but it didn't seem to have any effect.

@dwlsalmeida
Copy link
Collaborator

@bgrzesik you can have owned Nalus if you rework them to use Cow instead. See the AV1 parser for inspiration.

The purpose of having cow was precisely to make it easy to have owned versions of these if needed, I just did not convert all the parsers to it.

@dwlsalmeida
Copy link
Collaborator

dwlsalmeida commented Feb 27, 2024

The h265 vaapi backend in particular can benefit from this since it needs to hold on to multiple slices before submitting. This is currently done in a hackish way.

This also has the added benefit of removing the lifetime parameter from Nalu, which will simplify a lot of the signatures.

bgrzesik added a commit to bgrzesik/cros-codecs that referenced this issue Mar 5, 2024
@bgrzesik bgrzesik linked a pull request Mar 5, 2024 that will close this issue
bgrzesik added a commit to bgrzesik/cros-codecs that referenced this issue Mar 8, 2024
bgrzesik added a commit to bgrzesik/cros-codecs that referenced this issue Mar 8, 2024
With this commit the chromeos#44 is fixed. It is achieved by submitting multiple
slices of one picture using a single slice data buffer and all slice
parameters using a single buffer with multiple elements.
bgrzesik added a commit to bgrzesik/cros-codecs that referenced this issue Mar 13, 2024
With this commit the chromeos#44 is fixed. It is achieved by submitting multiple
slices of one picture using a single slice data buffer and all slice
parameters using a single buffer with multiple elements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants