OpenCVSharp Frame Buffer Pooling

OpenCVSharp을 사용해 비디오의 매 프레임을 가져오는 기능을 개발했다.

private static void ReadEachFrame_Normal()
{
    string path = "path/top/your/video.mp4";
    VideoCapture capture = new VideoCapture(path);
    capture.Open(path);
    using Mat frame = new Mat();
    while (capture.Read(frame))
    {
        byte[] bytes = frame.ToBytes();
    }
}

OpenCVSharp 라이브러리에는 내가 아는한 쉽게 프레임 버퍼를 입력한 버퍼로 가져오거나, 풀링이 적용된 형태로 가져올 수 있는 방법이 없다.
Mat.ToBytes()로 새로 생성된 byte[] 객체를 반환받는 방법이 가장 일반적으로 보였다.

내가 사용하는 케이스에선 동시에 여러 비디오의 프레임을 가져오다보니
byte[]의 할당과 버려짐이 상당했다.
그로 인해 GC 압력이 증가하고, 잦은 GC Collect가 일어날 것이 자명했다.

따라서 풀링을 적용해보기 위해 구글링을 열심히 한 결과..
https://github.com/shimat/opencvsharp/issues/784 에서 힌트를 찾을 수 있었다.

위 방법과 적절한 캡슐화를 거쳐서 아래의 코드를 작성했다.

public struct PoolingBytes : IDisposable
{
    public PoolingBytes(IMemoryOwner<byte> memory, int offset, int length)
    {
        Memory = memory;
        Length = length;
    }

    public IMemoryOwner<byte>? Memory { get; private set; }
    public int Offset { get; init; }
    public int Length { get; init; }

    public void Dispose()
    {
        if (Memory != null)
        {
            Memory.Dispose();
            Memory = null;
        }
    }

    public ReadOnlySpan<byte> AsSpan()
    {
        return Memory!.Memory.Span.Slice(Offset, Length);
    }
}

...

public static unsafe PoolingBytes GetBytesPooled(Mat mat, VectorOfByte bufferVec)
{
    InputArray inputArray = mat;
    NativeMethods.imgcodecs_imencode_vector(".png", inputArray.CvPtr, bufferVec.CvPtr, null, 0, out int ret);
    var rentArray = MemoryPool<byte>.Shared.Rent(bufferVec.Size);
    using var pin = rentArray.Memory.Pin();
    Buffer.MemoryCopy((void*)bufferVec.ElemPtr, (void*)pin.Pointer, bufferVec.Size, bufferVec.Size);
    return new PoolingBytes(rentArray, 0, bufferVec.Size);
}

MemoryPool<byte>.Shared.Rent()를 호출하면 입력한 파라미터보다 크거나 같은 크기의 메모리를 반환한다.
그렇기에 정확한 이미지 버퍼 영역을 지정하기 위해 추가적으로 Offset과 Length 프로퍼티를 사용한다.

Test Code:

[MemoryDiagnoser]
[SimpleJob(RunStrategy.ColdStart, launchCount: 1, iterationCount: 1000)]
public class OpenCVPooling : IDisposable
{
    string path = "path/top/your/video.mp4";

    private VideoCapture? capture;
    private Mat frame = new Mat();
    private VectorOfByte bufferVec = new VectorOfByte();

    public void Dispose()
    {
        frame?.Dispose();
        bufferVec?.Dispose();
    }

    [IterationSetup]
    public void Setup()
    {
        capture?.Dispose();
        capture = new VideoCapture(videoPath);
        capture.Read(frame);
    }

    [Benchmark]
    public void Normal()
    {
        byte[] bytes = frame.ToBytes();
    }

    [Benchmark]
    public void Pooled()
    {
        using PoolingBytes bytes = OpenCVUtility.GetBytesPooled(frame, bufferVec);
    }

BenchmarkDotNet으로 테스트한 결과이다.
소요 시간자체는 큰 차이가 없지만, Allocation을 거의 없앴다는 점을 주목해야 한다.

Method	Mean	Error	StdDev	Median	Allocated
Normal	18.66 ms	0.053 ms	0.511 ms	18.57 ms	1072120 B
Pooled	17.88 ms	0.044 ms	0.423 ms	17.76 ms	840 B