Intro

After having implemented the terminal video player, it was apparent that is was rather slow.

So the need for optimizations is in order. And oh boy did I cook on this one. And it’s actually the reason as to why this blog post took so long to make, I just kept finding ways to speed up my program.

Optimizations

A list of optimizations done:

  1. Use Spans as collections instead of arrays (if possible).
  2. Use Console.Out.Write.
  3. Instead of Dictionary<string, string>, use Dictionary<int, string>.
  4. Use StringBuilder in server instead of concatinating the colors together in the frame.
  5. Use char[] instead of string when requesting frames.
  6. Emptying the Stringbuilder after each frame sent, since I forgot it the first run, and that was pilling up memory.
  7. Move initialisation of frame array to send, outside of the loop, so it reuses the resource instead of creating a new one each loop.
  8. Remove redundant if statement. If the first if statement is true, use continue; after your logic. If not, don’t use another if statement, just place the code after the first if statement.
  9. Hotpath inside if statement, instead of after if statement.
  10. Custom console write implimentation.
  11. Compile with native AOT.
  12. Use StreamWriter instead of BufferedStream.

Benchmarking

All of these benchmarks are from the list of optimizations above. The benchmark is mesured for how fast it takes to play and show a 10 second video.

Change Time
Default 5.1s
1 5s
2 5s
3 4.99s
4 4.77s
5 4.62s
6 4.39s
7 4.31s
8 4.11s
9 4.1s
10 2s
11 1.9s
12 1s

Optimizations In Depth

Optimization 10 And 12

At this point, I realized that just cleaning up the code, and having to do less conversion is just not gonna cut it. So I went to the interwebs to consult the gray-beards of old (aka looking at StackOverflow for faster custom console printing methods). This method is the one I use.. I did also do some changes to the original method.

Less IL means faster (for the most time). And less locking means less waiting.

This is one of the optimizations I would not do in production, but I like it anyways.

Let’s take this method for example. It’s pretty important since it flushes the buffer. But it also uses a lock, which we actually don’t need since our code is 100% sync.

public static void Flush()
{
    lock (BufferedStream)
    {
        BufferedStream.Flush();
    }
}

So we remove the lock, and annotate the method with Synchronized, so it can only be called by one thread. Just to be safe.

public static void Flush()
{
    BufferedStream.Flush();
}

But that’s not all. It saves quite a few lines of IL code.

.method public hidebysig static void
  Flush() cil managed
{
  .maxstack 2
  .locals init (
    [0] class [System.Runtime]System.IO.BufferedStream V_0,
    [1] bool V_1
  )

  // [33 9 - 33 30]
  IL_0000: ldsfld       class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
  IL_0005: stloc.0      // V_0
  IL_0006: ldc.i4.0
  IL_0007: stloc.1      // V_1
  .try
  {
    IL_0008: ldloc.0      // V_0
    IL_0009: ldloca.s     V_1
    IL_000b: call         void [System.Threading]System.Threading.Monitor::Enter(object, bool&)

    // [35 13 - 35 36]
    IL_0010: ldsfld       class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
    IL_0015: callvirt     instance void [System.Runtime]System.IO.Stream::Flush()

    // [36 9 - 36 10]
    IL_001a: leave.s      IL_0026
  } // end of .try
  finally
  {

    IL_001c: ldloc.1      // V_1
    IL_001d: brfalse.s    IL_0025
    IL_001f: ldloc.0      // V_0
    IL_0020: call         void [System.Threading]System.Threading.Monitor::Exit(object)

    IL_0025: endfinally
  } // end of finally

  // [37 5 - 37 6]
  IL_0026: ret

} // end of method FastConsole::Flush

Will be reduced to:

.method public hidebysig static void
  Flush() cil managed
{
  .maxstack 8

  // [30 35 - 30 57]
  IL_0000: ldsfld       class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
  IL_0005: callvirt     instance void [System.Runtime]System.IO.Stream::Flush()
  IL_000a: nop
  IL_000b: ret

} // end of method FastConsole::Flush

Pretty neat, huh?

That’s not all though. I ended up switching from BufferedStream to StreamWriter. Just messing around a bit, and damn I’m happy I did that, because StreamWriter is way faster in this scenario.

public void Write(string s) => _streamWriter.Write(s);

public void Flush() => _streamWriter.Flush();

Showcase

Here is a video showcasing the player.

Funny thing, the more clients you connect, the faster the video playes. And that’s because I placed the video pacing logic in the client, and not the server.