After having implemented the terminal video player, it was apparent that is was rather slow.
So the need for optimizations is in order. And oh boy did I cook on this one. And it’s actually the reason as to why this blog post took so long to make, I just kept finding ways to speed up my program.
A list of optimizations done:
- Use Spans as collections instead of arrays (if possible).
- Use
. - Instead of
Dictionary<string, string>
, useDictionary<int, string>
. - Use
in server instead of concatinating the colors together in the frame. - Use
instead of string when requesting frames. - Emptying the
after each frame sent, since I forgot it the first run, and that was pilling up memory. - Move initialisation of frame array to send, outside of the loop, so it reuses the resource instead of creating a new one each loop.
- Remove redundant if statement. If the first if statement is true, use
after your logic. If not, don’t use another if statement, just place the code after the first if statement. - Hotpath inside if statement, instead of after if statement.
- Custom console write implimentation.
- Compile with native AOT.
- Use
instead of BufferedStream.
All of these benchmarks are from the list of optimizations above. The benchmark is mesured for how fast it takes to play and show a 10 second video.
Change | Time |
Default | 5.1s |
1 | 5s |
2 | 5s |
3 | 4.99s |
4 | 4.77s |
5 | 4.62s |
6 | 4.39s |
7 | 4.31s |
8 | 4.11s |
9 | 4.1s |
10 | 2s |
11 | 1.9s |
12 | 1s |
Optimizations In Depth
Optimization 10 And 12
At this point, I realized that just cleaning up the code, and having to do less conversion is just not gonna cut it. So I went to the interwebs to consult the gray-beards of old (aka looking at StackOverflow for faster custom console printing methods). This method is the one I use.. I did also do some changes to the original method.
Less IL means faster (for the most time). And less locking means less waiting.
This is one of the optimizations I would not do in production, but I like it anyways.
Let’s take this method for example. It’s pretty important since it flushes the buffer. But it also uses a lock, which we actually don’t need since our code is 100% sync.
public static void Flush()
lock (BufferedStream)
So we remove the lock, and annotate the method with Synchronized
, so it can only be called by one thread. Just to be safe.
public static void Flush()
But that’s not all. It saves quite a few lines of IL code.
.method public hidebysig static void
Flush() cil managed
.maxstack 2
.locals init (
[0] class [System.Runtime]System.IO.BufferedStream V_0,
[1] bool V_1
// [33 9 - 33 30]
IL_0000: ldsfld class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
IL_0005: stloc.0 // V_0
IL_0006: ldc.i4.0
IL_0007: stloc.1 // V_1
IL_0008: ldloc.0 // V_0
IL_0009: ldloca.s V_1
IL_000b: call void [System.Threading]System.Threading.Monitor::Enter(object, bool&)
// [35 13 - 35 36]
IL_0010: ldsfld class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
IL_0015: callvirt instance void [System.Runtime]System.IO.Stream::Flush()
// [36 9 - 36 10]
IL_001a: leave.s IL_0026
} // end of .try
IL_001c: ldloc.1 // V_1
IL_001d: brfalse.s IL_0025
IL_001f: ldloc.0 // V_0
IL_0020: call void [System.Threading]System.Threading.Monitor::Exit(object)
IL_0025: endfinally
} // end of finally
// [37 5 - 37 6]
IL_0026: ret
} // end of method FastConsole::Flush
Will be reduced to:
.method public hidebysig static void
Flush() cil managed
.maxstack 8
// [30 35 - 30 57]
IL_0000: ldsfld class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
IL_0005: callvirt instance void [System.Runtime]System.IO.Stream::Flush()
IL_000a: nop
IL_000b: ret
} // end of method FastConsole::Flush
Pretty neat, huh?
That’s not all though. I ended up switching from BufferedStream
to StreamWriter
. Just messing around a bit, and damn I’m happy I did that, because StreamWriter
is way faster in this scenario.
public void Write(string s) => _streamWriter.Write(s);
public void Flush() => _streamWriter.Flush();
Here is a video showcasing the player.
Funny thing, the more clients you connect, the faster the video playes. And that’s because I placed the video pacing logic in the client, and not the server.