Intro
After having implemented the terminal video player, it was apparent that is was rather slow.
So the need for optimizations is in order. And oh boy did I cook on this one. And it’s actually the reason as to why this blog post took so long to make, I just kept finding ways to speed up my program.
Optimizations
A list of optimizations done:
- Use Spans as collections instead of arrays (if possible).
- Use
Console.Out.Write
. - Instead of
Dictionary<string, string>
, useDictionary<int, string>
. - Use
StringBuilder
in server instead of concatinating the colors together in the frame. - Use
char[]
instead of string when requesting frames. - Emptying the
Stringbuilder
after each frame sent, since I forgot it the first run, and that was pilling up memory. - Move initialisation of frame array to send, outside of the loop, so it reuses the resource instead of creating a new one each loop.
- Remove redundant if statement. If the first if statement is true, use
continue;
after your logic. If not, don’t use another if statement, just place the code after the first if statement. - Hotpath inside if statement, instead of after if statement.
- Custom console write implimentation.
- Compile with native AOT.
- Use
StreamWriter
instead of BufferedStream.
Benchmarking
All of these benchmarks are from the list of optimizations above. The benchmark is mesured for how fast it takes to play and show a 10 second video.
Change | Time |
---|---|
Default | 5.1s |
1 | 5s |
2 | 5s |
3 | 4.99s |
4 | 4.77s |
5 | 4.62s |
6 | 4.39s |
7 | 4.31s |
8 | 4.11s |
9 | 4.1s |
10 | 2s |
11 | 1.9s |
12 | 1s |
Optimizations In Depth
Optimization 10 And 12
At this point, I realized that just cleaning up the code, and having to do less conversion is just not gonna cut it. So I went to the interwebs to consult the gray-beards of old (aka looking at StackOverflow for faster custom console printing methods). This method is the one I use.. I did also do some changes to the original method.
Less IL means faster (for the most time). And less locking means less waiting.
This is one of the optimizations I would not do in production, but I like it anyways.
Let’s take this method for example. It’s pretty important since it flushes the buffer. But it also uses a lock, which we actually don’t need since our code is 100% sync.
public static void Flush()
{
lock (BufferedStream)
{
BufferedStream.Flush();
}
}
So we remove the lock, and annotate the method with Synchronized
, so it can only be called by one thread. Just to be safe.
public static void Flush()
{
BufferedStream.Flush();
}
But that’s not all. It saves quite a few lines of IL code.
.method public hidebysig static void
Flush() cil managed
{
.maxstack 2
.locals init (
[0] class [System.Runtime]System.IO.BufferedStream V_0,
[1] bool V_1
)
// [33 9 - 33 30]
IL_0000: ldsfld class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
IL_0005: stloc.0 // V_0
IL_0006: ldc.i4.0
IL_0007: stloc.1 // V_1
.try
{
IL_0008: ldloc.0 // V_0
IL_0009: ldloca.s V_1
IL_000b: call void [System.Threading]System.Threading.Monitor::Enter(object, bool&)
// [35 13 - 35 36]
IL_0010: ldsfld class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
IL_0015: callvirt instance void [System.Runtime]System.IO.Stream::Flush()
// [36 9 - 36 10]
IL_001a: leave.s IL_0026
} // end of .try
finally
{
IL_001c: ldloc.1 // V_1
IL_001d: brfalse.s IL_0025
IL_001f: ldloc.0 // V_0
IL_0020: call void [System.Threading]System.Threading.Monitor::Exit(object)
IL_0025: endfinally
} // end of finally
// [37 5 - 37 6]
IL_0026: ret
} // end of method FastConsole::Flush
Will be reduced to:
.method public hidebysig static void
Flush() cil managed
{
.maxstack 8
// [30 35 - 30 57]
IL_0000: ldsfld class [System.Runtime]System.IO.BufferedStream TerminalVideoPlayerShared.Helpers.FastConsole::BufferedStream
IL_0005: callvirt instance void [System.Runtime]System.IO.Stream::Flush()
IL_000a: nop
IL_000b: ret
} // end of method FastConsole::Flush
Pretty neat, huh?
That’s not all though. I ended up switching from BufferedStream
to StreamWriter
. Just messing around a bit, and damn I’m happy I did that, because StreamWriter
is way faster in this scenario.
public void Write(string s) => _streamWriter.Write(s);
public void Flush() => _streamWriter.Flush();
Showcase
Here is a video showcasing the player.
Funny thing, the more clients you connect, the faster the video playes. And that’s because I placed the video pacing logic in the client, and not the server.