Alternative title: Going Around In Circles with SDL2
(Notice: this article is part of the BGC Rewrite Series)
So, I’ve got flashing text, and I’ve got a system for updating game entity stuff… Now I just need explosions to render explosions.
Wait a second…
Where are the Circles?
As it turns out, you’re on your own when it comes to SDL2. SDL2 only provides support for basic rendering operations such as pixels, and lines, but for some reason does not provide circle drawing.
It is unclear why there is no SDL_RenderDrawCircle when there is SDL_RenderDrawLine, and asking google “sdl2 why no render draw circle function” yields no developers asking that question. It may be that there are so many different circle drawing algorithms that perhaps performance conscious folks will roll their own cicle renderer – who knows.
So I borrowed some code from Stackoverflow.
Stackoverflow Driven Development
I must admit that in my previous job as a software developer, a lot of folks thought the work I did required some level of genious intellect, when in actual fact it just required one to know what to type into google search. :P
I did read library documentation (especially when it came to Delphi) and experimented in F# Interactive, but I am guilty of just looking at stackoverflow when I need a quick answer to a specific problem.
The algorithm is known as Midpoint Algorithm, and the code can be found here: https://stackoverflow.com/a/38335842
Just in case Stackoverflow ever goes away (and simultaneously ruins the careers of many developers :P ), the code is included here
void DrawCircle(SDL_Renderer * renderer, int32_t centreX, int32_t centreY, int32_t radius)
{
const int32_t diameter = (radius * 2);
int32_t x = (radius - 1);
int32_t y = 0;
int32_t tx = 1;
int32_t ty = 1;
int32_t error = (tx - diameter);
while (x >= y)
{
// Each of the following renders an octant of the circle
SDL_RenderDrawPoint(renderer, centreX + x, centreY - y);
SDL_RenderDrawPoint(renderer, centreX + x, centreY + y);
SDL_RenderDrawPoint(renderer, centreX - x, centreY - y);
SDL_RenderDrawPoint(renderer, centreX - x, centreY + y);
SDL_RenderDrawPoint(renderer, centreX + y, centreY - x);
SDL_RenderDrawPoint(renderer, centreX + y, centreY + x);
SDL_RenderDrawPoint(renderer, centreX - y, centreY - x);
SDL_RenderDrawPoint(renderer, centreX - y, centreY + x);
if (error <= 0)
{
++y;
error += ty;
ty += 2;
}
if (error > 0)
{
--x;
tx += 2;
error += (tx - diameter);
}
}
}
I naively translated directly into F# and called it a day
let renderDrawCircle(_x, _y, radius) renderer =
let mutable x = radius - 1;
let mutable y = 0;
let mutable tx = 1;
let mutable ty = 1;
let mutable err = tx - (radius * 2); // shifting bits left by 1 effectively
// doubles the value. == tx - diameter
while (x >= y) do
// Each of the following renders an octant of the circle
SDL.SDL_RenderDrawPoint(renderer, _x + x, _y - y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x + x, _y + y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - x, _y - y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - x, _y + y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x + y, _y - x) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x + y, _y + x) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - y, _y - x) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - y, _y + x) |> ignore
if (err <= 0) then
y <- y + 1
err <- err + ty
ty <- ty + 2
elif (err > 0) then
x <- x - 1
tx <- tx + 2
err <- err + tx - (radius * 2)
I bodged up some test explosion rendering code in the main game loop
let keys = sdlGetKeys()
for key in keys do
if key = SDL.SDL_Scancode.SDL_SCANCODE_RETURN then
for i in [1..5] do
let ex,ey = r.Next(0, 20), r.Next(0, 20)
timers.Add(animationTimer(5, 100, (fun f ->
let explosion = [15;15;14;14;12;12;12;4;4;4;4]
explosion
|> List.iteri (fun i c ->
renderer
|> sdlSetDrawColor (drawColor c)
|> renderDrawCircle((int ex)+400,(int ey)+400, f+i+10)
()
)
)))
In case you are wondering, animationTimer
is defined as so:
let animationTimer (frames:int, interval:int, callback) =
let mutable elapsed = 0
let mutable counter = 0
let update gs =
elapsed <- elapsed + gs.elapsed
if (elapsed > interval) then
elapsed <- 0
counter <- counter + 1
if counter < frames || frames = -1 then
callback(counter)
Continue
else
Terminate
update
I have actually found timers to be a fairly versatile concept, but to talk about them here would require their own article, so I’ll cover that later. For now, we’re talking about explosions… rendering.
Anyway, one shiny new explosion generator, let’s give it a run
Above: Lots of explosions, so little framerate
To my dismay, the framerate nearly halves with less than 50 explosions per second. In the animation, about 495 circles are being rendered per second, and it’s bringing a reasonably modern quad core laptop to it’s knees. It was time to go back to the drawing board.
At this point you might reasonably assume I did the sensible thing and use the SDL_gfx library. But as it turns out, I am an unreasonable person I really just needed circles, and knowing from experience how it took about 5 minutes to compile the SDL2 CS project, I decided I would rather spend several hours optimising this algorithm than spend 5 minutes compiling SDL_gfx :) after all, rolling your own is more fun than using something off the shelf. cough
Ok ok, it’s worth mentioning that there is no .NET wrapper library for SDL2_gfx, but I’m sure it would have taken half an hour to create one using interop. Anyway, onto the problem solving.
Optimisation
Before we start making chages, we need a means of measuring the speed of the function so that a baseline (un-optimised speed) can be established. This baseline is then used to measure the improvement (if any) on making a change.
Now you might be tempted to do something like
let renderDrawCircle(_x, _y, radius) renderer =
//insert some debug output
let sw = System.Diagnostics.Stopwatch()
sw.Start()
let mutable x = radius - 1;
let mutable y = 0;
let mutable tx = 1;
let mutable ty = 1;
let mutable err = tx - (radius * 2); // shifting bits left by 1 effectively
// doubles the value. == tx - diameter
while (x >= y) do
// Each of the following renders an octant of the circle
SDL.SDL_RenderDrawPoint(renderer, _x + x, _y - y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x + x, _y + y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - x, _y - y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - x, _y + y) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x + y, _y - x) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x + y, _y + x) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - y, _y - x) |> ignore
SDL.SDL_RenderDrawPoint(renderer, _x - y, _y + x) |> ignore
if (err <= 0) then
y <- y + 1
err <- err + ty
ty <- ty + 2
elif (err > 0) then
x <- x - 1
tx <- tx + 2
err <- err + tx - (radius * 2)
printfn "%d" sw.ElapsedMilliseconds
cough
But one needs to run the same function multiple times with a range of parameters, so having to check and sum multiple values printed to the screen will get tedius. Instead, why not a general purpose high order function that executes a unit returning function and measures how long it took? Let’s call it, I dunno, measureTime
let measureTime func =
let watch = System.Diagnostics.Stopwatch()
watch.Start()
func()
watch.ElapsedMilliseconds
Now we need to isolate the explosion drawing code
let drawExplosion (ex,ey) frame =
let explosion = [15;15;14;14;12;12;12;4;4;4;4]
explosion
|> List.iteri (fun i c ->
renderer
|> sdlSetDrawColor (drawColor c)
|> renderDrawCircle((int ex)+250,(int ey)+400, frame+i+10)
)
Great! And now we can call the function and measure how long it took to execute
> measureTime (fun () -> drawExplosion(100,200) 10);;
val it : int64 = 5L
5 milliseconds. But re-run it again, and you’ll see it’s not the true speed
> measureTime (fun () -> drawExplosion(100,200) 10);;
val it : int64 = 1L
> measureTime (fun () -> drawExplosion(100,200) 10);;
val it : int64 = 0L
> measureTime (fun () -> drawExplosion(100,200) 10);;
val it : int64 = 0L
> measureTime (fun () -> drawExplosion(100,200) 10);;
val it : int64 = 0L
More than likely that 5 milliseconds was required for Just in Time compilation, initialisation etc. It’s important to run something many times when profiling. Not to mention that this function is quick enough to return less than 1 millisecond.
So here you go, something to execute the function many times and measure the total duration
measureTime (fun () ->
[1..100]
|> List.iter(fun frame -> drawExplosion(100,200) frame))
|> printfn "%d"
This is drawing 1400 circles gradually increasing in size up to 100 pixels in radius. The times are:
403
402
380
396
OK we’re ready to start optimising the function. An obvious starting point was to reduce the number of function calls to SDL_RenderDrawPoint
, especially as there is an equivilent SDL_RenderDrawPoints
which, as the name suggests, takes a list of points.
let renderDrawCircle1(_x, _y, radius) renderer =
let mutable x = radius - 1;
let mutable y = 0;
let mutable tx = 1;
let mutable ty = 1;
let mutable err = tx - (radius * 2); // shifting bits left by 1 effectively
// doubles the value. == tx - diameter
while (x >= y) do
let points =
[|
SDL.SDL_Point(x = _x + x, y = _y - y)
SDL.SDL_Point(x = _x + x, y = _y + y)
SDL.SDL_Point(x = _x - x, y = _y - y)
SDL.SDL_Point(x = _x - x, y = _y + y)
SDL.SDL_Point(x = _x + y, y = _y - x)
SDL.SDL_Point(x = _x + y, y = _y + x)
SDL.SDL_Point(x = _x - y, y = _y - x)
SDL.SDL_Point(x = _x - y, y = _y + x)
|] //|> Array.map (fun (x,y)-> SDL.SDL_Point(x = x, y = y))
SDL.SDL_RenderDrawPoints(renderer, points, 8) |> ignore
if (err <= 0) then
y <- y + 1
err <- err + ty
ty <- ty + 2
elif (err > 0) then
x <- x - 1
tx <- tx + 2
err <- err + tx - (radius * 2)
The renderDrawCircle1 function takes a quarter of the time, by virtue of calling the SDL_RenderDrawPoint
function 7 times less often. Below are the times:
99
96
93
93
The next optimisation that occurred to me is that there was no reason all the points of the circle couldn’t be computed all at once
let renderDrawCircle2(_x, _y, radius) renderer =
let mutable x = radius - 1;
let mutable y = 0;
let mutable tx = 1;
let mutable ty = 1;
let mutable err = tx - (radius * 2); // shifting bits left by 1 effectively
// doubles the value. == tx - diameter
let points = [|
while (x >= y) do
let points =
[|
SDL.SDL_Point(x = _x + x, y = _y - y)
SDL.SDL_Point(x = _x + x, y = _y + y)
SDL.SDL_Point(x = _x - x, y = _y - y)
SDL.SDL_Point(x = _x - x, y = _y + y)
SDL.SDL_Point(x = _x + y, y = _y - x)
SDL.SDL_Point(x = _x + y, y = _y + x)
SDL.SDL_Point(x = _x - y, y = _y - x)
SDL.SDL_Point(x = _x - y, y = _y + x)
|]
yield! points
if (err <= 0) then
y <- y + 1
err <- err + ty
ty <- ty + 2
elif (err > 0) then
x <- x - 1
tx <- tx + 2
err <- err + tx - (radius * 2)
|]
SDL.SDL_RenderDrawPoints(renderer, points, points.Length) |> ignore
This function makes use of the seq
capability to generate an as required number of points. Surprisingly, it’s not much quicker:
74
92
76
69
71
For the 200 pixel wide circles there are 120 calls to the SDL_RenderDrawPoints
, so you would think 120 calls less would make a difference. I suspect it’s probably the use of seq
; faster times might be realised switching to unfold or the like, but at this point I … hang on, wait a minute
Profiling needs to be done right
I got through most of this article before I realised I had potentially been profiling the wrong part of the code.
Originally I had this code for measuring the total time taken to run a particular version of the explosion renderer
[1..100]
|> List.map(fun frame -> measureTime (fun () -> drawExplosion(100,200) frame))
|> List.sum
|> printfn "%d"
What it was doing was measuring each individual time taken to draw an explosion frame, and then summing the results. In my boneheadedness, I had surmised that since a call to drawExplosion was regularly executing in less than a millisecond, that I would often be summing zeros, making it seem like the time taken to iterate 100 times was less than it actually was.
So I switched to the following code:
measureTime (fun () ->
[1..100]
|> List.iter(fun frame -> drawExplosion(100,200) frame))
|> printfn "%d"
It seemed logical at the time: measure the total time taken to draw 100 explosion frames. But keep in mind you are now also measuring the Interation code. As it turns out, it adds about 2400 ticks to the total (abour 0.24 milliseconds)
measureTime (fun () ->
[1..100]
|> List.iter(fun _ -> ()))
|> printfn "%d"
2228
2654
2245
2604
In this case, 0.24 milliseconds is not much, but it’s something to keep in mind. In this case, the answer is not to change the way you measure, but to increase the resolution of the timer. In this case, ElapsedTicks provides a higher resolution.
let measureTime func =
let watch = System.Diagnostics.Stopwatch()
watch.Start()
func()
watch.ElapsedTicks
It’s also important to average out the values to remove any runtime quirks. Now that we are measuring ElapsedTicks, consider the following which is executed 8 times
measureTime (fun () -> renderDrawCircle2(100,200,200) renderer);;
results:
5619
1812
1669
15940 <<< WHAT THE HECK 10 TIMES SLOWER!?
2403
1700
1890
1693
For seemingly no reason, the 4th time took an order of magnitude longer than the rest. But if you average a thousand measurements you will effectively filter out those peaks (unless there are lots of them!).
And as a final point, you need to iterate enough times such that re-running the test has little variance compared to the previous. To ensure this horse is properly dead, consider the following is executed 5 times:
[1..10]
|> List.map(fun frame -> measureTime (fun () -> renderDrawCircle(100,200,frame+10) renderer))
|> List.sum
|> printfn "%d"
results:
5514
4861
8408
14179
8743
But this almost always gives the same answer, because it’s running the circle drawing code ten thousand times!
[1..100] |> List.map (fun i -> [10..110] |> List.map id) |> List.concat
|> List.map(fun frame -> measureTime (fun () -> renderDrawCircle1(100,200,frame) renderer))
|> List.sum
|> printfn "%d"
Even then, you still need to run a few times to make sure the total time stabilises
Get to the point
Hey, profiling is important to me. It’s not enough to simply think something is faster, actual metrics are required.
Anyway, back to the original article, which was explosions (rendering). I define a list of each version of renderDrawCircle to be called in succession, hopefully ensuring that any .NET weirdness in measuring time at least affects all three versions equally.
let funcs = [
renderDrawCircle
renderDrawCircle1
renderDrawCircle2
]
funcs |> List.iter (fun func ->
[1..100] |> List.map (fun i -> [10..110]) |> List.concat
|> List.map(fun frame -> measureTime (fun () -> func(100,200,frame) renderer))
|> List.sum
|> printfn "%d")
This is run a few times, including after waiting a long time between run 4 and run 5, in order to see stable results.
version | run 1 | run 2 | run 3 | run 4 | run 5* | run 6 | average | % faster than v1 |
---|---|---|---|---|---|---|---|---|
1 | 31758916 | 31235178 | 31142280 | 31205663 | 34267728 | 33657498 | 32211210.5 | N/A |
2 | 7576802 | 6936058 | 7112976 | 7061530 | 7367735 | 7318996 | 7229016.167 | 77.5 |
3 | 6398260 | 6230180 | 6212428 | 6025629 | 6930623 | 6114915 | 6318672.5 | 80.4 |
We’re iterating enough times that each version consistently performs faster than the last, so we have eliminated (for the most part) those runtime quirks.
Interestingly, there’s not a lot of performance difference between v2 and v3. It may be 10% faster than v2, but this only translates to about 3% faster overall. Perhaps with a different iteration strategy it may be able to get faster.
The Bottom Line
Battle Ground Copyright is now one step closer to ripping off Liero!
Above: now with faster explosions!
The new explosion rendering code now renders explosions without an appreciable drop in framerate. In fact, I am sure that if I did better framerate limiting than simply Thread.Sleep(40) it might even not drop the framerate.