Measuring and improving runtime or compile-time performance
This page is about computer performance in the context of Raku.
Make sure you're not wasting time on the wrong code: start by identifying your "critical 3%" by profiling your code's performance. The rest of this document shows you how to do that.
Expressions of the form
now - INIT now, where
INIT is a phase in the running of a Raku program, provide a great idiom for timing code snippets.
m: your code goes here raku channel evalbot to write lines like:
m: say now - INIT nowrakudo-moar abc1234: OUTPUT«0.0018558␤»
now to the left of
INIT runs 0.0018558 seconds later than the
now to the right of the
INIT because the latter occurs during the INIT phase.
This file will open to the "Overview" section, which gives some overall data about how the program ran, e.g., total runtime, time spent doing garbage collection. One important piece of information you'll get here is percentage of the total call frames (i.e., blocks) that were interpreted (slowest, in red), speshed (faster, in orange), and jitted (fastest, in green).
The next section, "Routines", is probably where you'll spend the most time. It has a sortable and filterable table of routine (or block) name+file+line, the number of times it ran, the inclusive time (time spent in that routine + time spent in all routines called from it), exclusive time (just the time spent in that routine), and whether it was interpreted, speshed, or jitted (same color code as the "Overview" page). Sorting by exclusive time is a good way to know where to start optimizing. Routines with a filename that starts like
gen/moar/ are from the compiler, a good way to just see the stuff from your own code is to put the filename of the script you profiled in the "Name" search box.
The "Call Graph" section gives a flame graph representation of much of the same information as the "Routines" section.
The "Allocations" section gives you information about the amount of different types that were allocated, as well as which routines did the allocating.
The "GC" section gives you detailed information about all the garbage collections that occurred.
The "OSR / Deopt" section gives you information about On Stack Replacements (OSRs), which is when routines are "upgraded" from interpreted to speshed or jitted. Deopts are the opposite, when speshed or jitted code has to be "downgraded" to being interpreted.
If the profile data is too big, it could take a long time for a browser to open the file. In that case, output to a file with a
.json extension using the
--profile=filename option, then open the file with the Qt viewer.
To deal with even larger profiles, output to a file with a
.sql extension. This will write the profile data as a series of SQL statements, suitable for opening in SQLite.
# create a profileraku --profile=demo.sql -e 'say (^20).combinations(3).elems'# create a SQLite databasesqlite3 demo.sqlite# load the profile datasqlite> .read demo.sql# the query below is equivalent to the default view of the "Routines" tab in the HTML profilesqlite> selectcase when r.name = "" then "<anon>" else r.name end as name,r.file,r.line,sum(entries) as entries,sum(case when rec_depth = 0 then inclusive_time else 0 end) as inclusive_time,sum(exclusive_time) as exclusive_timefromcalls c,routines rwherec.id = r.idgroup byc.idorder byinclusive_time desclimit 30;
The in-progress, next-gen profiler is moarperf, which can accept .sql or SQLite files and has a bunch of new functionality compared to the original profiler. However, it has more dependencies than the relatively stand-alone original profiler, so you'll have to install some modules before using it.
To learn how to interpret the profile info, use the
prof-m: your code goes here evalbot (explained above) and ask questions on the IRC channel.
If you want to profile the time and memory it takes to compile your code, use Rakudo's
If you run perl6-bench for multiple compilers (typically, versions of Perl, Raku, or NQP), results for each are visually overlaid on the same graphs, to provide for quick and easy comparison.
Once you've used the above techniques to identify the code to improve, you can then begin to address (and share) the problem with others:
For each problem, distill it down to a one-liner or the gist and either provide performance numbers or make the snippet small enough that it can be profiled using
prof-m: your code or gist URL goes here.
Think about the minimum speed increase (or ram reduction or whatever) you need/want, and think about the cost associated with achieving that goal. What's the improvement worth in terms of people's time and energy?
Let others know if your Raku use-case is in a production setting or just for fun.
This bears repeating: make sure you're not wasting time on the wrong code. Start by identifying the "critical 3%" of your code.
With multi-dispatch, you can drop in new variants of routines "alongside" existing ones:
# existing code generically matches a two arg foo call:multi sub foo(Any , Any )# new variant takes over for a foo("quux", 42) call:multi sub foo("quux", Int )
The call overhead of having multiple
foo definitions is generally insignificant (though see discussion of
where below), so if your new definition handles its particular case more efficiently than the previously existing set of definitions, then you probably just made your code that much more efficient for that case.
Method calls are generally resolved as late as possible (dynamically at runtime), whereas sub calls are generally resolved statically at compile-time.
One of the most reliable techniques for making large performance improvements, regardless of language or compiler, is to pick a more appropriate algorithm.
A classic example is Boyer-Moore. To match a small string in a large string, one obvious way to do it is to compare the first character of the two strings and then, if they match, compare the second characters, or, if they don't match, compare the first character of the small string with the second character in the large string, and so on. In contrast, the Boyer-Moore algorithm starts by comparing the *last* character of the small string with the correspondingly positioned character in the large string. For most strings, the Boyer-Moore algorithm is close to N times faster algorithmically, where N is the length of the small string.
The next couple sections discuss two broad categories for algorithmic improvement that are especially easy to accomplish in Raku. For more on this general topic, read the wikipedia page on algorithmic efficiency, especially the 'See also' section near the end.
This is another very important class of algorithmic improvement.
There are plenty of high performance C libraries that you can use within Raku and NativeCall makes it easy to create wrappers for them. There's experimental support for C++ libraries, too.
More generally, Raku is designed to smoothly interoperate with other languages and there are a number of modules aimed at facilitating the use of libs from other langs.
To date, the focus for the compiler has been correctness, not how fast it generates code or how fast or lean the code it generates runs. But that's expected to change, eventually... You can talk to compiler devs on the freenode IRC channels #raku and #moarvm about what to expect. Better still, you can contribute yourself:
Rakudo is largely written in Raku. So if you can write Raku, then you can hack on the compiler, including optimizing any of the large body of existing high-level code that impacts the speed of your code (and everyone else's).
Most of the rest of the compiler is written in a small language called NQP that's basically a subset of Raku. If you can write Raku, you can fairly easily learn to use and improve the mid-level NQP code too, at least from a pure language point of view. To dig into NQP and Rakudo's guts, start with NQP and internals course.
Some known current Rakudo performance weaknesses not yet covered in this page include the use of gather/take, junctions, regexes, and string handling in general.
If you think some topic needs more coverage on this page, please submit a PR or tell someone your idea. Thanks. :)
If you've tried everything on this page to no avail, please consider discussing things with a compiler dev on #raku, so we can learn from your use-case and what you've found out about it so far.
Once a dev knows of your plight, allow enough time for an informed response (a few days or weeks, depending on the exact nature of your problem and potential solutions).
If that hasn't worked out, please consider filing an issue about your experience at our user experience repo before moving on.