Tuesday, May 18, 2010

Perl vs Go vs Scala performance

I've been playing with Go and Scala a bit lately, and was curious to benchmark them doing something vaguely similar to my real-world Perl tasks. I know that there are already other benchmarks out there, but I wanted to see how things went on what is Perl's home turf - simple file processing and data manipulation.

Thus I created some tests, which you can see at http://github.com/TJC/PerfTesting.

I expected Perl to do badly here, as it was up against the natively-compiled Go, and the JVM, which I've heard is highly optimised. (Although it always seems to take a while to load for me, and uses tonnes of RAM)

However Perl did surprisingly well!
I repeated the tests for three sizes of data set. It's curious to note that Perl and Go scaled linearly with the size of the input data, while Scala did better, taking the lead once the file sizes increased.

Edit: After some suggestions from the crowd, I updated and re-ran the tests.

The results follow..


Language100k rows1m rows10m rows
Perl1.089 s10.96 s111.3 s
Scala1.857 s9.835 s89.05 s
Go1.682 s16.77 s154.3 s


In doing this test, I was running Ubuntu 10.04 64bit, and:
Perl 5.10.1 w/Text::CSV and Text::CSV_XS
Go (may 2010 build) w/csv.go
Scala 2.8.0.RC1 on Java 1.6 w/opencsv.sf.net

NOTE: This post has now been updated/superceded by: http://blog.dryft.net/2010/05/new-results-for-perl-vs-scala-vs-go-vs.html

44 comments:

  1. with scala you need to warmup the JVM a little bit. This mean running the test more than one time without closing the application before benchmarking it.

    ReplyDelete
  2. That seems wrong, Maime. If the JVM has initial performance problems, that's an important part of the benchmark. To ignore it would be inappropriate. However it is clear from these results that over time that initial cost is amortized.

    ReplyDelete
  3. Ubuntu uses OpenJDK by default and, like Debian, will try to switch back to it if you switch to Sun's JRE. (This may explain, in part, Scala's performance--OpenJDK sucks.)

    ReplyDelete
  4. Bit like eddie izzard http://www.youtube.com/watch?v=2uJqW9O6aW0#t=38 asking for a pencil, would you ask your users to warm the app up a bit?

    Bottom line, perl may not be pretty or elegant but it is a useful tool and can get the job done.

    ReplyDelete
  5. The jvm actualy run the native code only after bytecode interpretation, profiling and JIT-compilation.It also cache the compiled code for the next startup.

    So if you don't at least run it once before the benchmark what you are benchmarking is not compiled code. It's like running Quake 3 in the ch c++ interpreter and saying that c++ is slow.

    If you care about optimizing load time,and not runtime performance. You can have a look at AOT compilation with GCJ, Excelsior JET.

    It also would worth using "scalac" the right way. With "scalac -optimise –Xdisable-assertions –d classes HelloWorld.scala"

    Also note that "java -client" and "java -server" give significant performance offset.

    ReplyDelete
  6. I couldn't find a Scala CSV parsing library quickly, and so it is just using a
    simple String.split(","). This should be rectified and the tests re-run.


    Why haven't you stated this clearly in your blog post?

    ReplyDelete
  7. Huh. I ran your tests w/ a 100k line input.csv, and for me:
    Go: 2.6s
    Scala: 3.57s
    Perl: 8.9s

    On a Macbook Pro, Scala2.8, Perl 5.8.9, Go HEAD

    ReplyDelete
  8. I was running Ubuntu 10.04 64bit You don't seem to say anywhere what you were running Ubuntu on?

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. the supposedly-highly-optimised JVM

    Supposedly?

    Scala :: Go :: Perl

    Compare more than one program, written by more than one programmer.

    ReplyDelete
  12. Kyle, I was running on Perl 5.10.1 with a relatively recent version of Text::CSV and Text::CSV_XS installed.

    Perhaps you could try upgrading those latter two modules, then benchmark again, then upgrade perl to 5.10 or 5.12 and try again?

    ReplyDelete
  13. Isaac, I did say that I wanted to benchmark the languages doing a task a bit like some of my work - I am aware of other benchmarks, thank you.

    Sorry about the poor "CSV" implementation in the Scala test - although it should be erring on the side of performance, not accuracy. I'd be delighted to take a patch to improve it. *hint*

    Maxine,
    If you check my code, you'll see the timing info is taken from inside the applications, in order to avoid the start-up time of the JVM or interpreter.

    The "small" test is run on 10,000 lines of CSV, so for that first slow iteration, there's another 9999 ones that should be fast.

    It obviously does make a difference after a while, with Scala finally catching up to Perl on the big test with a million rows.

    However it is interesting that it seems to take the JVM many seconds to "Warm Up" before its performance starts to take hold!

    For the record, these tests were performed with Sun's JRE, not OpenJDK. Although now you've mentioned the performance issue I'm curious to try with the other.

    ReplyDelete
  14. Maxine - I re-ran the tests using OpenJDK instead of Sun's JRE, and despite your assertion that it sucks, I found the performance to be the same. (Actually, very very slightly faster.)

    ReplyDelete
  15. I have updated the Scala test to use a proper CSV parsing library (from Java, I couldn't find a Scala one), and re-run the tests.

    I'm just about to update the blog post above with those results - the sort version is, the Scala test performs quite a bit faster now!

    ReplyDelete
  16. CSV isn't exactly well defined so the test seems very dependent on each CSV library and how much/little work it does (also the startup cost of VMs).

    A CSV library that supports just the test file generated will be faster than a library that supports field quoting, escaping or other stuff you see in various peoples definitions of CSV.

    ReplyDelete
  17. Regarding the table: in the source article try to remove all newlines that are placed between the tags.

    ReplyDelete
  18. Regarding the JVM, please repeat the test with the sun-jdk. Instructions:

    To install the Sun JDK first uninstall openjdk (if you have it):
    - open Synaptic Package Manager
    - query: openjdk. Don't use the quick search field (it's buggy), but press the Search button in the toolbar of Synaptic.
    - Mark openjdk-6-jre for complete removal. The Icedtea plugin will automatically be marked as well, which is what you want.
    - press the Apply button in the toolbar.

    Then install the JDK with (currently you'll get 1.6 update 20):
    sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
    sudo apt-get update
    sudo apt-get install sun-java6-jdk sun-java6-fonts sun-java6-source

    ReplyDelete
  19. Erik - as per my comment to Maxine above, I did try running the tests with both Sun JRE and OpenJDK.

    The performance was approximately the same.

    Thanks for the tip about fixing the table formatting.

    ReplyDelete
  20. @Toby > I did say that I wanted to benchmark the languages doing a task a bit like some of my work - I am aware of other benchmarks, thank you.


    And it's reasonable to look at stuff like you would do for your work - if you do so with enough curiosity to profile the code and discover where the time is actually going.


    Where's the time go in your Scala program?

    If it's reading the file then can the size of the buffer be increased?

    If it's CSVReader then would it be faster to use one of the other Java packages - supercsv ?

    If it's writing output then is output to stdout being buffered?

    ReplyDelete
  21. @Toby > The "small" test is run on 10,000 lines of CSV, so for that first slow iteration, there's another 9999 ones that should be fast.


    That just suggests you don't know enough about how the JVM works to respond to Maxine's comment.

    Use -XX:+PrintCompilation to examine when the methods are compiled.


    @Toby > However it is interesting that it seems to take the JVM many seconds to "Warm Up" before its performance starts to take hold!

    No, you haven't looked closely enough to figure out where the times going yet.

    ReplyDelete
  22. @Toby > I'd be delighted to take a patch to improve it. *hint*

    Here's a hint, replace

    val result = columns(1).toDouble * columns(2).toDouble

    printf("%s is %.02f\n", name, result);

    with

    val result = columns(1).toDouble * columns(2).toDouble

    result

    and make your timing measurements.

    Seems like 80% of the time taken by your Scala program is taken by that printf statement.

    I'd guess that output is being flushed everytime that statement is called - so you're mostly measuring the JVM flushing stdout 10,000,000 times.

    ReplyDelete
  23. Isaac, you are correct that I do not know enough about the JVM to profile where the time is spent.

    It's interesting that you say that printf() is the main consumer of time. Is that really the case, or is that the jvm can optimise away all the work in the loop once the output is removed?

    It's not fair to just delete it.. The program needs to produce the same output as the other programs. Can you provide a patch that either replaces printf() with a faster version, or disables automatic flushing somehow?
    That would be a fairer test of your theory.

    Cheers,
    Toby

    ReplyDelete
  24. > It's not fair to just delete it

    I provided a really easy way for you to confirm how much time is spent in printf - why don't you do so?

    ReplyDelete
  25. Isaac, because as I explained, just deleting the printf() is not a valid way to confirm if that was using all the time. See comment above.

    ReplyDelete
  26. > Is that really the case, or is that the jvm can optimise away all the work in the loop once the output is removed?

    You really don't know what you're measuring do you.

    ReplyDelete
  27. Isaac, you are not being helpful.

    I am measuring how long it takes to run the same task, in several languages. I'm happy to accept patches to improve the performance of one, but it still needs to produce the same output. Otherwise I'm not comparing apples with apples.

    So if you want to be helpful, you can provide a patch that disables buffer flushing for the printf() command - and we'll see how that goes.

    But removing the printf() altogether is not acceptable.

    ReplyDelete
  28. @Toby > It's interesting that you say that printf() is the main consumer of time. Is that really the case, or is that the jvm can optimise away all the work in the loop once the output is removed?

    So check it for yourself!

    Add a global accumulator variable

    object PerfTest {
    var x = 0.0

    replace
    printf("%s is %.02f\n", name, result);

    by the accumulator
    x = x + result

    and print the accumulated value
    time(csvparser(filename))
    println(x)

    How much more time is taken with printf for every line than printing the accumulated result once?

    ReplyDelete
  29. @Toby > Otherwise I'm not comparing apples with apples.

    Do you consider yourself an expert Perl programmer?

    Do you consider yourself an expert Scala programmer?

    Are you comparing apples with apples?


    @Toby > Isaac, you are not being helpful.

    I've told you the printf statement takes most of the time in your Scala program, and given you 2 ways to confirm that for yourself.

    If you have any interest in making your Scala program better then you now know where to look - why don't you try?

    import java.text.DecimalFormat

    object PerfTest {
    val fmt = new DecimalFormat("0.00")


    val result = fmt.format( columns(1).toDouble * columns(2).toDouble )

    println(name + " is " + result);
    //printf("%s is %.02f\n", name, result);

    ReplyDelete
  30. Wouldn't hurt to make your program look more like a Scala program -

    def csvparser(filename: String) {
    val reader = new CSVReader(new FileReader(filename))
    reader.readNext() // skip header line
    convertDataRows(reader)
    }

    def convertDataRows(csv: CSVReader) {
    csv.readNext() match {
    case null =>
    return
    case cols =>
    println(cols(0) + " is " + fmt.format(cols(1).toDouble * cols(2).toDouble))
    convertDataRows(csv)
    }
    }

    ReplyDelete
  31. Let's write bytes rather than UTF-16 - which along with avoiding printf formating should take the time down to 2/5ths of the 89.05s you blogged

    (I didn't bother changing main or time)

    import java.text.DecimalFormat
    import java.io.BufferedOutputStream

    def csvparser(filename: String) {
    val reader = new CSVReader(new FileReader(filename))
    reader.readNext() // skip header line
    convertDataRows(reader)
    }

    val fmt = new DecimalFormat("0.00")
    val out = new BufferedOutputStream(System.out)

    def convertDataRows(csv: CSVReader) {
    csv.readNext() match {
    case null =>
    out.flush()
    return
    case cols =>
    out.write( (cols(0) + " is " + fmt.format(cols(1).toDouble * cols(2).toDouble) + "\n").getBytes )
    convertDataRows(csv)
    }
    }

    ReplyDelete
  32. And that explicit return statement is no longer needed, so -

    def convertDataRows(csv: CSVReader) {
    csv.readNext() match {
    case null =>
    out.flush()
    case cols =>
    out.write( (cols(0) + " is " + fmt.format(cols(1).toDouble * cols(2).toDouble) + "\n").getBytes )
    convertDataRows(csv)
    }
    }

    ReplyDelete
  33. Now you're being helpful, by providing code that does perform equivalently, it seems.

    I'll make a patch and re-run the tests later today.

    I should go and modify the other programs to use unbuffered I/O as well, in case they were wasting time there too..

    ReplyDelete
  34. @Toby > Now you're being helpful

    No - I'm doing what you should already have done.


    @Toby > The "small" test is run on 10,000 lines of CSV, so for that first slow iteration, there's another 9999 ones that should be fast.

    Here's a clue

    def time(f: => Unit) {
    var n = 10
    while (n > 0){
    val t1 = System.currentTimeMillis()
    f
    val t2 = System.currentTimeMillis()
    System.err.println((t2 - t1).asInstanceOf[Float])
    n = n - 1
    }
    }

    100k rows

    350.0
    177.0
    90.0
    46.0
    51.0
    44.0
    52.0
    46.0
    44.0
    44.0

    Notice that for the 100k workload, the Scala program becomes 7x faster than the first time measurement.

    ReplyDelete
  35. So it takes the JVM _three_hundred_thousand_ rows of the CSV file, before it reaches full speed?

    Damn :(

    That's interesting to know, but think how much better it would be if it could get there faster!

    ReplyDelete
  36. Hi Isaac,
    Using just the buffered I/O change, the time for the big file went from 89 to 67 seconds, a nice improvement.
    Swapping to use your other code (to use java's DecimalFormat class) as well, brought the total time down to only 26.5 seconds on the big file.

    Nice work.

    In the meantime, someone has submitted a C version of the test, that does the big file in less than 8 seconds ;)

    ReplyDelete
  37. @Toby > That's interesting to know, but think how much better it would be if it could get there faster!

    It's something the very first comment on this posting told you!

    We get there faster by using a server that's already run through the programs.


    @Toby > down to only 26.5 seconds

    And you haven't updated the times shown in your blog posting.


    @Toby > Nice work.

    No - the Scala program you wrote was really really bad.


    @Toby > someone has submitted a C version of the test, that does the big file in less than 8 seconds

    What's interesting or surprising about that?

    You said "I am aware of other benchmarks" - if you actually looked at them you'd have some clue how to write Scala and you'd know there'd be that sort of difference reading/writing plain ascii files.

    ReplyDelete
  38. @Toby > That's interesting to know, but think how much better it would be if it could get there faster!

    Here's another clue - RTFM - "The client system is optimal for applications which need fast startup times or small footprints, the server system is optimal for applications where the overall performance is most important."

    http://java.sun.com/docs/hotspot/HotSpotFAQ.html#compiler_types

    java -client -Xbootclasspath/a:/usr/local/src/scala-2.7.7.final/lib/scala-library.jar -classpath opencsv-2.2.jar:. PerfTest ../input.csv > /dev/null

    142.0
    74.0
    71.0
    66.0
    63.0
    64.0
    64.0
    63.0
    65.0
    63.0

    ReplyDelete
  39. @Toby > It's interesting that you say that printf() is the main consumer of time. Is that really the case, or ...


    The final demonstration that you were measuring the printf statement, this will produce identical output to your Scala program for your data file -

    def format(d: Double) : String = {
    val s = (d * 100.0).round toString
    val n = s.length
    if (n == 1) { "0.0" + s }
    else if (n == 2) { "0." + s }
    else { s.substring(0,n-2) + "." + s.substring(n-2) }
    }

    val out = new BufferedOutputStream(System.out)


    case cols =>
    out.write( (cols(0) + " is " + format(cols(1).toInt * cols(2).toDouble) + "\n").getBytes )

    ReplyDelete
  40. So, we've safely concluded that Scala's printf() function sucks, and writing an equivalent fast replacement is tedious. Great.

    I want to update the other languages to use unbuffered I/O as well before updating the headline scores, since it's only fair to give them the same advantage too..


    I've lost interest in micro-optimisations, such as your formatting routine above, since in real life I'm not going to want to do such stuff. If Scala's printf() sucks, well, so be it. Maybe they can fix it in a future version.

    The DecimalFormat java library is more promising though.

    ReplyDelete
  41. > So, we've safely concluded that

    All you're measuring is how much or how little you know about programming in each of those languages.


    > writing an equivalent fast replacement is tedious

    is easy


    > to use unbuffered I/O

    The Scala change was to use buffered output.


    > in real life I'm not going to want to do such stuff

    Then it probably doesn't matter whether your stuff runs slow or fast.

    ReplyDelete
  42. P.S. In my previous post it was:
    — Mac OS X, Snow Leopard 10.6.4, iMac 3GHz Core 2 Duo, 4GB RAM.
    — Perl 5.10.0
    — Python 2.6.1
    — Java 1.6.0.20
    — GNU Emacs editor. :-)

    ReplyDelete
  43. GO has been updated to Go1 version, the vector package was deprecated for almost 1 year.

    And since you're using Perl Text::CSV_XS, you should also write a cgo version CSV as it to be compared Text::CSV_XS.

    ReplyDelete
  44. Yo-An, thank you for your comments.
    I wrote this blog post nearly three years ago; I have no doubt that the languages involved have all evolved somewhat since then.
    I haven't kept up with Go, but if you could submit an updated version of the Go code, I'd be happy to run it against updated Scala and Perl versions.

    ReplyDelete