AS3 – Fast memory access without Alchemy
With the Flash Player 10, Adobe added a new set of instructions allowing to compile C/C++ in a way the AVM2 could execute. By wrapping a little bit of glue code in C, Alchemy allows to reuse some of the numerous open source C libraries available.
And when you appreciate the speed of Alchemy-compiled C code, you can wonder how it can possibly be so much faster than AS3. Unfair!
What makes Alchemy code so fast? The main secret is a faster memory management, because obviously C/C++ is all about pointers & malloc’ing. ByteArray in AS3 is kind of slow so Adobe had to hack the AVM2 to remove this bottleneck or Alchemy would have been pointless.
And the good news for us AS3 geeks is that it is possible to use this fast memory in AS3…
Fast memory in AS3 implementations
A few smart people have managed to expose this feature to “regular” Flash languages.
Likewise, a week ago, Burak Kalayci (you know, the ASV guy) has released Azoth a little tool specialized in enabling a fast memory API too.
Update June 2010: Joa & others are working on Apparat, a serious general purpose bytecode optimizer which includes TDSI. A recent update of Apparat fixes some issues I found with TDSI.
Apparat vs Azoth vs haXe
All have a similar API using static methods:
- select a target ByteArray for fast access (don’t change the target ByteArray during computation)
- read/write numbers.
Both Apparat and Azoth act as an optional post compilation step: the SWF is decompiled, optimized and recompiled. They can easily be automated to be executed after each compilation.
Joa’s approach using Apparat is interesting: this optimization fits naturally in within its generic optimization engine.
Azoth, being completely specialized to this optimization, is much faster than TDSI.
Finally haXe: memory API is built in the compiler so there’s no additional step. Oh and compilation speed is just insanely faster than with Flex SDK’s.
Some real stats now
As TDSI and Azoth processing is optional I also measured the non-optimized SWFs timings: the numbers are surprising.
The test is very basic with low calculations: a ByteArray is filled with a gradient thing and then copied in a BitmapData.
haXe version is honestly identical to AS3’s. Generated bytecode appeared to be quite similar (only slightly shorter) and no methods were inlined.
|Debug player||Release player (ActiveX)|
|Direct ByteArray access||1010ms||543ms|
|Azoth, non optimized||1428ms||1184ms|
|TDSI, old version non optimized||> 16s||3856ms|
|Apparat, non optimized||2214ms||646ms|
|haXe, direct ByteArray access||984ms||555ms|
* “Release” compilation, Windows 7 32bits, core2 E6600 2.4GHz. Timings identical using Flex 3 or Flex 4 SDK.
The 50ms timing in this test (writing about 6.5 million ints in memory) means you could draw pixel by pixel at 40fps a 1920×1024 bitmap. Not bad I guess.
Please note: this test was entirely focused on memory access, with almost no calculations so it should only give you an idea of what you can save for one particular operation (memory access). It doesn’t tell anything meaningful on other compiler optimization aspects.
haXe, almost a one man project, provides this memory API since FP10 is out.
haXe compiled code is generally more optimized than Adobe’s but this test does not let haXe show its strength in this regard: if you want to be blown away by the fast memory API + haXe’s awesome compiler optimizations read this post about an haXe version of an Alchemy experiment.
Azoth is a handy little tool:
Azoth is specialized in this task and it’s very fast – you know you’re not using a Java tool.
It does a great job at providing a decent performance even when running the non optimized version (unlike TDSI) – I believe this is a real “sell point” of this tool, especially if you are working with Flash CS which hardly offers post-compilation automation.
Sadly this is a Windows only tool, so this excludes a large group of potential users. Being a command line application it would be reasonable to port it to MacOS/Linux, but I kind of doubt this will happen – it’s apparently just a (nice) little side project from Burak.
Apparat wins as being the more practical:
Apparat has improved a lot and is becoming the tool of choice for code optimization (inlining, fast math functions, etc.) – it offers many practical optimization tools, like the MemoryPool which makes it easy to manage little memory chunks inside the unique Fast Memory buffer.
The memory API fallback code is now pretty decent, thanks to the pre-optimized SWC shipping with the tool instead of raw code.
Does anyone at Adobe work on the compiler?
Although I can agree that this memory API is not a major feature to add, these tools prove it is really usable and it would only represent a very small addition to the existing compiler. It could/should have been added in the Flex 4 SDK already.
As an example the latest Flex 4 SDK does not even optimize numeric calculations (like: var len:int = 512 * 256 * 4;). WTF!
Applications of the Fast Memory APIs
Admittedly most Flashers are not going to use it but some geeks may find a few uses:
- Pixel by pixel image manipulation, for example fancy bitmap effects – see Azoth sample bitmap animation,
- Data crunching – like tons of 3D particles,
- Binary data reading/writing – binary file formats, image encoders/decoders.
Links to get your hands dirty: