What should Adobe do to save AIR?

ASNext was supposed to solve once and for all AS3 VM performance problems.
ASNext is now canceled, but the problem remains.

Why and how slow is AS3?

We know AS3′s VM and JIT aren’t very efficient in general, partly because the language is fairly rich and dynamic; and even if the VM was awesome, all maths are computed with double numeric types which are known to be massively slower than floats.

However compared to other scripting languages, we shouldn’t complain that it’s slow – AS3 AOT-ed on iOS is definitely much faster than interpreted LUA for instance, so why is that such a problem in AIR and not in Corona/MOAI/Gideros?

The problem is Stage3D

And more precisely, the problem is Stage3D being a very low-level API.

A low-level API offers a lot of freedom, and you have to give credit to Adobe for creating an efficient abstraction of DirectX, OpenGL and OpenGL ES.

However, this freedom imposes a much bigger pressure on the scripting side – and that’s where AS3 falls short, mostly because it requires a lot of numeric computations.

Competitive analysis

Let’s look at Haxe NME – the Haxe language is close to AS3, cross-compiled to C++, using doubles for maths like AS3, so the resulting executable shouldn’t be meaningfully faster than AS3 AOT on iOS.

However 2D benchmarks tell a different story: CPU usage is a lot lower and framerate (generally) a lot higher for identical scenes.

The difference? It’s the API!

The closest API to Stage3D in NME is drawTriangles. As with Stage3D you must provide the complete geometry; it’s GPU accelerated, flexible, but it’s far from being the fastest option.

NME’s secret is that instead of having to do all the computations in Haxe, you can use:

  • a Tilesheet object holding rectangles definitions (ie. UVs,size,pivot) for a Spritesheet,
  • and the drawTiles method wich only requires a list of (x,y,rect_id,scale,rotation) or (x,y,rect_id,2x2_matrix).
  • These datas are passed to the runtime which does all the computations using optimized C++ maths.

The result: no double bottleneck.

Conclusion: we want drawTiles in AS3

In addition to IndexBuffer3D + VertexBuffer3D, I’m calling Adobe for the addition of a Tilesheet + TileBuffer3D which would expose a similar API to generate geometry more efficiently.

And although the quads are generated in 2D, it still goes through vertex shaders so there’s no reason it should be limited to flat 2D rendering, but anything that needs to animate quads or triangles (“mmmh, particles!”).

Come on Adobe, how hard could it be to implement?

PS: do you see other easy API additions that could help?

  1. makc says:

    Wow, I random feature request to save AIR… dramatic.

  2. Philippe says:

    Ha! Makc, you’ve got to make a dramatic title nowadays to get some pageviews :P

  3. makc says:

    I actually thought the post will be about pricing model you were discussing on CAB.

  4. Shawn says:

    Great post Philippe, I’ve just finished a new benchmark in Starling, and true to your point, I’m completely CPU limited.

    My benchmark maxes out, with GPU-Z reporting only 30% usage. Chrome only seems to use a single core.

    On iPhone 4, I can onlty get 50 instances going (300 sprites). It’s also not even scraping the surface of the GPU, and completely CPU limited.

  5. Horsetopus says:

    Performances are a huge issue, yes.
    But I really feel Adobe has made a lot of progress.
    It took some time… some looong time to get things done.
    For a while I kept asking myself why I was still using this technology, I tried to move to pure Android dev, or Haxe NME but I failed ( well, “gave up” is the word ).

    But lately I have made a calendar app of some sort, and I learned a lot thanks to Scout.
    For example I learned that the AS class for Date is a performance hog.
    You would not guess, at first, but imagine the impact on a calendar application…
    I want with a custom Julian date base class and got a great boost.

    And also, even if I started with a classic display List, I made an sliding element I wanted smooth with Starling, using batch quads, and bitmap fonts, and I reached amazing performances.

    I was lacking some functionality, but thanks to ANE I was able to launch some native request for DPI ( Yes, AIR on Android gets the DPI most of the time ), to export to the native Calendar, to launch a service at boot time, to get custom actions on notifications etc etc.

    I also made some test with FD4.7, wich seems much Faster and also will bring ASC2 ( not a breakthrough, but a progress ) and support Native deployment for iOS.
    And soon workers should be available on every platform, right?

    So yes, thank you Philip for making these kind of requests.
    I am not qualified enough to make them myself.
    But I think Flash is on the right way, and my work isn’t an old day long “F**k you Adobe!” anymore.
    Some good people must be at work now and I am sure they will provide performances improvement.

    As far as I am concerned, what Air needs is different:

    1/ It needs to support Linux and Windows metro.
    It needs to support everything, period.
    Every common chipsets too ( did you now ARMv6 isn’t supported, it came as a slap in the face to me ).
    This is the main selling point of the technology in my opinion.
    Everybody working with Flash knows that your selling point when you want to get a job or convince a client is “One development, every platforms”.
    With performances on top of that, Adobe could rule everything.

    2/ Workflow.
    ANE, multi-platforms, AS workers etc etc all of this is wonderful.
    But it is really complicated to get all of that working together.
    My multi-platform project, witch should be in my opinion one project, exportable and sharable easily, turns out to be 2 projects, 4 or 5 Ant scipts to build + a project for each ANE X platforms ( Windows, iOS, Android ).
    I even had to use a Mac ( I hate Macs… can’t say it any other way ) to make my ObjectveC Library (well, this is more Apple’s fault here… I hate Apple… can’t say it any other way ).
    Plus gathering all the assets, giving them the right properties, setting the right compilation constants… all this is a mess, and I can’t imagine getting back to it in a few months for an update.
    They need to figure out a way for things to be much more simple.
    As simple as building an SWF on the Flash IDE.
    This has been for a very long time the other reason for the success of Flash.

    This second part, I guess could come from other IDEs ( who said Flashdevelop? ), but the first one is the thing Flash developers should worry about the most, IMO.

    Turned out to be long, just to say “you are right but…”.
    He, I don’t have a blog, where else am I gonna write?

  6. makc says:

    quad is 4 vertices, right? so, assuming you are using static indices and separate uvs, you upload 8 numbers per quad. 2×2 matrix is 4 numbers – how much gain you expect here? without scaling, you don’t even have multiplications – you just add those numbers to x,y. I can’t buy that math is the bottleneck here.

  7. ben w says:

    I am inclined to agree that math has nothing to do with the performance issue when it comes to rendering with stage3d, pretty sure it boils down to the drawTriangles method. Under the hood there is probably a fair amount of work going on here and maybe here is where more attention needs to be paid. That said I am pretty sure the Adobe engineers will have worked very hard on this to make it as fast and robust as possible… but there must still be room for improvement in there somewhere given the start differences between what flash can achieve vs other languages.

  8. Philippe says:

    @makc @ben you assume wrong:
    - the 2×2 matrix gets computed in both cases: so you’re actually saving all the 8 vertices computations and storage as doubles along with the UV data in a buffer to be converted to floats before going to the GPU,
    - Adobe is well aware that maths with doubles are the bottleneck in AS3 and one of the goals of ASNext was to introduce a native float type which would have allowed this kind of computations to get near C++ performance.

  9. ben w says:

    you are right that they are slow, I just don’t think that they are a bottleneck in this case… and easy way to check is to remove all matrix uploads the the GPU and just do the raw draw calls. In my benchmarks constant uploads account for 60% used by the drawTriangle calls even for very simple geometry.

    If you upload from a bytearray, you can use writeFloat rather than writeDouble to save time when uploading (this is faster) but the savings are still pretty small as once uploaded it makes no difference at all and you cannot write to a ba anywhere near as fast as you can a vector :( ,,, well maybe alchemy will help a little here?

    I still cry myself to sleep about the news that ASNext is dead! So annoyed at those who moaned about the extra effort it would cause….. people still moan about as3 compared to as2, just imaging how crap flash would be if they never brought out as3 beacuse they listened to the whining back then!!!

  10. Kevin Newman says:

    I saw the changes to the roadmap and the dropping of AVMNext and ASNext as disinvestment in the Flash platform (and really, it should have been obvious sooner). That particular problem won’t be fixed by an API improvement, no matter how good it is.

  11. makc says:

    so you’re saying converting from double to float is the problem? this is like a matter of bit shifting, you just shift the fraction part by 4 bits upwards and drop last 32 bits. this is like few asm instructions per number somewhere deep inside buffer.uploadFromVector…

  12. Kevin Newman says:

    ARM chips don’t optimize doubles like x86 chips do, but they do optimize floats (through SIMD). The problem is in the ARM hardware. On top of having to spend extra cycles converting every double to float for the GPU, you also don’t get any SIMD benefit when using double for maths.

  13. Adam Harte says:

    This would definitely align with Adobe’s focus on gaming.

  14. HB says:

    I think my main problem has always been the lack of generics and function overloading. I’ve worked on large projects that required a level of abstraction that often caused a lot of boilerplate and difficult-to-read code.

    Generics would also improve performance on several cases like data structures, and if the classic display list were based on some IDisplayObject interface instead of DisplayObject people could make some interesting engines for their projects thanks to them and with few lines of code.

  15. justinfront says:

    I was just wondering if NME or hx-gameplay could be used as a native extension from AIR to run a graphics layer above an AIR app? Have you used haxe to create native extensions for AIR? I would love an example if you have tried, would be great to have wrapped libpd in haxe and use from AIR for instance.

  16. Josh says:

    Anything to speed up Starling and Feathers would be great. I’m always hitting CPU bottlenecks. Creating separate batches that aren’t updated every frame can help, but it doesn’t scale after a while. It’s frustrating.

  17. Brendan says:

    I think Haxe NME are doing a great job. I think AIR is also but it is way to broad. Being an Adobe product it has to pack in as much as possible. Haxe NME is therefore slightly less accessible but most of the features in AIR I would never use. Just get my game or interactive piece onto as many devices as possible. Case one, I need barebones and I have a bit of extra budget to push a little harder then I use Haxe. If I need video media playback and a very full featured toolset and get it done faster then AIR. If I am doing very content rich layout based stuff, then the html 5 css3 guys can gladly take it and produce great layouts, I will phonegap it. But the mobile native is everything. The only thing that is bugging me is that GPU is potentially a short term thing, Will cpus not be able to do this work again soon on mobiles as they reach the speed of desktops? Is this why NVIDIA makes cpus to cover their asses for when GPU’s become redundant,

  18. Gabree says:

    They should simply save AS3 and use it to write apps the way Appcelerator Titanium does: AS3 could be the equivalent of their JS.

  19. julien says:

    Using Objective-C C/C++ and even Lua for games gives you speed.

    And TBH, with all the stuff Adobe did to screw Flash (or Flex) development (oh wow they released scout) and the fact that know one wants it on the web except Flash developers, they actually convinced me to use something else either for web or game development a while back, and I’m rather happy.

  1. [...] We’re seeing about 1100 instances on our Core 2 Duo, and 2100 on the Intel i7 CPU. Each character consists of 6 pieces, so that’s between 6,000 and 12,000 animated sprites. Not bad! We seem to be consistently CPU limited, which is extremely unfortunate and shows why AS3 is still the main bottleneck for GPU based rendering. [...]