14ms glClear?!?

During development, I've run into something rather odd that I was hoping someone here might be able to explain.

It appears that, per frame, the first draw command to the GL context - be it a 1px poly, a glClear() call, or a simple texture draw - appear to take a minimum of 14ms. All other rendering must wait upon that first call completing whatever it's doing. The target resolution does not appear to affect this result. 

Note that I am timing entirely inside the onDrawFrame() method, so it should not be affected by the buffer copy. Here's the snippet I use:

    int total;
    int count;

    public void onDrawFrame(GL10 gl)
    {
        long start = System.currentTimeMillis();

        GLES20.glClear( GLES20.GL_COLOR_BUFFER_BIT );

        total += (System.currentTimeMillis() - start);
        count++;

        if(count >= 100)
        {
            System.out.println((total/count)+"ms");
            total = 0;
            count = 0;
        }
    }

If I do anything other than a GL operation (e.g. ALL the in-game logic), the system runs sub-millisecond and can't even be measured. 

Even more interesting is that it appears that GLSurfaceView is automatically clearing the buffer. I dropped the glClear() to see what difference I'd get and expected to see overdraw. I did not see any overdraw. Not even my favorite, "oh, this is the buffer from two frames ago". The screen is quite obviously being cleared. (The only thing I left running was the warp stars. There was no background texture.)

This issue is rather frustrating since only 2ms are left to render the rest of the frame. This leads to late or dropped frames, which is the opposite of what I want to see. 

This leads me to my questions for you, my fellow developers:
  1. Has anyone else seen this and/or know what it is?
  2. Is it possibly an issue with the GLSurfaceView design? Should I be eliminating the GLSurfaceView panel from my design and create the GL context myself?
  3. Is it possible that this is a problem with the Mali-400 GPU and isn't exhibited by the Tegra 3?
My setup is as follows:
  • Using the GLES20 context (OpenGL ES 2.0)
  • Developing on an MK808 (Dual core Rockchip Cortex A9 w/Mali-400 Quad Core GPU)
  • 720p display that can accept a 1080p signal (tried both)
Your help is most appreciated!

Comments

  • theaceofthespadetheaceofthespade Posts: 21Member
    Try running the clear multiple times and see if that changes anything.  It may just be that that is the minimum amount of time required.
  • jbanesjbanes Posts: 45Member
    edited January 2013
    @theaceofthespade - Thanks for your response!

    Multiple clears does not appear to have a major impact. Only the first call (whatever it may be) is a significant problem.

    A thought did occur to me, though. Is it possible that this is how the double-buffering is implemented? i.e. The thread is forced to wait until the swap thread is completed? If so, it seems like triple buffering may be the answer. Unfortunately, I don't see any way to turn triple buffering on. :(

    I have a sneaky suspicion that I may end up having to manually manage the GL context / buffers. 
    Post edited by jbanes on
  • noctnoct Posts: 122Member
    edited January 2013
    @jbanes I noticed this as well on the Tegra 3; just change your rendering order so you clear immediately after the swap, rather than clearing before rendering.
    Post edited by noct on
  • jbanesjbanes Posts: 45Member
    @noct - Thanks for the advice! 

    Is it possible to accomplish that with GLSurfaceView or do I need more control over the buffers & context?
  • KonajuGamesKonajuGames Posts: 560Member
    It sounds like it is waiting on a vertical sync.  The 14ms you are experiencing is not time from the current frame.  It is the left-over time from the previous frame.
  • jbanesjbanes Posts: 45Member
    @KonajuGames - Yup, that's what I'm starting to think. However, the problem is that my rendering is exceeds 16ms with some amount of regularity, resulting in dropped frames. My options are to attempt to optimize the render path (which is definitely on my list right now) or use triple buffering to smooth out the hills and valleys in processing time. 

    I'd love to get triple buffering going, but unfortunately I don't see a way to accomplish it without directly managing the GL context. 
  • MagnesusMagnesus Posts: 304Member
    Why not drop glClear?
    Anyway, waiting for VBL shouldn't affect your code if you do everything in the right order.

  • jbanesjbanes Posts: 45Member
    @Magnesus - Thanks for the suggestion!

    To be clear, the call to glClear() is not the underlying problem. Whatever call is made first is what will take 14ms. (Though I did already consider dropping the glClear() for performance once I realized the GLSurfaceView was doing it.)

    As I get a better picture about this, I think everyone here is right: the 14ms is waiting on a buffer swap. 

    The real problem I'm trying to solve for is that frame rendering times are inconsistent. This sometimes leads to the frame time being exceeded and the frame getting dropped. Triple buffering would be an easy solution (steal from Peter to pay Paul as it were), but further optimizations may be another. 
  • KonajuGamesKonajuGames Posts: 560Member
    Are the long frames coincident with garbage collection?
  • ZeroDependencyZeroDependency Posts: 27Member
    Please please please do not time single GL calls like this. The data you get back from it is meaningless.

    When you make a call to glClear() you are instructing the driver that is what you want to do, before the the next function you call. 

    The only contract the driver has to uphold is that the it completes the functions in the order you specified, not WHEN you specified.

    Therefore it is entirely possible that the 14ms time you're getting back is in fact a number of previous functions all being completed at that time.

    The only accurate way to time function calls is to call glFinish/glFlush, but this will stall the pipeline and again result in pretty meaningless information.

    Time the entire frame draw time, as that is useful information, or use a tool like gDEbugger on Windows to analyse where most of your Frametime/CPU time is being taken up.
    image
  • jbanesjbanes Posts: 45Member
    @KonajuGames - That was my first thought as well. 

    It's hard to prove that GC isn't the issue, but after spending a lot of time trying to line the two up I realized that there was a lot less GC than I would expect if it was causing this issue. (i.e. Gaps between GC being measured in seconds to tens of seconds.) 

    My suspicion at this point is that background processes are the culprit. Which isn't too big of a deal and things may run better on the Ouya (assuming it's running less out of the box), but it does start pushing me toward solutions to smooth that out.

    Of course, the more stuff I put in my game, the less noticeable the dropped frames are. When there's lasers and explosions everywhere, it's surprisingly easy to overlook a little jitter in the movement of something as big as a small moon. :)
  • MagnesusMagnesus Posts: 304Member
    http://gafferongames.com/game-physics/fix-your-timestep/ - this is always a good article to know..

  • jbanesjbanes Posts: 45Member
    @LeadHyperion - Thanks for the feedback. For what it's worth, I'm not trying to microbenchmark here. I was hoping to understand what's going on in the system. If I can understand what was happening under the covers, I can better optimize the code. That's all. 

    Honestly, I wish there was better documentation to explain the inner machinations of GLSurfaceView. It would make it a lot easier to understand and optimize for.

    @Magnesus - True story. I looked down at the first comment in the article you posted and realized that it was posted by the brother of a guy who's sitting two cubes down from me at this very moment. Small world! 
  • jbanesjbanes Posts: 45Member
    Very good point. :)
  • noctnoct Posts: 122Member
    @jbanes I'd like to retract my original solution.

    After discussing the issue with a friend of mine, I believe what they're doing is a minor optimization, and clearing immediately is counter-productive. While a good driver will actually let you buffer up a full frame of commands before blocking on the next swap, the Tegra 3 driver goes half way, and doesn't block until the first GL command. For single-threaded applications, this would reduce the amount of time spent blocked on GL calls.

    So basically: don't worry about it. As you increase the CPU workload between the swap and clear, you should see the delay on the clear drop proportionally.
Sign In or Register to comment.