FitnessViaPostMortem

(Difference between revisions)
 
(5 intermediate revisions not shown)
Line 1: Line 1:
__NOTOC__
__NOTOC__
-
This is a continuation of the [[FitnessViaPartnership | FitnessViaPartnership]] that enhances the original ideas with automatic postmortem identification and notification of problems. Will be ready as part of [[Fitness| performance work for 6.8]].
+
This is a continuation of the [[FitnessViaPartnership | FitnessViaPartnership]] that enhances the original ideas with automatic postmortem identification and notification of problems. Done as part of [[Fitness| performance work for 6.8]].
* Infrastructure: [http://openide.netbeans.org/issues/show_bug.cgi?id=166023 IZ 166023]
* Infrastructure: [http://openide.netbeans.org/issues/show_bug.cgi?id=166023 IZ 166023]
Line 7: Line 7:
===How it works?===
===How it works?===
[[Image:notification_FitnessViaPostMortem.png]]
[[Image:notification_FitnessViaPostMortem.png]]
-
* Detects that AWT thread is blocked for more than 100ms
+
* Detects that AWT thread is blocked for more than certain time (100ms in dev builds, 10s in final production build)
* Starts self-sampling profiling
* Starts self-sampling profiling
* As soon as AWT thread is free:
* As soon as AWT thread is free:
:* stops the profiling
:* stops the profiling
-
:* if the elapsed time is more than [http://performance.netbeans.org/responsiveness/whatisresponsiveness.html appropriate], it displays a notification
+
:* if the elapsed time is more than [http://performance.netbeans.org/responsiveness/whatisresponsiveness.html appropriate] (3s in dev builds, 20s in final production build), it displays a notification
:* the notification stays on for a minute, then it disappears
:* the notification stays on for a minute, then it disappears
* What are the user's options?
* What are the user's options?
Line 19: Line 19:
===What our users shall do?===
===What our users shall do?===
 +
 +
====Enabling the slowness detector====
 +
 +
The slowness detector is enabled by default in development builds and beta builds. However, it is disabled by default in final releases in NetBeans, and in release candidates. To enable the slowness detector in the FCS and RC builds, it is necessary to enable assertions, i.e. run the IDE with the following switch:
 +
<pre>
 +
-J-ea
 +
</pre>
 +
Alternatively, the <code>-J-ea</code> parameter can be added to the <code>netbeans_default_options</code> property in <code>etc/netbeans.conf</code>.
====Not all slowness is a problem. ====
====Not all slowness is a problem. ====
Line 36: Line 44:
-J-Dorg.netbeans.core.TimeableEventQueue.quantum=1000
-J-Dorg.netbeans.core.TimeableEventQueue.quantum=1000
</pre>
</pre>
 +
 +
Since 6.9 development time there is also ''slowness detector'' on code completion. If it takes more than 2s to provide results, it starts self profiling and allows you to complain. You can adjust the time by specifying:
 +
 +
<pre>
 +
-J-Dorg.netbeans.modules.editor.completion.slowness.report=10000
 +
</pre>
 +
 +
(to be included in 7.0beta2).
====Real Slowness====
====Real Slowness====
Line 63: Line 79:
====Plan to Address Weird Behaviour====
====Plan to Address Weird Behaviour====
-
* [[Image:NB67PerfPlan/yes.png]] Have a meeting with profiler guys
+
* [[Image:yes.png]] Have a meeting with profiler guys
-
* [[Image:NB67PerfPlan/yes.png]] 'Jarda': Write the plan
+
* [[Image:yes.png]] 'Jarda': Write the plan
* [[NB67PerfPlan/yes.png | NB67PerfPlan/yes.png]] '''native wait''' - treat Object.wait and ''[[[native | [native]]'' state equally (e.g. zero execution time)
* [[NB67PerfPlan/yes.png | NB67PerfPlan/yes.png]] '''native wait''' - treat Object.wait and ''[[[native | [native]]'' state equally (e.g. zero execution time)
-
* [[Image:NB67PerfPlan/yes.png]] '''two times''' - output two times (execution time, wall clock time) into the .nps file
+
* [[Image:yes.png]] '''two times''' - output two times (execution time, wall clock time) into the .nps file
-
* [[Image:NB67PerfPlan/yes.png]] '''display two times''' - make sure the UI represents the two times in some columns
+
* [[Image:yes.png]] '''display two times''' - make sure the UI represents the two times in some columns
-
* [[Image:NB67PerfPlan/yes.png]] '''no sum for threads''' - the sum of all threads execution is incorrect (in sampling or on multicore CPUs), thus hide it
+
* [[Image:yes.png]] '''no sum for threads''' - the sum of all threads execution is incorrect (in sampling or on multicore CPUs), thus hide it
-
* [[Image:NB67PerfPlan/yes.png]] '''JMX''' - rewrite the sampler to JMX. This will open the door to possible use in VisualVM as well as provide more info about the real thread execution time
+
* [[Image:yes.png]] '''JMX''' - rewrite the sampler to JMX. This will open the door to possible use in VisualVM as well as provide more info about the real thread execution time
-
* [[Image:NB67PerfPlan/yes.png]] '''overhead''' is slightly less than 10% - ''Jarda'': measure overhead of sampling (I ran the sieve of erastothenes for first 100000 numbers. Without self profiling 4.1s+-100ms with profiling 4.3ms+-100ms)
+
* [[Image:yes.png]] '''overhead''' is slightly less than 10% - ''Jarda'': measure overhead of sampling (I ran the sieve of erastothenes for first 100000 numbers. Without self profiling 4.1s+-100ms with profiling 4.3ms+-100ms)
-
* [[Image:NB67PerfPlan/yes.png]] '''inaccuracy''': compute average snapshot time and ignore intervals that are far (more than 30ms) from average. Potentially don't report these snapshots at all as a slowdown.
+
* [[Image:yes.png]] '''inaccuracy''': compute average snapshot time and ignore intervals that are far (more than 30ms) from average. Potentially don't report these snapshots at all as a slowdown.
Tentatively:
Tentatively:
* '''collect wait info''': generate a subnode below Object.wait() or monitor enter(?) with a subtree of other threads activity during the blocked time
* '''collect wait info''': generate a subnode below Object.wait() or monitor enter(?) with a subtree of other threads activity during the blocked time
-
* '''format''': change the sampling format to be (more or less) GZip of stacktraces. Processing to tree structure would occur later.
+
* [[Image:yes.png]] '''format''': change the sampling format to be (more or less) GZip of stacktraces. Processing to tree structure would occur later. Done for NetBeans 6.9
{{attachments|
{{attachments|
* [[Media:slowness-detector_FitnessViaPostMortem.odp]]}}
* [[Media:slowness-detector_FitnessViaPostMortem.odp]]}}
 +
 +
[[Category:Performance]] [[Category:Performance:ToolsAndTests]]

Current revision as of 15:24, 6 April 2011

This is a continuation of the FitnessViaPartnership that enhances the original ideas with automatic postmortem identification and notification of problems. Done as part of performance work for 6.8.

  • Infrastructure: IZ 166023
  • Bugs reported with perf-profileme status board: list

How it works?

Image:notification_FitnessViaPostMortem.png

  • Detects that AWT thread is blocked for more than certain time (100ms in dev builds, 10s in final production build)
  • Starts self-sampling profiling
  • As soon as AWT thread is free:
  • stops the profiling
  • if the elapsed time is more than appropriate (3s in dev builds, 20s in final production build), it displays a notification
  • the notification stays on for a minute, then it disappears
  • What are the user's options?
  • ignore the icon (disappears after a minute)
  • submit a bug report via FitnessViaPostMortemAutoReport
  • explore the content themselves (view data/show profiler data and work in editor as the dialog is modal)

What our users shall do?

Enabling the slowness detector

The slowness detector is enabled by default in development builds and beta builds. However, it is disabled by default in final releases in NetBeans, and in release candidates. To enable the slowness detector in the FCS and RC builds, it is necessary to enable assertions, i.e. run the IDE with the following switch:

-J-ea

Alternatively, the -J-ea parameter can be added to the netbeans_default_options property in etc/netbeans.conf.

Not all slowness is a problem.

Sometimes it may be acceptable. For example if the system wakes up from hibernation, etc. In such case just ignore the report and close it.

I don't care about slowness

In case you are annoyed by the "slowness detector" you may disable it by providing additional properties when launching NetBeans. For example to report only blockages higher than 10s use:

-J-Dorg.netbeans.core.TimeableEventQueue.report=10000

You can also increase the time when the sampling starts:

-J-Dorg.netbeans.core.TimeableEventQueue.quantum=1000

Since 6.9 development time there is also slowness detector on code completion. If it takes more than 2s to provide results, it starts self profiling and allows you to complain. You can adjust the time by specifying:

-J-Dorg.netbeans.modules.editor.completion.slowness.report=10000

(to be included in 7.0beta2).

Real Slowness

If you really noticed the slowness (e.g. blocked mouse, or keyboard input), then you can submit it for evaluation to FitnessViaPostMortemAutoReport or you can evaluate the snapshot first and then submit it.

To evaluate the snapshot follow these steps: click on the slowness notification icon in bottom right corner, click view data and click show profiler data. Snapshot is open in editor area and you can work with it as above dialog is modal are modal.

Thanks in advance for your help!

Evaluating Snapshot

  1. Important things are happening in AWT-Event threads. Explore what is happening there.
  2. If the snapshot ends in Object.wait() or while waiting on some lock, explore other threads to see who's blocking the evaluation

Weird Behaviour

There are some things that can confuse those who evaluate the profiler snapshots taken by the slowness detector. Please be patient, the improvements are in the queue:

  • Sum of time of all threads is wrong - currently the time of all threads is summed up, which is wrong. We use sampling and we are measuring the real time, not CPU time. The sum for all threads shall be max of all thread times. Confusion among poor developers would be lowered.
  • Object.wait() interpretation is almost impossible - Just try to analyse dump from IZ 168536 and find out why the AWT thread is blocked. Quite hard and some simplification is needed.
  • Hotspots Tab Unusable - most developers start understanding of a slowness problem by looking at Hotspots tab of the snapshot. However this is currently unusable as it primarily shows CLI Handler and XAWT-Dispatch threads as the biggest bottleneck (they are waiting for socketAccept or other I/O). This results in large confusion, useless close/reopen bug lifecycle stage, etc.
  • Inaccurate Snapshots - the accuracy of the snapshot's data depends on regular execution of the time (e.g. each 10ms a thread dump shall be taken). This may not be satisfied all the time, especially if the system is being awaken from hibernation, etc. This needs to be detected and indicated.

We need help from profiler team to improve the snapshots themselves or simplify life of those who analyse them. Please, help our developers be more productive!

Plan to Address Weird Behaviour

  • Image:yes.png Have a meeting with profiler guys
  • Image:yes.png 'Jarda': Write the plan
  • NB67PerfPlan/yes.png native wait - treat Object.wait and [[[native | [native]] state equally (e.g. zero execution time)
  • Image:yes.png two times - output two times (execution time, wall clock time) into the .nps file
  • Image:yes.png display two times - make sure the UI represents the two times in some columns
  • Image:yes.png no sum for threads - the sum of all threads execution is incorrect (in sampling or on multicore CPUs), thus hide it
  • Image:yes.png JMX - rewrite the sampler to JMX. This will open the door to possible use in VisualVM as well as provide more info about the real thread execution time
  • Image:yes.png overhead is slightly less than 10% - Jarda: measure overhead of sampling (I ran the sieve of erastothenes for first 100000 numbers. Without self profiling 4.1s+-100ms with profiling 4.3ms+-100ms)
  • Image:yes.png inaccuracy: compute average snapshot time and ignore intervals that are far (more than 30ms) from average. Potentially don't report these snapshots at all as a slowdown.

Tentatively:

  • collect wait info: generate a subnode below Object.wait() or monitor enter(?) with a subtree of other threads activity during the blocked time
  • Image:yes.png format: change the sampling format to be (more or less) GZip of stacktraces. Processing to tree structure would occur later. Done for NetBeans 6.9
Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo