Distributed Clover coverage

Clark Wright July 8, 2015

We have a large and distributed nightly build system that currently works well, but generating clover coverage is killing out network with traffic (we move almost 1tb of data every night in generating all the coverage reports. Yes, 1 Terabyte of network traffic).

Currently, our config looks roughly like the default distributed config here: https://confluence.atlassian.com/display/CLOVER031/Working+with+Distributed+Applications

One machine compiles, many run tests, a final machine produces the reports.

Aside from network bandwidth, it works fine. 

But that bandwidth is killing us.

In trying different approaches, we have found that working locally and then moving the artifacts to the network once all the tests are done on that particular machine reduces a lot of the bandwidth.

Problem is, the reports are then showing no coverage for any of the java files.  The grails components, however, are showing correct and expected coverage.

Any ideas?

What additional information should I provide?

 

13 answers

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 20, 2015

If I set clover.initstring and clover.logging.level as environment variables, will they get picked up correctly?

Setting it as a normal environment variable (e.g. "export clover.logging.level=debug") won't work. Clover uses System.getProperty(String) and not System.getenv(String). However, Java has a dedicated environment variable named JAVA_TOOL_OPTIONS which can be used to pass arbitrary arguments for the 'java' process.

Thus, setting it as follows:

export JAVA_TOOL_OPTIONS=-Dclover.logging.level=debug

should work.

See http://www.oracle.com/technetwork/java/javase/envvars-138887.html for more details.

0 votes
Clark Wright July 20, 2015

Hmmm, we may be hitting an issue with apps not getting the properties correctly from the container.

The clover docs are currently down, If I set clover.initstring and clover.debug.level as environment variables, will they get picked up correctly? 

There is exactly one clover database, and we are using what ever the default coverage recorder settings are.

Thanks,

-Clark.

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 20, 2015

Hi Clark,

1) Are you using "fixed" or "growable" coverage recorder?

 

2) I suspect that this -Dclover.initstring flag was not passed correctly to a JVM. Could you please run your application with the following flag: -Dclover.logging.level=debug - you shall be able to see messages related with resolving path to a clover.db like the following:

"[creating new recorder for <some key>]"

followed by:

"overriding initstring: <path>" or

"overriding initstring basedir: <path>" or

"prepending initstring prefix: <path>" or

"Failed to retrieve Clover properties " - this is in case when System.getProperty() failed

 

3) Do you have only one Clover database? Or more precisely: does one JVM runs one application using one Clover database? I'm asking, because in case your system uses many databases, you may actually need to use -Dclover.initstring.basedir=/path/to/clover/databases/root/directory. See https://confluence.atlassian.com/display/CLOVER/Using+Clover+in+various+environment+configurations for various deployment options

 

4) Do you run your applications in a restricted security environment? If yes, then you may need to grant few permissions for Clover. See https://confluence.atlassian.com/display/CLOVER/Working+with+Restricted+Security+Environments for more details.

 

Cheers
Marek

0 votes
Clark Wright July 20, 2015

Hi Marek,

 

That dirset should do the trick, thank you very much.

When we start up applications (web or daemons) that were compiled with clover, we are seeing complaints about getting to the clover.db.  Passing in clover.init.string as a jvm argument did in fact resolve that issue.

Now, however, we are seeing this message:

[echo] Configuring FTP directories

 [java] ERROR: java.io.FileNotFoundException flushing coverage for recorder /usr/local/litle-home/int13/output/clover/clover.dbdawhmu_icc5ag1m: /usr/local/litle-home/int13/output/clover/clover.dbdawhmu_icc5ag1m (Permission denied)

int13 was the account the source was compiled in. it is, in this instance, running in int11 with this passed to the jvm: -Dclover.initstring=/usr/local/litle-home/int11/output/clover/clover.db 

Any suggestions?  

Thanks,

-Clark.

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 17, 2015

Hi Clark,

Unfortunately you have to list all source roots, i.e. your second example is correct.

However, in case you have a consistent naming pattern of your source roots, you can use a <dirset> element with wildcards inside the <sourcepath>. For example:

&lt;clover-report initstring="${clover.db}"&gt;
    &lt;current outfile="clover-html"&gt;
        &lt;format type="html"/&gt;
        &lt;sourcepath&gt;
            &lt;dirset dir="."&gt;
                &lt;include name="**/src/main/java"/&gt;
                &lt;include name="**/src/main/groovy"/&gt;                    
                &lt;include name="**/src/test/java"/&gt;
                &lt;include name="**/src/test/groovy"/&gt;                    
            &lt;/dirset&gt;
        &lt;/sourcepath&gt;
    &lt;/current&gt;
&lt;/clover-report&gt;

 

Cheers
Marek

0 votes
Clark Wright July 17, 2015

Question for you about sourcepath in the clover-report ant tag.

How smart is it?

The doc simply says:

 

<sourcepath>
Specifies an Ant path that Clover should use when looking for source files.

Since this is a multiple module project and we do generate the reports on a different machine/account than they are run on. if I have  filepaths at compile time that looks like:

/usr/local/cwright/cvs/feVap/src/main/java/com/vantiv/services/merchant/payments/v2/foo.java

/usr/local/cwright/cvs/core/src/main/java/com/vantiv/services/bar.java

/usr/local/cwright/cvs/UI/src/groovy/com/vantiv/ui/page.groovy

But at report time it looks like

/usr/local/build/cvs/feVap/src/main/java/com/vantiv/services/merchant/payments/v2/foo.java

/usr/local/build/cvs/core/src/main/java/com/vantiv/services/bar.java

/usr/local/build/cvs/UI/src/groovy/com/vantiv/ui/page.groovy
The difference being the username.
Would this sourcepath be sufficient?
<sourcepath>
<pathelement path="/usr/local/build/cvs/"/>
</sourcepath>

 

or something more along the lines of:

 

<sourcepath>
<pathelement path="/usr/local/build/cvs/feVap/main/java; /usr/local/build/cvs/core/src/main/java; /usr/local/cwright/cvs/UI/src/groovy"/>
</sourcepath>

 

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 9, 2015

Performance concerns with creating the HTML reports.  Given the size of the system, it can take a couple of hours to create.

While generating a report, in most cases, the majority of time Clover spends on merging per-test coverage recording files. Thus in order to speed this up you can:

a) ensure that you don't keep old, outdated coverage recording files from previous builds

b) reduce number of recording files produced by Clover

c) reduce size of per-test coverage files

d) do not report per-test coverage

 

ad a) This is usually simple, just run a clean build. ;) But more seriously, in complex setups there's always a possibility that some artifacts are not passed correctly, workspace is not cleaned, jobs has superfluous dependencies etc. It may be worth checking it.

ad b) Reducing this is tricky. In general Clover creates a dedicated coverage recorder for every set of classes being instrumented in one call - such as <clover-instr>, <clover-setup>+<javac> or <clover-setup>+<groovyc> in case of Ant, every maven-compiler-plugin invocation in case of Maven, every Grails module etc. Especially in Grails projects this can be a problem because every domain class and every controller is being compiled by Grails separately, resulting in vast number of coverage recorders created. For this reason for Grails projects a dedicated "SHARED" recorder has been prepared - see the https://confluence.atlassian.com/display/CLOVER/Coverage+Recorders for more details.

ad c) It's possible to reduce size of these files by changing instrumentation granularity from "statement" to "method" - reduction is by about 80%. Please keep in mind, however that this comes at the cost of lower coverage precision - an entire method will be shown as covered even if only one statement was executed. Use cases where I see this approach applicable are as follows:

  • to use the method-level coverage as an upper-bound estimate; in case when more detailed data is needed, a developer could run a report locally on his machine, for instance
  • to use the method-level coverage for test optimization

ad d) Either don't record per-test coverage recording files (-Dclover.pertest.coverage=off) or delete these files before reporting.

 

 

Feel free to ask if you have more questions.

Cheers
Marek

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 9, 2015

Source files not found during creation of the html report.  Because this is an extremely distributed build and test system, the box/user that creates the reports is not the same as the one that compiles.  And the location of the source tree may be different.   The path to the source file is in the clover.db file, we are looking for ways either to:

  1. update those references to the current location 
  2. how to get the file name in the clover.db to be relative, not absolute. 
  3. use carefully crafted symbolic links to just side step the issue entirely.

During report generation you can point to location of source files, see the <clover-report> / <current> / <sourcepath> tag. Again, it's for Ant, but the same can be achieved in Maven (by using a report descriptor) and in Grails (by using a reporttask).

It's also possible to generate an HTML report without sources - in such case a report will contain coverage statistics for packages, files and classes only (as well as test results) - see <clover-report> / <current> / <format srcLevel="false">.

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 9, 2015

Not all the coverage is being reported.  This may actually be an internal network error caused by unannounced patching.

 There may be several reasons of lack of coverage data. For example:

  • source code could have not been instrumented by Clover (e.g. wrong inclusion/exclusion pattern, clover2:setup called after 'compile' and not before etc)
  • normal classes instead of the instrumented ones were executed during tests (e.g. a normal WAR instead of the instrumented file was deployed for testing, JAR contains non-instrumented classes)
  • clover.db could not be found during test execution (e.g. not copied to the machine, copied to a wrong location, lack or wrong -Dclover.initstring)
  • inappropriate flush policy was used (e.g. the default one writes coverage data at JVM shutdown hook which is not applicable if your JVM does not stop - for example when application server runs continuously)
  • coverage recording files were not copied to a reporting machine or copied to a wrong folder (as they must be present in the same directory where clover.db is located)

Please have a look at https://confluence.atlassian.com/display/CLOVERKB/Troubleshooting+Reports or feel free to raise a ticket at http://support.atlassian.com.

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 9, 2015

For managing the clover db, we have a 3 tb NFS mount backed by the corporate SAN. Currently all the work is done directly to the NFS mount, we are investigating more efficient approaches.

Do you need to keep all clover.db files and coverage recordings in one place? I'm asking about your requirements regarding management of coverage data. For instance:

  • do you need to create one central Clover report of all of your 80 development streams
  • or maybe each of them has own reports?

In case each stream has own Clover report you could for example:

  • dedicate build agents for builds with Clover - in such way that compilation & testing & reporting is performed on the same agent using local hard drive (no NFS)

In case this is not an option, then please check if tar-ing and copying coverage files would not be faster:

  • Clover's coverage recording files are already compressed, however they are usually very small and there are thousands of them
  • thus, theoretically it should be faster to write coverage recording files on agent's local hard drive, tar all of them and send to your central SAN storage as one file
0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 9, 2015

Hi Clark,

We have 80 active development streams being fully built and tested every night.  The unit test count per stream is on the order of 6k test classes and 42k test methods.  Its a big system.

Do you need to see per-test coverage in HTML reports? By per-test coverage I mean that you can see which tests hit given source line (or a file) as well as which classes were executed by given test. If not, then you can disable per-test coverage recording at runtime by providing the -Dclover.pertest.coverage=off property. Thanks to this, a number of coverage recording files produced will drop dramatically (as a rule of thumb it's 90% less) as only global coverage will be recorded. Note that Clover will also not record test results, however this is not a problem, because you can provide an external source of test results, which are XML test result files in JUnit-compatible format. See:

 

0 votes
Clark Wright July 9, 2015

So we are currently using:

Java 7

Grails 1,3,7

Ant 1.7

Clover 3.1.12

There is an effort underway to update us to the latest versions.

We are not using the distributed coverage feature.  The topography of the system being tested doesn't lend itself to that model (that we can tell).

We have 80 active development streams being fully built and tested every night.  The unit test count per stream is on the order of 6k test classes and 42k test methods.  Its a big system.

For managing the clover db, we have a 3 tb NFS mount backed by the corporate SAN.  Currently all the work is done directly to the NFS mount, we are investigating more efficient approaches.  (Shockingly, what works in a small scale dev environment may not scale successfully...) 

In solving this, we are running into several issues:

  1. Not all the coverage is being reported.  This may actually be an internal network error caused by unannounced patching. 

  2. Source files not found during creation of the html report.  Because this is an extremely distributed build and test system, the box/user that creates the reports is not the same as the one that compiles.  And the location of the source tree may be different.   The path to the source file is in the clover.db file, we are looking for ways either to:
    1. update those references to the current location 
    2. how to get the file name in the clover.db to be relative, not absolute. 
    3. use carefully crafted symbolic links to just side step the issue entirely.

  3. Performance concerns with creating the HTML reports.  Given the size of the system, it can take a couple of hours to create.

Using a shared reporter is on our list of things to try, I will certainly expedite that.

Thank you,

-Clark.

0 votes
Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 8, 2015

Hi Clark,

Currently, our config looks roughly like the default distributed config here: https://confluence.atlassian.com/display/CLOVER031/Working+with+Distributed+Applications

Please note that the CLOVER031 space is for Clover 3.1.x. In case you use the latest version of Clover, please have a look at https://confluence.atlassian.com/display/CLOVER/Working+with+Distributed+Applications - a diagram is slightly different (more accurate).

Yes, 1 Terabyte of network traffic

Indeed, this is really a lot. Could you please tell me how do you manage Clover databases and coverage recording files?

  • do you use a network shared drive or
  • do you copy database and recordings between machines?

Could you also tell how many coverage recording files do you have in total?

How many projects / modules / Clover databases do you have?

In trying different approaches, we have found that working locally and then moving the artifacts to the network once all the tests are done on that particular machine reduces a lot of the bandwidth

Could you please explain what do you mean by artifacts? JARs containing instrumented code? / Clover database? / recording files?

Problem is, the reports are then showing no coverage for any of the java files.

What tool do you use to compile your java files? Ant / Maven / Grails?

Do you see any recording files generated for your java code on your test machine(s)? If yes, then there may be a case that you copy recording files to a reporting machine to wrong location.

What additional information should I provide?

What Clover / Java / Grails / Ant / Maven versions do you use?

Do you build and run tests on homogeneous environment (i.e. all build agents are the same)?

Are you using Clover's "Distributed Coverage" feature?

Any ideas?

First things which come to my mind are as follows:

  • the enormous network traffic may be caused by usage of a shared network drive (please confirm); if yes, then copying recording files from test machine(s) to a reporting machine should improve this
  • the enormous network traffic may be also caused by large number of coverage recording files; it may be caused by a fact that Grails projects instrumented by Clover may produce lots of them; changing a default coverage recorder type to a "shared" one may improve this
  • this traffic may be also caused by a Distributed Coverage feature (although I would not expect that this is a main cause); you may consider disabling it, in case you use it, but the drawback is that you will loose per-test coverage information in reports (which is usually very useful)

I will share more details with you as soon as I know more about your environment.

 

Cheers

Marek

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events