Skip to content

Difference between full and partial artifact (replication package) #6

Description

@apanichella

When reviewing or trying to replicate existing study I do find some problem with what some peers call ** full replication package**, and this is mostly related to ML and mining studies.

Let me take defect prediction as an example. Most of study provide a replication package, which includes only a matrix with numbers (features) and labels. Often, there is not whatsoever link to the actual code, commit, or file.

To me, in this case the replicability is partial (aka the artifact is partial) as we can replicate only 50% of the study: feeding the data to an ML model and check whether the results matches what is in the paper. This also imply, we cannot check whether the matrix (and the labels in particular) matches the original source code data.

A compete artifact should include:

  1. The link to the GitHub repo
  2. Feature extraction procedure (metrics)
  3. Data set building
  4. ML training and evaluation

Here, I make the example of an ML-based application. But I do believe distinguishing between partial and complete artifact is necessary. Especially, when working on open source data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions