assert_replication/ReplicationSteps.txt at master · caseycas/assert_replication · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
Replication Steps:
In order to replicate the data, you will need to set the file paths and the SQL schema names to be appropriate for your context.

In order to run all the scripts you need to have installed:

The commands 'rar' and 'unrar'.
Java 1.7 - (Required Libraries included under Resources)
	- I recommend using Eclipse to run the files.
PostgreSql
R - libraries:
	pscl, car, lsr, sqldf, xtable, stargazer

Also, there are some spreadsheet's that are not necessary to replicate the results
saved in .ods format.  You will need OpenOffice or another program capable of opening
these files.

Your computer should also have at least 8GB in main memory to run all the
scripts on the processed data.  If you are rerunning the gitcproc tool,
you will likely need at least 16 GB (processing the linux log in particular
may take up a lot of memory).  For further instructions on gitcproc, see the
README's for that project.


1. Generate Raw Data from logs
	- The raw git logs are too large to store on GitHub, if you want to replicate from the logs, please contact me at ccasal@ucdavis.edu or caseycasal@gmail.com and we can work out a way to transfer them to
	you.
	- The tool to parse the logs is located at github.com/caseycas/gitcproc.
	- Running instructions for this tool are included in that repository.

OR

1. Copy the first round processed data from the rar files.  The tables
change_summary and method_change_detail are the only two that are processed
from the gitcproc tool.  Instructions on how to use the sql commands
to create the other tables are located in the README.md files in the
respective sql_script folder.
	- Otherwise, to load in the data, do the following:
		a) In your postgres database, create a schema called 'assert_replication'
		b) Run the unpackRarFiles.sh script to decompress the csv files ('sh unpackRarFiles.sh').
		c) Run each of the sql scripts in src/sql_scripts/TableCreationAndPopulation/TableCreation.
		Note that at the bottom of each script there is a section that looks
		like:
		----------------------------------------------------------------------
		ALTER TABLE assert_replication.<table_name>
  			OWNER TO <owner_name>;
		GRANT ALL ON TABLE assert_replication.<table_name> TO <owner_name>;
		----------------------------------------------------------------------
		You will need to alter <owner_name> to whatever your postgres username
		is.
			Then run with "psql <database_name> < <sql_script>"
		d) Run the copy_csv_to_tables.sql script located under src/sql_scripts/TableCreationAndPopulation/TablePopulation to copy the csv files into your newly created
		tables.
			E.g. psql <database_name> < <sql_script>

2. To reproduce the Summary table from these files, run the run_all.sh shell script ('sh run_all.sh') in src/sql_scripts/SummaryTableSqlScripts.
This script will produce the Summary table's values from left to right, top to bottom.  There are more details in the README file in this directory.


3. Run the Java Scripts on the sql to create the relevant csv files. There are two ways to do
this.  In the bin/ directory there are two runnable jar files, BuildMethodBugTable.jar and
BuildMethodUserTable.jar.  The first produces the table used in RQ #1, the second used in
RQ#2.
To run them:
java -Xmx4096m -Xms512m -jar BuildMethodBugTable.jar <jdbc database connection> <username> [password]
java -Xmx4096m -Xms512m -jar BuildMethodUserTable.jar <jdbc database connection> <username> [password]
For example the three arguments could be jdbc:postgresql://localhost:5432/caseycas, caseycas, password

Alternatively, you can load the source files into Eclipse, link the libraries, and run them from there.
	a) To load in the source files:
		- Create Eclipse New Project
    		- Import the java files under the src/Java folder into Eclipse and
    		- link all the libraries included in Resources.

   	b) To link in the supporting libraries:
    		- Import File System -> select src/Java
    		- Right Click on Project -> select Build Path -> select Configure Build Path
    		- Select the Libraries Tab and Click Add External JARs
    		- Navigate to the Resources folder and include add JARS there, then click OK.

	c) To create the table used for RQ #1, run BuildMethodBugTable.java
		- To set up the necessary VM and command line arguments, select
		Run -> Run Configurations.
		- Go to the tab "Arguments"
		- Under "Program Arguments", input your database connection string,
		your username, and your password.  If you have no password to the
		database, leave the last argument blank.  For example, a default
		account into a locally hosted postgres database named "caseycas" would
		use the arguments: jdbc:postgresql://localhost:5432/caseycas postgres
		-The default VM memory is not large enough to process the data.
		-Under VM arguments, put "-Xmx4096m -Xms512m"
	b) To create the tables used for RQ #2, run BuildMethodUserTable.java
		-The command line and VM arguments are the same as in part c)


4. Run the R scripts on the csv files to reproduce the results used in the
replication.  These can be recreated by following Step 3 or by using the files
already included in the data folder.
If you've recreated the processed data file by following Step 3, copy them from your Eclipse project directory into data/Csvs/R_Inputs.
Also, before you run the R scripts, set the currect working directory in R
to the top level of this git repository.
	- E.g. setwd("/Users/caseycas/caseycas___assert_replication")

RQ1 - bug_model.R
	Input: method_assert_everything_before_ICSE_no_merge_expanded.csv
	This table contains a record of changes to functions our logs from before the ICSE deadline.
	It ignores merge commits.

RQ2 - plots.R
	Inputs: MethodAssertInfoNoMerge.csv
			MethodUserAssertInfoNoMerge.csv

Final Note on Accuracy Studies:
The files nonFunctionStudySample.ods, FunctionSampleRep.ods and
bug_accuracy_study.ods contain the randomly sampled change chunks or commits
used to estimate the bug false positives and the number of false positives and
negatives in function name and assert identification from the version of
gitcproc used to create these datasets.  The shas and chunk names are included
along with the evaluation of Y(yes) or N(no) of whether the tool labelled this
chunk correctly.  The criteria of calling a chunk correct is listed in the
paper.