This is a script for automatic instruction sequence extraction from CFGs generated by Ghidra.
Create the venv python3 -m venv .venv, then activate it.
As we use pyghidra, it is needed to set an environment variable accordingly.
Respectively, 'GHIDRA_INSTALL_DIR' which takes the path to your Ghidra Installation folder.
# On Windows
> ./.venv/bin/activate.bat
# On Linux
$ source ./.venv/bin/activate
Finally, install the requirements and run the script:
pip install -r requirements.txt
python3 calls.py
The following packages need to be installed:
- mingw
sudo apt install mingw-w64
- wine (optional, only if you want to also run the pe on linux)
sudo apt install wine
The directory looks like this
.
├── calls.py
├── graphs
│ ├── a86bc.gf
│ ├── da4a0.dot.gf
│ └── dacb9.dot.gf
├── README
└── symbols
├── a86bc_data.csv
├── a86bc_symbols.csv
├── da4a0_data.csv
├── da4a0_symbols.csv
├── dacb9_data.csv
└── dacb9_symbols.csv
Where in Graph we have the DOT Graph files, and in Symbols we have the data and symbols table. All exported manually from Ghidra.
- How to convert a dot file to png?
dot <graph.dot> -Tpng > <graph.png>
st = currentProgram.symbolTable.getAllSymbols(1)
for i in st:
print('\n-----\nsymbol:' + str(i) + '\naddress:' + str(i.getAddress()))
<focused in Listing panel/ Window -> Listing>
CRTL + A
Graph -> Graph Output -> Graph Export(check)
Graph -> Code Flow -> <Format: DOT> OK
Header column we work with (example):
<focused in Defined Data panel/ Window -> Defined Data>
CRTL + A
<Right click> -> Export -> Export to CSV...
OK
Window -> Symbol Table/ CTRL + T
<Right click on table header> -> Add/Remove Columns... -> <tick "Function Name" checkbox>
CRTL + A
<Right click> -> Export -> Export to CSV...
OK
pm = currentProgram.treeManager.getRootModule(currentProgram.treeManager.DEFAULT_TREE_NAME)
currentProgram.symbolTable.externalSymbols
currentProgram.symbolTable.getAllSymbols(0) docs
document how "_data.csv" "_symbols.csv" and "*.gf" are generated(done by How to manually generate the input files- find a way to programmatically generate "*.gf" files related issue
find a way to programmatically generate "_data.csv" "_symbols.csv" files(done by print symbol table using api section)rewrite that section of the code that generate the sequences and encapsulate in a function, document what information that functions uses from the dot files(we may gather that data from api) (half done)need review for bug documented onbug_in_read_csv_samplesbranch or at least I need the executable samples for the provided symbols/graphs- improve the encoding: it should include the registers used by mnemonics, we should use dictionaries of mnemonics, systemcalls, registers