There are a bunch of files that it complains "No token file", for example
No token file for Political Files/2020/Federal/US Senate/amy mcgrath order 519-525 30/mcgrath 30
No token file for Political Files/2020/Federal/President/Mike Bloomberg 2020/mike bloomberg egxa 117 invoice 1.21
No token file for Political Files/2020/Federal/President/Sanders/Orders/sanders.wsoc.663906R.02.28.2020-2
No token file for Political Files/2020/Federal/US House/Sheila Jackson/sheilajacksonfinalinvoice
No token file for Political Files/2020/Federal/President/Steyer/Telemundo/Orders/steyer2020.esoc.649599.1.21.20
No token file for Political Files/2020/Federal/President/tom steyer invoice kgwn 12.15
This ends up with only generating 8990 parquet files under data/training
There are a bunch of files that it complains "No token file", for example
This ends up with only generating 8990 parquet files under
data/training