1. I'd also consider how you're structuring the data before applying the join, naively doing the join could be expensive so doing a bit of data preparation may be necessary to improve join performance. Try to get a baseline as well. Arrow would help improve it.
2. Try storing it back as Parquet but in a way the next application can take advantage of predicate pushdown.