Hi everyone, I want to ask for guidance for my log analyzer platform idea. I have an elasticsearch system which collects the logs from different platforms, and creates alerts. The system writes the alerts to an index on ES. Also, my alerts are stored in a folder as JSON (multi line format).
Read json folder or ES index as streaming (read in new entry within 5 min)
Select only alerts that I want to work on ( alert.id = 100 , status=true , ...)
Create a DataFrame + Window for 10 min period
Run a query fro that DataFrame by grupping by IP ( If same IP gets 3 alerts then show me the result)
All the coding should be in python
The ideas is something like this, my question is how should I proceed to this task. What are the technologies that I should use?
Apache Spark + Python + Pyspark + Kaola can handle this ?