Guidance

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Guidance

Suat Toksöz
Hi everyone, I want to ask for guidance for my log analyzer platform idea. I have an elasticsearch system which collects the logs from different platforms, and creates alerts. The system writes the alerts to an index on ES. Also, my alerts are stored in a folder as JSON (multi line format).

The Goals:
  1. Read json folder or ES index as streaming (read in new entry within 5 min)
  2. Select only alerts that I want to work on ( alert.id = 100 , status=true , ...)
  3. Create a DataFrame + Window for 10 min period
  4. Run a query fro that DataFrame by grupping by IP ( If same IP gets 3 alerts then show me the result)
  5. All the coding should be in python

The ideas is something like this, my question is how should I proceed to this task. What are the technologies that I should use?

Apache Spark + Python + Pyspark + Kaola can handle this ?

-- 

Best regards,

Suat Toksoz