Immutable ledger-based security bigdata analytic system
Immutable ledger-based security bigdata analytic system
The proposed system has focused on both batch and real-time log analysis use cases. The audit trails can be analyzed in a batch mode at automated regular intervals or manually and in a near real-time manner using the application. The serverless architecture which has been adopted while developing the system has improved the scalability of the system while providing a cost-effective audit log processing environment. This also removes the additional overhead of maintaining servers to run the application.
Using HiveQL on the EMR cluster, the data get queried from the storage and then stored on an external table to perform batch process against those data. The processed data then get stored on a cloud storage, in an output directory. Then the output data get queried and displayed on a dashboard for visualization.
To provide real-time log analysis and real-time anomaly detection, a server has been monitored by using a monitoring agent software application and those logs are get ingested to the Elasticsearch cluster and scan for anomalies of the data using an unsupervised machine learning model.
Application Workflow Diagram
The following configurations have been used on the EMR cluster, which has been provisioned while developing the proposed system.Environment: DevelopmentHive 2.3.6, Hue 4.4.0Release label: emr-5.28.0Hadoop distribution: Amazon 2.8.5Availability zone: us-east-1dMaster: 1 m5.xlarge
The auditors have the capability of enumerating the audit logs and uploading them to the cloud using the application. The exported ledger output gets stored on the S3 cloud storage. Once the processing is completed the output results get saved on the S3 cloud storage as a parquet compressed file and the EMR cluster get terminated automatically.Since the termination of the cluster causes loss of metadata, the system transfers the metadata to the AWS Glue data catalog. AWS Glue is used to prepare and load the data for analytics. System event logs, application logs, security logs from Windows/Linux systems have been used for analysis while developing the proposed system.To demonstrate the real-time data analysis, an EC2 instance with the following configurations has been used as the source of the audit trails. IIS server logs, Event logs and VPC flow logs have been streamed to the cloudwatch from the server.Each log groups then get streamed to the Elasticsearch cluster for the near realtime data analysis and anomaly detection. The audit trails have been stored in an immutable and verifiable ledger to protect the integrity and to assure accountability. Each record of the ledger considered as a document. Each document can be cryptographically verified and can identify all the alterations that have been taken a place on the specified document.
The application provides the capability of exporting the ledgers and the exported journals have been used for the analytics, auditing and verification purposes. These ledger exports can also be used as a backup and for exporting to the other systems when neededHigh Level System Architecture DiagramsStoring and retrieving ledger data
Batch processing log data
Near Realtime log analysis
The auditors have the capability of enumerating the audit logs and uploading them to the cloud using the application. The exported ledger output gets stored on the S3 cloud storage. Once the processing is completed the output results get saved on the S3 cloud storage as a parquet compressed file and the EMR cluster get terminated automatically. Since the termination of the cluster causes loss of metadata, the system transfers the metadata to the AWS Glue data catalog. AWS Glue is used to prepare and load the data for analytics. System event logs, application logs, security logs from Windows/Linux systems have been used for analysis while developing the proposed system.Following configurations have been used when configuring the Elasticsearch clusterwhile developing the application.Number of nodes: 1Number of data nodes: 1Active primary shards: 32Active shards: 32Data nodes storage type: EBSEBS volume type: General Purpose (SSD)Instance type: (data) t2.small.elasticsearchTo demonstrate the real-time data analysis, an EC2 instance with the following configurations has been used as the source of the audit trails. IIS server logs, Event logs and VPC flow logs have been streamed to the cloudwatch from the server. Eachlog groups then get streamed to the Elasticsearch cluster for the near real-time data analysis and anomaly detection.Instance type: t2.microAMI ID: Windows_Server-2019-English-Full-Base-2019.10.09 (ami-0d4df21ffeb914d61)IIS web server version: 10.0Monitoring agent: Unified cloudwatch agent
Comments
Post a Comment