Transform CloudFront log files to Parquet format.
Read CloudFront log files from an S3 bucket path source, create new files with the content transformed to Parquet format and save them in an S3 bucket path destination.
Variables
Obtain CloudFront log files from this S3 Bucket location.
Folder path from the Folder Path Prefix in the Source S3 Bucket Connection to the folder where the CloudFront log files are located.
Specify a number of days. Number specified is subtracted from the current day when the task runs to create a range of dates. Any files with a date (extracted from the filename) that falls within the range will be included for potential transformation.
If enabled, the next run of this Task will transform CloudFront files that have been recorded as successfully transformed previously again. Any data in existing Parquet files created from the same CloudFront files will be ignored and the existing files will be deleted. After the next successful run of this Task, this field will automatically become disabled.
Specify which columns in the CloudFront logs to include in the output files. The date and time columns will automatically be included. Any other columns not specified will not be included.
Create Parquet formatted files in this S3 Bucket location.
Folder path from the Folder Path Prefix in the Destination S3 Bucket Connection to the folder where the Parquet formatted files will be created.
Select the type of compression (if any) to apply to all columns in the Parquet file.
If enabled, when combining data from CloudFront with Parquet data from previous Task runs, any new rows that are identical to existing rows will be removed.
Connection for the MySQL database to use to record which files have been processed by this task in the past.
Schema and table name where processed file information is stored within the MySQL database.