Catalogue processing

For a more general introduction to the QuakeT tool please refer to the online documentation available at https://gemsciencetools.github.io/quakeT/

This notebook demonstrates how to convert a Stochastic Event Set (SES), that is a set of ruptures describing the potential seismicity occurring within a certain period of time according to a PSHA input model, generated by the OpenQuake (OQ) Engine event-based PSHA calculation workflow, into a catalogue format compatible with the Hazard Modeller’s Toolkit (HMTK) (for a general description of the HMTK please refer to the QuakeT Documentation).

The main steps of th workflow described in this notebook are summarized below:

Load raw SES event and SES rupture files,
Merge event and rupture data using the build_hmtk_ses_catalogue utility,
Assign random datetimes to events (Months, Days, Hours, Minutes, Seconds),
Transform the current catalogue format into HMTK format.

[1]:

import os
import pandas as pd

from openquake.man.ses_cat import build_hmtk_ses_catalogue

Merging rupture and event sets

[2]:

# Paths to the files produced by the OQ Engine - Usually use the following command:
# `oq engine --eos <calculation ID>`
events = '../data/aux/ses/output-241-events_62.csv'
ruptures = '../data/aux/ses/output-244-ruptures_62.csv'

output_folder = os.path.join("..", "output")
output = os.path.join(output_folder, "hmtk_sample_catalogue.csv")

# Run build_hmtk_ses_catalogue function
result = build_hmtk_ses_catalogue(events, ruptures, output)
print(f"Done!\n\nOutput file: {result}")

Done!

Output file: ../output/hmtk_sample_catalogue.csv

Statistical summary

This block loads the catalogue and generates a descriptive statistics table to provide an overview of the catalogue’s range and distribution.

[4]:

# Load the generated HMTK catalogue into a `pandas.DataFrame` instance
df = pd.read_csv(result)

# Columns for the summary
summary_cols = ["magnitude", "depth", "year"]

# Generate descriptive statistics
stats_summary = df[summary_cols].describe().T
stats_summary['range'] = stats_summary['max'] - stats_summary['min']
stats_summary.columns = [
    'Count', 'Mean', 'Std Dev', 'Min', '25%', '50% (Median)', '75%', 'Max', 'Range'
]

print("Statistical Summary of the catalogue: ")
display(stats_summary.style.format("{:.2f}"))
print(f"\nTotal Number of Events: {len(df)}")
print(f"Time Span: {df['year'].min():.0f} to {df['year'].max():.0f} ({df['year'].max() - df['year'].min():.0f} years)")

Statistical Summary of the catalogue:

	Count	Mean	Std Dev	Min	25%	50% (Median)	75%	Max	Range
magnitude	59501.00	3.99	0.48	3.55	3.65	3.85	4.15	7.45	3.90
depth	59501.00	9.56	7.06	5.00	5.00	5.00	15.00	27.50	22.50
year	59501.00	5009.78	2891.54	1.00	2518.00	5003.00	7529.00	10000.00	9999.00


Total Number of Events: 59501
Time Span: 1 to 10000 (9999 years)

[ ]: