Catalogue processing

For a more general introduction to the QuakeT tool please refer to the online documentation available at https://gemsciencetools.github.io/quakeT/

This notebook demonstrates how to convert a Stochastic Event Set (SES), that is a set of ruptures describing the potential seismicity occurring within a certain period of time according to a PSHA input model, generated by the OpenQuake (OQ) Engine event-based PSHA calculation workflow, into a catalogue format compatible with the Hazard Modeller’s Toolkit (HMTK) (for a general description of the HMTK please refer to the QuakeT Documentation).

The main steps of th workflow described in this notebook are summarized below:

  • Load raw SES event and SES rupture files,

  • Merge event and rupture data using the build_hmtk_ses_catalogue utility,

  • Assign random datetimes to events (Months, Days, Hours, Minutes, Seconds),

  • Transform the current catalogue format into HMTK format.

[1]:
import os
import pandas as pd

from openquake.man.ses_cat import build_hmtk_ses_catalogue

Merging rupture and event sets

[2]:
# Paths to the files produced by the OQ Engine - Usually use the following command:
# `oq engine --eos <calculation ID>`
events = '../data/aux/ses/output-241-events_62.csv'
ruptures = '../data/aux/ses/output-244-ruptures_62.csv'

output_folder = os.path.join("..", "output")
output = os.path.join(output_folder, "hmtk_sample_catalogue.csv")

# Run build_hmtk_ses_catalogue function
result = build_hmtk_ses_catalogue(events, ruptures, output)
print(f"Done!\n\nOutput file: {result}")
Done!

Output file: ../output/hmtk_sample_catalogue.csv

Statistical summary

This block loads the catalogue and generates a descriptive statistics table to provide an overview of the catalogue’s range and distribution.

[4]:
# Load the generated HMTK catalogue into a `pandas.DataFrame` instance
df = pd.read_csv(result)

# Columns for the summary
summary_cols = ["magnitude", "depth", "year"]

# Generate descriptive statistics
stats_summary = df[summary_cols].describe().T
stats_summary['range'] = stats_summary['max'] - stats_summary['min']
stats_summary.columns = [
    'Count', 'Mean', 'Std Dev', 'Min', '25%', '50% (Median)', '75%', 'Max', 'Range'
]

print("Statistical Summary of the catalogue: ")
display(stats_summary.style.format("{:.2f}"))
print(f"\nTotal Number of Events: {len(df)}")
print(f"Time Span: {df['year'].min():.0f} to {df['year'].max():.0f} ({df['year'].max() - df['year'].min():.0f} years)")
Statistical Summary of the catalogue:
  Count Mean Std Dev Min 25% 50% (Median) 75% Max Range
magnitude 59501.00 3.99 0.48 3.55 3.65 3.85 4.15 7.45 3.90
depth 59501.00 9.56 7.06 5.00 5.00 5.00 15.00 27.50 22.50
year 59501.00 5009.78 2891.54 1.00 2518.00 5003.00 7529.00 10000.00 9999.00

Total Number of Events: 59501
Time Span: 1 to 10000 (9999 years)
[ ]: