Code
import json
import os
import pandas as pd
import pdfplumber
os.getcwd()
'c:\\Users\\joost\\ReposWindows\\bedrock-web\\sandbox'
'c:\\Users\\joost\\ReposWindows\\bedrock-web\\sandbox'
Upload the AGS3-3-1-2005.pdf
Then use a prompt like this:
Put the Groups and Headings table in a CSV with columns group_name,contents,notes,parent_group
The Groups and Headings table starts at the bottom of page 13 and finishes on page 17.
It might require a bit of chatting with ChatGPT before you get a proper table in your chat. Once you do, you can copy-paste it to a .tsv file. A .tsv file is a Tab Separated Value file instead of CSV - Comma Separated Value file, which is what you get when you copy the table from ChatGPT. TSV’s are also handy, because they don’t run into issues when one of the values contains a comma…
Now it could be that ChatGPT doesn’t always return the same results and that the table isn’t exactly as in the AGS 3 .pdf document. This was the case for me:
df1 = pd.read_csv("ags3_chatgpt_groups_and_headings.tsv", sep="\t")
df1["df"] = 1
df2 = pd.read_csv("ags3_chatgpt_groups_and_headings2.tsv", sep="\t")
df2["df"] = 2
# Concatenate the two DataFrames
df_concat = pd.concat([df1, df2])
# Find the duplicate rows
manual_duplicates = df_concat.duplicated(
subset=df_concat.columns.difference(["df"]), keep=False
)
# Find the rows that are not duplicates (i.e., the rows that are unique to one DataFrame)
unique_rows = df_concat[~manual_duplicates]
unique_rows.sort_index()
group_name | contents | parent_group | df | |
---|---|---|---|---|
32 | HPGI | Horizontal Profile Gauge Installation Details | HOLE | 1 |
32 | HPGI | Horizontal Profile Gauge Installation | HOLE | 2 |
37 | IFID | On Site Volatile Headspace Testing (FID) | HOLE | 2 |
37 | IFID | On Site Volatile Headspace Testing Using Flame... | HOLE | 1 |
38 | INST | Single Point Instrument Installation | HOLE | 2 |
38 | INST | Single Point Instrument Installation Details | HOLE | 1 |
40 | IPID | On Site Volatile Headspace Testing (PID) | HOLE | 2 |
71 | UNIT | Definition of <UNITS> and CNMT_UNIT | - | 1 |
72 | UNIT | Definition of Units | - | 2 |
def extract_ags3_data_dict_table(table):
headings = []
for row in table[2:]: # Skip first 2 rows: 1st = title, 2nd = headings
headings.append(
{
"status": None if row[0] == "" else row[0].strip(),
"heading": row[1].strip(),
"unit": None if row[2] == "" else row[2].strip().replace("\n", " "),
"description": row[3].strip().replace("\n", " "),
"example": None if row[4] == "" else row[4].strip().replace("\n", " "),
}
)
return headings
def extract_ags4_data_dict_table(table):
# Skip rows that don't contain data
for i, row in enumerate(table):
if "Suggested\nUnit / Type" in row or "Unit / Type" in row:
first_data_row = i + 1
break
headings = []
for row in table[first_data_row:]:
row = [x for x in row if x is not None]
headings.append(
{
"status": None if row[0] == "" else row[0].strip(),
"heading": row[1].strip(),
"unit": None if row[2] == "" else row[2].strip().replace("\n", ""),
"type": row[3].strip(),
"description": row[4].strip().replace("\n", " "),
"example": None if row[5] == "" else row[5].strip().replace("\n", " "),
}
)
return headings
pdf_file, from_page, to_page = pdf_dict[ags_version].values()
# List to store extracted data for each group
extracted_data = []
previous_group_name = ""
with pdfplumber.open(pdf_file) as pdf:
# Adjust the page range based on where the tables are located
for page_number in range(from_page, to_page):
page = pdf.pages[page_number - 1] # pdfplumber is 0-based, so subtract 1
tables_on_current_page = page.extract_tables() # Extract tables from the page
# Iterate through all tables found on the page
for table in tables_on_current_page:
if ags_version == 3:
table_title = table[0][0].strip() # Get table title from AGS3
elif ags_version == 4:
table_title = table[0][1].strip() # Get table title from AGS4
print(table_title)
parts = table_title.split(": ", 1) # Split on the first occurrence of ': '
if "Group Name" in parts[0]:
group_name = parts[1].split(" - ")[0]
group_description = " - ".join(parts[1].split(" - ")[1:])
group_description = group_description.replace("\n", " ")
if ags_version == 3:
headings = extract_ags3_data_dict_table(table)
elif ags_version == 4:
headings = extract_ags4_data_dict_table(table)
if group_name == previous_group_name:
extracted_data[-1]["headings"].extend(headings)
else:
extracted_data.append(
{
"group_name": group_name,
"group_description": group_description,
"headings": headings,
}
)
previous_group_name = group_name
Group Name: PROJ - Project Information
Group Name: ABBR - Abbreviation Definitions
Group Name: DICT - User Defined Groups and Headings
Group Name: FILE - Associated Files
Group Name: TRAN - Data File Transmission Information / Data Status
Group Name: TYPE - Definition of Data Types
Group Name: UNIT - Definition of Units
Group Name: AAVT - Aggregate Abrasion Tests
Group Name: AAVT - Aggregate Abrasion Tests
Group Name: ACVT - Aggregate Crushing Value Tests
Group Name: AELO - Aggregate Elongation Index Tests
Group Name: AFLK - Aggregate Flakiness Tests
Group Name: AFLK - Aggregate Flakiness Tests
Group Name: AIVT - Aggregate Impact Value Tests
Group Name: AIVT - Aggregate Impact Value Tests
Group Name: ALOS - Los Angeles Abrasion Tests
Group Name: ALOS - Los Angeles Abrasion Tests
Group Name: APSV - Aggregate Polished Stone Tests
Group Name: ARTW - Aggregate Determination of the Resistance to Wear (micro-Deval)
Group Name: ASDI - Slake Durability Index Tests
Group Name: ASNS - Aggregate Soundness Tests
Group Name: AWAD - Aggregate Water Absorption Tests
Group Name: AWAD - Aggregate Water Absorption Tests
Group Name: BKFL - Exploratory Hole Backfill Details
Group Name: CBRG - California Bearing Ratio Tests - General
Group Name: CBRT - California Bearing Ratio Tests - Data
Group Name: CDIA - Casing Diameter by Depth
Group Name: CHIS - Chiselling Details
Group Name: CHOC - Chain of Custody Information
Group Name: CMPG - Compaction Tests - General
Group Name: CMPT - Compaction Tests - Data
Group Name: CONG - Consolidation Tests - General
Group Name: CONG - Consolidation Tests - General
Group Name: CONS - Consolidation Tests - Data
Group Name: CONS - Consolidation Tests - Data
Group Name: CTRC - Cyclic Triaxial Tests - Consolidation
Group Name: CTRC - Cyclic Triaxial Tests - Consolidation
Group Name: CTRD - Cyclic Triaxial Tests - Data
Group Name: CTRG - Cyclic Triaxial Test - General
Group Name: CTRG - Cyclic Triaxial Test - General
Group Name: CTRP - Cyclic Triaxial Test - Derived Parameters
Group Name: CTRP - Cyclic Triaxial Test - Derived Parameters
Group Name: CTRS - Cyclic Triaxial Test - Saturation
Group Name: CORE - Coring Information
Group Name: DCPG - Dynamic Cone Penetrometer Tests - General
Group Name: DCPG - Dynamic Cone Penetrometer Tests - General
Group Name: DCPT - Dynamic Cone Penetrometer Tests - Data
Group Name: DETL - Stratum Detail Descriptions
Group Name: DETL - Stratum Detail Descriptions
Group Name: DISC - Discontinuity Data
Group Name: DISC - Discontinuity Data
Group Name: DLOG - Driller Geological Description
Group Name: DOBS - Drilling/Advancement Observations & Parameters
Group Name: DOBS - Drilling/Advancement Observations & Parameters
Group Name: DPRG - Dynamic Probe Tests - General
Group Name: DPRB - Dynamic Probe Tests - Data
Group Name: DREM - Depth Related Remarks
Group Name: ECTN - Sample Container Details
Group Name: ELRG - Environmental Laboratory Reporting
Group Name: ELRG - Environmental Laboratory Reporting
Description
Depth to top of test
specimen
Laboratory analytical
name
Group Name: ERES - Environmental Contaminant Testing
Group Name: ERES - Environmental Contaminant Testing
Group Name: ESCG - Effective Stress Consolidation Tests - General
Group Name: ESCG - Effective Stress Consolidation Tests - General
Group Name: ESCT - Effective Stress Consolidation Tests - Data
Group Name: ESCT - Effective Stress Consolidation Tests - Data
Group Name: FGHG - Field Geohydraulic Testing - General
Group Name: FGHI - Field Geohydraulic Testing - Instrumentation Details
Group Name: FGHS - Field Geohydraulic Testing - Test Results (per stage)
Group Name: FGHT - Field Geohydraulic Testing - Data
Group Name: FLSH - Drilling Flush Details
Group Name: FRAC - Fracture Spacing
Group Name: FRST - Frost Susceptibility Tests
Group Name: FRST - Frost Susceptibility Tests
Group Name: GCHM - Geotechnical Chemistry Testing
Group Name: GCHM - Geotechnical Chemistry Testing
Group Name: GEOL - Field Geological Descriptions
Group Name: GRAG - Particle Size Distribution Analysis - General
Group Name: GRAG - Particle Size Distribution Analysis - General
Group Name: GRAT - Particle Size Distribution Analysis - Data
Group Name: GRAT - Particle Size Distribution Analysis - Data
Group Name: HDIA - Hole Diameter by Depth
Group Name: HDPH - Depth Related Exploratory Hole Information
Group Name: HDPH - Depth Related Exploratory Hole Information
Group Name: HORN - Exploratory Hole Orientation and Inclination
Group Name: ICBR - In Situ California Bearing Ratio Tests
Group Name: ICBR - In Situ California Bearing Ratio Tests
Group Name: IDEN - In Situ Density Tests
Group Name: IFID - On Site Volatile Headspace Testing Using Flame Ionisation Detector
Group Name: IPEN - In Situ Hand Penetrometer Tests
Group Name: IPID - On Site Volatile Headspace Testing by Photo Ionisation Detector
Group Name: IPID - On Site Volatile Headspace Testing by Photo Ionisation Detector
Group Name: IPRG - In Situ Permeability Tests - General
Group Name: IPRG - In Situ Permeability Tests - General
Group Name: IPRT - In Situ Permeability Tests - Data
Group Name: IRDX - In Situ Redox Tests
Group Name: IRES - In Situ Resistivity Tests
Group Name: IRES - In Situ Resistivity Tests
Group Name: ISAG - Soakaway Tests - General
Group Name: ISAG - Soakaway Tests - General
Group Name: ISAT - Soakaway Tests - Data
Group Name: ISPT - Standard Penetration Test Results
Group Name: ISPT - Standard Penetration Test Results
Group Name: IVAN - In Situ Vane Tests
Group Name: LBSG - Testing Schedule
Group Name: LBST - Testing Schedule Details
Group Name: LBST - Testing Schedule Details
Group Name: LDEN - Density Tests
Group Name: LDEN - Density Tests
Group Name: LDYN - Dynamic Testing
Group Name: LDYN - Dynamic Testing
Group Name: LFCN – Laboratory Fall Cone Test
Group Name: LFCN – Laboratory Fall Cone Test
Group Name: LLIN - Linear Shrinkage Tests
Group Name: LLIN - Linear Shrinkage Tests
Group Name: LLPL - Liquid and Plastic Limit Tests
Group Name: LLPL - Liquid and Plastic Limit Tests
Group Name: LNMC - Water/moisture Content Tests
Group Name: LOCA - Location Details
Group Name: LOCA - Location Details
Group Name: LPDN - Particle Density Tests
Group Name: LPEN - Laboratory Hand Penetrometer Tests
Group Name: LRES - Laboratory Resistivity Tests
Group Name: LRES - Laboratory Resistivity Tests
Group Name: LSLT - Shrinkage Limit Tests
Group Name: LSTG - Initial Consumption of Lime Tests - General
Group Name: LSTG - Initial Consumption of Lime Tests - General
Group Name: LSTT - Initial Consumption of Lime Tests - Data
Group Name: LSWL - Swelling Index Testing
Group Name: LSWL - Swelling Index Testing
Group Name: LTCH - Laboratory Thermal Conductivity
Group Name: LTCH - Laboratory Thermal Conductivity
Group Name: LUCT - Laboratory Unconfined Compression Test
Group Name: LUCT - Laboratory Unconfined Compression Test
Group Name: LVAN - Laboratory Vane Tests
Group Name: LVAN - Laboratory Vane Tests
Group Name: MCVG - MCV Tests - General
Group Name: MCVG - MCV Tests - General
Group Name: MCVT - MCV Tests - Data
Group Name: MCVT - MCV Tests - Data
Group Name: MOND - Monitoring Readings
Group Name: MONG - Monitoring Installations and Instruments
Group Name: MONG - Monitoring Installations and Instruments
Group Name: PIPE - Monitoring Installation Pipe Work
Group Name: PLTG - Plate Loading Tests - General
Group Name: PLTT - Plate Loading Tests - Data
Group Name: PMTG - Pressuremeter Test Results - General
Group Name: PMTG - Pressuremeter Test Results - General
Group Name: PMTD - Pressuremeter Test Data
Group Name: PMTL - Pressuremeter Test Results - Individual Loops
Group Name: PREM - Project Specific Time Related Remarks
Group Name: PTIM - Boring/Drilling Progress by Time
Group Name: PTST - Laboratory Permeability Tests
Group Name: PTST - Laboratory Permeability Tests
Group Name: PTST - Laboratory Permeability Tests
Group Name: PUMG - Pumping Tests - General
Group Name: PUMT - Pumping Tests - Data
Group Name: RCAG - Rock Abrasiveness Tests - General
Group Name: RCAG - Rock Abrasiveness Tests - General
Group Name: RCAT - Rock Abrasiveness Tests - Data
Group Name: RCAT - Rock Abrasiveness Tests - Data
Group Name: RCCV - Chalk Crushing Value Tests
Group Name: RCCV - Chalk Crushing Value Tests
Group Name: RDEN - Rock Porosity and Density Tests
Group Name: RDEN - Rock Porosity and Density Tests
Group Name: RELD - Relative Density Tests
Group Name: RELD - Relative Density Tests
Group Name: RESC - Resonant Column Test - Consolidation
Group Name: RESC - Resonant Column Test - Consolidation
Group Name: RESD - Resonant Column Test – Data
Group Name: RESD - Resonant Column Test – Data
Group Name: RESG - Resonant Column Test – General
Group Name: RESG - Resonant Column Test – General
Group Name: RESP - Resonant Column Test - Derived Parameters
Group Name: RESS - Resonant Column Test – Saturation
Group Name: RPLT - Point Load Testing
Group Name: RPLT - Point Load Testing
Group Name: RSCH - Schmidt Rebound Hardness Tests
Group Name: RSCH - Schmidt Rebound Hardness Tests
Group Name: RSCH - Schmidt Rebound Hardness Tests
Group Name: RSHR - Shore Scleroscope Hardness Tests
Group Name: RTEN - Tensile Strength Testing
Group Name: RUCS - Rock Uniaxial Compressive Strength and Deformability Tests
Group Name: RUCS - Rock Uniaxial Compressive Strength and Deformability Tests
Group Name: RWCO - Water Content of Rock Tests
Group Name: RWCO - Water Content of Rock Tests
Group Name: SAMP - Sample Information
Group Name: SAMP - Sample Information
Group Name: SCDG - Static Cone Dissipation Tests - General
Group Name: SCDT - Static Cone Dissipation Tests - Data
Group Name: SCDT - Static Cone Dissipation Tests - Data
Group Name: SCPG - Static Cone Penetration Tests - General
Group Name: SCPP - Static Cone Penetration Tests - Derived Parameters
Group Name: SCPT - Static Cone Penetration Tests - Data
Group Name: SCPT - Static Cone Penetration Tests - Data
Group Name: SHBG - Shear Box Testing - General
Group Name: SHBG - Shear Box Testing - General
Group Name: SHBT - Shear Box Testing - Data
Group Name: SHBT - Shear Box Testing - Data
Group Name: STND - Standards / Specifications
Group Name: SUCT - Suction Tests
Group Name: SUCT - Suction Tests
Group Name: TNPC - Ten Per Cent Fines
Group Name: TNPC - Ten Per Cent Fines
Group Name: TREG - Triaxial Tests - Effective Stress - General
Group Name: TRET - Triaxial Tests - Effective Stress - Data
Group Name: TRET - Triaxial Tests - Effective Stress - Data
Group Name: TREM - Location Specific Time Related Remarks
Group Name: TRIG - Triaxial Tests - Total Stress - General
Group Name: TRIT - Triaxial Tests - Total Stress - Data
Group Name: TRIT - Triaxial Tests - Total Stress - Data
Group Name: WADD - Water Added Records
Group Name: WADD - Water Added Records
Group Name: WETH - Weathering
Group Name: WGPG - Wireline Geophysics - General
Group Name: WGPT - Wireline Geophysics - Readings
Tool
Calliper X-Y (2 x two
arm Calliper)
Downhole Magnetic
Resonance
Group Name: WINS - Window or Windowless Sampling Run Details
Group Name: WSTG - Water Strike - General
Group Name: WSTD - Water Strike - Details
{'LFCN'}
{'LFCN – Laboratory Fall Cone Test'}