Deleting Metadata
Guide to Deleting V7 metadata in your storage bucket
Permanent Deletion of Data
Performing this action will permanently delete both the item in V7 and the metadata in your storage account. Please make sure you are certain you wish to proceed with this action.
When registering items with external storage, metadata will be added to your storage bucket either by V7 (if readwrite) or by you (read only). Once you are completely finished with an item in V7 then you will want to delete the metadata stored in your cloud storage.
Unless you have a GCP integration and have enabled storage.objects.delete
permissions, then V7 does not have the ability to clear this data up automatically. This operation will have to be performed with the cloud service hosting your data.
Below is an example script for deleting an item in V7 and in readwrite created metadata in your S3 bucket. Please note that older registered items (registered before 2023) will have a different metadata structure
Use of Script
Please make sure to double check that you are saving/deleting the relevant S3 metadata. This script is an example template and should be tailored to your individual environment.
'''
DESCRIPTION
When executed from the command line, this script:
- 1: Deletes a given item in V7
- 2: Deletes all associated metadata in S3
USAGE
python3 auto-metadata-deletion.py [-h] team_slug dataset_id item_name [-s3_path]
REQUIRED ARGUMENTS
team_slug: V7 Team slug
dataset_id: V7 Dataset Id containing item
item_name: Name of Item to be deleted
OPTIONAL ARGUMENTS
-h, --help Print the help message for the function and exit
s3_path: Path to item in S3
'''
import boto3
import requests
import os
import argparse
import logging
def delete_s3_file(bucket_name, file_key) -> None:
"""
Deletes a single item from an S3 bucket.
Parameters
----------
bucket_name (str): The S3 bucket name where the item is stored
file_key (str): The path to the item to be deleted
Returns
-------
None
"""
try:
# Delete the file from the S3 bucket
s3_client.delete_object(Bucket=bucket_name, Key=file_key)
logging.info(f"File '{file_key}' deleted successfully from bucket '{bucket_name}'.")
except Exception as e:
logging.warning(f"Error deleting file '{file_key}' from bucket '{bucket_name}': {str(e)}")
def delete_s3_files_in_path(bucket_name, path) -> None:
"""
Gets the dataset_id(s) and name(s) of all empty datasets in a Darwin team.
Parameters
----------
bucket_name (str): The S3 bucket name where the item is stored
file_key (str): The path to the item to be deleted
Returns
-------
None
"""
try:
# List all objects in the specified path
objects = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=path)['Contents']
# Delete each object in the path
for obj in objects:
s3_client.delete_object(Bucket=bucket_name, Key=obj['Key'])
logging.info(f"File '{obj['Key']}' deleted successfully from bucket '{bucket_name}'.")
logging.info(f"All files in path '{path}' deleted successfully from bucket '{bucket_name}'.")
except Exception as e:
logging.warning(f"Error deleting files in path '{path}' from bucket '{bucket_name}': {str(e)}")
def delete_v7_file(item_name,team_slug,dataset_id) -> None:
"""
Deletes a V7 file with a given name in the designated dataset
Parameters
----------
bucket_name (str): The S3 bucket name where the item is stored
file_key (str): The path to the item to be deleted
Returns
-------
None
"""
try:
#Specify header and payload information for V7 item deletion
url = f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items"
payload = {"filters": {
"item_names": [item_name],
"dataset_ids": [dataset_id]
}}
headers = {
"accept": "application/json",
"content-type": "application/json",
"Authorization": f"ApiKey {V7_API_KEY}"
}
response = requests.delete(url, json=payload, headers=headers)
logging.info(f"File '{item_name}' deleted successfully from V7 Dataset: '{dataset_id}'.")
except Exception as e:
logging.warning(f"Error deleting files in dataset '{dataset_id}' from bucket '{bucket_name}': {str(e)}")
def main():
'''
Top level function to execute sub-functions
'''
parser = argparse.ArgumentParser(description='Delete V7 metadata files in a specified path within an S3 bucket when deleting a V7 item.')
parser.add_argument('team_slug', help='the name of the S3 bucket')
parser.add_argument('dataset_id', help='the name of the S3 bucket')
parser.add_argument('item_name', help='the name of item to be deleted in V7')
parser.add_argument('--s3_path', help='path to original s3 item')
args = parser.parse_args()
#Loading environment variables
access_key = os.environ.get('S3_ACCESS_KEY')
secret_key = os.environ.get('S3_SECRET_KEY')
v7_prefix = os.environ.get('V7_STORAGE_PREFIX')
bucket_name = os.environ.get('S3_BUCKET_NAME')
V7_API_KEY = os.environ.get('V7_API_KEY')
s3_client = boto3.client('s3',
aws_access_key_id= access_key,
aws_secret_access_key= secret_key
)
if args.s3_path:
path = v7_prefix + "/" + args.s3_path + "/data/" + args.item_name
else:
path = v7_prefix + "/data/" + args.item_name
delete_s3_files_in_path(bucket_name, path)
delete_v7_file(args.item_name,args.team_slug,args.dataset_id)
if __name__ == "__main__":
main()
Readonly Differences
If you have decided to setup your external storage using read only access then the location of your metadata will depend on where you have decided to store it. This may not follow the same structure as the V7 written metadata but the deletion process will be very similar.
Updated 9 months ago