Skip to main content

Creating and Uploading a Custom Sitemap

Overview

Use a custom sitemap when you need to batch upload URLs and their associated content. While the "Add Website" feature supports filtering and is typically more efficient, the custom sitemap method is beneficial for specific bulk upload needs.

Sitemap JSON Format

This JSON format lets you upload web URLs, functioning like a sitemap. Once uploaded, the system ingests all the web content linked to these URLs.

Example JSON:

{
"type": "webLinks",
"crawlDelay": 2,
"forceMode": "auto",
"lang" : ["en"],
"urls": [
"https://example.com/page1",
"https://example.com/page2"
"https://example.com/fileName.pdf"
// More URLs
]
}

Parameters

  • type: Must be set to webLinks.
  • crawlDelay (Optional): Sets the delay between crawls. Default is 2 seconds, which is also the minimum value.
  • forceMode (Optional): Options are auto, sync, or async. Default is auto. In most cases (99%), it should be set to auto.
  • lang (Optional): Example ['en'] 2 letter language code, array. default is [] all languages. Sometimes you want to filter out all other languages but 1

Python Helper Code to Generate JSON from CSV

The following Python script converts a CSV file containing URLs into the required JSON format for the custom sitemap.

import csv
import json

# Path to CSV file
csv_file_path = "path/to/your/csvfile.csv"

# Read CSV file and convert to JSON format
json_data = {"type": "webLinks", "crawlDelay": 2, "forceMode": "auto", "urls": []}

with open(csv_file_path, mode='r', encoding='utf-8-sig') as file:
csv_reader = csv.DictReader(file)
for row in csv_reader:
url = row["URL"]
if "uas" in url: # Filter condition
json_data["urls"].append(url)

# Save JSON data to a file
json_file_path = "path/to/your/jsonfile.json"
with open(json_file_path, 'w', encoding='utf-8') as jsonfile:
json.dump(json_data, jsonfile, ensure_ascii=False, indent=2)

print("JSON file saved at:", json_file_path)

Note:

  • Ensure the CSV file has a column named "URL" containing the URLs.
  • Adjust the csv_file_path and json_file_path as needed.
  • The filter condition in the script (if "uas" in url) should be modified according to your specific requirements.