Content Hash List API (v2.0.0)
The Content Hash Fingerprinting API provides robust access to a comprehensive database of content fingerprints designed for advanced content moderation systems. The API leverages multiple hashing algorithms including MD5, SHA256, SHA512, and PDQ to ensure maximum coverage and accuracy in content identification.
Key features and use cases
The TCAP Archive is a repository of known terrorist or violent extremist content (TVEC) media files, including images, videos and documents. The TCAP Archive Hash List API allows platforms to ingest the hashes produced from these media files in bulk, so they can use them in their content moderation processes.
The TCAP Archive’s hashes are distinct from any existing TVEC hash lists, as they complement Tech Against Terrorism’s proactive monitoring of terrorist internet usage by its team of open-source intelligence specialists. By leveraging this expertise, alongside a suite of automated monitoring capabilities, the TCAP Archive hash list reflects content created and uploaded over a number of years by a range of violent Islamist and violent far-right terrorist entities.
Authentication
In order to use any of the Hash List endpoints you will need to be an on-boarded Hash List TCAP user. These endpoints sit inside the main TCAP backend so you will be able to access all with a standard TCAP JWT token.
To obtain a token make a request to the TCAP authentication endpoint with your username and password.
POST https://beta.terrorismanalytics.org/token-auth/tcap/
{
username: YOUR_TCAP_USERNAME,
password: YOUR_TCAP_PASSWORD,
}
Response
The Authentication endpoint returns the following data on each request:
token
:String
Token to be used to on each request you make the the API as aBearer
tokenuser
:User
Your system user information
Hash List by Ideology
GET /hash-list/v2/all
This Hash List API endpoint retrieves a hash list file filtered by a specified ideology.
Parameters
ideology
: (Type: String, Required: Yes): Specifies the ideology to filter the results.
GET /api/hash-list/:ideology <'islamist' | ' far-right ' | 'all' >
Response
The Hash List endpoint returns the following data on each request:
count
:Integer
Total number of hash records availablenext
:String
URL of the next page resultsprevious
:String
URL of the previous page results (null if first page)checkpoint
:String
Timestamp-based checkpoint identifier for synchronizationresults
:Array
Array of Hash objects
Hash Object Fields
hash_digest
:String
The computed hash valuealgorithm
:"MD5" | "SHA256" | "SHA512" | "PDQ"
The algorithm used to generate the hashideology
:'islamist' | 'far-right' | 'all'
Content classification categoryfile_type
:'String'
Source file formatdeleted
:'Boolean'
If the file has been removed from the systemupdated_on
:'Float'
Unix timestamp of the last update
Pagination
The API implements cursor-based pagination using timestamp and ID pairs. Results can be traversed using the next
and previous
URLs provided in the response.
Query Parameters
limit
: Number of results per page (default: 1000)offset
: Starting position for paginationorder
: Sort order for results (asc/desc)after
: Cursor value for pagination (format: timestamp,id)
Implementation Notes
Multiple hashing algorithms provide redundancy and enhanced detection capabilities
Checkpoint field enables efficient delta updates for client-side caching
Each hash entry includes metadata for content categorization and tracking
Real-time updates reflected through
updated_on
timestampsDeleted flag allows for soft deletion while maintaining hash history
Example Response
{
"count": 19676,
"next": "http://beta.terrorismanalytics.org/hash-list/v2/all?<params>,
"previous": null,
"checkpoint": "1730213563.621023,29152",
"results": [
{
"hash_digest": "baf781254eb82811cdf3fe4751240eb8",
"algorithm": "MD5",
"ideology": "Far-right",
"file_type": "mp4",
"deleted": false,
"updated_on": 1730204429.302388,
"id": 28023
}
...
]
}
Best Practices
Implement local caching using the checkpoint mechanism
Process updates incrementally using the pagination system
Consider implementing parallel processing for multiple hash algorithms
Store hash values in their original format to maintain precision
Monitor the deleted flag for deprecated hash values
Usage With Metas Threat Exchange
If you want to use the Hash List through Meta's Threat Exchange you can create a collaboration configuration for our API, fetch and compare PDQ Image and MD5 video hashes.
Step 1 - Install threat exchange
$ pip install threatexchange
Step 2 - Configure the default credentials
$ threatexchange config api tat --credentials '<TCAP_USERNAME>' '<TCAP_PASSWORD>'
Step 3 - Set up config
$ threatexchange config collab edit tat --create 'TAT'
Step 4 - Fetch hashes with verbose logging
$ threatexchange -v fetch
Step 5 - View dataset
$ threatexchange dataset
Step 6 - Match a piece of content
$ threatexchange match ~/path/to/image.jpg