Skip to content

Cluster Documents

POST
/api/discovery/enhanced-search/cluster-documents

Cluster documents into semantic groups using K-means or community detection.

Request Body:

  • num_clusters: Number of clusters to create (2-20, default: 5)
  • method: Clustering method (“kmeans” or “community”, default: “kmeans”)
  • model_name: Embedding model reported in the response (default: “multilingual-e5-large-instruct”)

Returns:

  • clusters: Dictionary mapping cluster IDs to document ID lists
  • cluster_stats: Statistics for each cluster:
    • document_count: Number of documents in cluster
    • documents: Preview of first 5 document IDs

Raises:

  • 404: No document embeddings found (reindex required)
  • 500: Clustering operation failed
DocumentClusteringRequest
object
numClusters
Numclusters

Number of clusters to create

integer
default: 5 >= 2 <= 20
method
Method

Clustering method

string
default: kmeans /^(kmeans|community)$/
modelName
Any of:
string

Successful Response

ClusterResult

Document clustering result response model.

Provides results from document clustering operations including cluster assignments and cluster statistics for semantic document grouping and organization analysis.

Fields:

  • clusters: Dictionary mapping cluster IDs to lists of document IDs assigned to each cluster
  • cluster_stats: Dictionary containing cluster statistics including:
    • document_count: Number of documents in cluster
    • documents: Preview list of document IDs in cluster

Usage: POST /api/discovery/enhanced-search/cluster-documents returns this response model.

JSON Example:

{
  "clusters": {
    "cluster_0": ["doc_1", "doc_2", "doc_3"],
    "cluster_1": ["doc_4", "doc_5"]
  },
  "clusterStats": {
    "cluster_0": {
      "documentCount": 3,
      "documents": ["doc_1", "doc_2", "doc_3"]
    },
    "cluster_1": {
      "documentCount": 2,
      "documents": ["doc_4", "doc_5"]
    }
  }
}
object
clusters
Clusters

Cluster assignments

object
key
additional properties
Array<string>
clusterStats
Clusterstats

Cluster statistics

object
key
additional properties
Any of:
string

Validation Error

HTTPValidationError
object
detail
Detail
Array<object>
ValidationError
object
loc
required
Location
Array
msg
required
Message
string
type
required
Error Type
string
input
Input
ctx
Context
object