Function Details

Semantic Density Clustering

Semantic Density Clustering groups similar items together based on their meaning and content, rather than just numerical values. It works by converting all data into text embeddings and then detecting areas where these semantic representations are concentrated.

Features

  • Semantic clustering based on meaning rather than just numerical similarity

  • Optional meaningful labels and descriptions for each cluster

  • Automatic detection of noise/outlier points

  • No need to specify number of clusters beforehand

Parameters

  • Min Samples: Minimum number of samples required to form a cluster (default: 3)

  • Epsilon: Maximum distance between points in a cluster (default: 0.5)

  • Label Types: Choose from:

  • Basic numbers (integers)

  • Short meaningful labels

  • Cluster descriptions

Output

The output format depends on the selected label types. You can choose to include:

- Numeric cluster labels (default)

- Human-readable short labels

- Detailed cluster descriptions


Example

Given a table with customer feedback:

[
  { "feedback": "The app keeps crashing when I open it", "platform": "iOS" },
  { "feedback": "Can't launch the application at all", "platform": "Android" },
  { "feedback": "Great user interface, very intuitive", "platform": "iOS" },
  { "feedback": "Love the new design, very clean", "platform": "Android" }
]

Output with all label types enabled:

[
  {
    "feedback": "The app keeps crashing when I open it",
    "platform": "iOS",
    "cluster": 0,
    "cluster_label": "App Crashes",
    "cluster_description": "Users reporting application crash and launch issues"
  },
  {
    "feedback": "Can't launch the application at all",
    "platform": "Android",
    "cluster": 0,
    "cluster_label": "App Crashes",
    "cluster_description": "Users reporting application crash and launch issues"
  },
  {
    "feedback": "Great user interface, very intuitive",
    "platform": "iOS",
    "cluster": 1,
    "cluster_label": "Positive UI Feedback",
    "cluster_description": "Users praising the application's interface and design"
  },
  {
    "feedback": "Love the new design, very clean",
    "platform": "Android",
    "cluster": 1,
    "cluster_label": "Positive UI Feedback",
    "cluster_description": "Users praising the application's interface and design"
  }
]

When to Use

  • When you want to group similar items based on their meaning rather than exact values

  • For analyzing text data or mixed data types

  • When you need human-readable explanations of the clusters

  • When you want to automatically identify outliers or noise in your data

  • When dealing with irregularly shaped or non-spherical clusters

  • When you don't know the number of clusters beforehand

Notes

  • Points that don't fit into any cluster are labeled as "Noise Points" (-1)

  • The algorithm automatically determines the number of clusters based on data density

  • All data is converted to text and processed semantically, making it suitable for mixed data types