# Elements of Access: Clustering

When we have nodes or links with high Betweenness values, it is often because our network is split into various sub-groups that can be called clusters.  Clusters tend to have their own unique set of properties, so it is useful to be able to identify clusters quantitatively.

While there are a growing number of clustering algorithms, the basic idea behind them is to capture the degree to which nodes cluster.  The Clustering coefficient, for instance, represents how likely is it that two connected nodes are part of a larger group of highly connected nodes.  It can be calculated by dividing number of actual connections between the neighbors of a node (i.e. the nodes directly connected to the node in question) by the number of possible connections between these same neighboring nodes.  For instance in the image above, the red node is the node of interest, and it has a Degree of 4.  Those 4 neighboring nodes make 4 actual connections (i.e. the black lines in the figure on the right) but have 6 possible connections (i.e. the black lines plus the red dashed lines).  Thus, the Clustering coefficient for the red node is 4 divided by 6 or 0.67.

The value represented by the Clustering coefficient ranges from 0 (i.e. no clustering) to 1 (i.e. complete clustering).  If we are interested in the amount of clustering for an entire network, we average the Clustering coefficients for all of the nodes.  Clustering tends to be higher in real-world networks than in random networks.  So when a network becomes more centralized (i.e. a small percentage of nodes have high connectivity), the overall topology becomes more differentiated and clusters begin to emerge.

Other related terms include component and clique.  When a given sub-group of nodes is also highly connected, that is called a component.  When the nodes in a component have few connections to other nodes outside of the component, that is a clique.  Understanding clusters, components, and cliques in networks can be useful because they can hold more influence over behavior than overall network structure (Neal, 2013).  Imagine, for instance, a New Urbanist neighborhood with great street connectivity set into a city with poor overall street connectivity.  Analyzing network structure for the overall city might lead us to one conclusion; yet, we could find very different outcomes in the New Urbanist neighborhood.  While factors such as land use, street design, and demographics influence transportation-related outcomes as well, the concept of clustering holds value for those interested in truly understanding transportation networks.