Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Topological stability and textual differentiation in human interaction networks:

statistical analysis, visualization and linked data


Advisor: Prof. Dr. Osvaldo Novais de Oliveira Junior

Candidate: Renato Fabbri

Doctoral thesis defense in Computational Physics, May/08/2017

São Carlos Institute of Physics, University of São Paulo


Outline

1

Outline


How stable are the scale-free and other topological features in social networks? How does text and topology relate in social interaction networks?

These questions are important for us to characterize our social systems.

We relied in the literature and data mining to reach two main results presented in this work:

There are subsidiary results in dynamic graph visualization and linked data. They enabled and shaped the core analysis.

2

Introduction

3

Introduction

4

Introduction

There are \(10^{80}\) atoms in the observable universe, a scale reference. Consider \(N\) the number of individuals needed to yield more possible networks than atoms in the universe. Each edge is a Bernoulli variable: the edge may be present or not.


2^{N \choose 2} > 10^{80} \Rightarrow log_2[2^{N \choose 2}] > log_2(10^{80}) \Rightarrow {N \choose 2} > \frac{log_{10}(10^{80})}{log_{10}2} \Rightarrow \nonumber\\ \Rightarrow \frac{N.(N-1)}{2} > \frac{80}{log_{10}2} \Rightarrow N > 23.5988

I.e. only 24 vertices are needed for there to be more possible networks than atoms in the universe. This endorses the utility of paradigms for networks, and of generic measures for each vertex and for the network, instrumental for complex networks, including human interaction networks.

Complex System \(\Rightarrow\) consists of several parts whose interaction exhibits emerging behavior. It is usual to consider that a complex system: processes information, exhibits adaptive mechanisms, may have reproduction capabilities. A complex system is integrated with other complex systems and the environment in which it subsists.

5

Introduction


6

Materials


7

Methods


These are the methods considered for studying the topology of the systems:

The core method used to observe textual differences in the Erdös sectors is an adaptation of the Kolmogorov-Smirnov test.

For enabling the research, we had to use methods for:

Directional statistics (or spherical or circular statistics) are generic for observations in Riemannian manifolds and was used to observe the distribution of sent times from email messages.

8

Circular statistics

Consider each measure over time:

\theta=2\pi \frac{measure}{period}\\ z_i= e^{i\theta} \\ m_n=\frac{1}{N}\sum_{i=1}^N z_i^n \;\;\text{ are the moments}

The norm and angle of the moments:

R_n=|m_n| \nonumber \\ \theta_\mu=Arg(m_1) \\ \theta_\mu'=\frac{period}{2\pi} \theta_\mu \nonumber

The angle is measure of localization and the dispersion is given by:

Var(z)=1 - R_1 \nonumber\\ S(z)= \sqrt{-2\ln(R_1)}\\ \delta(z)=\frac{1-R_2}{2 R_1^2} \nonumber

We also used \(\frac{b_h}{b_l}\) between the largest \(b_h\) and the smallest \(b_l\) incidence in the histograms.

9

Interaction networks attainment


10

Erdös sectioning

\sum_{x=k_i}^{k_j} \widetilde{P}(x) < \sum_{x=k_i}^{k_j} P(x) \Rightarrow \text{i is intermediary}
P(k)=\binom{2(N-1)}{k}p_e^k(1-p_e)^{2(N-1)-k}

where

p_e=\frac{z}{N(N-1)}

11

PCA in topological metrics


Mean and standard deviation of measurements j in eigenvectors k through all l instances and of eigenvalues k through all l instances:

\begin{eqnarray} \mu_{V'}[j,k] &=\frac{\sum_l^L V'[j,k,l]}{L}\nonumber\\ \sigma_{V'}[j,k]&=\sqrt{\frac{(\mu_{V'}-V'[j,k,l])^2}{L}}\\\nonumber \mu_{D'}[k]&=\frac{\sum_l^L D'[k,l]}{L}\\\nonumber \sigma_{D'}[k]&=\sqrt{\frac{(\mu_{D'}-D'[k,l])^2}{L}} \end{eqnarray}

We used standard measures of (in, out, total-) degree, (in, out, total-) strength, betweenness centrality and clustering coefficient. We also used non-standard measures of asymmetry and disequilibrium.

12

Adaptation of the Kolmogorov-Smirnov two sample test

D_{n,n'} > c(\alpha)\sqrt{\frac{n+n'}{nn'}} \Rightarrow F_{1,n} \neq F_{2,n'}
c(\alpha) < \frac{D_{n,n'}}{\sqrt{\frac{n+n'}{nn'}}} = c'
α 0.1 0.05 0.025 0.01 0.005 0.001
c(α) 1.22 1.36 1.48 1.63 1.73 1.95

13

Audiovisualization of data


14

Linked data representations and ontologies


To enable our research start a social dataset which fits our needs, we developed:


We maintained online infrastructure to navigate and query the linked data for some months. USP cloud services started charging so we had to withdraw these services.

15

Typological and humanistic considerations


16

Results


17

Temporal and topological stability


18

Temporal and topological stability


19

Temporal and topological stability


20

Temporal and topological stability


21

Temporal and topological stability

22

Textual differentiation


23

Textual differentiation

24

Textual differentiation


25

Audiovisualization with Versinus


In Versinus (Latim versus+sinus meaning line+sinusoid), the Erdös sectors are positioned on the first and second half of the sinusoid and on the upper line. Vertex size corresponds to in and out strengths. Color corresponds to clustering coefficient. Music is synthesized using the total activity of the four most active hubs.

26

Linked social data



27

Art and sensory mappings


28

Software


Official Python packages (PyPI) for precise and efficient sharing of the developments:

29

Conclusions


30

Bibliography


31







Thank you!