首页 网络科学-哥伦比亚大学-NetSci-Fall2011-Lecture5

网络科学-哥伦比亚大学-NetSci-Fall2011-Lecture5

举报
开通vip

网络科学-哥伦比亚大学-NetSci-Fall2011-Lecture5 1 © 2011 Columbia University E6885 Network Science Lecture 5: Network Models E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University October 10th, 2011 © 2011 Columbia University2 E6885...

网络科学-哥伦比亚大学-NetSci-Fall2011-Lecture5
1 © 2011 Columbia University E6885 Network Science Lecture 5: Network Models E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University October 10th, 2011 © 2011 Columbia University2 E6885 Network Science – Lecture 7: Network Models -- II Course Structure Final Project Presentation 1412/19/11 Large-Scale Network Processing System 1312/12/11 Behavior Understanding and Cognitive Networks 1212/05/11 Privacy, Security, and Economy Issues in Networks 1111/28/11 Graphical Models1011/21/11 Final Project Proposal Presentation911/14/11 Info Diffusion in Networks810/31/11 Dynamic Networks710/24/11 Network Topology Inference610/17/11 Network Modeling510/10/11 Network Visualization, Sampling and Estimation410/03/11 Network Partitioning, Clustering, and Use Case309/26/11 Network Representations and Characteristics209/19/11 Overview – Social, Information, and Cognitive Network Analysis109/12/11 Topics Covered Class Number Class Date 2 © 2011 Columbia University3 E6885 Network Science – Lecture 7: Network Models -- II Random Graph Models © 2011 Columbia University4 E6885 Network Science – Lecture 7: Network Models -- II Other Network Graph Models -- overview � Modeling of random network graphs. { }( ), :G Gθ θΡ ∈Γ ∈Θ Γ Θ Pθ : probability distribution on Γ : a collection of possible graphs : a collection of parameters 3 © 2011 Columbia University5 E6885 Network Science – Lecture 7: Network Models -- II Usage of network graph models � In practice, network graph models are used for a variety of purposes. � Study of proposed mechanisms for the emergence of certain commonly observed properties in real-world networks � Or, the testing for significance of a pre-defined characteristics in a given network graph. © 2011 Columbia University6 E6885 Network Science – Lecture 7: Network Models -- II Random Graph Models � The term ‘Random Graph Model’ typically is used to refer to a model specifying a collection and a uniform probability over . � Random graph models are arguably the most well-developed class of network graph models, due to: – comparatively simpler nature of these models –This nature allows for the precise analytical characterization of many of the structural summary measures (in Chapter 4). Γ ΓP( )⋅ 4 © 2011 Columbia University7 E6885 Network Science – Lecture 7: Network Models -- II Model-based estimation vs. Design-based Estimation � Model-Based Estimation vs. Design-based Estimations in Network Graphs. –Design-based: inference is based entirely on the random mechanism by which a subset of elements were selected from the population to create the sample. –Model-based: a model is given by the analyst that specifies a relationship between the sample and the population. � In recent decades, the distinction between these two approaches has become more blurred. � Consider the task of estimating a given characteristic of a network graph G, based on a sampled version of that graph, G*. � In Chapter 5, we used ‘design-based’ perspective. � If we augment this perspective to include model-based component, then G is assumed to be randomly from a collection and inference needs to consider it. ( )Gη © 2011 Columbia University8 E6885 Network Science – Lecture 7: Network Models -- II Example – Assessing Significance in Network Graphs � Suppose we have a graph derived from observations: � We are interested in accessing whether the value is ‘significant’, in the sense of being somehow unusual or unexpected. � Formally, a random graph model is used to create a reference distribution which, under the accompanying assumption of uniform likelihood of elements in Γ , takes the form: � If is found to be sufficiently unlikely under this distribution, this is taken as evidence against the hypothesis that is a uniform draw from Γ . � How best to choose Γ is a practical issue of some importance. obs G ( )obsGη , ( ) #{ : ( ) } | | t G G t Pη η Γ ∈Γ ≤ = Γ ( )obsGη obs G 5 © 2011 Columbia University9 E6885 Network Science – Lecture 7: Network Models -- II Classical Random Graph Models � Erdos and Renyi models (1959): – A simple model that places equal probability on all graphs of a given order and size. � A collection of all graphs with and . � Assign probability to each , where is the total number of distinct vertex pairs. � The key contribution of Erdos and Renyi was to develop a foundation of formal probabilistic results concerning the characteristics of graphs G drawn randomly from ,v eN N Γ ( , )G V E= | | vV N= | | eE N= 1 ( ) e N P G N −   =     2 vN N   =    ,v eN N G∈Γ ,v eN N Γ © 2011 Columbia University10 E6885 Network Science – Lecture 7: Network Models -- II Classical Random Graph Models � Gilber Model (1959): � A collection of all graphs with . � Assign edge independently to each pair of distinct vertices with probability � When p is an appropriately defined function of and , these two classes of models are essentially equivalent for large . ,vN p Γ ( , )G V E= | | vV N= (0,1)p∈ vN e vN e N⋅∼ vN 6 © 2011 Columbia University11 E6885 Network Science – Lecture 7: Network Models -- II Example � Let � Then, if c>1, with high probability G will have a single connected component consisting of vertices, for some constant depending on c, with the remaining components having only on the order of O(logNv) vertices. � If c<1, then all components will have on the order of O(logNv) vertices, with high probability G will consist entirely of a large number of very small, separate components. v c p N = c vNα cα © 2011 Columbia University12 E6885 Network Science – Lecture 7: Network Models -- II Classic random graphs have distributions that are concentrated � Classical random graphs have distributions that are concentrated, with exponentially decaying tails. (1 ) ( ) (1 ) ! ! d c d c d c e c e f G d d ε ε − − − ≤ ≤ + ( )df G : the proportion of vertices with degree d. v c p N = � For large Nv, G will have a degree distribution that is like a Poisson distribution with mean c. 7 © 2011 Columbia University13 E6885 Network Science – Lecture 7: Network Models -- II Are Classical Random Graphs Practical? � Classical random graphs do not have the broad degree distribution observed in many large-scale real-world network. � They do not display much clustering. � On the other hand, these graphs do possess the smal-world property. The diameter can be shown to vary like O(log Nv). © 2011 Columbia University14 E6885 Network Science – Lecture 7: Network Models -- II Comparing Random Graph Models with Small World Graphs � Small World graphs start from the lattice structure, which can be shown a high level of clustering. The clustering coefficient is roughly ¾ for r large. This model begin with a set of Nv vertices, arranged in a periodic fashion, and join each vertex to r of its neighbors to each side. � Add a few randomly rewired edges. 8 © 2011 Columbia University15 E6885 Network Science – Lecture 7: Network Models -- II Network Growth Models � Network grows over time � Preferential Attachment Models: –‘The rich get richer’ principle. –Simon (1955) proposed a class of models that produced such broad, skewed distributions. –Price (1965) took this idea and applied it in creating a model for the manner in which networks of citations for document sin the literature grow. –Barabasi and Albert’s model (1999) – a network growth model for undirected graphs. © 2011 Columbia University16 E6885 Network Science – Lecture 7: Network Models -- II Some examples of Degree Distribution � (a) scientist collaboration: biologists (circle) physicists (square), (b) collaboration of move actors, (d) network of directors of Fortune 1000 companies 9 © 2011 Columbia University17 E6885 Network Science – Lecture 7: Network Models -- II Power-Law Model � Barabasi-Albert model: –Start with an initial graph of vertices and edges. –At Stage t=1,2,…, the current graph is modified to create a new graph by adding a new vertex of degree , where the m new edges are attached to m different vertices in , and the probability that the new vertex will be connected to a given vertex v is given by –At each stage, m existing vertices are connected to a new vertex in a manner preferential to those with higher degrees. –After t iterations, the resulting graph G will have vertices and edges. –In the time as t tends to infinity, the graph G have degree distributions that tend to a power-law form , with . (0)G (0)vN (0) eN ( 1)tG − ( )tG 1m≥ ( 1)tG − '' v vv V d d ∈∑ ( ) (0)t v vN N t= + ( ) (0)t e eN N tm= + d α− 3α = © 2011 Columbia University18 E6885 Network Science – Lecture 7: Network Models -- II Copying Models � More common in biochemical networks, rather the WWW. � Gene duplication is at the heart of nature’s observed tendency of ‘re-use’ biological information in evolving the genomes of living orgamisms. � Chung et. al. (2003): – Beginning with an initial graph . – Graphs are constructed from their immediate predecessors, , by the addition of a new vertex, say v, that is connected to some randomly chosen subset of neighbors of a randomly chosen existing vertex, say u. – A vertex u is chosen from uniformly at random, and then the new vertex v is joined with each of the neighbors of u independently with probability p. – The degree distribution will tend to a power-law form, with exponent α satisfying the equation – When p=1, each new vertex is connected to by fully duplicating the edges of the randomly selected vertex u. (0)G ( )tG ( 1)tG − 1( 1) 1p pαα −− = − ( 1)tG − ( 1)tG − 10 © 2011 Columbia University19 E6885 Network Science – Lecture 7: Network Models -- II Fitting Network Growth Models � Predicting – making informal comparisons between certain characteristics of an observed network and the graph resulting from such models. � Example Wiuf duplication-attachment models – calculating a univariate likelihood function for a network (e.g., interactions among 2,368 proteins). � However, there are a number of open issues, such as the methodology to be scaled up effectively to more complicated contexts, such as involving multivariate parameters, larger networks, more realistic network growth models, etc. © 2011 Columbia University20 E6885 Network Science – Lecture 7: Network Models -- II Exponential Random Graph Models � Robins and Morris: “A good statistical network graph model needs to be both estimable from data and a reasonable representation of that data, to be theoretically plausible about the type of effects that might have produced the network, and to be amenable to examining which competing effects might be the best explanation of the data.” � A potential set of such models are the “Exponential Random Graph Models” – ERGM models. 11 © 2011 Columbia University21 E6885 Network Science – Lecture 7: Network Models -- II Dynamic Probabilistic Complex Network Model © 2011 Columbia University22 E6885 Network Science – Lecture 7: Network Models -- II The Most Difficult Challenge: State-of-the-Arts? � Social Networks in sociological and statistic fields: focus on (1) overall network characteristics, (2) dynamic random graphs, (3) binary edges, etc. � Not consider probabilistic nodes/edges or individual nodes/edges. � Epidemic Networks & Computer Virus Network: focus on (1) overall network characteristics – when will an outbreak occurs, (2) regular / random graphs. � Not focus on individual nodes/edges. � (Computer) Communication Networks: focus on (1) packet transmission – information is not duplicated, or (2) broadcasting – not considering individual nodes/edges or complex network topology. � WWW: focus on (1) topology description, (2) binary edges and ranked nodes (e.g., Google PageRank) � Not consider probabilistic edges ���� Our Objectives: Find important people, community structures, or information flow in a network, which is dynamic, probabilistic and complex, in order allocate resources in a large-scale mining system. 12 © 2011 Columbia University23 E6885 Network Science – Lecture 7: Network Models -- II What is a Dynamic Probabilistic Complex Network? � Example: http://smallblue.research.ibm.com http://smallblue.research.ibm.com/publications/netsci2007.pdf © 2011 Columbia University24 E6885 Network Science – Lecture 7: Network Models -- II Modeling a Dynamic Probabilistic Complex Network � [Assumption] A DPCN can be represented by a Dynamic Transition Matrix P(t), a Dynamic Vertex Status Random Vector Q(t), and two dependency functions fM and gM. , 1 , 2 , Pr( ( ) ) Pr( ( ) ) ( ) , Pr( ( ) ) E i j i j i j y t SE y t SE t y t SEΩ =   =     =   i,jp ≜ ⋮ where ( )ix t : the status value of vertex i at time t. and 1 2 Pr( ( ) ) Pr( ( ) ) ( ) , Pr( ( ) ) V i i i x t SV x t SV t x t SVΩ =   =     =   iq ≜ ⋮ Pr( ( ) ) 1, V ix t SVω ω∈Ω = =∑ , ( )i jy t : the status value of edge i →j at time t. ,Pr( ( ) ) 1, E i jy t SEω ω∈Ω = =∑ where ( ) ( ) ( ) ( ) ( ) ( ) ( ) , ( ) ( ) ( ) t t t t t t t t t t                1,1 2,1 N,1 1,2 2,2 N,2 1,N 2,N N,N p p p p p p P p p p ⋯ ⋯ ⋮ ⋮ ⋱ ⋮≜ ⋮ ⋮ ⋱ ⋮ ⋯ ⋯ ( ) ( ) ( ) , ( ) t t t t                1 2 N q q Q q ≜ ⋮ ⋮ ( ) ( ( ), ( )),Mt t f t tδ+P Q P≜ ( ) ( ( ), ( ), ( )),M t t g t t t t δ δ + + Q P Q P≜ and 13 © 2011 Columbia University25 E6885 Network Science – Lecture 7: Network Models -- II Modeling a Dynamic Probabilistic Complex Network – cont’d � Also the Network Topology should follow the characteristics of complex network: Network topology follows power-law: , , 1, ,Pr( ( ) ) 0 u( ) 0, i j i j if t y t null p else ∃ ≠ > =   ,Pr( u( ) ) d i j i p l S l−= ⋅∑ ∼ and the clustering coefficient C is typically > 0.2. where d is typically in the range of 2 ~ 2.5. , , ,Pr(u( ) 1 | u( ) 1,u( ) 1)j k i j i kC p p p= = = = © 2011 Columbia University26 E6885 Network Science – Lecture 7: Network Models -- II Modeling a Binary DPCN of binary nodes and edges � A Binary DPCN can be represented by a Dynamic Transition Matrix P(t), a Dynamic Vertex Status Random Vector Q(t), and two dependency functions fM and gM. , ( ) Pr( ( ) 1),i j i jp t edge t−> =≜where ( )ix t : the status value of vertex i at time t. and 1,1 2,1 ,1 1,2 2,2 ,2 1, 2, , ( ) ( ) ( ) ( ) ( ) ( ) ( ) , ( ) ( ) ( ) N N N N N N p t p t p t p t p t p t t p t p t p t                P ⋯ ⋯ ⋮ ⋮ ⋱ ⋮≜ ⋮ ⋮ ⋱ ⋮ ⋯ ⋯ 1 2 ( ) ( ) ( ) , ( )N q t q t t q t                Q ≜ ⋮ ⋮ ( ) ( ( ), ( )),M t t f t t δ+P Q P≜ ( ) ( ( ), ( ), ( )),M t t g t t t t δ δ + + Q P Q P≜ and ( ) Pr( ( ) 1),i iq t x t =≜ Pr( ( ) 1) Pr( ( ) 0) 1,i j i jedge t edge t−> −>= + = = Pr( ( ) 1) Pr( ( ) 0) 1,i ix t x t= + = = 14 © 2011 Columbia University27 E6885 Network Science – Lecture 7: Network Models -- II Markov Model is a special case of Binary DPCN � Markov Model where ( )ix t : the status value of vertex i at time t. and 1,1 2,1 ,1 1,2 2,2 ,2 1, 2, , , N N N N N N p p p p p p p p p                P ⋯ ⋯ ⋮ ⋮ ⋱ ⋮≜ ⋮ ⋮ ⋱ ⋮ ⋯ ⋯ 1 2 ( ) ( ) ( ) , ( )N q t q t t q t                Q ≜ ⋮ ⋮ ( ) ( , ( )) ( ) t t g t t δ+ = ⋅ Q P Q P Q ≜ ( ) Pr( ( ) 1)i iq t x t= =, 1 1 N i j j p = =∑ © 2011 Columbia University28 E6885 Network Science – Lecture 7: Network Models -- II Many Prior Researches are based on Markov Models � Random Walks: (2) ( )lim ( ) 1 (1 (0)).*(1 (0)).* .*(1 (0))V E V E V E V t Q t P Q P Q P Q∞ →∞ = − − − −⋯ 15 © 2011 Columbia University29 E6885 Network Science – Lecture 7: Network Models -- II Markov Model is not appropriate to model information flow � Random Walks assume the existence of a token � unique existence. � However, information can be duplicated at nodes. � New models are needed. © 2011 Columbia University30 E6885 Network Science – Lecture 7: Network Models -- II Outline � Complex Network: Characteristics and Examples � Dynamic Probabilistic Complex Network � Information Flow in Dynamic Probabilistic Complex Network � Summary and Conclusion 16 © 2011 Columbia University31 E6885 Network Science – Lecture 7: Network Models -- II Information Flow in Dynamic Probabilistic Complex Network (Let’s call it: Behavioral Information Flow (BIF) Model) � [Assumption] Edge can be represented by a four-state S-D-A-R (Susceptible-Dormant-Active- Removed) Markov Model. Nodes can be represented by three states S-A-I (Susceptible-Active- Informed) Model. , , , , , , , , Pr( ( ) ) Pr( ( ) ) ( ) , Pr( ( ) ) Pr( ( ) ) i j i j i j i j i j i j i j i j y t S y t D t y t A y t R σ ψ µ ρ =       =   =    =     =       i,jp ≜ where ( ) ( ) ( ) ( ) ( ) ( ) ( ) , ( ) ( ) ( ) t t t t t t t t t t                1,1 2,1 N,1 1,2 2,2 N,2 1,N 2,N N,N p p p p p p P p p p ⋯ ⋯ ⋮ ⋮ ⋱ ⋮≜ ⋮ ⋮ ⋱ ⋮ ⋯ ⋯ ( ) ( ) ( ) , ( ) t t t t                1 2 N q q Q q ≜ ⋮ ⋮ ( ) ( , ( ), ( )), t t f t t δ+P M Q P≜ ( ) ( ( ), ( ), ( )), t t g t t t t δ δ + + Q P Q P≜ and Pr( ( ) ) ( ) Pr( ( ) ) , Pr( ( ) ) i i i i i i x t S t x t A x t I λ η ν =       = =       =    iq ≜ , , , , 1i j i j i j i jσ ψ µ ρ+ + + = 1i i iλ η ν+ + = © 2011 Columbia University32 E6885 Network Science – Lecture 7: Network Models -- II Major Difference between BIF and Prior Modeling Methods in Epidemic Research and Computer Virus Fields � Model Human Nodes as S-I-R (Susceptible, Infected, and Removed). � Did not consider individual node’s behavior distinctly in network structure/topology � did not consider edge status. � We propose to model edge status as (autonomous) S-D-A-R Markov Model (Susceptible, Dormant, Active, Removed) � We propose to model human node behavior as S-A-I (Susceptible, Active, and Informed). 17 © 2011 Columbia University33 E6885 Network Science – Lecture 7: Network Models -- II Edges are Markov State Machines, Nodes are not � State transitions of edges: S-D-A-R model. (Susceptible, Dormant, Active, and Removed) This indicates the time-aspect changes of the state of edges. S A RD 1 α− trigger α β γ 1 β− 1 γ− 1 � States of nodes: S-A-I model. (Susceptible, Active, and Informed) Trigger occurs when the start node of the edge changes from state S to state I : Node view Network view Edge view S I trigger A © 2011 Columbia University34 E6885 Network Science – Lecture 7: Network Models -- II Edge State Probability and Network Configuration Model � Nodes and Edges ( ) ( , ( ), ( )),t t f t tδ+ =P M Q P 1,1 1,1 1,1 2,1 2,1 2,1 ,1 ,1 ,1 1,2 1,2 1,2 2,2 2,2 2,2 ,2 ,2 ,2 1, 1, 1, 2, 2, 2, , , , ( , , ) ( , , ) ( , , ) ( , , ) ( , , ) ( , , ) , ( , , ) ( , , ) ( , , ) N N N N N N N N N N N N N N N N N N α β γ α β γ α β γ α β γ α β γ α β γ α β γ α β γ α β γ                M ⋯ ⋯ ⋮ ⋮ ⋱ ⋮≜ ⋮ ⋮ ⋱ ⋮ ⋯ ⋯ � αi,j = 0 � No Edge between i and j � Our KDD 2005 paper is a special case that αi,j =1 or 0, and did not mod
本文档为【网络科学-哥伦比亚大学-NetSci-Fall2011-Lecture5】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_105753
暂无简介~
格式:pdf
大小:840KB
软件:PDF阅读器
页数:27
分类:
上传时间:2013-03-20
浏览量:20