知識圖譜如火如荼,首先推薦在python下進行社交網絡分析networkx
建立圖網絡
無向圖
import networkx as nxG = nx.Graph() #建立一個空的無向圖GG.add_node(1) #添加一個節點1G.add_edge(2,3) #添加一條邊2-3(隱含著添加了兩個節點2、3)G.add_edge(3,2) #對於無向圖,邊3-2與邊2-3被認為是一條邊print (G.nodes()) #輸出全部的節點: [1, 2, 3]print (G.edges()) #輸出全部的邊:[(2, 3)]print (G.number_of_edges()) #輸出邊的數量:1
[1, 2, 3][(2, 3)]1
有向圖
將G = nx.Graph() 改為 G = nx.DiGraph()即進行有向圖,表示不同的邊
import networkx as nxG = nx.DiGraph() #建立一個空的無向圖GG.add_node(1) #添加一個節點1G.add_edge(2,3) #添加一條邊2-3(隱含著添加了兩個節點2、3)G.add_edge(3,2) #對於無向圖,邊3-2與邊2-3被認為是一條邊print (G.nodes()) #輸出全部的節點: [1, 2, 3]print (G.edges()) #輸出全部的邊:[(2, 3), (3, 2)]print (G.number_of_edges()) #輸出邊的數量:
[1, 2, 3][(2, 3), (3, 2)]2
同時,有向圖和無向圖是可以相互轉化的,分別用到Graph.to_undirected() 和 Graph.to_directed()兩個方法。
帶權圖
有向圖和無向圖都可以給邊賦予權重,用到的方法是add_weighted_edges_from,它接受1個或多個三元組[u,v,w]作為參數,其中u是起點,v是終點,w是權重。例如:
G.add_weighted_edges_from([(0,1,3.0),(1,2,7.5)])
添加0-1和1-2兩條邊,權重分別是3.0和7.5。
如果想讀取權重,可以使用get_edge_data方法,它接受兩個參數u和v,即邊的起訖點。例如:
import networkx as nx
G = nx.DiGraph()
G.add_node(1)
G.add_edge(2,3)
G.add_edge(3,2)
G.add_weighted_edges_from([(0,1,3.0),(1,2,7.5)])
print (G.nodes())
print (G.edges())
print (G.number_of_edges())
print (G.get_edge_data(1,2))
[1, 2, 3, 0]
[(1, 2), (2, 3), (3, 2), (0, 1)]
4
{‘weight’: 7.5}
NetworkX提供了常用的圖論經典算法,例如DFS、BFS、最短路、最小生成樹、最大流等等
#調用多源最短路徑算法,計算圖G所有節點間的最短路徑
path=dict(nx.all_pairs_shortest_path(G))print (path[0][2])
[0, 1, 2]
一個完整官方小案例
import networkx as nx G = nx.Graph()G.add_edge('A', 'B', weight=4) G.add_edge('B', 'D', weight=2) G.add_edge('A', 'C', weight=3) G.add_edge('C', 'D', weight=4)nx.shortest_path(G, 'A', 'D', weight='weight')
[‘A’, ‘B’, ‘D’]
圖數據的保存與繪圖
import networkx as nximport matplotlib.pyplot as pltG = nx.DiGraph()G.add_node(1)G.add_edge(2,3)G.add_edge(3,2)nx.write_edgelist(G, path="grid.edgelist", delimiter=":")# write edgelist to grid.edgelistnx.write_edgelist(G, path="grid.edgelist", delimiter=":")# read edgelist from grid.edgelistH = nx.read_edgelist(path="grid.edgelist", delimiter=":")
nx.draw(H)plt.show()
數據庫基本統計
!usr/bin/env python
* coding:utf-8 *
import matplotlib.pyplot as plt
from networkx import nx
G = nx.lollipop_graph(4, 6)
pathlengths = []
print(“source vertex {target:length, }”)
for v in G.nodes():
spl = dict(nx.single_source_shortest_path_length(G, v))
print(‘{} {} ‘.format(v, spl))
for p in spl:
pathlengths.append(spl[p])
print(”)
print(“average shortest path length %s” % (sum(pathlengths) / len(pathlengths)))
histogram of path lengths
dist = {}for p in pathlengths: if p in dist: dist[p] += 1 else: dist[p] = 1print('')print("length #paths")verts = dist.keys()for d in sorted(verts): print('%s %d' % (d, dist[d]))print("radius: %d" % nx.radius(G))print("diameter: %d" % nx.diameter(G))print("eccentricity: %s" % nx.eccentricity(G))print("center: %s" % nx.center(G))print("periphery: %s" % nx.periphery(G))print("density: %s" % nx.density(G))
nx.draw(G, with_labels=True)plt.show()
source vertex {target:length, }
0 {0: 0, 1: 1, 2: 1, 3: 1, 4: 2, 5: 3, 6: 4, 7: 5, 8: 6, 9: 7}
1 {1: 0, 0: 1, 2: 1, 3: 1, 4: 2, 5: 3, 6: 4, 7: 5, 8: 6, 9: 7}
2 {2: 0, 0: 1, 1: 1, 3: 1, 4: 2, 5: 3, 6: 4, 7: 5, 8: 6, 9: 7}
3 {3: 0, 0: 1, 1: 1, 2: 1, 4: 1, 5: 2, 6: 3, 7: 4, 8: 5, 9: 6}
4 {4: 0, 5: 1, 3: 1, 6: 2, 0: 2, 1: 2, 2: 2, 7: 3, 8: 4, 9: 5}
5 {5: 0, 4: 1, 6: 1, 3: 2, 7: 2, 0: 3, 1: 3, 2: 3, 8: 3, 9: 4}
6 {6: 0, 5: 1, 7: 1, 4: 2, 8: 2, 3: 3, 9: 3, 0: 4, 1: 4, 2: 4}
7 {7: 0, 6: 1, 8: 1, 5: 2, 9: 2, 4: 3, 3: 4, 0: 5, 1: 5, 2: 5}
8 {8: 0, 7: 1, 9: 1, 6: 2, 5: 3, 4: 4, 3: 5, 0: 6, 1: 6, 2: 6}
9 {9: 0, 8: 1, 7: 2, 6: 3, 5: 4, 4: 5, 3: 6, 0: 7, 1: 7, 2: 7}
average shortest path length 2.86
length #paths
0 10
1 24
2 16
3 14
4 12
5 10
6 8
7 6
radius: 4
diameter: 7
eccentricity: {0: 7, 1: 7, 2: 7, 3: 6, 4: 5, 5: 4, 6: 4, 7: 5, 8: 6, 9: 7}
center: [5, 6]
periphery: [0, 1, 2, 9]
density: 0.26666666666666666
閱讀更多 Farmer001 的文章