To make it easier to understand, we use the following function to express it:
pass by
reach
It can also be mapped to the following bipartite graph.
Well, after importing the graphic data first, we can do the next analysis.
We already have a simple graphic model. Next, starting from the bipartite graph network (two node types) of heroes and cartoons, according to the number of times two heroes appear in the same cartoon, a bipartite graph (only one node type) is derived.
Next, we will do some analysis on the simplex derived before. Regarding data analysis, my usual habit is to do some overall statistics first, have a general perceptual understanding of the chart, and then study the details in depth.
Let's first look at the distribution of heroes with similar weights. The weight value refers to the total number of times two heroes appear together in the same cartoon.
When I first saw this query statement, I found? (k.weight / 10) * 10? When such a clause appears, you will definitely think that this is a very stupid statement. But if you understand the calculation rules of Secondary (two integers are still integers when divided), you will understand that we write this way to complete the function of a "bucket" function, that is, to allocate all the weights to a bucket that is a multiple of 10. This makes it easy to understand the following results.
? As can be seen from the results, among the 17 1644 relationships in Marvel Hero Network, the weight of 162489 relationships (accounting for 94% of the total integral relationship data) is below 10, that is to say, most heroes only meet once.
The maximum weight is 724, which appears in THING/BENJAMIN J. GR (Stone Man) and Torch/Johnny's (Thunderbolt Fire). These two are really good friends.
Although everyone knows each other in the social network of Marvel heroes, it can be seen from the weight that most of them are very weak links. I boldly made two assumptions:
To test my hypothesis, I will first try to use the following query statement.
Some average indices are shown above, but I personally prefer to look at the distribution, just like the "bucket" function I used before:
It seems that my hypothesis is still valid. 8999(7 1%) heroes appear in comics less than 10 times. Coupled with the previous inference of 7.5 heroes per episode, we can know that there may be only 5 heroes or less in most comics. Some cartoons will have a "family gathering", when there will be more than 30 heroes. There is a cartoon called "COCI" with 1 10 heroes, which should be the problem of the superhero conference.
We can use the maximum or minimum method to standardize the weight values. Notice what we used this time? (toFloat(k 1 . weight)-min)/(max-min)? First, k 1.weight will be converted into floating-point type, so that floating-point type is still floating-point type if it is shaped. It won't be put in the front bucket.