At present, there are still some people who can't establish the thinking framework of data analysis. So today, the class teacher will teach you how to build it step by step. If the great god passes by, please make a detour. Of course, we can communicate. Small partners in need can refer to it.
Someone once asked me, what is data analysis thinking? If analytical thinking is a structural embodiment, then data analysis thinking adds another standard to it:
I don't think so, but the data proves it.
This is a watershed. "I think" is an intuitive and empirical thinking. It is impossible to rely on your intuition everywhere in your work, let alone the development of the company. Data proof is the most direct embodiment of data analysis, relying on data-oriented thinking rather than skills. The former is guidance, while the latter is only application.
As an individual, how should we establish data analysis thinking?
First, establish your index system.
Before talking about indicators, let's push the time forward for decades. Peter drucker, the father of modern management, said a classic sentence:
If you can't measure it, then you can't grow effectively.
The so-called measurement means that a unified standard is needed to define and evaluate the business. This standard is an indicator. Suppose Lao Wang opened a fruit shop next door. If you ask him how his business is every day, he can answer that it is selling well, very well, and it has been depressed recently. These are all empty words, because he thinks it sells well, maybe 50, and you think it sells well, but it sells 100.
This is the cognitive trap caused by "I want to". When you put the case in the company, you will encounter more problems: if an operator tells you that the product performance is very good, because many people evaluate and praise it every day, I will show you some screenshots. Another operator said that there were some problems with the products, and they didn't sell well. Who should you trust?
In fact, it is hard to believe that these different judgments are caused by the lack of data analysis thinking.
If Lao Wang wants to describe business, he should use sales volume, which is his indicator. Internet should describe products, and also use indicators such as activity rate, utilization rate and conversion rate.
If you can't describe the business with indicators, then you can't grow effectively.
Understanding and using indicators is the first step of data analysis thinking, and then you need to establish an indicator system. Isolated indicators can't play the value of data. Like analytical thinking, indicators can be structured and should be structured.
Let's take a look at Internet products. The user will go through these steps from beginning to end. E-commerce app or content platform is the same. Think about it, what indicators will you need to use?
The following figure explains what indexation is, which is the difference between data analysis and thinking, and is also a typical data operation. We can talk about this in depth when we have time.
There is no universal template for the standard system, and different business forms have different index systems. Mobile APP is different from websites, SaaS is different from e-commerce, and low-frequency consumption is different from high-frequency consumption. Like wedding-related apps, there is no need to consider the repurchase rate index; Internet finance must have risk control indicators; In e-commerce, the indicators of sellers and buyers are different.
These all need different industry experience and business knowledge to learn and master. Are there any general skills and precautions?
Second, make clear the good indicators and bad indicators.
Not all indicators are good. This is a common mistake made by beginners. Let's go back to Lao Wang's fruit shop and think about the sales.
Recently, prices have gone up. Lao Wang complied with the price increase of fruits, but he dared not go up. Although fruit sales have not changed much, Lao Wang found that he didn't earn much in a month, and private money was not enough.
Lao Wang sold 2000 kinds of fruits this month, and finally he lost money. After careful study, he found that although the sales volume was high, the fruit inventory was also high. Hundreds of units of fruit are unsalable every month, and finally expire at a loss.
Both of these examples can illustrate that it only depends on how unreliable the sales volume is. Sales volume is a measure, but it is not a good indicator. Self-employed Lao Wang should take the profit of fruit shop as the core element.
Good indicators should be the core driving indicators. Although indicators are important, some indicators need to be more important. Just like sales volume and profit, the number of users and the number of active users, the latter is more important than the former.
The core indicators are not just the numbers written in the weekly report, but the goals of the whole operation team, product team and even R&D team.
The relationship between the core driving indicators and the company's development is the key direction of the company at a stage. Remember that it is a stage, and the core driving indicators are different in different periods. The core driving indicators of different businesses are also different.
The common core indicators of Internet companies are the number of users and the active rate. The number of users represents the volume and possession of the market, and the activity rate represents the health of the product, but this is the core index in the development stage. During the period of product 1.0, we should pay attention to polishing the product to improve the product quality, and then promote it on a large scale. At this time, the retention rate is a core indicator. In the later stage of products with a certain user base, commercialization is more important than activity, and we will pay attention to money-related indicators, such as advertising click-through rate and profit rate.
Core driving indicators are generally the overall goals of the company. If you look at your job responsibilities, you can also find your own core indicators. For example, content operation can focus on the number of readings and the length of reading.
Core driving indicators will definitely bring the greatest advantages and benefits to companies and individuals, remember the 28 th rule? 20% indicators will definitely bring 80% effect, and this 20% indicator is the core.
On the other hand, a good indicator has another feature, which should be ratio or proportion.
Only indicate the number of active users. We have 65,438+million active users. What does this mean? That doesn't mean anything. If the product itself has tens of millions of registered users, then 65,438+million users mean that it is very unhealthy and the product is going downhill. If the product has only 400,000-500,000 users, it means that the product is very sticky.
It is precisely because the number of active users is meaningless that operations and products will pay more attention to the activity rate. This indicator is a ratio obtained by dividing the number of active users by the total number of users. So when setting indicators, we are all wondering whether it can be a ratio.
Know and look at things
This common way of asking questions includes how to evaluate a celebrity or historical event. How to treat a product? How do you understand something? How to treat or analyze a behavior or hotspot, etc.
Thinking or analyzing things is a lot of content in our previous thinking logic, that is, things themselves should be comprehensively analyzed in combination with external environment+time axis+core dimension, external interaction of things, internal structure and cohesion of things, and dynamic behavior characteristics of things themselves. These first analysis is clear, that is, to have a comprehensive and objective understanding of the thing itself.
One of the cores of this kind of thinking is dialectical thinking. I don't like to use the word critical thinking here. The I dimension of dialectical thinking shows that the focus of this kind of thinking is to be comprehensive and objective, to speak with data and to reduce subjective deviation. For this kind of problem, you don't have to show your subjective feelings. What's more important is to make the facts and reasons clear and well-founded.
After the real analysis is clear, the subsequent transition to the evolution of such problems, that is, how to evaluate or evaluate a thing, is still based on the analysis of objective data, but the data itself is not an evaluation or evaluation index, so when it comes to evaluation, it is natural to think of establishing or referring to an evaluation system. An emperor in history has various evaluation systems in politics, economy, diplomacy, military affairs and people's livelihood. A car may have a variety of evaluation systems such as power, comfort, fuel consumption and handling. A product itself has a variety of evaluation systems such as functional satisfaction, usability, performance and price. For any evaluation, we must first find a ready-made scientific evaluation system, and then map the data after the analysis of things to a specific evaluation system, that is, the conclusion of any evaluation index value must be supported by the internal data and operating mechanism of things themselves.
After thinking all these things clearly, that is, the focus of this kind of thinking is the decomposition and integration analysis of things, the analysis of things' behavior or activities, the analysis of things' internal and external environmental factors, the analysis of things' key attribute dimensions and the determination of evaluation system, and the analysis of mutual restriction and promotion of things' key index characteristics (similar to the positive and negative cycle in system thinking).
What are the bad indicators?
One is vanity index, which has no practical significance.
Does it make sense for the product to have hundreds of thousands of exposures in the app store? No, what I need is the actual download. Does downloading make sense? Not too big, I hope the user registered successfully. Exposure and downloading are vanity indicators, but the degree of vanity is different.
New media are pursuing the reading number of WeChat official account. If you advertise by reading numbers, then reading numbers is meaningful. If you sell goods by pictures and texts, you should pay more attention to the conversion rate and sales volume. After all, an exaggerated title can bring a high reading volume, and the reading volume at this time is a vanity index. Unfortunately, many bosses still pursue 10W+ tirelessly, even if it is brushing.
Vanity index is a meaningless index, often it will look good and can whitewash the performance of operations and products, but we should avoid using it.
The second bad indicator is a posteriori indicator, which can only reflect what has happened.
For example, I have a definition of lost users: if I don't open the APP for three months, I will be lost. Then the number of lost users counted by the operation every day has not been opened for a long time. In terms of timeliness, it has happened for a long time and it is difficult to recover through measures. I know I hurt users because of some bad operation methods, but is it still useful?
The ROI (Return on Investment) of activity operation is also a posterior index, and its benefits can only be known after an activity pays the cost. But the cost has been spent, and the quality of the activity is doomed. The activity cycle is long and there is room for adjustment. If the activity is short-term, this indicator can only be used as a resumption of trading, and it cannot drive business.
The third bad indicator is the complexity indicator, which traps data analysis in a trap caused by a bunch of indicators.
Indicators can be subdivided and decomposed. For example, the activity rate can be subdivided into daily activity rate, weekly activity rate, monthly activity rate and old user activity rate. Data analysis should choose indicators according to the specific situation. If it is a weather tool, you can choose the daily activity rate. If it is a social APP, you can choose the weekly activity rate, and the product with lower frequency is the monthly activity rate.
Each product has several indicators suitable for it. Don't put a bunch of indicators on it. When you prepare twenty or thirty indicators for analysis, you will find that there is no way to start.
Third, establish a correct index structure.
Since there are too many and complicated indicators, how should we choose the appropriate indicators?
Like the pyramid structure of analytical thinking, indicators also have an internal structure, which is tree-like. The core of index structure is business process-oriented and structure-oriented.
Suppose you are a content operator and need to analyze the existing business and improve the content-related data. What would you do?
We have transformed the pyramid thinking into a data analysis method.
Starting from the process of content operation, it is: content collection-content editing and publishing-user browsing-user clicking-user reading-user comment or forwarding-entering the next browsing.
This is a standard process, and each process has indicators that can be established. Content collection can establish a hotspot index to see which content is hot. User browsing and clicking are standard PV and UV statistics, and user reading is reading time.
From the perspective of process, the index framework can fully contain user-related data without omission.
The indicators listed in this framework still follow the principle of indicators: core driving indicators are needed. Get rid of vanity indicators, cut them appropriately, and don't add indicators for the sake of adding indicators.
Fourth, understand the dimensional analysis method.
With the indicators, you can start the analysis. Data analysis can be roughly divided into three categories:
Using dimensions to analyze data
Use statistical knowledge, such as data distribution hypothesis testing.
Using machine learning
Let's learn about dimensional analysis first.
Size is a parameter that describes an object. Concrete analysis can be considered as the perspective of analyzing things. Sales is an angle, activity rate is an angle, and time is also an angle, so they can all be counted as dimensions.
When we have dimensions, we can combine different dimensions to form a data model. The data model is not an abstruse concept, it is just a data cube.
The figure above is a data model/data cube composed of three dimensions. They are product type, time and region. We can not only get the sales of electronic products in Shanghai in the second quarter of 20 10, but also know the sales of books in Jiangsu in the first quarter of 20 10.
A data model organizes complex data in a structured form. All the indicators we talked about before can be used as dimensions. Here is an example:
Combined with the three dimensions of user type, activity and time, the use of products by different user groups was observed. Is it more obvious for Group A to use products for a long time?
Combined with the three dimensions of commodity type, order amount and region, observe whether there are sales differences between different commodities in different regions.
Data model can observe data from different angles and levels, which improves the flexibility of analysis and meets different analysis needs. This process is called OLAP (on-line analytical processing). Of course, it involves more complex data modeling and data warehouse, and we don't need to know it in detail.
There are several common techniques in data model, such as drilling, rolling up and slicing.
Select Yes to continue subdividing dimensions. For example, Zhejiang Province is subdivided into Hangzhou, Wenzhou and Ningbo. 20 10 becomes 65438+ 10 in the first quarter, February and March. Rollup is the opposite concept of drilling, which aggregates Zhejiang, Shanghai and Jiangsu dimensions into Zhejiang-Shanghai dimension. Slicing is to select a specific dimension, such as only selecting the Shanghai dimension or only selecting the first quarter dimension of 20 10. Because the data cube is multidimensional, we can only observe and compare two dimensions of data, that is, tables.
The tree structure in the above figure represents drilling (subdividing the source and time), and then the specific data are obtained through aerial slicing of the route.
Clever as you think, our commonly used pivot table is a kind of dimension analysis, which puts the dimensions to be analyzed into rows and columns for summation, counting and average calculation. Put a use case diagram: calculate the average salary in two dimensions: city and working years.
In addition to Excel, BI, R, Python, dimension analysis can be used. BI is relatively the easiest.
Speaking of dimension method, I want to emphasize a core thinking of analysis: contrast, contrast in different dimensions, which is probably one of the best shortcuts for newcomers to improve quickly. For example, the comparison of past and present time trends, such as the comparison of different geographical dimensions, such as the comparison of product types, such as the comparison of groups of different users. A single data has no analytical significance, and only multiple data combinations can give full play to the maximum value of data.
I want to analyze the company's profit, profit = sales-cost. Then find out the indicators/dimensions involved in sales, such as product type, region, user group, etc. And through continuous assembly and disassembly, find out the cause of the problem or good performance. So is the cost.
This is the correct thinking of data analysis. To sum up, we establish and screen indicators through business, and analyze them with dimensions.
Many people will ask, what is the difference between indicators and dimensions?
Dimension is the angle to explain and observe things, and index is the standard to measure data. Dimension is a wider range, not just data, such as time dimension and city dimension. We can't express it by indicators, but by indicators (retention rate, bounce rate, browsing time, etc. ) can become a dimension. Popular understanding: dimension >; Indicators.
At this point, everyone has a thinking framework for data analysis. It is a framework because it lacks specific skills, such as how to verify that a certain dimension is the key to affecting data, such as how to improve business with machine learning, which involves data and statistical knowledge, which will be explained later.
I want to emphasize here that data analysis is not a result, but a process. Remember the sentence "If you can't measure it, then you can't grow effectively"? The ultimate goal of data analysis is to increase business. If data analysis needs performance indicators, it is not the right or wrong of analysis, but the result of final data improvement.
Data analysis needs feedback. When I analyze the factors that affect the business results, I will verify it. Tell the operation and product personnel to see how the improved data is, and everything will be subject to the results. If the results have not improved, it is time to reflect on the analysis process.
This is also an element of data analysis, and the results are oriented. If the analysis is only made when a report is submitted, and there are no follow-up and improvement measures, then the data analysis is equal to zero.
Business guidance data, data-driven business. This is the only way.