Software ecosystem research


















Then, we verify it in the example study:. In equation 2 , represents the primary productivity of software ecosystems, and it is a linear function related to the number of users and the willingness to contribute.

In different natural ecosystems, the primary productivity of ecosystems will vary depending on the effective radiation ratio of vegetation to absorb photosynthesis. Similarly, in different software ecosystems, different programming languages, project lifetimes, project followers, etc. These different factors together constitute the. And the default minimum productivity of a participant is 1.

Thus, the value of is the number of participants when C is negative number and is less than absolute value. Similarly, according to the definition of secondary productivity, secondary productivity is transformed from primary productivity.

And it is also a linear function related to the number of users and willingness to contribute. This hypothesis can be expressed as. In equation 3 , is the secondary productivity of software ecosystem, is the productivity conversion rate, and is the constant. According to the primary productivity formula, the relationship model of secondary productivity can be transformed as follows:.

In equation 4 , is the conversion parameter of productivity, which is obtained by multiplying the active factor by the conversion rate. It can also be predicted that the secondary productivity of software ecosystem is linearly related to the number of users and their willingness to contribute.

In Sections 5 and 6 , we specifically analyzed the influencing factors of software productivity for different platforms and made an empirical study on the feasibility and universality of the abovementioned hypothetical model to verify whether the models can express the impact of participants on the ecosystem productivity.

Q 3 How should we measure the productivity in open source software ecosystems? The research method analyzed the factors affecting the productivity. Then, the productivity model of software ecosystem was verified and established. It was divided into three steps. Step 1 data collection : firstly, the appropriate data were selected and the ecosystem was divided. In order to verify the feasibility of the model, the typical open source software platform GitHub was selected for empirical research.

We took GitHub platform as an example in Section 4. Because project popularity and development language are important platform factors affecting ecosystem, in order to eliminate the impact of these factors in the ecosystem, we divided the ecosystem with different types of platforms, mainly using development language as the index.

For platforms that cannot divide ecosystem according to the development language, we adopt other dividing standards such as project type and project popularity. Step 2 correlation analysis : in order to analyze the impact of user activities on software ecosystem productivity, the relationship between productivity and ecosystem participants in software ecosystem was analyzed.

The data of productivity and participants in the ecosystem were analyzed in a month-long observation period, and the relationship between different types of productivity in the ecosystem and those produced by people in a unit time were analyzed. The negative value means that the software ecosystem productivity is negatively correlated with the participants.

And tends to 0, which means that there is no correlation between them. Step 3 model construction : the construction of this model was divided into two parts. The first is regression analysis. According to the prototype of the software ecosystem productivity model, the linear relationship between productivity and participants was judged. Under the condition of linear relationship, the initial regression equation between productivity and participants was obtained by the least square method [ 18 ].

Because the productivity created by different types of producers is different in the same ecosystem, the specific ecosystem presents different regression models because of the different activity of participants. The second is the construction of the real model. The regression equation of each ecosystem is inconsistent.

To determine a regression model applicable to most projects, this study used truth discovery methods [ 19 ]. Truth discovery is a method to measure the reliability of multisource information and estimate the real information. The flow of this algorithm was shown in Algorithm 1. By this method, we calculated the reliability of the regression equation based on each project and obtained a general regression equation with a greater accuracy.

The specific algorithm was described as follows. The results of the first iteration are the average value of and. Secondly, the weight of each item in the overall ecosystem is calculated, where represents the ecosystem number , represents the activity of the ecosystem obtained through the regression equation, represents the activity result of the t -time iteration, and represents the constant result of the t -time iteration.

Finally, the results of the t -time iteration are calculated. In this paper, we used the truth discovery algorithm to get the real software ecosystem productivity quantification model under a specific software ecological platform. The specific experimental process was described in detail in Section 5. This paper focused on the analysis of user factors influence on ecosystem productivity.

This section verified the relationship between the number of users, activities, and productivity. In this section, we took GitHub as an example to quantify the software ecosystem productivity model and conduct empirical research. The first task was data collection and preprocessing. The main external factors affecting ecosystem productivity include the function type, the popularity of the project, and the popularity of the development language.

Based on the type of development language, this paper chose seven of the most popular languages and divided GitHub platform into several technological ecosystems to analyze the productivity model. As shown in Table 1 , we reversed the popularity of these languages. The first language in each platform had 10 points, the second had 9 points, and so on. The scores of the four platforms were added, and we selected the top 7 languages in terms of popularity.

Then, technical ecosystems of the platform were divided by language as the main factor, and the ecosystem productivity was analyzed. Using the seven popular languages mentioned above, the platform was divided into software ecosystems of different languages. According to the definition of software ecosystem productivity in this paper, the data of user contribution activities were used.

GitHub platform saves a lot of historical data of development process, and the API of the website provides high data integrity for data crawling.

After data processing was completed, the correlation between productivity and participants was analyzed. This paper attempted to find out the relationship between the number of participants and productivity through the statistical analysis of average user activities. In a specific software ecosystem, it is necessary to select data that can represent the ecosystem productivity and make correlation analysis with the number of users. In GitHub ecosystem, the most important contribution of users usually comes from the pull requests behavior, so PR data is a good representation of the software ecosystem productivity, while valuable PR is usually merged into the project code base.

So, the merged PR was used as the secondary productivity after transformation. The correlation between the productivity and the ecological participants in software ecosystem projects was analyzed.

In this paper, the productivity data of each project in unit time and the data of participants in statistical time threshold were counted according to the observation period, and the correlation coefficient of person was used to analyze the correlation between participant data and software productivity data. The results are shown in Figure 1. As shown in Figure 1 , in the GitHub platform, a highly positive correlation can be found between the total PR and the total number of participants.

Among them, the lowest degree of association of a development language is 0. The number of PR that has been merged and the number of participants are also highly positively correlated, and the degree of association ranges from 0.

Therefore, it can be concluded that, in GitHub ecosystem, there is a highly positive correlation between the productivity of software ecosystem and the number of participants in ecosystem, and the number of participants directly affects the productivity.

Based on the above analysis of the correlation between productivity and the number of users, it was concluded that there is a highly positive correlation between the productivity of software ecosystem and the number of participants in the ecosystem. The number of participants directly affects the value of productivity. Because productivity comes from the interaction of all participants in the ecosystem, the main influencing factors of software productivity are the number and activity of producers in the ecosystem.

Therefore, this paper used the number of users and the activity of users to build a model of the impact of participants on productivity. According to the above correlation analysis, it was found that there is a clear positive correlation between ecosystem productivity and the number of participants. Therefore, it was necessary to judge whether productivity and the number of participants have linear function relations and carry out regression analysis.

On the GitHub platform, the impact model of participants on productivity was constructed by using seven representative languages.

In Figure 2 , we found that there is a linear relationship between productivity and the number of participants. Specific projects or ecosystems present different regression models depending on the activity of participants. As shown in Figure 2 , in every ecosystem, productivity is a linear function related to the number of participants.

Primary and secondary productivity models can be met in both primary and secondary productivity. Therefore, the primary regression equation was obtained by the least square method, as shown in Table 3.

Since the regression equations of each project are inconsistent, the lowest activity factor, Ac, was 1. Through analysis, it was found that the active factor can usually indicate the willingness of users in the ecosystem. To obtain a regression model for most projects, the participant's impact model on productivity was constructed using the real-discovery approach described in Section 3 :.

Pe represents the number of participants in the software ecosystem. The primary activity was 2. And in this paper, the default minimum productivity of a user was 1. Through the analysis of the software ecosystem productivity of GitHub platform, it was found that the software productivity model proposed in Section 4 can be applied to multiple software ecosystems.

Usually, the average activity of users can be used to replace the active factors derived from the inversion of productivity and the number of participants. Q 4 Can this evaluation method and productivity model be applied to ecosystems of GitHub and other platforms? To answer this question, we verified the model in other ecosystems in this section, which include three different class ecosystems in GitHub. Then, the ecosystems in Stack Overflow and Bugzilla were verified.

In this section, we verified the model in the three different class ecosystems in GitHub. In biology, the range of ecosystems can be large or small, and ecosystem productivity can represent the production capacity of individuals, groups, ecosystems, regions, and even biosphere. Similarly, this paper selected three software ecosystems of different sizes and types in GitHub for verification. They are the single software product ecosystems, the software development team ecosystems, and the language ecosystems.

The relationship between model productivity and actual productivity was verified by statistical calculation. Three different ecosystems, Moby, GitBook, and Ruby, were selected to validate the model. Moby is an open source project dedicated to promoting the movement of software containerization.

In the Moby project, users and software frameworks, components, and other software products gather to form a software ecosystem. The GitBook team is mainly a development team for text editors using Git technology. In the GitBook ecosystem, users and software development environments, software products, services, and others are condensed together through a team to form a software ecosystem. Ruby is a simple and fast object-oriented scripting language.

It is a popular project development language in GitHub. In the Ruby ecosystem, users, software products, and development environment form a software ecosystem with the same development language. Figure 3 shows that the software ecosystem primary productivity was calculated by the productivity model.

It was consistent with the trend of the actual productivity of the software ecosystem, and the quantity is roughly the same. However, there was a difference between the predicted and the measured value. Because this model analyzed the characteristics of multiple ecosystems, the large dataset obscured the characteristics of specific ecosystems. It caused the predicted value to be different from the average productivity of the platform. The predicted value is greater than the measured value; this means that the productivity of this ecosystem is lower than the average productivity of this platform, and user activities should be improved.

The predicted value is less than the measured value; this means that the users of this ecosystem are more active, and the productivity is higher than the average productivity of the platform. When the productivity of the ecosystem is continuously higher than the prediction model, the model should be adjusted according to the characteristics of this software ecosystem to ensure its accuracy.

This fully proved that the abovementioned composition model of software ecosystem primary productivity is applicable to GitHub and other software ecosystems. It was also found that the number of users is limited by the environmental capacity of the ecosystem during the stable operation of the platform. In the software ecosystem, the transition from primary productivity to secondary productivity takes time. It takes some collaboration with other users to convert primary productivity to secondary productivity.

Hence, the prediction of secondary productivity was not very good in the last few months. Also, the prediction of secondary productivity of these three ecosystems in GitHub is analyzed. The result is shown in Figure 4. During the verification process, we found the impact of factors other than the number of users and activity on the software ecosystem productivity. On the GitHub platform, the software primary productivity problem had a high correlation with the number of participants, but when the ecosystem is small and the data volume is sparse, the accuracy of the model will be greatly reduced.

During the productivity verification experiment of a single project team, it was found that each user participating in the PR submission would have one or two problems, but a few core developers would generate a large number of submissions during certain observation periods, resulting in partial errors in the model.

The reason why the secondary productivity and the number of participants are lower than the primary productivity was that the large number of submissions generated by these few core developers is often incorporated into the code base, so the primary productivity depends more on the activities of the core developers.

In the initial phase of the project, the contribution rate of core users is usually high. However, as the project progresses, the number of noncore developers participating in the project will increase, and the proportion of secondary productivity converted from primary productivity will gradually increase. In the experiment, we also found that, in the process of stable operation of the platform, there is no sudden effect of external force.

And after the number of users reaches a certain level, it will remain in a range for a long time. Therefore, the most important thing for primary productivity is to improve the participants and active level. And in these open source software ecosystems, the transition from primary productivity to secondary productivity takes time and needs to be discovered and collaboratively completed by other users to drive primary productivity into secondary productivity.

Therefore, the amount of secondary productivity is not very good in the last few months. And because the ecological secondary productivity is converted from primary productivity, the secondary productivity is almost zero when the primary productivity is low, so the secondary productivity model is too dependent on the core user for the product of a small project team. Therefore, in order to increase secondary productivity, it is necessary to increase primary productivity, conversion factor, and number of core users.

To verify the universal applicability of this model, we also verified the model in the ecosystems of other ecosystems. Using the same method, the software ecosystem productivity of Stack Overflow platform and Bugzilla platform was obtained. On the Stack Overflow platform, users can perform a variety of different activities such as questions, answers, votes, and comments.

Users participate in group intelligence collaborative activities such as questions and answers to form an open source ecosystem. It has been highly popular with software developers and is considered to be one of the most successful open source ecosystems.

Therefore, we selected these software ecosystems to verify the productivity model. Before verification, the ecosystems and their productivity data in these platforms were determined. In the Stack Overflow, the information is in questions and answers. Therefore, the number of questions and answers represent productivity. Therefore, the productivity is represented by the number of reports. Then, the ecosystems on the platform were divided. According to the characteristics of the platform, users in Stack Overflow usually gather with different technical fields.

Therefore, Stack Overflow was divided according to the programming language. In Bugzilla, there are five types of projects, so the ecosystems were divided in terms of project types. Then, we determined the activity factor of these ecosystem models according to the method in Section 4. First, the correlation between productivity and the number of participants was analyzed.

It was found that, on the Stack Overflow, the productivity is highly positively correlated with the number of participants. Congratulations to Chong Cheng on his promotion to the rank of Full Professor. Congratulations to Johannes Hachmann on his promotion to the rank of Associate Professor with tenure.

Congratulations to Gang Wu on his promotion to the rank of Full Professor. NAMS is the only professional society in North America that promotes all aspects of membrane science and technology. The board of directors is the real heart of the Membrane Community, generally the most active members leading committees, initiatives, and steering strategic directions to promote membranes.

Lin will serve the term of DOE-NREL for low temperature CH conversion by developing a novel after treatment system for future natural gas vehicles using palladium-based catalysts.

Johannes Hachmann receives NSF grant of , The project aims to assert the role of big data research in the chemical domain, i. The newly-established Department of Engineering Education seeks to study and implement new strategies for scaling and translation of engineering education research findings into widespread classroom practice.

He received a B. Prior to Microsoft, Fred was the director of technology at Applied Technical Systems, where he developed innovative search and database technology. He holds patents in database and systems diagnostics. Follow us:.



0コメント

  • 1000 / 1000