当前位置:天才代写 > report代写 > 机器学习报告代写 machine learning代写 R代写

机器学习报告代写 machine learning代写 R代写

2023-02-13 10:49 星期一 所属: report代写 浏览:471

机器学习报告代写

Optimize the Portfolio of Soybean Varieties at Target Farm through Machine Learning

机器学习报告代写 The important result of this project is the portfolio of 5 soybean varieties and the specific land allocation for the target farm.

Abstract

The world is facing the problem of food shortages due to the following three reasons. Firstly, the population of the world has grown rapidly. Secondly, urbanization has led to a decrease in the area of ​​food planting. Thirdly, the COVID-19 is seriously affecting production globally. Therefore, it is imperative to select the best soybean varieties to increase yields and thus the food shortage may be alleviated.

This project will use machine learning to optimize the soybean variety portfolio of the target farm by selecting 5 kinds of soybeans from 182 varieties and allocating a certain amount of land for them to optimize production. Descriptive analytics, predictive analytics and prescriptive analytics will all be used in this project.

Descriptive analytics mainly studies the influence of location on the yield, the relationship between the weather and location, the distribution of yields and varieties as well as if there are enough data to build a model.

The specific methods of predictive analytics are Linear Regression, LASSO, Regression Tree, Bagging, Random Forest, Boosted Trees and Neural Network. These methods will be used to build models, predict yield and compare accuracy. Finally, in prescriptive analytics, the mean risk heuristic is applied to give the final result of questions. Both predicted average yield given by the models and mean square error indicated the risk of growing the varieties will be considered in this part.  机器学习报告代写

The ultimate goal of the project is to provide the optimal soybean varieties through machine learning to achieve the goal of increasing yield per unit area. The final result will help soybean farmers increase their yields and reduce the stress of starvation for people in certain area. Therefore, it is of great significance to conduct this research.

Keywords: Food Shortage, Optimization, Portfolio of Soybean Varieties, Machine Learning

Introduction  机器学习报告代写

Food shortages have now become a global problem. This is because, on the one hand, the world’s population has grown tremendously. In the past 200 years, the world’s population has increased by 750%, from 1 billion to about 7.5 billion. On the other hand, urbanization has led to a decrease in rural areas and cultivated land, thus food shortages have become more serious. In addition, a special influencing factor emerged that the impact of the COVID-19 is gradually deepening.

The World Food Program pointed out that 135 million people around the world were facing severe food shortages before, and now affected by the epidemic, this data is expected to increase by about 130 million people this year to 265 million people. In this case, it is particularly important to increase grain production per unit area. This has become the motivation of this project to increase the yield and solve difficult food shortage problem by selecting the best soybean varieties.

The goal of this project is to optimize the portfolio of soybean varieties at target farm through machine learning.

The specific question of this project is to that 5 varieties of soybeans is selected from 182 kinds of soybeans for the target farm and a certain land is allocated as the portfolio so that the yield could be optimized. If the results are convincing and credible, there will be benefits for many people in the world, such as farmers who grow soybeans, people who are starving, especially children who are malnourished due to food shortages. Therefore, it is significant to do this research.   机器学习报告代写

The analysis includes three components: descriptive analytics, predictive analytics, and prescriptive analytics. To be more specific, descriptive analytics mainly studies some qualitative and descriptive questions to understand the general situation of the data related to the question. In predictive analytics, 7 methods are used to build models for different varieties and predict the yield for the target farm. All methods are compared accuracy by the mean square error. Finally, in the prescriptive analytics part, the Mean-Risk Heuristics that considered both average yield and risk is applied to give the final portfolio of the question.

The important result of this project is the portfolio of 5 soybean varieties and the specific land allocation for the target farm. This indicates that through machine learning, the optimal varieties can be selected, and the purpose of increasing the yield per unit area could be achieved. Such experience may also be applied in other target areas, which may help solve the problem of food shortages on a larger scale.

Literature Review

Some articles have studied the problem of increasing soybean or other grain yields by optimizing variety selection. For example, Huang et al. (2020) performed inter-comparison method in “Comparative Test Analysis and Evaluation of New Summer Soybean Varieties (lines) in Xinxiang Area”. “Effects of planting patterns on agronomic characters and yield of different soybean varieties” (Wang et al., 2020) applied algorithms for the purpose of constructing a reasonable group structure of soybeans and increasing yields. The structure and methods of these studies are of reference significance.

“Machine-Learning-Based Simulation for Estimating Parameters in Portfolio Optimization:

Empirical Application to Soybean Variety Selection” (Sundaramoorthi & Dong, 2019) is more instructive because the method is totally Machine-Learning-Based. Bagging, Random Forest and Regression Trees methods were used which can provide a theoretical basis for this project. Barkley, Peterson & Shroyer (2010) conduct a study using portfolio theory in business investment analysis to find the best portfolio of wheat varieties with maximum yield and minimum risk. This article inspired me to consider both risk and yield comprehensively.   机器学习报告代写

However, these studies (Basnet, Mader & Nickell, 1974; Cui et al., 2020) usually only use 1 to 2 methods for modeling and prediction. In fact, each method has drawbacks and shortcomings. Adding more methods can make the results better and more unbiased. Also, it is more likely to apply the method to other regions to increase the scope of problem solving. This project will try to fill this research gap.

机器学习报告代写
机器学习报告代写

Methodology and Analysis

There are three components of the methodology: descriptive analytics, predictive analytics, and prescriptive analytics. Before this three steps, data pre-processing is also an important step because it will provide a solid foundation for subsequent analysis.

Step 1: Data Pre-processing

In this step, the data is divided into two types: sufficient and insufficient. Sufficient data is directly retained, and insufficient data is merged. Taking less than 50 data as the criterion for inadequate data, among all 182 varieties, 99 are insufficient and 83 are sufficient. Synthesize 99 insufficient data into a new variety, called “Vnew”, and participate in the fitting together with the other 83, a total of 84 varieties. In the subsequent fitting, each method will build 84 models. There are also other operations for subsequent fitting.   机器学习报告代写

Step 2: Descriptive Analytics

This part is mainly to have a general understanding of the data. Firstly, plot the latitudes and longitudes on a map to visualize the locations of farms in Figure 1.

Figure 1. Location Information of the farms in the given data.

Secondly, generate frequency distribution for varieties in Figure 2.

机器学习报告代写
机器学习报告代写

Figure 2. The Frequency of the Variety for all data.

It is apparent from Figure 2 that the amount of data of different types varies greatly, and there are situations where the amount of data is sufficient and insufficient. We have already processed this part in Step 1, which will benefit the quality of subsequent models.

Thirdly, check to see if there is any relationship between the locations and varieties. Linear regression method is selected here, by observing the p-value and coefficient to judge the relationship between the locations and yield of varieties. Both longitude and latitude have a significant impact on yield because they have p-values close to 0. The coefficient of longitude is -0.33 which is hard to judge whether its impact on production is positive or negative. The coefficient of latitude is -2.74, which obviously shows that the lower the latitude, the greater the yield.

Fourthly, explore relationships between locations and weather related variables.

Through the linear regression, it can be proved that both factor Weather 1 and 2 positively influence the locations. This is because the two p-values are close to 0 and the coefficient of Weather 1 and 2 are 1.09 and 1.97 respectively.   机器学习报告代写

Fifthly, plot the distribution of the yield variable in Figure 3. Figure 3 describes a normal distribution of the variety yield and indicates that the yield between different varieties varies greatly. Therefore, the goal of this project should be select the better varieties to improve the total yield.

Figure 3, the histogram of the Variety yield.

Step 3: Predictive Analytics

The target variable is Variety_Yield in this project. The 84 varieties including 83 varieties with sufficient data and 1 new variety call “Vnew” consisting of all varieties with insufficient data will be used to build the model. 7 methods including Linear Regression, LASSO, Regression Tree, Bagging, Random Forest, Boosted Trees and Neural Network are applied here.   机器学习报告代写

The steps to build a model for each method are similar. Firstly, divide all data into training set and test set and ratio is 8:2. Secondly, predict the yield of all 84 varieties by the “for” loop code. More specifically, fit the training data and predict the test set. Calculate the mean square error (MSE) to test the accuracy of the method. Finally, perform the prediction of the yield of the evaluation data for the target farm.

There are several points to note among the seven methods. Firstly, LASSO is different from other methods.

It requires performing regression processing on the data frame before fitting. Secondly, even though the data was pre-processed in advance, an error occurred in the loop of the Boosted Trees method. The reason was that the data was insufficient. This problem was properly solved by adjusting the loop range. Thirdly, the Neural Network method needs to standardize the data before fitting. The formula for standardization is (X-min)/(max-min).

The result of the prediction could be seen below in Table 1.

Table 1. Predicted yield of different varieties through different method.

Variety Linear Regression LASSO Regression Tree Bagging Random Forest Boosted Trees Neural Network
V100 112.6819803 65.18063591 70.02227682 64.68006616 64.75130555 65.56819895 86.55001021
V101 100.1761093 57.02092699 58.74248239 57.74756935 58.30591639 55.62336377 88.47904301
V102 57.76424079 63.62861206 57.69892858 61.57844969 61.99542259 67.6825447 99.95074506
V103 61.50147159 66.18768647 43.16572425 58.52990743 59.01815772 52.74332988 97.4743919
V104 90.04583615 63.24537448 59.34155904 59.26660596 58.32978547 63.62265776 93.09393859
V105 63.28664144 64.15628013 55.70881625 58.05946715 58.90818474 60.43203184 115.6697742
V106 95.18119428 60.95025391 58.4372731 59.56121281 59.88475096 50.18138299 85.86605985
V107 64.58351253 68.63657536 60.47064032 58.61390831 58.06799915 60.6825476 99.61836252
V108 74.59180266 62.52148554 48.33331444 55.39438697 55.03133129 58.54584829 92.79804262
V109 62.2590398 64.41932599 55.19891827 56.13975415 58.01575387 64.61644134 91.75252446
V110 70.73957641 61.19700908 60.79995268 60.93484401 60.3007325 62.71432929 87.6647739
V111 63.12875576 67.55673962 59.19266437 58.04586013 56.63538705 52.10891863 86.72573571
V112
73.0723474 66.90948133 56.22177384 58.72016627 59.03399816 64.61683611 89.72766442
V114 59.51929313 60.99421047 54.84387274 57.10822593 57.8543227 54.00845871 85.30094106
V115 62.25947432 64.37366175 69.82979417 58.83971293 57.86527428 50.02048224 102.5309678
V116 -929.4503032 47.83455902 54.15194142 49.7853575 49.39766396 50.32464398 73.12323454
V117 62.45410717 58.99087959 54.93492436 54.99392153 54.72097492 54.5991056 90.73609857
V118 55.96325129 55.68537017 52.47233522 54.51578862 55.89744269 51.85565025 86.10926931
V119 44.02900824 56.90416414 44.29714131 52.70952276 53.62105414 56.18499717 93.02048592
V121 58.78954239 59.39031401 54.1570868 55.49908152 55.01089998 32.08863654 105.7599135
V122 45.69461977 57.7078316 56.81222375 51.82878334 51.52560117 50.36948139 87.97906017
V124 88.4745568 56.67059064 60.80477336 53.55030403 54.9217717 54.78632791 79.75623728
V125 -355.8928473 48.18237053 50.49200568 52.88173615 52.20755789 0 64.52248074
V126 64.27944551 57.00401805 64.72036744 54.05213706 54.3566554 66.27144938 92.03500761
V127 51.97222519 69.02170065 37.12745528 46.25098813 48.08354609 40.09954802 73.18214852
V128 -110.111735 81.46268537 79.81791803 56.25024086 54.9895276 58.31450786 92.01592498
V130 -295991.064 52.87317115 51.79883041 50.8390562 50.53069405 46.20298869 77.51456631
V131 3228.775683 68.83773718 53.47943307 51.0921526 51.60233134 53.59107986 71.14493931
V133 882237.7943 58.50506726 60.00990364 48.62824027 48.49385244 51.5754167 87.03977229
V135 -42981.98149 55.92438675 60.40135846 48.10611492 47.9690189 45.93819059 75.3090758
V136 142776.7128 44.97741592 42.09221778 43.07008466 42.89561098 46.85419215 82.92352273
V137 -77301.8127 42.88827691 42.52617216 36.6968592 37.25211529 34.21647654 71.84029338
V138 3312.620365 43.99827692 52.3892111 42.6436661 42.38232617 36.98697034 75.25713636
V139 -43649.94441 41.98683887 29.90236296 35.55724395 36.22138564 48.67731866 67.0473347
V140 -12285.02071 41.7296011 34.67279417 39.84071598 38.35600717 39.50572437 64.98804297
V169 -37970.0721 28.4695012 50.71726808 51.09414105 51.73651807 51.10642065 88.45234246
V179 868.3821495 75.66002337 54.12209191 60.48061254 61.17765996 58.89906281 86.861495
V180
300.0392466 60.31813473 75.14458347 66.98092237 65.70026091 72.86432688 89.25291391
V181 -3535.110402 63.01954876 62.9838544 61.49756756 62.52623435 60.91804453 83.65247886
V183 69.50047722 61.90443939 61.6125274 58.26592123 58.58700213 58.64384123 87.74383484
V185 12818.14947 60.35110302 52.22385801 54.39711645 55.58875598 54.28898398 79.89387435
V186 80.75199106 60.10133466 48.6927661 56.68903753 57.33623531 62.09918411 83.71241357
V187 355.5063985 62.20440013 60.90305002 59.28488672 61.09158135 66.90738493 87.77214958
V188 61.43990407 61.22242724 54.40304051 58.49960435 58.59701108 55.03299809 84.61083927
V189 63.81374744 61.52924509 62.88808061 60.40639675 59.78986754 60.4744975 88.29118463
V190 32.54284615 58.46607469 59.79578661 55.03738111 56.64329632 56.36857103 79.69779757
V191 34.96283151 57.28432055 67.1188618 51.39759108 52.34676607 62.53797276 84.48099701
V192 57.84897296 76.30603216 46.42201911 56.56424204 57.12966566 59.99185514 79.93395304
V193 36.68941554 58.06421889 39.44389071 53.17233488 53.35921346 50.68375301 78.14500234
V194 12570409.87 64.43078004 58.08755544 50.41946118 51.31744878 52.57776077 86.78233998
V195 88372.32648 68.26771043 67.65027253 57.63376716 56.68007943 58.29651232 78.55092628
V196 -21581730.25 59.07413665 55.48377119 53.1975548 53.66858256 54.74732697 82.19622987
V31 -18231465.38 79.75719408 60.90651095 62.84662761 63.56252955 57.81897931 90.47262509
V32 -82364.01534 78.49617284 68.75748331 64.82894187 64.98542435 60.62933599 84.5450634
V36 94586956.39 79.22139038 58.25863914 65.28044486 65.43308843 73.90618251 88.15588295
V38
35362.24119 64.64885372 51.23172521 63.543584 63.24707043 67.57525266 89.5136613
V39 144837.5432 64.75317968 60.88123661 61.9221465 62.43259266 69.08356302 81.93952727
V40 4719.304605 62.91058379 52.32644162 57.73106802 58.12714477 58.3782358 79.07237815
V41 -113811231 65.15025395 59.90649521 62.05799451 63.04924149 64.61736251 85.33566101
V42 8085.800581 61.04544475 56.77610532 57.9887994 57.94797297 51.73323917 87.03576504
V43 53.93564332 62.57085027 51.90648824 56.8506727 58.01553756 57.14109669 87.04316807
V44 168.0675351 60.88200889 52.35164255 56.51456099 57.26948553 59.19570743 86.18108629
V45 111961.7153 61.2391268 63.85091657 54.18753218 54.08018128 52.94394781 82.70367346
V46 64.71912114 63.59648409 66.36835637 59.51095116 57.94943526 67.63817308 88.59488646
V47 64.43590373 61.25594532 54.8607355 60.87768045 59.98495753 68.19862266 78.62697814
V48 38.81158613 59.74674225 51.23598056 54.88151054 55.47105266 54.74459132 81.37433194
V49 -10452.05125 61.59371522 52.24030824 56.93075548 58.20986234 57.32381628 79.72989069
V51 149042.2013 59.84185677 51.31206448 55.63793612 56.31026847 64.3737881 81.41539401
V52 -10227.39447 57.69102436 69.31505539 56.20734508 56.49324219 56.64307495 76.48343358
V54 4374.377771 61.89406967 49.53123535 54.99915998 55.93281599 51.77181006 82.74961592
V56 88300.6969 63.16869238 55.49762304 53.12815731 50.80869952 63.06396717 79.27525701
V8 2425185.521 59.07306522 66.05912473 60.53120773 60.42068334 65.58507743 88.87808504
V87 -33535.75165 56.53485808 43.76346273 48.28065608 51.3300429 51.87624032 77.624912
V88 -184164.2453 50.76524107 57.26742813 59.2546737 59.24639782 61.05039632 81.82995349
V9 -305944.2698 74.34657329 58.49765809 58.35948305 58.8901849 73.29179773 83.84272849
V90 -527800957.7 99.02743208 72.24215477 59.63938064 60.4631578 59.32364058 104.8754767
V92 -766494.0612 45.9537126 43.00975128 56.72061643 58.71636854 64.77833997 84.16517855
V94
-80778.6646 52.12197375 51.92567675 57.7075577 57.76309266 59.69825646 77.53049732
V95 3036035.183 37.57766531 51.3037449 60.16463488 61.4358126 54.67157376 90.09971169
V96 2573032.856 64.94371687 56.4966121 59.01382432 61.70679277 64.07693631 94.53271472
V97 27205.55249 60.80261568 67.59211554 61.26005455 61.30674414 55.0005536 89.16797002
V98 68.86288426 71.82933768 68.89192141 67.1311739 65.29808979 64.24985717 92.38771294
V99 84.70390657 72.91427377 58.10841352 61.56336709 60.39798121 65.56884353 101.7332391
Vnew 71.73584042 60.89675222 54.14046816 58.45951856 58.02478818 62.60185974 93.70239313

Step 3: Predictive Analytics

In this step, firstly the accuracy of different method should be summarized. Table 2 shows the MSE of 7 methods.

Table 2. MSE of 7 methods.

机器学习报告代写
机器学习报告代写

Then, assign different weights to different methods according to MSE. Since the MSE of the linear regression is too large and has no reference value, the weight is 0. Others are assigned weights 6, 5, 4, 3, 2 and 1, according to the ordering MSE from small to large. Refer Table 3 for the weights.

Table 3. Weights of 7 methods.

Next, calculate the weighted average of the yield predicted by different methods according to the weights above.

Also, the Mean-Risk Heuristics that considered both average yield and risk is applied here. Standard deviation is a good way to express risk.

By calculating the standard deviation (SD) of the yield of all the data for each variety, the uncertainty of planting this soybean is determined, which is the risk.

To combine the weighted average yield and risk, a formula balance=(average yield)^2/risk is used here because we hope to have greater yield and less risk, and the yield should be considered more than SD. After obtaining the new factor balance, order it and choose the top 5 for the final selection. According to the balance, the land is allocated proportionally. Therefore, the five selected varieties are V98, V39, V9, V90 and V31. The land allocation proportions are 20.95%, 20.59%, 20.07%, 19.47% and 18.92%, respectively. Table 4 describes the ordered balance and the corresponding weighted average yield and risk. The final allocation proportions are also shown.   机器学习报告代写

Table 4. Process and the final result of the land allocation.

Variety Average_yield Risk(SD) Balance Allocation
V98 68.65 9.75 483.12 20.95%
V39 64.22 8.68 474.96 20.59%
V9 64.23 8.91 463.00 20.07%
V90 71.84 11.50 448.97 19.47%
V31 66.54 10.14 436.43 18.92%

Conclusion  机器学习报告代写

This project uses 7 methods of machine learning including Linear Regression, LASSO, Regression Tree, Bagging, Random Forest, Boosted Trees and Neural Network to build models to analyze and predict the yield of different varieties of soybeans. Considering the accuracy of different methods, the predicted value is weighted and averaged. Risk and yield are innovatively both considered to determine the final optimal portfolio.

The final result of this project is that the five selected varieties are V98, V39, V9, V90 and V31.

The land allocation proportions are 20.95%, 20.59%, 20.07%, 19.47% and 18.92%, respectively. Also, among the 7 methods, the Bagging, Random Forest and LASSO perform the best, and linear regression and neural networks are the least suitable for this case.

Using enough methods and assigning different weights to different methods based on accuracy is the first innovation of this project, because this can make up for the shortcomings of a certain method and obtain more unbiased results. The second innovation is to summarize whether different methods are suitable for this type of data. Such experience can also be used in other regions, which can provide help for future research on soybeans or other grain varieties in more regions, so as to solve the wider food shortage problem. The finding of this project can be helpful for both soybean growers and starving people, and could be even instructive for relevant government departments to issue guiding policies on varieties.

References  机器学习报告代写

  1. Huang Jinhua, Wang Lingyan, Tang Zhenhai, Dou Shishu, Li Mingwei, Ma Haitao, Zhang Suping, Li Junli, Zheng Qiudao, Fan Yongsheng. Analysis and evaluation of comparative experiment of new summer soybean varieties (lines) in Xinxiang area[J]. Anhui Agricultural Sciences, 2020, 48(17):21-23+27.
  2. Wang He,Sun Jiaxing,Mo Yan,Yang Shuang. Effects of Planting Patterns on Agronomic Characters and Yield of Different Soybean Varieties[J].China Seed Industry,2020(12):60-63.
  3. Sundaramoorthi D, & Dong L. Machine-Learning-Based Simulation for Estimating Parameters in Portfolio Optimization: Empirical Application to Soybean Variety Selection[J]. SSRN Electronic Journal, 2019.   机器学习报告代写
  4. Barkley, A., Peterson, H., & Shroyer, J. (2010). Wheat Variety Selection to Maximize Returns and Minimize Risk: An Application of Portfolio Theory. Journal of Agricultural and Applied Economics, 42(1), 39-55.
  5. Basnet B., Mader E. L. & Nickell C. D., Influence of Altitude on Seed Yield and Other Characters of Soybeans Differing in Maturity in Sikkim (Himalayan Kingdom), Agronomy Journal, 1974.
  6. Cui Jihan, Li Shunguo, Liu Meng, Guo Shuai, Zhao Yu, Ma Junting, Xia Xueyan. The effect of millet and peanut/soybean intercropping on yield and the differences between varieties[J]. Journal of Anhui Agricultural Sciences, 2020, 48(17) :35-40+45.

 

机器学习报告代写
机器学习报告代写
更多代写:Statistics统计学网课代考  多邻国线上考试  quiz Exam代考  Report代写机构  Term paper怎么写  澳大利亚论文写不出来怎么办

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

 

天才代写-代写联系方式