合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        代寫MS6711、代做Python語言程序
        代寫MS6711、代做Python語言程序

        時間:2025-03-07  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



        MS6711 Data Mining
        Homework 2
        Instruction
        This homework contains both coding and non-coding questions. Please submit two files,
        1. One word or pdf document of answers and plots of ALL questions without coding details.
        2. One jupyter notebook of your codes.
        3. Questions 1 and 2 are about concepts, 3 - 6 are about coding.
        1
        Problem 1 [20 points]
        We perform best subset, forward stepwise and backward stepwise selection on the same dataset with p
        predictors. For each approach, we obtain p + 1 models containing 0, 1, 2, · · · , p predictors. Explain your
        answer.
        1. Which of the three models with same number of k predictors has smallest training RSS?
        2. Which of the three models with same number of k predictors has smallest testing RSS? (best
        subset, forward, backward, or cannot determine?)
        3. True or False: The predictors in the k-variable model identified by forward stepwise are a subset of
        the predictors in the (k + 1)-variable model identified by forward stepwise selection.
        4. True or False: The predictors in the k-variable model identified by best subset are a subset of the
        predictors in the (k + 1)-variable model identified by best subset selection.
        5. True or False: The lasso, relative to OLS, is less flexible and hence will give improved prediction
        accuracy when its increase in bias is less than its decrease in variance.
        2
        Problem 2 [20 points]
        Suppose we estimate Lasso by minimizing
        ||Y − Xβ||2
        2 + λ||β||1
        for a particular value of λ. For part 1 to 5, indicate which of (a) to (e) is correct and explain your answer.
        1. As we increase λ from 0, the training RSS will
        (a) Increase initially, and then eventually start decreasing in an inverted U shape.
        (b) Decrease initially, and then eventually start increasing in a U shape.
        (c) Steadily increase.
        (d) Steadily decrease.
        (e) Remain constant.
        2. Repeat 1. for test RSS.
        3. Repeat 1. for variance.
        4. Repeat 1. for (squared) bias.
        3
        Problem 3 [20 points]
        These data record the level of atmospheric ozone concentration from eight daily meteorological mea surements made in the Los Angeles basin in 1976. We have the 330 complete cases1. We want to find
        climate/weather factors that impact ozone readings. Ozone is a hazardous byproduct of burning fossil
        fuels and can harm lung function. The data set for this problem is:
        Variable name Definition
        ozone Long Maximum Ozone
        vh Vandenberg 500 mb Height
        wind Wind speed (mph)
        humidity Humidity (%)
        temp Sandburg AFB Temperature
        ibh Inversion Base Height
        dpg Daggot Pressure Gradint
        ibt Inversion Base Temperature
        vis Visibility (miles)
        doy Day of the Year
        [Note: I would recommend you use R for this question, since python does not have package for
        forward / backward selection. See the code example on Canvas. Or you may use the sample python code
        I provided.]
        1. Report result of linear regression using all variables. Note that ozone is the response variable to
        predict. What variables are significant?
        2. Report the selected variables using the following model selection approaches.
        (a) All subset selection.
        (b) Forward stepwise
        (c) Backward stepwise
        3. Compare the outcome of these methods with the significant variables found in the full linear regres sion in question 1.
        4. Potentially, other transformation of covariates might be important. What happens if you do all
        subset selection using both the original variables and their square? That is, for all variables, include
        4
        both
        X, X2
        in the linear regression model for all subset selection.
        5
        Problem 4 [20 points]
        In this exercise, we will predict the number of applications received using the other variables in the College
        data set.
        Private Public/private school indicator
        Apps Number of applications received
        Accept Number of applicants accepted
        Enroll Number of new students enrolled
        Top10perc New students from top 10% of high school class
        Top25perc 1 = New students from top 25 % of high school class
        F.Undergrad Number of full-time undergraduates
        P.Undergrad Number of part-time undergraduates
        Outstate Out-of-state tuition
        Room.Board Room and board costs
        Books Estimated book costs
        Personal Estimated personal spending
        PhD Percent of faculty with Ph.D.
        Terminal Percent of faculty with terminal degree
        S.F.Ratio Student faculty ratio
        perc.alumni Percent of alumni who donate
        Expend Instructional expenditure per student
        Grad.Rate Graduation rate
        1. Split the data set into a training set and a test set.
        2. Fit a linear regression model using OLS on the training set, and report the test error obtained.
        3. Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test
        error obtained.
        4. Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error
        obtained, along with the number of non-zero coefficient estimates.
        5. Fit a PCR model on the training set, with number of components chosen by cross-validation. Report
        the test error obtained, along with the value of M selected by cross-validation.
        6. Fit a PLS model on the training set, with number of components chosen by cross-validation. Report
        the test error obtained, along with the value of number of components selected by cross-validation.
        6
        Problem 5 [20 points]
        We will now try to predict per capita crime rate in the Boston data set.
        crim per capita crime rate by town.
        zn proportion of residential land zoned for lots over 25,000 sq.ft.
        indus proportion of non-retail business acres per town.
        chas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
        nox nitrogen oxides concentration (parts per 10 million).
        rm 1 = average number of rooms per dwelling.
        age proportion of owner-occupied units built prior to 1940.
        dis weighted mean of distances to five Boston employment centres.
        rad index of accessibility to radial highways.
        tax full-value property-tax rate per $10,000.
        ptratio pupil-teacher ratio by town.
        black 1000(Bk − 0.63)2 where Bk is the proportion of blacks by town.
        lstat lower status of the population (percent).
        medv median value of owner-occupied homes in $1000s.
        1. Try out some of the regression methods explored in this chapter, such as best subset selection, the
        lasso, ridge regression, PCR and partial least squares. Present and discuss results for the approaches
        that you consider.
        2. Propose a model (or set of models) that seem to perform well on this data set, and justify your
        answer. Make sure that you are evaluating model performance using validation set error, cross validation, or some other reasonable alternative, as opposed to using training error.
        3. Does your chosen model involve all of the features in the data set? Why or why not?
        7
        Problem 6 [20 points]
        In a bike sharing system the process of obtaining membership, rental, and bike return is automated
        via a network of kiosk locations throughout a city. In this problem, you will try to combine historical
        usage patterns with weather data to forecast bike rental demand in the Capital Bikeshare program in
        Washington, D.C.
        You are provided hourly rental data collected from the Capital Bikeshare system spanning two years.
        The file Bike train.csv, as the training set, contains data for the first 19 days of each month, while
        Bike test.csv, as the test set, contains data from the 20th to the end of the month. The dataset includes
        the following information:
        daylabel day number ranging from 1 to 731
        year, month, day, hour hourly date
        season 1=spring,2=summer,3=fall,4=winter
        holiday whether the day is considered a holiday
        workingday whether the day is neither a weekend nor a holiday
        weather 1 = clear, few clouds, partly cloudy
        2 = mist + cloudy, mist + broken clouds, mist + few clouds, mist
        3 = light snow, light rain + thunderstorm + scattered clouds, light rain
        4 = 4 = heavy rain + ice pallets + thunderstorm + mist, snow + fog
        temp temperature in Celsius
        atemp ’feels like’ temperature in Celsius
        humidity relative humidity
        wind speed wind speed
        count number of total rentals, outcome variable to predict
        Predictions will be evaluated using the root mean squared error (RMSE), calculated as
        RMSE =
        v
        u
        u t
        n
        1
        nX
        i=1
        (yi − ybi)
        2
        where yi
        is the true count, ybi
        is the prediction, and n is the number of entries to be evaluated.
        Build a model on train dataset to predict the bikeshare counts for the hours recorded in the test
        dataset. Report your prediction RMSE on testing set.
        Some tips
        • This is a relatively open question, you may use any model you learnt from this class.
        8
        • It will be helpful to examine the data graphically to spot any seasonal pattern or temporal trend.
        • There is one day in the training data with weird atemp record and another day with abnormal
        humidity. Find those rows and think about what you want to do with them. Is there anything
        unusual in the test data?
        • It might be helpful to transform the count to log(count + 1). If you did that, do not forget to
        transform your predicted values back to count.
        • Think about how you would include each predictor into the model, as continuous or as categorical?
        • Is there any transformation of the predictors or interactions between them that you think might be
        helpful?
        Try to summarize your exploration of the data, and modeling process. You may fit a few models and
        chose one from them. You will receive points based on your write-up and test RMSE. This is not a
        competition among the class to achieve the minimal RMSE, but your result should be in a reasonable
        range.


        請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp



         

        掃一掃在手機打開當前頁
      1. 上一篇:INT5051代做、代寫Python編程設計
      2. 下一篇:代寫COMP3334、代做C/C++,Python編程
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        急尋熱仿真分析?代做熱仿真服務+熱設計優化
        急尋熱仿真分析?代做熱仿真服務+熱設計優化
        出評 開團工具
        出評 開團工具
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
        合肥機場巴士2號線
        合肥機場巴士2號線
        合肥機場巴士1號線
        合肥機場巴士1號線
      4. 短信驗證碼 酒店vi設計 NBA直播 幣安下載

        關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

        Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
        ICP備06013414號-3 公安備 42010502001045

        青青青国产精品一区二区| 日韩精品成人一区二区三区| 国产精品免费视频一区| 国产精品视频一区麻豆| 国产精品青草久久久久婷婷| 538精品视频在线观看| 人妻精品久久无码专区精东影业| 久久综合日韩亚洲精品色| 久久青青草原国产精品免费| 思思久久99热只有频精品66| 国产亚洲午夜高清国产拍精品| 精品国产热久久久福利| 国产精品白丝AV嫩草影院| 国产高清在线精品一本大道国产| 亚洲日韩国产精品乱| 午夜在线视频91精品 | 久久精品国产只有精品66| 国产AⅤ精品一区二区三区久久| 日韩精品无码永久免费网站| 四虎永久在线精品国产馆V视影院| 国产日韩精品一区二区三区 | 日韩午夜电影在线观看| 日韩精品亚洲aⅴ在线影院| 亚洲日韩av无码| 日韩视频免费在线观看| 国产精品麻豆欧美日韩WW| 精品一区二区三区四区电影| 亚洲午夜精品第一区二区8050 | 精品乱码一卡2卡三卡4卡网| 国产精品久线观看视频| 亚洲午夜精品久久久久久app| 亚洲精品国产精品| 国产精品∧v在线观看| 小呦精品导航网站| 国产精品va在线观看无| 国产精品V亚洲精品V日韩精品| 国产伦精品一区二区三区女| 在线精品国精品国产尤物| 久久精品无码中文字幕| 精品国产一区二区三区久久影院 | 精品亚洲一区二区|