Suitable for those who have no prior experience of Python or programming, this course will hand-hold you to an intermediate level of expertise beginning with the basics of programming in Python.
At the end of this course, you will need to attempt the Python Programming Associate (PyA) certification, and upon successfully passing the exam, the PyA designation will be bestowed on you. -Learn to solve real world problems-Use statistics in Python...
Business Analytics Reporting
Planning:
■ Clarify the objective. ■ Develop a clear plan. ■ Give yourself enough time.
Developing a Report:
■ Write a quick irst draft. ■ Edit and proofread. ■ Give your report a professional look.
Be Clear:
■ Provide suficient background information. ■ Tailor statistical explanations to your audience. ■ Place charts and tables in the body of the report.
Be Concise:
■ Let the charts do the talking. ■ Be selective in the computer outputs you include.
Be Precise:
■ List assumptions and potential limitations. ■ Limit the decimal places. ■ Report the results fairly. ■ Get advice from an expert.
Example 1:
I am a statistical consultant, and I have been hired by Company ABC, a semi conductor manufacturing company, to analyze its overhead data. The company has supplied me with historical monthly data from the past three years on overhead expenses, machine hours, and the number of production runs. My task is to develop a method for forecasting overhead expenses in future months, given estimates of the machine hours and number of production runs that are expected in these months. My contact, Sudhir Singh, is in the company’s finance department. He obtained an MBA degree about 10 years ago, and he vaguely remembers some of the statistics he learned at that time. However, he does not profess to be an expert. The more I can write my report in nontechnical terms, the more he will appreciate it.
REPORT BEGINS:
Sudhir, here is the report you requested. (See also the attached the R Markdown HTML file, Overhead Costs Analysis.htm, which contains the details of my analysis. By the way, it was done with the help of R Commander. If you plan to do any further statistical analysis, I would strongly recommend using this tool.) As I explain in this report, regression analysis is the best-suited statistical methodology for your situation. It fits an equation to historical data, uses this equation to forecast future values of overhead, and provides a measure of the accuracy of these forecasts. I believe you will be able to sell this analysis to your colleagues. The theory behind regression analysis is admittedly complex, but the outputs I provide are quite intuitive, even to people without a statistical background.
OBJECTIVES AND DATA:
To ensure that we are on the same page, I will briefly summarize my task. You supplied me with Bendrix monthly data for the past 36 months on three variables: Overhead (total overhead expenses during the month), MachHrs (number of machine hours used during the month), and ProdRuns (number of separate production runs during the month). You suspect that Overhead is directly related to MachHrs and ProdRuns, and you want me to quantify this relationship so that you can forecast future overhead expenses on the basis of (estimated) future values of MachHrs and ProdRuns. Although you did not state this explicitly in your requirements, I assume that you would also like a measure of the accuracy of the forecasts.
STATISTICAL METHODOLOGY:
Fortunately, there is a natural methodology for solving your problem: regression analysis. Regression analysis was developed specifically to quantify the relationship between a single dependent variable and one or more explanatory variables (assuming that there is a relationship to quantify). In your case, the dependent variable is Overhead, the explanatory variables are MachHrs and ProdRuns, and from a manufacturing perspective, there is every reason to believe that Overhead is related to MachHrs and ProdRuns. The outcome of the regression analysis is a regression equation that can be used to forecast future values of Overhead and provide a measure of the accuracy of these forecasts. There are a lot of calculations involved in regression analysis, but statistical software, such as StatTools, performs these calculations easily, allowing you to focus on the interpretation of the results.
PRELIMINARY ANALYSIS OF THE DATA:
Before diving into the regression analysis itself, it is always a good idea to check graphically for relationships between the variables. The best type of chart for your problem is a scatterplot, which shows the relationship between any pair of variables. The scatterplots in Exhibits A.10 and A.11 illustrate how Overhead varies with MachHrs and with ProdRuns. In both charts, the points follow a reasonably linear pattern from bottom left to upper right. That is, Overhead tends to increase linearly with MachHrs and with ProdRuns, which is probably what you suspected. The correlations below these plots indicate the strength of the linear relationships. These correlations, 0.632 and 0.521, are fairly large. (The maximum possible correlation is 1.0.) They suggest that regression analysis will yield reasonably accurate forecasts of overhead expenses.
<Show charts as relevant>
Before moving to the regression analysis, there are two other charts you should consider. First, you ought to check whether there is a relationship between the two explanatory variables, MachHrs and ProdRuns. If the correlation between these variables is high (negative or positive), then you have a phenomenon called multicollinearity. This is not necessarily bad, but it complicates the interpretation of the regression equation. Fortunately, as Exhibit X1 indicates, there is virtually no relationship between MachHrs and ProdRuns, so multicollinearity is not a problem for you.
You should also check the time series nature of your overhead data. For example, if your overhead expenses are trending upward over time, or if there is a seasonal pattern to your expenses, then MachHrs and ProdRuns, by themselves, would probably not be adequate to forecast future values of Overhead. However, as illustrated in Exhibit X2 a time series graph of Overhead indicates no obvious trends or seasonal patterns.
REGRESSION ANALYSIS:
The plots in Exhibits X1-X3 provide some evidence that regression analysis for Overhead, using MachHrs and ProdRuns as the explanatory variables, will yield useful results. Therefore, I used StatTools’s multiple regression procedure to estimate the regression equation. As you may know, the regression output from practically any software package, including StatTools, can be somewhat overwhelming. For this reason, I report only the most relevant outputs. The estimated regression equation is Forecast Overhead = 3997 + 43.54MachHrs + 883.62ProdRuns Two important summary measures in any regression analysis are R-square and the standard error of estimate. Their values for this analysis are 93.1% and $4109. Now let’s turn to interpretation. The two most important values in the regression equation are the coefficients of MachHrs and ProdRuns. For each extra machine hour your company uses, the regression equation predicts that an extra $43.54 in overhead will be incurred. Similarly, each extra production run is predicted to add $883.62 to overhead. Of course, these values should be considered as approximate only, but they provide a sense of how much extra machine hours and extra production runs add to overhead.
<Charts as required>
(Don’t spend too much time trying to interpret the constant term 3997. Its primary use is to get the forecasts to the correct “level.”) The R-square value indicates that 93.1% of the variation in overhead expenses you observed during the past 36 months can be explained by the values of MachHrs and ProdRuns your company used. Alternatively, only 6.9% of the variation in overhead has not been explained. To explain this remaining variation, you would probably need data on one or more other relevant variables. However, 93.1% is quite good. In statistical terms, you have a good fit. For forecasting purposes, the standard error of estimate is even more important than R-square. It indicates the approximate magnitude of forecast errors you can expect when you base your forecasts on the regression equation. This standard error can be interpreted much like a standard deviation. Specifically, there is about a 68% chance that a forecast will be off by no more than one standard error, and there is about a 95% chance that a forecast will be off by no more than two standard errors.
FORECASTING:
Your forecasting job is now quite straightforward. Suppose, for example, that you expect 1525 machine hours and 45 production runs next month. (These values are in line with your historical data.) Then you simply plug these values into the regression equation to forecast overhead:
Forecast overhead = 3997 + 43.54(1525) + 883.62(45) = $101, 158
Given that the standard error of estimate is $4109, you can be about 68% confident that this forecast will be off by no more than $4109 on either side, and you can be about 95% confident that it will be off by no more than $8218 on either side. Of course, I’m sure you know better than to take any of these values too literally, but I believe this level of forecasting accuracy should be useful to your company
■ Clarify the objective. ■ Develop a clear plan. ■ Give yourself enough time.
Developing a Report:
■ Write a quick irst draft. ■ Edit and proofread. ■ Give your report a professional look.
Be Clear:
■ Provide suficient background information. ■ Tailor statistical explanations to your audience. ■ Place charts and tables in the body of the report.
Be Concise:
■ Let the charts do the talking. ■ Be selective in the computer outputs you include.
Be Precise:
■ List assumptions and potential limitations. ■ Limit the decimal places. ■ Report the results fairly. ■ Get advice from an expert.
Example 1:
I am a statistical consultant, and I have been hired by Company ABC, a semi conductor manufacturing company, to analyze its overhead data. The company has supplied me with historical monthly data from the past three years on overhead expenses, machine hours, and the number of production runs. My task is to develop a method for forecasting overhead expenses in future months, given estimates of the machine hours and number of production runs that are expected in these months. My contact, Sudhir Singh, is in the company’s finance department. He obtained an MBA degree about 10 years ago, and he vaguely remembers some of the statistics he learned at that time. However, he does not profess to be an expert. The more I can write my report in nontechnical terms, the more he will appreciate it.
REPORT BEGINS:
Sudhir, here is the report you requested. (See also the attached the R Markdown HTML file, Overhead Costs Analysis.htm, which contains the details of my analysis. By the way, it was done with the help of R Commander. If you plan to do any further statistical analysis, I would strongly recommend using this tool.) As I explain in this report, regression analysis is the best-suited statistical methodology for your situation. It fits an equation to historical data, uses this equation to forecast future values of overhead, and provides a measure of the accuracy of these forecasts. I believe you will be able to sell this analysis to your colleagues. The theory behind regression analysis is admittedly complex, but the outputs I provide are quite intuitive, even to people without a statistical background.
OBJECTIVES AND DATA:
To ensure that we are on the same page, I will briefly summarize my task. You supplied me with Bendrix monthly data for the past 36 months on three variables: Overhead (total overhead expenses during the month), MachHrs (number of machine hours used during the month), and ProdRuns (number of separate production runs during the month). You suspect that Overhead is directly related to MachHrs and ProdRuns, and you want me to quantify this relationship so that you can forecast future overhead expenses on the basis of (estimated) future values of MachHrs and ProdRuns. Although you did not state this explicitly in your requirements, I assume that you would also like a measure of the accuracy of the forecasts.
STATISTICAL METHODOLOGY:
Fortunately, there is a natural methodology for solving your problem: regression analysis. Regression analysis was developed specifically to quantify the relationship between a single dependent variable and one or more explanatory variables (assuming that there is a relationship to quantify). In your case, the dependent variable is Overhead, the explanatory variables are MachHrs and ProdRuns, and from a manufacturing perspective, there is every reason to believe that Overhead is related to MachHrs and ProdRuns. The outcome of the regression analysis is a regression equation that can be used to forecast future values of Overhead and provide a measure of the accuracy of these forecasts. There are a lot of calculations involved in regression analysis, but statistical software, such as StatTools, performs these calculations easily, allowing you to focus on the interpretation of the results.
PRELIMINARY ANALYSIS OF THE DATA:
Before diving into the regression analysis itself, it is always a good idea to check graphically for relationships between the variables. The best type of chart for your problem is a scatterplot, which shows the relationship between any pair of variables. The scatterplots in Exhibits A.10 and A.11 illustrate how Overhead varies with MachHrs and with ProdRuns. In both charts, the points follow a reasonably linear pattern from bottom left to upper right. That is, Overhead tends to increase linearly with MachHrs and with ProdRuns, which is probably what you suspected. The correlations below these plots indicate the strength of the linear relationships. These correlations, 0.632 and 0.521, are fairly large. (The maximum possible correlation is 1.0.) They suggest that regression analysis will yield reasonably accurate forecasts of overhead expenses.
<Show charts as relevant>
Before moving to the regression analysis, there are two other charts you should consider. First, you ought to check whether there is a relationship between the two explanatory variables, MachHrs and ProdRuns. If the correlation between these variables is high (negative or positive), then you have a phenomenon called multicollinearity. This is not necessarily bad, but it complicates the interpretation of the regression equation. Fortunately, as Exhibit X1 indicates, there is virtually no relationship between MachHrs and ProdRuns, so multicollinearity is not a problem for you.
You should also check the time series nature of your overhead data. For example, if your overhead expenses are trending upward over time, or if there is a seasonal pattern to your expenses, then MachHrs and ProdRuns, by themselves, would probably not be adequate to forecast future values of Overhead. However, as illustrated in Exhibit X2 a time series graph of Overhead indicates no obvious trends or seasonal patterns.
REGRESSION ANALYSIS:
The plots in Exhibits X1-X3 provide some evidence that regression analysis for Overhead, using MachHrs and ProdRuns as the explanatory variables, will yield useful results. Therefore, I used StatTools’s multiple regression procedure to estimate the regression equation. As you may know, the regression output from practically any software package, including StatTools, can be somewhat overwhelming. For this reason, I report only the most relevant outputs. The estimated regression equation is Forecast Overhead = 3997 + 43.54MachHrs + 883.62ProdRuns Two important summary measures in any regression analysis are R-square and the standard error of estimate. Their values for this analysis are 93.1% and $4109. Now let’s turn to interpretation. The two most important values in the regression equation are the coefficients of MachHrs and ProdRuns. For each extra machine hour your company uses, the regression equation predicts that an extra $43.54 in overhead will be incurred. Similarly, each extra production run is predicted to add $883.62 to overhead. Of course, these values should be considered as approximate only, but they provide a sense of how much extra machine hours and extra production runs add to overhead.
<Charts as required>
(Don’t spend too much time trying to interpret the constant term 3997. Its primary use is to get the forecasts to the correct “level.”) The R-square value indicates that 93.1% of the variation in overhead expenses you observed during the past 36 months can be explained by the values of MachHrs and ProdRuns your company used. Alternatively, only 6.9% of the variation in overhead has not been explained. To explain this remaining variation, you would probably need data on one or more other relevant variables. However, 93.1% is quite good. In statistical terms, you have a good fit. For forecasting purposes, the standard error of estimate is even more important than R-square. It indicates the approximate magnitude of forecast errors you can expect when you base your forecasts on the regression equation. This standard error can be interpreted much like a standard deviation. Specifically, there is about a 68% chance that a forecast will be off by no more than one standard error, and there is about a 95% chance that a forecast will be off by no more than two standard errors.
FORECASTING:
Your forecasting job is now quite straightforward. Suppose, for example, that you expect 1525 machine hours and 45 production runs next month. (These values are in line with your historical data.) Then you simply plug these values into the regression equation to forecast overhead:
Forecast overhead = 3997 + 43.54(1525) + 883.62(45) = $101, 158
Given that the standard error of estimate is $4109, you can be about 68% confident that this forecast will be off by no more than $4109 on either side, and you can be about 95% confident that it will be off by no more than $8218 on either side. Of course, I’m sure you know better than to take any of these values too literally, but I believe this level of forecasting accuracy should be useful to your company


