#!/usr/bin/env python # coding: utf-8 # # Spreadsheet basics # Although the Part 7 Notebooks do not contain any Python code, the Notebook format is still useful because - as explained in Part 1 - you can add your own notes to Notebooks. # ## Download a spreadsheet # Visit the Office for National Statistics (ONS) website at: # # [http://www.ons.gov.uk/ons/datasets-and-tables/index.html](http://www.ons.gov.uk/ons/datasets-and-tables/index.html) # In the text box next to the button labelled ‘Search for a keyword(s) or time series ID’ type the word 'Crime' (without the quotes) and click the button. # # You should see a page with three tabs: "All results", "Data" and "Publications". Select the tab "Data". # # You should be provided with a list of datasets and reference tables relating to crime in England and Wales. Many of the datasets are provided as Microsoft Excel spreadsheets. # Find the link entitled 'Crime in England and Wales: Appendix Tables' (you may need to move onto the second or third page of datasets). Alternatively, search for *Crime in England and Wales: Appendix Tables*.Click through, and select the dataset described as "Year ending June 2018". A link will appear to download an `.xlsx` file. # # Depending on your web browser, you may be asked whether you want to open or save the file. Choose to save the file. The chosen file will be downloaded to your computer. # The remainder of this activity uses the Microsoft Office 365 / Office.com Excel application, although if you are familiar with another spreadsheet application, such as Google Sheets or the Microsoft Excel desktop application, you can use that instead. The instructions given below are designed specifically for Microsoft Excel Online but you should be able to interpret them for other spreadsheet systems. # # For more information about accessing Office 365 see https://help.open.ac.uk/microsoft-office-365 . # ### Download a spreadsheet from ONS # The remainder of this activity asks you to interact with the spreadsheet you have just downloaded. # # The activity is divided into a number of sections, each of which points out a salient feature of the spreadsheet. Some sections simply ask you to observe a feature of the spreadsheet; others ask you to amend the spreadsheet. # ### Open the spreadsheet # Go to office.com / Excel online: [https://www.office.com/launch/excel](https://www.office.com/launch/excel) . You should be able to log in with your Open University credentials. # # On the left hand side, just above your own personal Onedrive document listing, click the `Upload and open…` link. # # Select the ONS crime file that you have just downloaded. Once it has been uploaded you should be able to explore the spreadsheet in your browser. # ### Examine the data # Spend a few minutes becoming familiar with the structure of this spreadsheet. Initially you should see a list of the titles of tables contained in the dataset. There is a considerable number of tables and figures. You can select a particular table or figure by clicking on one of the tabs at the bottom of the spreadsheet. # ## Observations: learning your way around a spreadsheet # *This section is intended to familiarise you with the essential features of a spreadsheet. If you are already familiar with spreadsheets then skip to the section labelled 'Exercise 1'.* # Clicking on a tab at the bottom of the spreadsheet takes you to a single **sheet**. A sheet can contain one or more tables, headings, notes and other information. In this dataset, each table has been placed on a separate sheet. # A **table** is a two-dimensional array of data; some data is numeric, some is text. # Click on any element in the table and it will be highlighted with a box drawn around it. Such an element is known as a **cell**. Try clicking on any cells to get a feeling for the structure of this table. # Each row in the table is numbered (the numbers are listed down the left-hand side) and each column is labelled with a letter (listed across the top). This enables each cell of the table to be identified by specifying its column letter and row number. For example, in Table A1 of the crime file, the number of 'Violence with injury' offences in the period April 2014 to Mar 2015 is recorded as 678 and is located at position `W9`. That is, cell `W9` contains the (numeric) **value** 678. The label `W9` is the **address** of the cell. # When you click on a cell, the value contained in the cell is also displayed in the **formula bar** located just beneath the menu bar (and is labelled fx in Google Sheets). # # The formula bar is used for editing the contents of a cell. Click on a cell, click in the formula bar and change its contents. Notice that, as you edit the contents of the formula bar, what you see in the corresponding cell also changes. # # Press *Enter* when you have finished editing the cell. The blue rectangle highlights the next cell in the column. # Table A1 is just a set of data. Each cell contains a value and you can change that value. Some cells, shown in bold type, contain totals. Clearly, someone or something has previously performed the additions to give these totals. # # In this table, the totals in bold type in row 8 represent the total violent crimes found from the sum of row 9 (*Violence with injury*) and row 12 (*Violence without injury*). Also note that the values in row 9 represent the sum of row 10 (*Wounding*) and row 11 (*Assault with minor injury*). # # However, if you were to change the number of woundings in the period April 2015 to Mar 2016 (the value in cell `W10`) because you were told that this number had been input incorrectly, this change would not be reflected in the totals in cells `W9` and `W8`. Try it and confirm that this is indeed the case. # # Clearly, it would be advantageous to arrange that a cell containing a value that depends on other values in the spreadsheet is updated automatically when one of those other values is changed. This is possible using an appropriate formula as you will see in the next section. # ## Performing simple calculations # ### Summing values in a range of cells # A really useful property of spreadsheets is that you can get them to perform calculations, such as summing the values in a column, very easily, as follows. # # a. Make a note of the value in cell `W9` (678). # # b. Click on cell `W9`. # # c. Replace the contents of cell `W9` by editing the contents of the formula bar to become: # # =SUM(W10:W11) # # The equals sign is significant because it informs the spreadsheet that a calculation is to be performed (otherwise the text *SUM(W10:W11)* would be stored in the cell). The text following the equals sign is known as a **formula**. # # The construct `W10:W11` is known as a **range**. A contiguous set of cells, either horizontal or vertical, can be specified by a pair of cell references separated by a colon (:). Thus, `=SUM(W10:W11)` means "apply a formula that sums the values in the cells from `W10` to `W11` inclusive". (Although the "range" here consists of only two cells, the range can obviously be many more than two, provided that the cells exist in a horizontal or vertical contiguous set of cells.) # # Notice how the spreadsheet system highlights the cells appearing in the `SUM` functions when you are editing the corresponding formula by surrounding the associated cells with dashed lines. This is to help you to check that you have chosen the appropriate cells for your calculation. # Press *Enter* to see the result of editing cell `W9`. You should see that the value displayed in the cell is the same as before, but the actual contents of the cell is the function `=SUM(W10:W11)` which you see in the formula bar when you click on the cell. # Now edit the value in cell `W10` (to 330, say, representing the correction of a transposition error when the value was first input). Press *Enter* and you will see that the value in cell `W9` changes to take into account the change in `W10`. # # It may be useful to think of a cell as a box containing a value or a formula and what you see displayed on screen in the cell is a view of that value or the result of carrying out the calculation. # There are many functions (such as `SUM`) built into spreadsheets and you can place arbitrarily complex calculations (a complex formula) into a single cell. This is both a strength and a weakness of spreadsheets. Putting a formula into a cell allows you to specify a possibly complex calculation, the result of which will be accurately recorded. Whenever you change a value of one or more of the cells mentioned in a calculation, the result will automatically change as a consequence. This ease of use is a potential weakness as you will see later. # ### Specifying a range of cells # Most spreadsheet systems offer a visual way of specifying a range of cells. For example, click on cell `W18` (it contains the value 447). # # In the formula bar replace the value with the following (this is a deliberately incomplete formula, the remaining details will be added shortly): # # =SUM( # # Notice how the spreadsheet tries to help you by giving information about the `SUM` function. # # Click and hold on cell `W19`, then drag down to cell `W21` and then release the mouse button. # # A dotted rectangle should have been drawn that surrounds the range of cells `W19:W21`. The range will have been automatically entered into the formula bar. # # Type a closing bracket into the formula bar and the appropriate formula will have been constructed. # # Press *Enter* and the calculation will be performed and the result will be displayed in the selected cell. # ### Activity 1 # Replace the contents of cell W8 with an appropriate formula: recall that the "Violence" values are the sum of the "Violence with injury" and "Violence without injury" values. # #### Our solution # # To reveal our solution, click on the triangle symbol on the left-hand end of this cell. # Two correct formulas could be: # # =W9+W12 # # or # # =SUM(W10:W11) + W12 # # because W9 is itself the sum of W10 and W11. # # This illustrates that you have to be very diligent when reading spreadsheets, as the tables may not be uniform in their contents. This table contains lists and sublists, values and formulas which are not immediately obvious from the nice layout on the screen. # # #### End of Activity 1 # -------------------------------------------------------------- # ## Relative addressing # *If you know that an address such as U9 represents a relative address, skip to the section headed 'Absolute addressing'.* # In Table A1, suppose that you wanted to replace the static values in row 8 (total violence figures) by formulas that will calculate the sum of the individual values of violence with injury and violence without injury. We have already seen that to calculate the value in, say, cell U9, you could replace the value (982) by the formula: # # =SUM(U10:U11) # or # =U10 + U11 # # Rather than type what is essentially the same formula for the remaining cells in row 9, there is a simpler way. # # Click on cell U9; it should have a box around it with a small square at the bottom right-hand corner. When you hover the mouse cursor over the square it turns into a large '+' symbol. Drag the small square to the right, across the cells V9 to Y9, and release the mouse button. The same values as before should still be visible in cells V9 to Y9. # # Now click on any of the cells from V9 to Y9. You should see a formula in the formula bar. The formula will have the same structure as the original in U9 but the locations (cell addresses) will have been amended to fit the appropriate percentage cell. For example, cell X9 will contain the formula: # # =X10 + X11 # # and cell Y9 will contain: # # =Y10 + Y11 # # Copying a formula in this way is clearly a great benefit, saving considerable effort. The spreadsheet system has many such efficiency features. # # At this point, you might also be starting to see how complicated a spreadsheet may become and how hard it might be to find errors in it. In this case, we would ideally want *all* the values in row 9 to be calculated values rather than some cells containing literal values and some cells containing calculated values. The easiest way to do this is to place the formula in the first cell for which you want to calculate values and then drag the formula across the rest of the row. # Now suppose that you now want to alter the spreadsheet to contain an additional row that contains the figure for whether the victim of an assault required hospital treatment. Perhaps you want to place this row between the current rows 10 and 11 ("Wounding" and "Assault with minor injury"). # # Click on the number 10 at the left-hand end of row 10. The whole of row 10 will be highlighted. # # From the Home tab toolbar, open the Insert menu and select `Insert sheet rows...` (Alternatively, right click on the row number and from the pop-up menu select `Insert Rows`). A new row will be added to the spreadsheet. # # If you now click on cell W9, you will find that it contains a revised formula which has been changed to match the fact that the original data is now in different rows: # # =W10 + W12 # # The same change will have been performed on all the cells which you modified in row 8. This is another example where the spreadsheet system has amended formulas to match an editing activity. # # This form of cell referencing is called **relative addressing**. The system does not store the actual address, but instead stores the number of columns and the number of rows relative to the cell containing the formula. # # ### Absolute addressing # *If you know that an address such as \$B\$7 represents an absolute address, skip to the section headed 'Formatting: the presentation of data'.* # # Before starting this section, return Table A1 to its original state by removing the row you inserted in the previous section (either press the 'undo' button or select the new row 11 and use 'Delete row 11' from the Edit menu). # There is another form of cell referencing known as **absolute addressing**. There are times when you don’t want the system to automatically change the references when you copy formulas. Try the following: # # First, add two empty rows to Table A1, under row 13. To do this: click on the number 13 on the far left of the spreadsheet to highlight row 13, then select `Insert Rows` from the pop-up menu. Do this a second time. # # Next, write the text "% Wounding" in cell `A14`, and "% minor injury" in cell `A15`. Your spreadsheet should now look something like this (depending on whether you are using Excel online, and your choice of browser, operating system, etc.): # # # # # # # ![Excel online - crime spreadsheet with added cells](./appendixtablesyearendingjune2018_xlsx.png) # We will now use row 14 to contain the percentage of violent crimes which resulted in an injury. # # In cell `B14`, enter the formula: # # = 100 * B10/B9 # # This should result in the value 45 (the display of the calculated percentage has been rounded) appearing in the cell. Now, as before, copy this value across the row, by selecting `B14`, clicking on the small square, and dragging across to cover cells `B14` to `AB14`. You should find that each cell now contains a value representing the percentage of violent crimes which involved wounding (generally between about 35% and 55%). # Next, we will fill in row 15, to contain the percentages for minor injuries. To fill this row in the same way as for the woundings, we might try to copy row 14: because the spreadsheet recognises relative values, copying row 14 into row 15 should use the context to obtain the same relationship as between rows 10 and 11. # # Select the cells `B14` to `AB14`, and use the square to drag the cells onto `B15` to `AB15`. Your spreadsheet should now look like this: # ![Excel online spreadsheet with added rows](./appendixtablesyearendingjune2018b_xlsx.png) # Clearly something has gone wrong: the values in row 15 are much higher than we would expect. If we look at the contents of the cell `B15`, it contains the formula: # # = 100 * B11/B10 # # The spreadsheet has correctly chosen `B11` rather than `B10` as the number of minor injuries, but has used `B10` rather than `B9` as the total number of injuries. # # To remedy this, edit the formula in cell `B14` to become: # # = 100 * B10/B$9 # # This states that the the relative address `B9` has been replaced by the absolute address `B$9` (the row number is now preceded by a dollar sign to fix, or *anchor* the row number; in a similar way, placing a dollar character before the column label would indicate that the column should also be treated as a fixed, absolute column value). # # Now repeat the above steps, by copying this formula to the appropriate cells in the range `B14` to `AB14`, and then dragging the cells `B14` to `AB14` onto `B15` to `AB15` so that the rectangle from `B14` to `AB15` is filled. The correct percentages should appear. # # ### Formatting: the presentation of data # *If you know how to change the format of a numeric value, skip to the next section headed 'Built-in statistical functions'. However, the examples in this section use the values generated in the previous section (Absolute Addressing), so if you are working through this section, do ensure that you have the additional two rows defined in that section.* # If you click on the cell `AE8`, and look at the cell's value, you should see that the actual cell value is displayed as -66.7630950366119 in the formula bar. However, in the spreadsheet itself, the value of the cell appears to be -67. This is an illustration of another feature of spreadsheets: the ability to set the display format of the cells. # Click on cell `B14`, which should contain the value 45 (this section assumes that you are using the spreadsheet as modified in the previous section). # # On the Home tab tool bar there is a button labeled with '.0' with an arrow pointing to the left. Click on this tool two or three times and observe the effect on cell `B14`. # # Perform a similar experiment with the tool labeled with '.00' with an arrow pointing to the right. # # This is a way of indicating how the actual value (as shown in the formula bar) should be displayed in the cell. # There are a variety of standard formats. Select the Home tab toolbar item ‘Number’. There you will see that it is possible to apply a variety of formats (to a single cell or a range of cells) including percentage, scientific, financial, currency, and date. # ### Built-in statistical functions # *If you know how to use built-in functions, skip to the next section headed 'Visualising data in a spreadsheet'.* # All spreadsheets come with a variety of built-in statistical functions. It is very easy, for example, to calculate the mean and standard deviation of a set of values in a range of cells. # #### Move to Table A10. # # In cell `A15` enter the following formula: # # =AVERAGE(B14:K14) # # The cell should now evaluate to the value 15, the average value of the data in the range `B14` to `K14`. # # In cell `A15` enter the formula: # # =STDEVA(B14:K14) # # The cell should now evaluate to the value 4, the rounded standard deviation of the data in the range `B14` to `K14`. Click on the numeric formatting button with the '.0' and the arrow pointing to the left to increasing the decimal precision of the display standard deviation. # # ### Visualising data in a spreadsheet: creating simple graphs # The final exercise that we want you to carry out illustrates how you can create visualisations of data held in a spreadsheet. # # In this part of the activity you will create a simple graph of some of the data in Table A10 that illustrates the trend data. # # To create a graph, the data must first be recorded in columns (rather than the rows which are currently used). # # Many spreadsheet applications, such as Google Spreadsheets or the Microsoft Excel desktop application support a facility that will help get the data in the correct form ‘Paste transpose’ option, sometimes as a refinement of a ‘Paste special’ option. Unfortunately, Excel on line *does not* have such a utility. # # Start by selecting all the headings and data in cells `A8` to `L12`. To select multiple cells in a row, click in cell `A8` and then hold the `SHIFT` key and click on cell `L12`. (If you click in cell `B8` to select it and then drag along to cell `L8`, you will copy the value of `A8` into each cell.) Enter `Ctrl+C` to take a copy of the data (`CMD+C` on a Mac). # # #### Creating the Chart - Google Spreadsheets / Excel desktop application # # Click in cell `AD7`. Select the ‘Paste transpose’ option. The data that you selected will now have been copied but the values will have been arranged vertically rather than horizontally. # # # #### Creating the Chart # # In Google Spreadsheets / Excel desktop application, select the whole of the copied data (if you have not clicked on the spreadsheet since pasting the data, the copied data will already be selected). # # In Office 365 / Excel online, select the original data in cells `A8` to `L12`. # # From the Insert menu, select the *Line chart* from the available chart types and then select the simplest 2D line chart. # # In Google Sheets / Excel desktop, you should be presented with a view of the chart showing the selected data with appropriate axes. # # In Office 365 / Excel online, you will need to click on the chart to select it and then from the Chart tab select the `Switch Row / Column` to transpose the rows and columns of the data used to generate the chart. # # In some spreadsheet applications / charts, you can hover the cursor along the line to reveal the data points and their values. # Microsoft Excel online and Google Sheets both automatically save your spreadsheet (to your Microsoft OneDrive and to MyDrive in Google Drive respectively). There is no need to explicitly close the spreadsheet using online spreadsheets. In other spreadsheet systems you may have to go to the File menu to save the spreadsheet file with a suitable name, and then explicitly close the spreadsheet. # ## Concluding remarks # Finally, we would like to draw your attention to some general points about the data provided by ONS. # # As you will have observed, the spreadsheet provided by ONS contains values alone: no formulas are provided. To discover how summary data has been calculated you have to examine the notes provided by ONS. # # Each table has meaningful labels for its columns and rows. There are a variety of notes to accompany each table and figure to help you understand the data presented. Even then, with this information it is not always easy to understand precisely what the data represents: in Table 1, what is the difference, if any, between robbery and theft, for example? # # Fortunately, ONS provides a wealth of additional information on its website. If you move to the first sheet, labelled *Notes*, there is a link to a set of explanatory notes for the statistics in this spreadsheet. There you will find that a robbery is an incident or offence in which force or the threat of force is used either during or immediately prior to a theft or attempted theft. # # Clearly, you need to have some understanding of the categorisation of offences in order to fully appreciate the data you are viewing. # ## Summary # In this Notebook you have seen: # # 1. How to open an Excel spreadsheet file using Office 365 / Excel online. # 2. That a spreadsheet consists of one or more sheets, each sheet typically holds one two-dimensional table. # 3. That a table consists of cells each one of which is identified, or addressed, by the letter of its column and the number of its row. # 4. That a cell can contain a value (either numeric or string) or a formula (a calculation). # 5. When clicked on, the value in a cell is displayed in the formula bar ready for editing. # 6. That the value shown in a cell is either a view of the value shown using a specific format, or the result of carrying out the formula stored in the cell. # 7. How to specify a range of cells by giving the addresses of the first and last cells in the range. # 8. That an address given in terms of a specific column letter and row number is a relative address. Relative addresses are adjusted automatically as necessary whenever rows or columns are inserted or deleted. # 9. That references to a specific cell or range of cells that must not be automatically adjusted use absolute addresses in which either a column letter and/or a row number is preceded by a dollar sign, e.g. `$B$2`. # 10. That data can be formatted using one of a number of in-built formats. # 11. That spreadsheet systems provide in-built functions for performing common calculations. # 12. That it is possible to generate simple graphs using the Chart function, provided the data has been suitably arranged in a table to fit the requirements of the function. # # ## What next? # If you are working through this Notebook as part of an inline exercise, return to the module materials now. # # If you are working through this set of Notebooks as a whole, move on to `07.2 Problems in spreadsheet construction`.