Friday, November 29, 2019
Maths Statistic Coursework Essay Example
Maths Statistic Coursework Essay I have been given the task of finding what affects the price of a used car, using a spreadsheet given to me displaying data on a hundred cars with data on about each car. The data on the cars were: (See Spreadsheet 1)Make Model Price When NewUsed Price Age ColourEngine Size Fuel Type MPGMileage Service OwnersLength of MOT Tax (Months left) Insurance GroupDoors (Amount) Style Central LockingSeats Gearbox Air ConditioningAirbagsImmediately from looking at those categories I omitted colour, fuel, service, doors, style, central locking, seats, gearbox, air conditioning and airbags. I omitted this data because it is of a low range of contains words, these would be hard to show on graphs and would show me little evidence of what affects a used car price.E.g. Colour: Cannot produce a scatter graph as it uses words.Seats: Has a range of 2-5 and would produce poor scatter graphs and would be hard to find a direct relationship on.Then from the remaining categories I picked age, insurance group , MPG, mileage and of course used price, as this is what I was investigating. It then dawned one me that I could use the depreciation price, the price when I took the used price away from the new, this perhaps could be a more accurate look at the data as some cars depreciate quicker than others. Looking further into that work I decided against it as it would take longer and time was of the essence, but this was perhaps an extension that could be added on at the end.Reasons Why* Age: Has a large range and would be interesting to see what sort of relationship there is* Insurance Group: Again a wide range.* MPG: Grouped data could be used on cumulative frequency graph and has quite a large range.* Mileage: Huge range and a definite effecter of used price but would be interesting to exactly how much.SampleI was given 100 cars but to investigate this would be very time consuming so I would have to bring that number down. In the end I chose to do a 40 car sample as it is a round number, l ower than 100 but still big enough to display a fair representation of the data supplied.Sampling MethodNow Ive decided how big I need my sample, I know have to decide how I will sample. There are two main methods random or stratified, eventually I want to try both but for now I will use a random sample. To do this I will use the random number function on my calculator.I press the random number button and a 3 decimal place number is displayed, I then picked the first 2 numbers and used this as my sampling method. If a number was repeated I ignored it and chose again.EG.Random produced number 0.311 so I chose car number 31Random produced number 0.981 so I chose car number 91Using this sampling method I chose my first group of cars. They ended up being numbers.1 2 4 5 7 8 15 16 17 18 21 22 24 26 27 31 32 35 37 38 44 51 53 63 65 67 68 70 71 73 76 77 83 86 91 95 96 97 98 98From these car numbers I made a table with all the data on the cars above thats I needed such as used price, MPG an d mileage. (See Spreadsheet 2)From this data I complied for scatter graphs on:* Age against used price* MPG against used price* Mileage against used price* Insurance group against used priceI used scatter graphs as they will display relationships between the data, which is why used price is in everyone. A scatter graph will also give me the ability to put a line of best fit in giving me the ability to predict future data.Predictions* For age I believe there will be a very strong negative correlation as the older the car gets the lower the price.* For MPG I believe there will be a weak positive correlation as the higher the MPG the higher the price but I believe it doesnt affect it that much.* For mileage I believe there will be a very strong negative correlation as the mileage increases the price will decrease.* For insurance group I believe there will be a weak negative correlation as the higher the insurance group the price will decrease but not by much.As you can see from my pred ictions I believe that mileage will affect used price the most while insurance group will affect it the least from the ones I chose.See scatter graphs 1, 2, 3 and 4.Conclusions of Random Sampling.As you can see some of my predictions were right while others werent.* Age was a big effecter of price and had quite a strong negative correlation as I predicted.* MPG again had a very strong negative correlation showing it did affect price a lot, which I predicted wrongly.* Mileage had quite a strong negative correlation but not very strong as I said. It shows mileage affects price but only to a degree by the shape of the graph it appears a curved line of best fit would suite it better but I shall leave that to that.* Insurance group did have a positive correlation and quite a strong one at that, showing as the insurance group went up so did used price.ObservationsAs you can see on all of the graphs there are pieces of data that are way of the lines of best fit and away from the rest of th e data. I purposely kept this data in as it gives me a valid reason to do another sampling method. This data can be called anomalies as they differ from the rest of the data. I could cut this data out to make the sample fairer but then it wouldnt be a true random sample.With these observations made I can say a few things of what affects used car prices but now I shall move on and use a stratified sample and see if the data is more reliable.StratifiedA stratified sample is one where all the data has been put into an order and then a then picked out. For my stratified sample I have ordered them by mileage and then grouped the mileage and picked 40% from each group. This ensures I get 40 cars again so I can evenly compare the random and stratified samples.The mileage groups were. 0-50005000-10,00010,000-20,00020,000-40,00040,000-70,00070,000-110,000With these sorted I took 40% at random from each group and ended up with this. I ensured it was random by drawing numbers out of a hat resp ective to the numbers of the car, I then noted that number and placed in back in so each time the chance of drawing a single card was equal and didnt change. If I drew the same one twice I simply ignored that, placed it back in and redrew. (See Spreadsheet 3)If actually counted there are 41 cars. As 40 and 41 are very close, rather than tamper with any results which could make them biased I simply left them.From this data I then compiled scatter graphs on them just as before.Predictions* Age, I believe that there will be a strong negative correlation as there was before but as this is supposedly a more reliable sample it should be more evident.* MPG, I believe there will be a strong negative correlation as there was before but should be more evident due to sample being more reliable.* Mileage should have a strong negative correlation due to reasons above.* Insurance group should have a strong positive correlation due to reasons mentioned above.See graphs 5,6,7 and 8.Conclusions on S tratified Sampling.As you can see some very strange results came up.* Age showed the very strong negative correlation as I said there would be.* MPG showed a strong negative correlation as well as I said.* Mileage proved very weird. The data was in two groups basically one showing high mileage and low price while the other low mileage and low price. From this I can deduce that the mileage is a limiting factor of used price.* Insurance group showed no correlation with data all over the place, show perhaps my random sample was a mishap and in fact insurance has no relationship or very little with used price.ObservationsCorrelations were generally a lot tighter showing that stratified sampling alleviates anomalous data but can provide strange results, such as mileage for example. This result however may not be wrong but in fact right and the random results were wrong. To find out this I shall become more specific and look at another way of representing data.HistogramsAfter some thought a great way of comparing two sets of data and in a visual manner would be a histogram.To make a histogram I would have to group the mileages this however was easy as I shall take the groups I did for my stratifying of the data.The mileage groups were. 0-50005000-10,00010,000-20,00020,000-40,00040,000-70,00070,000-110,000I then made a tally chart with the groups and both random and stratified data.RandomMileage GroupTallyFrequency0-500015000-10,000110,000-20,000520,000-40,0001440,000-70,0001970,000-110,0002StratifiedMileage GroupTallyFrequency0-500015000-10,000210,000-20,000420,000-40,0001140,000-70,0001870,000-110,0005Then to construct a histogram I would have to work out the frequency density to go on the horizontal axis, this is worked out by.Frequency Density = FrequencyGroup WidthSo I ended up with this.Mileage GroupFrequencyFrequency Density.0-500011/5000=0.00025000-10,00011/5000=0.000210,000-20,00055/10,000=0.000520,000-40,0001414/20,000=0.000740,000-70,0001919/30,000-0.00063 70,000-110,00022/40,000=0.00005RandomMileage GroupFrequencyFrequency Density.0-500011/5000=0.00025000-10,00011/5000=0.000210,000-20,00055/10,000=0.000520,000-40,0001414/20,000=0.000740,000-70,0001919/30,000-0.0006370,000-110,00022/40,000=0.00005StratifiedMileage GroupFrequencyFrequency Density.0-500011/5000=0.00025000-10,00011/5000=0.000210,000-20,00055/10,000=0.000520,000-40,0001414/20,000=0.000740,000-70,0001919/30,000-0.0006370,000-110,00022/40,000=0.00005Mileage GroupFrequencyFrequency Density0-500011/5000=0.00025000-10,00022/5000=0.000410,000-20,00044/10,000=0.000420,000-40,0001111/20,000=0.0005540,000-70,0001818/30,000=0.000670,000-110,00055/40,000=0.000125Predictions* I predict that the random histogram will have a much more erratic distribution of car mileage while the stratified distribution will be more of bell shape displaying the majority in the mid range with low or no extreme values displayed.I then proceeded to draw the graphs.See Graphs 9, 10 and 11Results* As seen o n the two histograms there are some slight differences. The spread of the random sample is a little more erratic and uneven than that of the more bell shaped graph the stratified data shows. From this you could deduce that the stratified sample is a more reliable source of data than a random sample.* From individual graphs you can see that the majority of the cars are around the 20,000 to 60,000 miles range in both the random and stratified samples. Standard deviation could perhaps tell me which sample is more accurate so that could be an extension to the work done.* I mentioned a bell shape graph before. By this I mean one, which slowly goes up to a peak then reduces down, with the majority of the data displayed in the middle and only some or no data displayed in the highest and lowest areas.However from the histograms I did not find any reasoning behind the weird shaped and correlated stratified scatter graph. Further investigation into this could prove interesting.Overall Conclus ionFrom all the work carried out above you can clearly see that many different things affect used car prices and some more than others. You could say that the different categories are limiting factors and a culmination of these results in the depreciation of a cars price.As a further investigation I would look into the strange scatter graph produced by my stratified mileage sample. Perhaps using standard deviation or other data representation methods I could find out why it is so peculiar. I could also look at how one category affects another such as engine size and mileage or engine size and MPG and find a relationship between those. There are many more aspects that I could of considered but however from the work Ive done there are things that are certainly clear.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.