I’ve been digging further into the fire data from my last post (here and here), trying to see if there is a pattern that might explain the large number of Riverside County fires in 2007.
Here are some descriptive numbers. Of the 955 fires in Riverside County during 2007:
932 (97.6%) of were unnamed. The vast majority of fires in the data set have names. But many of these riverside fires do not. This is unusual. Only four Riverside fires in other years were unnamed. However, 2007 had the largest number of unnamed fires across the state (2529/3211, 78.8%)
933 (97.7%) burned 1 acre or less. The average percentage of such fires for Riverside across all other years was 67.5%. Statewide, 2007 had a relatively high percentage of fires that burned 1 acre or less (88.6%, 84.8% without Riverside), but not the highest (1976: 89.6%). Note that most (92.5%), but not not all of these fires were unnamed.
A total of 2721 acres was burned. For all other years, the average total acreage burned in Riverside was 14,902 acres. Across the state, a total of 648,614 acres were burned in 2007. This was the largest annual total in the data set.
By comparison, San Diego County had the largest total area burned in 2007 with 453,447 acres burned by 172 fires. Santa Clara County had the largest average fire size, with 10 fires averaging 9,552 acres each.
This chart shows how much of an outlier the number of fires in Riverside is. The next highest fire count was for San Diego County in 1998 (424 fires).
I’ve been doing some more playing with the fire data from my last post to try and identify some aspect of the data that might help explain the pattern. My first step was to localize the fire data to specific counties. This information does not exist in the original data set, but it does contain latitudes and longitudes. I was able to map these latitudes and longitudes to a specific California county using the over function in R (from the sp package).
Looking at a choropleth map for 2007 shows that Riverside County had a significant number of fires, almost a third of the of fires that took place in California that year.
Inspired by a post on Nathan Yau’s excellent data visualization blog Flowing Data, showing a calendar view of fatal accidents, I thought I try creating a calendar map with some wildfire data I have been playing with (available from NIFC). This data set includes fire data going back to 1972.
Starting with R calendar map code by Paul Bleicher, I created plots of fires in California for various years.
I’ve been working on some research with Robb Dunbar of the University of Minnesota – Rochester that looks at using group or ‘pyramid’ exams as a cooperative learning technique. In this model, students take each exam individually. They then re-take the same exam cooperatively within a small group. Their overall grade is a weighted combination of the two grades.
Analyzing this kind of data presents an interesting challenge. The group’s score is not completely dependent on the individual scores. The group members interact to determine their group answers in ways that can have significant effects on the group score. In playing around with the data, I ended up creating a chart that highlights patterns in the data that were not visible with standard techniques.
The first thing to notice is that most students improved on the group test. The other thing is that it looks like there are three types of groups:
Type 1: The group score is (more or less) equal to that of the highest individual. This group may be characterized as having a strong leader – the highest scoring student asserts that his/her answers are correct and others follow
Type 2: The group score is less than highest individual score, but greater than all other individual scores in the group. This group may be characterized as having a negotiating leader – the highest scoring student asserts that his/her answers are correct but others may disagree. The negotiated answer may not be correct.
Type 3: The group score is higher than all individual scores in the group. This group may be characterized as one where everyone works together to figure out which answer is correct. The highest individual score in this type of group tends to be lower than that of the other groups. Maybe their strategy is the result of the recognition that no one student has all of the correct answers.
Further research is needed to figure out if this pattern holds up and if the descriptions make sense. However it is an interesting example of how visualization can reveal patterns that go unnoticed.