The next hurdle in my quest for data science knowledge involved building a classification model. To push the boundary of my knowledge into the frontier of the unknown, I chose a dataset which would require multi-class classification. My objective was to build a model that will predict accident causality based on factors such as weather, speed limit, and lighting conditions.
First, I began by grouping the primary contributory cause of accidents into four groups. Initially there were 39 different recorded reasons for an accident, but each ultimately fell into one of four groups:
Next, I built a map using KeplerGL to get a glimpse into where these accidents occurred.
By far the most accidents that occurred were due to driver negligence, the breakdown of totals by primary contributory cause were as follows:
- Driver Negligence — 221,684
- External Factors — 17,185
- Intoxication — 7,672