Sunday, December 11, 2016

NYC vehicular accidents: WHEN?


W
hen do most vehicular accidents in NYC happen?  


My data source is the NYPD vehicular collision dataset ranging from July 1, 2012 - December 6, 2016.



We would expect most accidents to occur during hours where more people are driving/commuting. Since most people neither work nor go to school on weekends, we would also want to look at weekdays and weekends separately.  Which hours have the most/least accidents?  Do we find any differences between weekends and weekdays?  Are findings consistent across years (2012-2016)?

Below you can see colored bar plots of the yearly percent of vehicular accidents (y-axis) by hour of the day (x-axis).  The hour of the day is expressed such that hour 0 is between 12:00 am to 12:59 am and hour 23 is between 11:00 pm - 11:59 pm . The bar colors range from dark green (least) to dark red (most) to accentuate the variance across the hours of the day.  The top graph shows weekdays, while the bottom shows weekends.














































From the results in the bar plots, below are the main findings that are consistent year over year.


  • 2 pm as well as 4-5 pm are when most vehicular accidents occur year over year.
  • There is an 8-9 am spike in accidents on weekends that is not present on weekends.  
  • The hour with the least percent of accidents shifts from 3 am on weekdays to 7 am on weekends.  





What day in the week has the most vehicular accidents?  Is it Friday when people are gearing up for a night out?  Or is it Monday when people are groggy from sleeping in on the weekend and rushing to work?  What day in the week has the least vehicular accidents?  To answer this question, see the horizontal bar plots below illustrating percent of accidents by day of week across years.





From the results in the bar plots, these are the main findings consistent year over year:


  • Friday had the most vehicular accidents 
  • Sunday had the least vehicular accidents 


Note that the findings don't necessarily mean that you are less likely to get in a vehicle collision on Sunday versus Friday, only that the absolute volume of vehicle collisions is lower.

Saturday, November 26, 2016

NYC vehicular accidents: WHERE?

WHERE -- Where do most vehicular accidents in NYC happen?

Happy Thanksgiving!  After a long hiatus, I will present some results addressing the above question about New York City vehicular collisions (source: NYPD vehicular collision dataset July 1, 2012 - November 21, 2016).  

--------------------------------------------------------------------------------------------------------------------

WHERE -- Where do most vehicular accidents in NYC happen?



  • in the last week of the dataset (November 14 2016 - November 21 2016)
















































  • the last month of the dataset (October 21 2016 - November 21 2016)

























  • in November (Novembers 2012-2016)


  • in the entire dataset (July 1, 2012 - November 21, 2016) 








The zip codes that are common among all these views are:
  • Manhattan - Midtown & Hells Kitchen (10019)
  • Manhattan - Turtle Bay & Midtown (10022)
  • Long Island City (11101)
  • Downtown Brooklyn (11201)
  • Brooklyn - East New York (11207)


Thus, it wouldn't hurt to be extra careful driving, walking, or cycling through the above 5 areas.

--------------------------------------------------------------------------------------------------------------------




Still to come....
  • WHEN -- When do most vehicular accidents in NYC happen?
  • HOW -- What is the predominant reason/cause for these vehicle accidents?  

Drive safely as we approach the end of the holiday...and remember, don't drive like my brother!


Monday, June 29, 2015

NYC vehicular accidents: Introduction


NYC, the most populated American city (8 million+ as of July 2014), is inundated with motor vehicles (e.g. taxis, passenger vehicles, vans, etc.) amidst pedestrians and cyclists, all sharing roadway access.  In large cities like NYC, unintentional motor vehicle accidents occur everyday.



According to 2013 CDC estimates, non-fatal injuries resulting from unintentional motor vehicle accidents were one of the top 10 leading causes of nonfatal injuries treated in U.S. hospitals (ages 5+).  Once the individual was of legal age to drive, injuries resulting from unintentional motor vehicle accidents became consistently ranked within the top 5 leading causes of nonfatal injuries in U.S. hospitals (see Table 1).

Table 1

Even more alarming, fatal injuries (death) resulting from unintentional motor vehicle accidents were consistently within the top 4 leading causes of injury death for all age groups in the U.S. (see Table 2).

Table 2

Although these CDC statistics are worrisome, vehicles and roadways are being made safer through technological innovations (e.g. self-stopping cars) and new navigational options such as traffic redirection.  However, little has been done to address the actual roadway in which one drives.  From past history, are some areas particularly prone to vehicle accidents?  When and how so?

In next blog post, I will address these questions.  I will be using historical motor vehicle collision data (since July 1, 2012) in NYC to deliver statistical insights as to where, when, and how motor vehicle accidents have occurred in the past. 

Specifically....

where: in what zip code did this vehicle accident happen? were pedestrians or cyclists involved?
when:  year, season, month, and hour of vehicle accident
how: the reason given for the accident (if one was given)

I hope this overarching idea can be:
  1. incorporated in navigational systems in the future as an extra feature to promote increased vigilance for drivers, pedestrians, and cyclists...
  2. utilized by emergency patrol vehicles, such as police or EMTs, to be faster responders to vehicle accidents, resulting in better health outcomes

Until next time!

Saturday, December 20, 2014

Maximizing Opening Weekend Sales for Wide-Release Horror Films


As scared as I get, I really do enjoy the experience of watching horror films.  But what exactly determines a horror film's success?  More specifically...

  • How can a horror film producer maximize a wide-release horror film's opening weekend sales?  
  • Does the horror film's budget matter?  
  • Which major studio distributor should the producer use?
  • What other factors (e.g. month of release, runtime, sub-genre, etc.) are at play?

I set out to answer these questions using BoxOfficeMojo data from all wide-release horror films with film budget information.  A wide-release film is released in at least 600 theaters, thereby factoring out all obscure horror films.

I built a regression model to predict opening weekend sales from the following features of each horror film:
  • number of theaters released (≥ 600 for wide-release)
  • budget for the film
  • years since 1975 (are opening weekend sales increasing or decreasing?)
  • runtime of movie (does film length matter?)
  • the rating of the movie (PG-13 or R rating)
  • horror subgenre (comedy, found footage, period, prequel, remake, scifi, serial killer, slasher, supernatural, terror in the water, torture, vampire, video game adaptation, & zombie) ... a movie can be of multiple subgenres
  • studio distributor (Buena Vista, Dimension, DreamWorks, Fox, Lionsgate, MGM, Miramax, Paramount, Relativity Media, Sony Entertainment, Universal Pictures, & Warner Brothers)
  • month of release

Ordinary least squares was used to estimate the parameters for the model below:


There was a significant (p<0.001) positive correlation of number of theaters released with opening weekend sales.  Specifically, the model predicts that one additional theater would result in a $15,560 ± $1,869 increase in opening weekend sales.  

The figure below shows the relationship between opening weekend sales and the number of theaters released and illustrates: 
  • the positive correlation of opening weekend sales with the number of theaters released
  • the observed data compared with the model prediction (R-squared: 0.871).

R-squared goodness of fit value: 0.871

Interestingly, the only other significant correlation found in the regression model was a positive correlation (p=0.026) of the period horror subgenre on opening weekend sales.  Specifically, the model predicts that releasing a period horror film (i.e. a horror film set in a past time period) would result in a $7.66 ± $3.39 million dollar increase in opening weekend sales.  Some examples of period horrors include Paranormal Activity 3, The Conjuring, Exorcist: The Beginning, Sleepy Hollow, and Shutter Island.

Why do period horror films appear to do better than the rest?  My hypothesis is that period horrors are usually based on something familiar, whether it being the time period or the story.  Due to the mere-exposure effect, individuals tend to show an affinity toward encountered items over unencountered ones.  This may result in a subconscious preference for watching familiar period horror films, especially for older generations.  In addition, the availability heuristic biases individuals to believe that more familiar events are more likely to occur.  This can intensify the creepiness of the film and increase viewership by horror film fans.

These features did not have a significant effect on opening weekend sales:
  • budget
  • years since 1975
  • runtime
  • rating
  • studio/distributor
  • month of release

Based on the findings, my advice for film producers wishing to maximize opening weekend sales for their wide-release horror film is to:
  • release in as many theaters as possible
  • produce a period horror thematically based on a legend, myth, and/or time period that most people are familiar with
  • not worry about the film's budget
  • not worry about the possible decrease in the number of movie-goers with the increased accessibility of streaming movie providers such as Netflix
  • not worry about the length of the film or the rating
  • not worry about which studio/distributor to use unless they can release in more theaters
  • not worry about releasing the film around Halloween...there is no evidence that it helps (possibilities include students being in school, lack of proximity to holidays such as Thanksgiving or Christmas, and increased competition from other horror films)

Thank you

=]

Friday, November 28, 2014

The LA|NY State of Mind: LA Restaurants May Offer Better, More Memorable Service Compared to NYC Restaurants


Do LA restaurants generally offer better service than NYC restaurants?  As before, I answered this question using Yelp restaurant comments from the top 40 restaurants from 44 different categories of cuisine in LA and NYC.  To determine the relative impact of service, I first calculated the tf-idf of the word service in all comments from the top 40 restaurants in each of the 44 categories.  I then normalized the tf-idf of the word service as a percentage in relation to the highest tf-idf word from the same set of comments.  I only used 5-star and 1-star Yelp comments to remove any ambiguity in sentiment when comparing positive and negative service, respectively.  I then compared the average impact of positive and negative service across all 44 restaurant categories within each city as well as between cities.  

I found that within each city, there were no significant differences in positive versus negative service importance (LA: t(43)=0.722 , p=0.472 ; NYC: t(43)=1.502 , p=0.137).  Thus, neither LA nor NYC diners were more positive or negative about service.  There was also no significant difference between cities with regards to average importance placed toward negative service (t(43)=0.090, p=0.929).   However with positive comments,  I found a significant difference whereby LA diners placed a significantly higher importance on positive service compared to their NYC counterparts (t(43)=2.272 , p=0.026).  


Negative vs Positive Service Importance Within Cities

N.S. = not significant (p≥0.05)

Negative vs Positive Service Importance Between Cities

star = significant (p<0.05)


For the top 40 restaurants among 44 popular categories of food, LA restaurant reviewers placed significantly more importance on positive service compared to NYC restaurant reviewers.  

I can't help but wonder:
  • Does LA's sunnier, warmer, and less rainy weather make diners and servers more positive?
  • Conversely, does NYC's more gloomy and cold weather make diners and servers less positive?
  • Do other positive aspects (taste of the food, plating, ambience, etc.) in NYC restaurants overshadow the importance of service?
Ultimately for top restaurants in each city, LA may offer more memorable positive service than NYC.  

Thanks again, 

Thursday, November 27, 2014

SERVING THE CITY OF ANGELS: How Important is 'Service' among LA Restauranters?









Ahh the City of Angels (LA).  Like the previous post about serving the Big Apple, I will now discuss how important service is to different categories of restaurants in LA.  To address this, I used Yelp comment data (as of Nov 6th, 2014) from the top 40 restaurants from 44 different categories of cuisines.  I looked at high (5-star) and low (1-star) comments in order to discover the relative impact of positive and negative service in restaurant assessments.

To calculate the importance of service, I used a text mining method called term frequency inverse document frequency (tf-idf) (see previous post on serving the Big Apple for details on this method).

Figures 1 and 2 below are some example word clouds that illustrate the relative importance (i.e. relative tf-idf) of the word service.  More specifically, each word cloud below shows the top 50 tf-idf words from positive (Figure 1) and negative (Figure 2) comments in a particular restaurant category, and sizes the words by its relative tf-idf value.



Figure 1
French Restaurant Category Word Cloud (Positive 5-Star Comments)
Relatively High Service Importance


Figure 2
Pizza Restaurant Category Word Cloud (Negative 1-Star Comments)
Relatively Low Service Importance


High Yelp Ratings 

Using only 5-star rated comments from the different categories of restaurants, I found the following (Figure 3) relative tf-idf values for the word service ordered by restaurant category from highest to lowest.


Figure 3


The categories with the most positive service comments were steakFrenchItalian, and diner restaurants.  The restaurant categories with the least positive service comments (note that this does not mean negative, simply less impact of service) were foodstandsfish n chipspizza, and Filipino.  


Low Yelp Ratings 


Using only 1-star rated comments fron m the different categories of restaurants, I found the following (Figure 4) relative tf-idf values for the word service ordered by restaurant category from highest to lowest.

Figure 4


The categories with the most negative service comments were from cafes, Korean, breakfast/brunch, and traditional American restaurants.  The restaurant categories with the least negative service comments were food standspizzafood courts, and fondue.

As was found in the Big Apple, a positive relationship was found between high and low service normalized tf-idf values (p<0.001).  Thus, the more importance LA restaurant diners placed on positive service, the more they placed on negative service as well (and vice versa).  Next I examined whether the price of the restaurant influenced how important service was for the diners.  To examine this possibility, I used Yelp's restaurant pricing system ($, $$, $$$, $$$$).  These dollar signs represent the cost per person for a meal including one drink, tax, and tip (see Serving the Big Apple post for details).

For each category of food, I calculated a PRICE SCORE to quantify the overall price of a particular restaurant category from its top 40 restaurants using the following equation:


PRICE SCORE = 
(# of $ restaurants) + 2*(# of $$ restaurants) + 3*(# of $$$ restaurants) + 4*(# of $$$$ restaurants)

As opposed to the Big Apple, I found a marginally insignificant (p=0.051) positive correlation between price score and positive service importance (from 5-star Yelp comments).  Although a negative correlation was found between price score and negative service importance (from 1-star Yelp comments), this correlation was not significant (p=0.658).

I deduce two salient possibilities from this data:

1.  The price of LA restaurants do not necessarily equate to better service. 
2.  LA restaurant diners may generally place more importance on other factors (e.g. ambience, taste, plating).

Wednesday, November 12, 2014

SERVING THE BIG APPLE: How Important is 'Service' among NYC Restaurant Diners?


How important is service to different categories of restaurants in NYC?  To address this, I used Yelp comment data (as of Nov 6th, 2014) from the top 40 restaurants from different categories of cuisine (e.g. Traditional American, French, Mexican, burgers, etc.).  I looked at high (5-star) and low (1-star) comments in order to discover the relative impact of positive and negative service in restaurant assessments.

To calculate the importance of service, I used a text mining method called term frequency inverse document frequency (tf-idf), which counts the number of times a word appears in all the comments and divides that number by the number of comments that contain at least one instance of that word.  This calculation allows us to quantify the importance of words while reducing the importance of words that appear in almost all comments such as 'a', 'the', 'I', etc. that are less meaningful.  Once I found the tf-idf values of all the words, I normalized the tf-idf value of the word service relative to the highest tf-idf value (associated with the most important word) to obtain a measure for the relative importance of service in the users' dining experiences.  For example, if the highest tf-idf was 0.5 and service's tf-idf value was 0.2, service's normalized tf-idf would be 0.4 or 40%.

Figures 1 and 2 below are some example word clouds that illustrate the relative importance (i.e. relative tf-idf) of the word service.  More specifically, each word cloud below shows the top 50 tf-idf words from positive (Figure 1) and negative (Figure 2) comments in a particular restaurant category, and sizes the words by its relative tf-idf value.

Figure 1
French Restaurant Category Word Cloud (Positive 5-Star Comments)
Relatively High Service Importance


Figure 2
Pizza Restaurant Category Word Cloud (Negative 1-Star Comments)
Relatively Low Service Importance

High Yelp Ratings 

Using only 5-star rated comments from the different categories of restaurants, I found the following (Figure 3) relative tf-idf values for the word service ordered by restaurant category from highest to lowest.


Figure 3

The categories with the most positive service comments were French, seafood, steak, and Italian restaurants.  The restaurant categories with the least positive service comments (note that this does not mean negative, simply less impact of service) were food courts, food stands, pizza, and hot dog.


Low Yelp Ratings 

Using only 1-star rated comments from the different categories of restaurants, I found the following (Figure 4) relative tf-idf values for the word service ordered by restaurant category from highest to lowest.

Figure 4


The categories with the most negative service comments were traditional American, French, German, and fondue restaurants.  The restaurant categories with the least negative service comments were food courts, food stands, pizza, and hot dog.


From Figures 3 and 4, the importance of service appears to be similar regardless of the valence of the comments for a particular restaurant category.  Statistically, a positive relationship was indeed found between high and low service normalized tf-idf values (p<0.001).  Simply put, NYC diners within a restaurant category placed roughly equal importance to service no matter if it was a positive or negative experience.  But why would this be the case?  One possibility is that the price of the restaurant influenced how important service was for the diners.  To examine this possibility, I used Yelp's restaurant pricing system ($, $$, $$$, $$$$).  These dollar signs represent the cost per person for a meal including one drink, tax, and tip.

  • $ = under $10
  • $$ = $11-$30
  • $$$ = $31-$60
  • $$$$ = above $61
For each category of food, I calculated a PRICE SCORE to quantify the overall price of a particular restaurant category from its top 40 restaurants using the following equation:


PRICE SCORE = 
(# of $ restaurants) + 2*(# of $$ restaurants) + 3*(# of $$$ restaurants) + 4*(# of $$$$ restaurants)

As one may expect, I found a significant (p=0.001) positive correlation between price score and positive service importance (from 5-star Yelp comments).  This means the more expensive the restaurant category, the better the service and/or the more importance Yelp diners placed on good service in their reviews.  Although a negative correlation was found between price score and negative service importance (from 1-star Yelp comments), this correlation was not significant (p=0.158).

Ultimately for diners in the Big Apple, the more you pay for your meal, the better the service and/or the more attention you pay to great service.  In addition, you also pay more attention to bad service.