This group project is for course CMPT732 Professional Master’s Program Lab, at Simon Fraser University.
With the development of technology and social progress, more and more people are involved in urbanization. Therefore, in order to make people’s lives easier, the study of cities is imperative. For this reason, we chose to study the world’s most prosperous metropolis, New York. Furthermore, to keep the study from being vague, we chose to analyze things related to taxis in New York.
Our main goal is to explore the NYC taxi data and to find out the pattern told by it. Each group member had different subjects from the perspectives of passengers, taxi drivers, and city taxi regulators. Our questions are listed below, corresponding to four subjects:
Based on the interest of the group members, we analyze the data toward the following aspects:
Our main dataset is New York TLC Trip Record Data. We select yellow and green cab data for the period between 2017-2021.
Column | filter | Remark |
---|---|---|
Time span | 2017-2021 | We only focus on the near 5 years of data. |
Car type | Yellow and Green Cab | Yellow cabs are allowed to pick up passengers anywhere in the city. Green cabs can only pickup passengers from Bronx, Staten Island, Brooklyn, Queens(excluding airports), and Northern Manhattan. |
Payment type | Cash and Credit card | Trips with other payment types (no charge, dispute, etc.) will be viewed as invalid. |
Total amount | Total payment amount greater or equal to 2.5 | The initial price of NYC taxi is 2.5 dollars. |
Category | Tools |
---|---|
Data Analysis | Pyspark DataFrame, Pyspark SQL, Pandas |
Data Visualization | Matplotlib, Geopandas, Seaborn, Echarts |
Cloud Resources | AWS S3, EMR |
UI | Jekyll |
The final analyses of our problems are as follow:
You may also select the analysis report in the left sidebar.