In Week 1 term 4 we talked about random sampling.
What is it and why
The simplest random sample allows all the units in the population to have an equal chance of being selected. Perhaps the most important benefit to selecting random samples is that it enables the researcher to rely upon assumptions of statistical theory to draw conclusions from what is observed.
What did I do this week
This week I went to Kaggle and I found the ‘London Crime Data’ database
I used the database above to made a simple code that gains random data from database, and the selection will be used to generate visualisations.
The simple code was written to process in following steps:
- Download the database
- Convert it into Pandas Dataframe (using sql code)
- Use pandas.dataframe.sample to get random sample
- Use matplotlib to plot the data into bar chart and scatter plot
The result
Due to the inconvinience of screenshoting and put it here on my blog, I actually shared my work on Kaggle, via here
Unfortunately, it does not save my previous results, but running it again will generate a different visualisation that represent the crime activities in London in different year.
Conclusion
I guess learning random sampling helps me learning more about Machine Learning and Artificial Intelligence in specefic. Because the selection of data is important, which actually is what AI learns from.