Machine learning is doing analysis using machine. Data is the most important part of all Data Analytics, Machine Learning, Artificial Intelligence. Without data, we can’t train any model and all modern research and automation will go vain. Big Enterprises are spending lots of money just to gather as much certain data as possible. Machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development.
DATA : It can be any unprocessed fact, value, text, sound or picture that is not being interpreted and analyzed. Data is the most important part of all Data Analytics, Machine Learning, Artificial Intelligence. Without data, we can’t train any model and all modern research and automation will go vain. Big Enterprises are spending lots of money just to gather as much certain data as possible.
INFORMATION : Data that has been interpreted and manipulated and has now some meaningful inference for the users.
KNOWLEDGE : Combination of inferred information, experiences, learning and insights. Results in awareness or concept building for an individual or organization.
How we split data in Machine Learning?
- Training Data: The part of data we use to train our model. This is the data which your model actually sees and learn from.
- Validation Data: The part of data which is used to do a frequent evaluation of model, fit on training dataset along with improving involved hyper parameters (initially set parameters before the model begins learning). This data plays it’s part when the model is actually training.
- Testing Data: Once our model is completely trained, testing data provides the unbiased evaluation. When we feed in the inputs of testing data, our model will predict some values (without seeing actual output). After prediction, we evaluate our model by comparing it with actual output present in the testing data. This is how we evaluate and see how much our model has learned from the experiences feed in as training data, set at the time of training.
Properties of Data –
- Volume : Scale of Data. With growing world population and technology at exposure, huge data is being generated each and every millisecond.
- Variety : Different forms of data – healthcare, images, videos, audio clippings.
- Velocity : Rate of data streaming and generation.
- Value : Meaningfulness of data in terms of information which researchers can infer from it.
- Veracity : Certainty and correctness in data we are working on.