There are quite a few online resources that collect/crowdsource and distribute/share price data. In the various entering or collecting processes there are situations where bad/invalid data gets captured.
I am interested in discussing algorithms for the cleaning, verifying or sanity checking price data.
For each commodity, I guess there is a sensible range for the buy price and the sell price. The idea of course is to eliminate/ignore prices that are outside these ranges. But what methodology to we use to establish the valid ranges? How do we know if a price makes sense? Assuming we have alot of data in a database, we can find the average buy/sell, the min/max buy/sell. We can get the standard deviation too - does it mean anything in this context to use in establishing a valid range?
If we decide what a valid range is for a buy price or a sell price, do we delete everything outside that range OR do we delete one value (the Max or Min) and then recalculate the range?
Firstly - if the buy is 0 and the sell is 0 then the price is ignored/deleted. I will post some algorithms I have considered in another msg. What do others use?
This is NOT a discussion of the merits of sharing price data so please if you think it's wrong then it may be better for you to move along.