Results 1 to 10 of 10

Thread: Price sharing - price sanity check algorithms

  1. #1

    Price sharing - price sanity check algorithms

    There are quite a few online resources that collect/crowdsource and distribute/share price data. In the various entering or collecting processes there are situations where bad/invalid data gets captured.

    I am interested in discussing algorithms for the cleaning, verifying or sanity checking price data.

    For each commodity, I guess there is a sensible range for the buy price and the sell price. The idea of course is to eliminate/ignore prices that are outside these ranges. But what methodology to we use to establish the valid ranges? How do we know if a price makes sense? Assuming we have alot of data in a database, we can find the average buy/sell, the min/max buy/sell. We can get the standard deviation too - does it mean anything in this context to use in establishing a valid range?

    If we decide what a valid range is for a buy price or a sell price, do we delete everything outside that range OR do we delete one value (the Max or Min) and then recalculate the range?

    Firstly - if the buy is 0 and the sell is 0 then the price is ignored/deleted. I will post some algorithms I have considered in another msg. What do others use?

    This is NOT a discussion of the merits of sharing price data so please if you think it's wrong then it may be better for you to move along.
    DEADLY / ELITE / PIONEER / HELPLESS / ENSIGN / DUKE
    Maddavo Market Share (MMS) - www.davek.com.au/td

  2. #2
    A very interesting thread. Wanted to post in the EDDN thread but you ninja-ed me ;-) I've taken the liberty to create a post about it in the 3rd party tools thread and also put it in the OP, section "announcement". I think this thread deserves all the exposure it can get.

    I DO have some suggestions, not for an algorithms but usage by other tools. This time I will keep quit for a while and let others have a go at it. Beeing to visible already in the forums ;-(

  3. #3
    This chart shows the distribution of Gold Buy prices that have been recorded across a sample of 4000 stations since release.


    Name:  Gold Buy Price Histogram.jpg
Views: 134
Size:  53.6 KB


    It can be seen that there are a few 'bell'-type curves in there. I expect these are probably due to the economy types at different stations. The small bell-curve on the left is probably what we are interested in for considering a low-buy limit. A low buy limit is critical as incorrectly low buy prices will attract traders. ie: if Gold is reported (incorrectly) somewhere for 4000 instead of the minimum 8707 in the table, then it will attract traders incorrectly.

    Similarly, traders are attracted to high sell prices. So if they are incorrect then traders are attracted there incorrectly.

    How do we programatically establish the parameters of the small bell curve at the left and then establish a low-buy limit?
    DEADLY / ELITE / PIONEER / HELPLESS / ENSIGN / DUKE
    Maddavo Market Share (MMS) - www.davek.com.au/td

  4. #4
    This chart shows the distribution of Gold Sell Prices across 4000 stations where the buy price was 0 (not sold).


    Name:  Gold Sell Price Histogram.jpg
Views: 104
Size:  58.5 KB


    The few prices at the left are probably incorrect, not that the sell price is incorrect, but that the corresponding buy price at the station was not recorded so the sell price gets included in the table. But that's not of too much concern in this table. It is a high sell price limit that we are interested in so that we can filter out high sells that are probably recorded incorrectly. As with the buy price histogram, a few economy-specific curve seem to emerge. How do we parameterize the right-most curve and use it to establish a high sell limit.

    Another consideration is whether the limits should be based on existing data at all. We might have a good sample size of data to extrapolate what sensible prices are, but we may not.

    I am not a statistician, is there someone who deals with this kind of data who can give some advice?
    DEADLY / ELITE / PIONEER / HELPLESS / ENSIGN / DUKE
    Maddavo Market Share (MMS) - www.davek.com.au/td

  5. #5
    eddb is working with preset price ranges already. I checked all commodities and took min_buy_price - 10% and max_sell_price + 10%, excluding the wrong submits. This is actually working pretty good since the wrong prices are usually off by a factor of ten.
    Creator of EDDB - A site about systems, bodies, stations, commodities and trade routes in Elite: Dangerous.
    Member of EDCD - Elite Dangerous Community Developers - A group of developers that create and maintain 3rd party tools and sites for Elite Dangerous.
    Forum thread | EDCD Discord Server | QuakNet IRC Channel: #eddb (webchat)

  6. #6
    I recently added price checking to the site after adding an importer for eliteOCR as mistakes started throwing things off. I looked at the data and found sensible min/max range then asked users to let me know when they found verifiable prices out of the range. After a day of tweeking I ended up with these ranges:

    Min Max
    Advanced Catalysers 2350 3400
    Agri-Medicines 700 1850
    Algae 25 270
    Aluminium 190 500
    Animal Meat 900 1700
    Animal Monitors 160 490
    Aquaponic Systems 140 440
    Atmospheric Processors 240 600
    Auto-Fabricators 3200 4400
    Basic Medicines 180 480
    Battle Weapons 6000 7500
    Bauxite 45 300
    Beer 65 310
    Bertrandite 1850 3000
    Beryllium 7400 9200
    Bioreducing Lichen 700 1300
    Biowaste 10 390
    Chemical Waste 60 150
    Clothing 170 490
    Cobalt 450 1000
    Coffee 1000 1700
    Coltan 1050 1800
    Combat Stabilisers 2400 3350
    Computer Components 350 750
    Consumer Technology 5950 7600
    Copper 300 700
    Crop Harvesters 1795 3100
    Domestic Appliances 350 750
    Explosives 160 470
    Fish 250 600
    Food Cartridges 30 280
    Fruit and Vegetables 170 480
    Gallite 1500 2400
    Gallium 4500 5900
    Gold 8250 10500
    Grain 80 350
    H.E. Suits 150 440
    Hydrogen Fuel 80 170
    Imperial Slaves 15000 17600
    Indite 1800 2700
    Indium 5200 6700
    Land Enrichment Systems 4000 6100
    Leather 65 310
    Lepidolite 360 900
    Liquor 430 900
    Lithium 1200 2000
    Marine Equipment 3540 4900
    Microbial Furnaces 85 350
    Mineral Extractors 360 830
    Mineral Oil 80 330
    Narcotics 60 300
    Natural Fabrics 240 600
    Non-Lethal Weapons 1300 2250
    Palladium 11800 14500
    Performance Enhancers 5950 7650
    Personal Armor 3600 4800
    Personal Weapons 3600 4900
    Pesticides 100 370
    Platinum 17700 20000
    Polymers 40 290
    Power Generators 350 750
    Progenitor Cells 6000 7600
    Reactive Armour 1600 2500
    Resonating Separators 5000 6700
    Robotics 1400 2250
    Rutile 180 510
    Scrap 20 130
    Semiconductors 600 1200
    Silver 4050 5600
    Slaves 10100 12000
    Superconductors 6000 7600
    Synthetic Fabrics 75 330
    Synthetic Meat 115 410
    Tantalum 3400 4600
    Tea 1100 1900
    Titanium 700 1400
    Tobacco 4100 5600
    Uraninite 600 1200
    Uranium 2200 3200
    Water Purifiers 160 470
    Wine 130 410

  7. #7
    I'm not a programmer, but the answer looks intuitive to me

    1. IF X has Highest_Frequency for all Data Points below Average THEN set X as Low_Value_Marker
    2. IF Y has Highest_Frequency for all Data Points above Average THEN set Y as High_Value_Marker

    Estimate drop-off curve on both sides of both markers, and flag any data point that deviates from expectations.
    And yes, always cut off the extreme High and Low values.

  8. #8
    Originally Posted by maddavo View Post (Source)
    We can get the standard deviation too - does it mean anything in this context to use in establishing a valid range?
    By default, Quazil's uses 3 standard deviations on the EDDN feed, and that seems to have been working well. There is no correction, outside that band they are filtered out.
    LET PLAYERS DETERMINE THE MISSION REWARDS AND WORK FOR THEM.

    It is by will alone I set my ship in motion.
    It is by gaming that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning.
    It is by will alone I set my ship in motion.

  9. #9
    Applying cutoff's seems the right way...
    And it's good to look for the distributions of the spread between buy and sell for each commodity and crop the values outside too, because you can have values inside the distribution, but wrong in the spread. I mean a "correct" (that pass thru the cutoffs) value for buy and sell, but they can be too different (one too low and the other too high or being inverted).

  10. #10
    Originally Posted by Cmdr Thrudd View Post (Source)
    I recently added price checking to the site after adding an importer for eliteOCR as mistakes started throwing things off. I looked at the data and found sensible min/max range then asked users to let me know when they found verifiable prices out of the range. After a day of tweeking I ended up with these ranges:

    Min Max
    Advanced Catalysers 2350 3400
    Agri-Medicines 700 1850
    Algae 25 270
    Aluminium 190 500
    Animal Meat 900 1700
    Animal Monitors 160 490
    Aquaponic Systems 140 440
    Atmospheric Processors 240 600
    Auto-Fabricators 3200 4400
    Basic Medicines 180 480
    Battle Weapons 6000 7500
    Bauxite 45 300
    Beer 65 310
    Bertrandite 1850 3000
    Beryllium 7400 9200
    Bioreducing Lichen 700 1300
    Biowaste 10 390
    Chemical Waste 60 150
    Clothing 170 490
    Cobalt 450 1000
    Coffee 1000 1700
    Coltan 1050 1800
    Combat Stabilisers 2400 3350
    Computer Components 350 750
    Consumer Technology 5950 7600
    Copper 300 700
    Crop Harvesters 1795 3100
    Domestic Appliances 350 750
    Explosives 160 470
    Fish 250 600
    Food Cartridges 30 280
    Fruit and Vegetables 170 480
    Gallite 1500 2400
    Gallium 4500 5900
    Gold 8250 10500
    Grain 80 350
    H.E. Suits 150 440
    Hydrogen Fuel 80 170
    Imperial Slaves 15000 17600
    Indite 1800 2700
    Indium 5200 6700
    Land Enrichment Systems 4000 6100
    Leather 65 310
    Lepidolite 360 900
    Liquor 430 900
    Lithium 1200 2000
    Marine Equipment 3540 4900
    Microbial Furnaces 85 350
    Mineral Extractors 360 830
    Mineral Oil 80 330
    Narcotics 60 300
    Natural Fabrics 240 600
    Non-Lethal Weapons 1300 2250
    Palladium 11800 14500
    Performance Enhancers 5950 7650
    Personal Armor 3600 4800
    Personal Weapons 3600 4900
    Pesticides 100 370
    Platinum 17700 20000
    Polymers 40 290
    Power Generators 350 750
    Progenitor Cells 6000 7600
    Reactive Armour 1600 2500
    Resonating Separators 5000 6700
    Robotics 1400 2250
    Rutile 180 510
    Scrap 20 130
    Semiconductors 600 1200
    Silver 4050 5600
    Slaves 10100 12000
    Superconductors 6000 7600
    Synthetic Fabrics 75 330
    Synthetic Meat 115 410
    Tantalum 3400 4600
    Tea 1100 1900
    Titanium 700 1400
    Tobacco 4100 5600
    Uraninite 600 1200
    Uranium 2200 3200
    Water Purifiers 160 470
    Wine 130 410


    Do you apply the limits to the sell price only, or also to the buy price?
    DEADLY / ELITE / PIONEER / HELPLESS / ENSIGN / DUKE
    Maddavo Market Share (MMS) - www.davek.com.au/td