Making my own basketball statistics program

Like real basketball, as well as basketball video games? Talk about the NBA, NCAA, and other professional and amateur basketball leagues here.

Making my own basketball statistics program

Postby dwayne2005 on Thu Jul 06, 2017 4:24 am

I am not a programmer. I have no education in the area. A year or so ago I needed to make a program to help prove some audio had been doctored (you don't want to go there; I will maintain I am right to the grave, but that is another story). I had Gamemaker from a Humble Bundle so I used that. I had some familiarity with that program as a result, and rather than learn everything again I decided to make the statistics program in GameMaker.

Actually, I had hoped to start it out as a statistics program and then mold it into a manager/simulator program so it made sense to have a game making interface as opposed to a conventional utility interface that might be gotten from a proper competent programmer (there is only one other free stat program I know that has NBA stats, and that is Bball Sports SuperDB).

Problem is GameMaker is horrible for this kind of thing, but I'm sticking with it anyway. It's too far along now to go back, and I do not have the patience to learn everything anew in another program. I have created some complex (for my brain anyway) algorithms. Gamemaker doesn't provide many options for an interface for the presentation of data. It doesn't have easily available input boxes or menus, and the user created 'plug ins' are very game'ish.

This is what it looks like so far:

Image

At the moment, it calculates splits based on individual game data just as a site like basketball-reference does. For instance, it has totals, totals per team if the player is traded, home totals, away totals, win totals and loss totals. I don't have splits for months like basketball-reference, instead what I have is a system of customizable days (the 3 input boxes at the top to the right of the name) where for instance you can set it to split the data up into every X number of days (bottom left of the trio of boxes), and days between played games for pre absence and post absence (bottom right).

This system only splits data this way if the data is sorted by date. If it is sorted by another column, such as minutes, it splits the data up into groups of that data. For instance, if 5 was inputed and points was sorted, it will split the data up into every 5 points. So if a player had a game of 50 points, then it will split the data up into 46-50, 41-45, 36-40, 31-35, etc so you can see the pattern of statistics that corresponds with each grouping.

To the right of the trio of boxes is the filter boxes, so you can for instance filter out the triple double games in the data. This impacts the splits. If you filter out games, the splits will only take into account the data that is not filtered out.

Currently, it can display all the games and all the totals/averages of players who played against a player but I need to integrate this into the main data not the totals, so it can be filtered out and sorted among the games. This should also speed it up greatly. At the moment, it takes 26-27 minutes to generate all the players that are associated with SG or SF positions that Jordan played against from 1985-2003 in the regular season. It is exponential. It takes probably 10 seconds for a single season for all positions, but could take hours to generate it for all seasons. When I integrate it into the main data, it will be part of the same loop as the main data so I have hopes of maybe cutting time down. But it's still very slow to work with when handling large amounts of data.

The second line of data in the main window (the advanced stats) is freely customizable in the programs .ini. I've created an algorithm that will turn formulas into the functions used in the program. This is one of the reasons why the data takes so long to process, it processes new formulas rather than simply pulls out preprocessed generated data. At the moment, the customization is a bit 'weird': numbers have to be 5 digits long, like $5.. for 5 or $50.. for 50 or $5.5. for 5.5. This was to simplify the algorithm (all the variable names are also 5 digits long: eg. PLPTS, TMPTS, OPPTS). I have other priorities than fixing this. While it can calculate equations using parenthesis, it also fails to calculate more than one parenthesis inside a parenthesis and I spent hours trying to figure out why before giving up!

The program also generates graphs. This for LeBron James in 2015-16:

Image

This shows the correlation of scoring with points differential (how many points the game was won or lost by), with scoring measured from left (high) to right (low). It is set to the mean method which averages it out, so below the line indicates below average points differential, above the line indicates above average. It is colour coded, and actually a 3 way comparison but two of the 'ways' are set to differential here. Here, it is just two colours: yellow (above 0) and white (below 0). Some yellow falls below the mean average. The colour gradient is determined by how the main data is sorted and whether it has splits set but it shows the range from yellow to white.

This shows that when LeBron scored above average, it had very little correlation with win-margin but when his scoring efficiency was high (points/FGA) regardless of how many shots he took it had a much greater correlation.

This shows the correlation with minutes played:

Image

The differential was worse the longer LeBron was on court. This isn't because LeBron was bad, it is because star players are rested in blow outs and spend much more time on court during close, competitive games. This pattern goes for every star player I've looked at from every era so far. (So if I were to ever branch it out into a historical manager, I'd make players play less in blow outs and more in close games and games within reachable deficits rather than simply set it to a players average.)

EDIT: I have adapted the program to now run all the opponents totals in the same loop, but it is erroring when doing multiple seasons with player specific positions so I couldn't run a speed test. It should be much faster now, but when all positions are included over a career it took 45 minutes before I gave up waiting. However, singular season outputs take about 30 seconds, if that.

An example of what I'm talking about: from all of the players who played against Jordan's team in 1990-91. The V/ needs to be changed to something. The Win-Loss under the name implies it is Jordan's win-loss, it is in fact the players. So a 1-4 record is 4-1 for Jordan. I have yet to create a system to integrate Jordan's averages adjacent these figures. Okay, it is wrapping at the wrong line length but anyway...

Code: Select all
DATE/GAME                  DIFF     +/-     MINS    FGM    FGA    3PM    3PA    FTM    FTA    ORB    DRB    TRB    AST    STL    BLK    TOV    PFS    PTS
                             ON     OFF    MINS%   FGM%   FGA%   3PM%   3PA%   FTM%   PTS%   ORB%   DRB%   TRB%   AST%   STL%   BLK%   AST:   FTA:   SCORE EFF
===============================================================================================================================================================
910302INDCHIH               +21       -    24:--      5      9      0      0      8     13      1      2      3      3      1      0      3      0     18
 Detlef Schrempf           -.--    -.--     50.0   55.6   11.0    0.0    0.0   61.5   44.4    3.0    4.7    3.9    3.1    0.9    0.0    1.0    0.0   2.00 +0.40
901222CHIINDA               -10       -    30:--      6     10      0      0      8     10      3      2      5      4      2      0      1      6     20
 Detlef Schrempf           -.--    -.--     62.5   60.0   11.1    0.0    0.0   80.0   40.0    6.5    4.8    5.7    3.9    1.8    0.0    4.0   44.0   2.00 +0.77
901130CHIINDA               -29       -    26:--      4      5      0      0      4      4      1      3      4      2      1      0      3      1     12
 Detlef Schrempf           -.--    -.--     54.2   80.0    6.1    0.0    0.0    100   33.3    2.3    9.4    5.3    1.9    1.0    0.0    0.7    5.0   2.40 +1.32
910323CHIINDA               -14       -    16:--      3      6      0      0      3      3      0      2      2      3      0      0      3      3      9
 Detlef Schrempf           -.--    -.--     33.3   50.0    7.6    0.0    0.0    100   33.3    0.0    6.3    3.1    3.1    0.0    0.0    1.0    8.5   1.50 -0.01
910410INDCHIH                -5       -    29:--      2      6      0      0      7      8      6      9     15      7      0      0      2      3     11
 Detlef Schrempf           -.--    -.--     60.4   33.3    7.0    0.0    0.0   87.5   63.6   13.0   23.7   17.9    7.1    0.0    0.0    3.5    6.3   1.83 +0.77
---------------------------------------------------------------------------------------------------------------------------------------------------------------
V/Detlef Schrempf          -7.4       -   125:00    4.0    7.2    0.0    0.0    6.0    7.6    2.2    3.6    5.8    3.8    0.8    0.0    2.4    2.6   14.0
 1-4                       -.--    -.--    52.08  55.56   8.59   0.00   0.00  78.95  42.86   5.50   9.63   7.49   3.82   0.79   0.00   1.58   7.18  1.944 +.657
===============================================================================================================================================================
910320CHIATLA               -22       -    30:--      3     13      1      5      2      2      2      2      4      7      0      0      1      2      9
 Doc Rivers                -.--    -.--     62.5   23.1   13.3   20.0   38.5    100   22.2    3.7    7.7    5.0    6.5    0.0    0.0    7.0    4.8   0.69 -0.46
910310ATLCHIH               -35       -    24:--      1      5      0      2      2      2      0      2      2      2      2      0      0      1      4
 Doc Rivers                -.--    -.--     50.0   20.0    5.5    0.0   40.0    100   50.0    0.0    5.1    2.2    1.8    2.0    0.0    2.0    3.0   0.80 -0.17
910212CHIATLA                -9       -    33:--      2      7      0      2      2      2      1      1      2      4      0      0      1      1      6
 Doc Rivers                -.--    -.--     68.8   28.6    8.2    0.0   28.6    100   33.3    2.4    3.7    2.9    4.0    0.0    0.0    4.0    3.5   0.86 -0.51
910111CHIATLA                -3       -    32:--      8     15      2      5      0      0      0      1      1      4      0      0      0      4     18
 Doc Rivers                -.--    -.--     66.7   53.3   19.2   40.0   33.3    0.0    0.0    0.0    2.8    1.3    4.6    0.0    0.0    4.0    7.7   1.20 -0.04
910118ATLCHIH                +9       -    34:--      7     13      4      5      0      0      0      5      5      5      4      1      1      2     18
 Doc Rivers                -.--    -.--     70.8   53.8   17.8   80.0   38.5    0.0    0.0    0.0   11.1    6.4    5.5    4.1    1.0    5.0    3.0   1.38 -0.22
---------------------------------------------------------------------------------------------------------------------------------------------------------------
V/Doc Rivers              -12.0       -   153:00    4.2   10.6    1.4    3.8    1.2    1.2    0.6    2.2    2.8    4.4    1.2    0.2    0.6    2.0   11.0
 1-4                       -.--    -.--    63.75  39.62  12.47  36.84  35.85    100  10.91   1.35   6.36   3.54   4.40   1.26   0.21   7.33   4.04  1.038 -.204
===============================================================================================================================================================
910111CHIATLA                -3       -    42:--      6     19      1      3     10     12      4      8     12      1      3      0      2      2     23
 Dominique Wilkins         -.--    -.--     87.5   31.6   24.4   33.3   15.8   83.3   43.5    9.8   22.2   15.6    1.1    3.3    0.0    0.5    4.6   1.21 -0.03
910118ATLCHIH                +9       -    40:--     12     23      2      7      8     11      1      8      9      3      2      1      1      1     34
 Dominique Wilkins         -.--    -.--     83.3   52.2   31.5   28.6   30.4   72.7   23.5    3.0   17.8   11.5    3.3    2.0    1.0    3.0    2.5   1.48 -0.12
910320CHIATLA               -22       -    36:--     11     19      1      2      5      6      1      6      7      4      1      1      3      0     28
 Dominique Wilkins         -.--    -.--     75.0   57.9   19.4   50.0   10.5   83.3   17.9    1.9   23.1    8.8    3.7    1.1    1.1    1.3    0.0   1.47 +0.47
910310ATLCHIH               -35       -    30:--      6     17      0      2      2      2      5      3      8      3      0      3      3      0     14
 Dominique Wilkins         -.--    -.--     62.5   35.3   18.7    0.0   11.8    100   14.3    9.3    7.7    8.6    2.6    0.0    2.9    1.0    0.0   0.82 -0.16
910212CHIATLA                -9       -    42:--     13     29      2      4      9      9      5      6     11      3      1      1      3      1     37
 Dominique Wilkins         -.--    -.--     87.5   44.8   34.1   50.0   13.8    100   24.3   12.2   22.2   16.2    3.0    1.1    1.1    1.0    3.5   1.28 -0.08
---------------------------------------------------------------------------------------------------------------------------------------------------------------
V/Dominique Wilkins       -12.0       -   190:00    9.6   21.4    1.2    3.6    6.8    8.0    3.2    6.2    9.4    2.8    1.4    1.2    2.4    0.8   27.2
 1-4                       -.--    -.--    79.17  44.86  25.18  33.33  16.82  85.00  25.00   7.17  17.92  11.87   2.80   1.47   1.26   1.17   3.26  1.271 +.073
===============================================================================================================================================================
910312CHIMINA               -32       -    18:--      2      4      0      0      3      3      3      3      6      3      1      1      0      2      7
 Doug West                 -.--    -.--     37.5   50.0    4.5    0.0    0.0    100   42.9    7.5    7.9    7.7    3.0    1.0    1.0    3.0    5.2   1.75 +0.65
---------------------------------------------------------------------------------------------------------------------------------------------------------------
V/Doug West               -32.0       -    18:00    2.0    4.0    0.0    0.0    3.0    3.0    3.0    3.0    6.0    3.0    1.0    1.0    0.0    2.0    7.0
 0-1                       -.--    -.--    37.50  50.00   4.55   0.00   0.00    100  42.86   7.50   7.89   7.69   3.00   0.95   0.95   3.00   5.20  1.750 +.655
===============================================================================================================================================================
910328NJNCHIH               -34       -    14:--      4      7      0      1      2      4      1      1      2      2      0      0      0      3     10
 Drazen Petrovic           -.--    -.--     29.2   57.1    8.1    0.0   14.3   50.0   20.0    2.1    2.0    2.1    2.0    0.0    0.0    2.0    7.3   1.43 +0.37
910216CHINJNA               -12       -    25:--      7     10      1      1      2      2      1      2      3      1      1      0      3      2     17
 Drazen Petrovic           -.--    -.--     52.1   70.0   11.6    100   10.0    100   11.8    2.0    4.7    3.2    0.9    1.0    0.0    0.3    4.4   1.70 +0.78
---------------------------------------------------------------------------------------------------------------------------------------------------------------
V/Drazen Petrovic         -23.0       -    39:00    5.5    8.5    0.5    1.0    2.0    3.0    1.0    1.5    2.5    1.5    0.5    0.0    1.5    2.5   13.5
 0-2                       -.--    -.--    40.63  64.71   9.88  50.00  11.76  66.67  14.81   2.04   3.26   2.63   1.43   0.47   0.00   1.00   5.67  1.588 +.595
===============================================================================================================================================================


EDIT: It has gone from 26-27 minutes to 14 minutes to generate totals for SG and SF rivals (and those marked either for secondary positions) for the career of Jordan. In addition, I've added in an option to count the games so it can process for instance only players with whom he played against 10 or more matches. This speeds up the generation considerably. For all positions, it is under 10 minutes. For just Jordan's position, it would rather less than that.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby shadowgrin on Sun Jul 09, 2017 11:13 pm

The answer is clear man but you don't want to accept it.

Find better software than Gamemaker.

You have these insteresting ideas only to be hindered by the limitations of Gamemaker.
HE'S USING HYPNOSIS!
JaoSming2KTV wrote:its fun on a bun
shadowgrin
NLSC's Jefferson Davis/The Questioneer
 
Posts: 22596
Joined: Thu Dec 12, 2002 6:21 am
Location: In your mind

Re: Making my own basketball statistics program

Postby dwayne2005 on Mon Jul 10, 2017 3:16 am

It's going to be GameMaker, and I know I have made 'rookie' mistakes that is making it slower than it should (if it processes the seasons collectively more than it does individually, then clearly there is something I can and should do to fix it). One thing I've noticed is that GameMaker Studio never exceeds 35% of my single core laptops CPU. So I gather it doesn't work on both cores (but I read something elsewhere once that contrasted that). But you have to understand that previously I never used Excel, and would never pay the kinds of money they want for programs like that. I had to use Open Office, which was not only restricted to a single core, but behaved worse to far less demands than my current program in GameMaker. So in contrast, GameMaker seems amazing.

I am a rookie programmer, and am not going to make any money off this so I'm not buying alternatives like Unity. And I need some 'feedback' as I'm doing stuff, and doubt I'll get along well with raw programming at this stage. Small steps, maybe I'll reach a point where other types will become more approachable but I'm not at that stage yet to dump GameMaker. As I'm offline most of the time, the GameMaker offline manuals have also been a major help and a major invcentive to continue to use it.

I doubt I'd ever have gotten as far as I have without it.

At the moment, I only have the game data by player data. So the program takes all the subjects data, finds all the game data from the season data, tallies up all the game datas and then exports this game data. It processes this game data once for quicker output later on.

Actually, I think I just figured out a way to massively speed it up I've been overlooking. Rookie mistake, if so. I need to check over it. I have been too busy looking for new features to add to it, optimization is secondary.

I have a bunch of optimizations in mind, but when you're new to programming you don't think about these things before you progress and some parts need to be rewritten. I'll get there eventually.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Mon Jul 10, 2017 5:31 am

So how is it useful to accumalate the data and then retotal it with filtered out content? Actually, the filter is a bit exchangeable with the splits. With the splits, I can for instance split up all steals into 1 and show the tallies for every further steal a player gets. The most interesting stats there tend to be points differential and +/-. At the moment, I can get it to display all the players but it really bogs down the program. I don't have the 2016-17 data yet, but in 2015-16 it shows on the average there is a decrease in the values to a +/- (I think, as opposed to differential) for steals above 5 I think. That is, the points differential increases and then suddenly begins to decrease after a point. There is evidence it is the same for blocks, but it drops off at a higher threshold from which the pool of games is too thin to come to any determination.

What does this mean? On the whole it looks like blocks corresponds on average with a higher points differential than steals. But this data may be misleading, as it is heavily position dependent (the value of steals drops off at a much higher level for guards than centres, for instance) and as higher numbers corresponds with more minutes, which corresponds with closer games (so a closer game may in fact show a higher advantage; then again, it takes into account bench players as well so that may nullify that effect). But it gets you thinking about the values for blocks and steals, and the intricate ways those stats may influence a game. For instance, as is well known already blocks might on the average lead to lower opponent field goal shooting (minus those which were blocked I assume) which may compensate for the fact that not all blocks are turnovers, as opposed to steals. But how often do you think about the fact that steals may create a faster game and blocks a slower game? There is nothing necessarily wrong with a faster game, save for the fact that team foul penalty situations are rigidly locked at a certain value. On average, I imagine there would be an increased rate of team fouls and sending opponents to the free throw line corresponding with steals and a decreased rate for blocks. But that's an unproven theory at the moment. (It is also possible it increases fouls, especially if the taller players attempt steals or the smaller players attempt shot blocks but from recollection the data didn't show a strong, definite indication of that.)

The data actually shows it is far more valuable to have a 10 assist player than a 10 rebound player. The average for a 10+ rebound game was I think a +1.0 while steals were at about a +2.0, which I believe was an even higher advantage than 7 blocks. Add in a rebound and assist double double, I think the number came about +2.48 and a further 10+ points increases it to about +2.50, so a triple double is hardly worth anything over a double double for assists and rebounds with assists being the significantly most valuable attribute. That said, really high scoring games (probably 50+) did seem to have advantages in excess of 10+ assists.

This data is only for one season where certain players with certain attributes were on more dominant teams which probably heavily distorts the data, so take it with a grain of salt.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Tue Jul 11, 2017 2:17 am

Two of my ideas for optimizations appear to have had no effect. It is workable depending on how enthusiastic you are about the numbers, even though it may sound otherwise. You just need a little patience or using the options to minimize the output over entire careers. There is one big optimization I have in mind, which I think will be my next focus but I also have to figure out a system to calculate the players stats relative the opponents as as it stands is only half-way there.

Output: the highest scoring rivalries against Jordan over his career including playoffs with a minimum of 10 games played. This output took about 10-15 minutes to generate when set to a limit of 10 games. It is longer due to the 25% increase in the data for the playoffs and the fact that it has an exponential effect on the processing.

Image

In case you are wondering what programs I have installed on the bottom of the screen: Corel Painter Essentials 5, OBS, Amiga Forever, AmiKit, Warblade, DotBot, Directory Opus, and OSU are among them. ;)

This is how it displays the rivalry data upon start. When it's sorted (at the moment, it's only set to sort by points) it splits the games and the averages up into 2 sections and sorts those sections separately.

Image

GameMaker's sorts seem to randomize the order when there are two or more identical values. It does not keep the order of the previous sort, which is a messed up way of doing things. I have fixed the main sorts by adding a number to the columns to get it to reflect the previous sort but haven't integrated that at start up yet. 2 sorts need to be run at the moment before it presents a better sorting. I'd very much like the games here to be in order, but it's one of the things to fix up later on. But I do have a solution already, just needs adapting.

The 'advanced stats' second line are not final. 'On' and 'Off' mean how many points advantage/deficit per minute when the player was on court/off court, so it compares how well they performed with him to the bench. The 'off' value is a relative value to his 'on' value, therefore if a player for instance has a score of +.500 (on) and +.600 (off) it means the bench had a deficit of -.100. These calculations are the only major ones that are not customizable from the .ini.

The rebound percentages are my own formula based on missed shots not total rebounds that actually occurred. They are not per minute percentages reflecting when he was on court. You can guess that or you can derive it from the game logs but I'm a long way of parsing the game logs for more exacting data although that would be a dream.

The ratio under personal fouls is an experimental ratio of opponent FTA's to a players personal fouls. The scoring efficiency is just PTS/FGA at the moment. I am debating in my head whether it needs to be set to something like PTS/(FGA+(FTA/2)). At the moment, the formula punishes players like Larry Bird extraordinarily. Despite how good and efficient they were, in Bird's case he just didn't get to the free throw line or take 3 pointers often enough and he is paying the cost of that. It doesn't really make sense to not include FTA's in the calculation as they are not bonus points but take away a possession, but I haven't changed it yet. The +/- value right next to it is another relative measurement, relating his efficiency to the rest of his team (at the moment; you can also set it to relative measure also for the opponents team). Despite being a 50-40-90 player against Jordan, Bird in all 34 rivalries against Jordan was just +.003 over the rest of his team discounting his own shots. I will prioritize changing this calculation to reflect free throws more accurately, although I don't expect it to be final.

I'd like space for a game rating column.

I know the interface is bad, but this is actually as good as it gets for me. I am partially red/green colour blind so the visual side of things has always been a deficit for me. Even though theoretically not majorly impacting for this kind of presentation, I guess I have just learned to not pride myself that much on these things. I'm much more concerned about features than I am the user interface.

Advantages in 2015-16 league-wide (regular season only):

10+ rebounds: +.101 points per minute differential on court and +1.8 game differential.
10+ assists: +.192 / +4.3
Double-double in assists and rebounds: +.348 / +10.0 (off here in my recollection)
Triple double: +.356 / +10.3

So there's not much difference between a double-double in assists and rebounds and a triple double, but it occurred to me only after posting that the double-double was all inclusive. So filtering out a double in points I get only 5 games from 3 players (Ricky Rubio, Draymond Green and Rajon Rondo) so not enough for a clear indication of how much that extra double in points is worth, but based on those 5 games the numbers seem very consistent: +.233 and +6.0. So the extra double in points may be worth an extra +.123 per minute (I think there was some indication that 10-20 points tended to be more valuable than 20-40 points but am not sure).

4+ steals: +.149 / +2.8
4+ blocks: +.143 / +4.1

The curious thing here is that blocks dominates despite having a similar per minute advantage to en large that final deficit? So do 4+ shot blockers spend more time on court? 4+ shot blockers spent 63% of the game on court compared to 69% for 4+ stealers. So why is there this advantage? Once again, is it due to penalty foul situations? My ratio of opponent FTA's to a players PF's was 5.3 for 4+ shot blockers to 5.2 for 4+ stealers. But this relates a players own personal fouls, which may be low, with opponents free throw attempts. It might be the other team mates who are accruing the fouls due to a faster paced game.

5+ steals: +.148 / +2.6
5+ blocks: +.169 / +4.3

No increased advantage for steals. Blocks continue to climb per minute.

6+ steals: +.139 / +1.6
6+ blocks: +.187 / +4.3

Blocks continues increasing per minute but stabilizes in the overall impact on a game, while steals increasingly declines in advantage. That is, it appears more valuable to have a 4 and 5 steal player game than a 6+ steal player game.

Finally, although the amount of data is slim:

7+ steals: +.047 / -0.9
7+ blocks: +.179 / +1.7

Small decrease to the advantage of blocks per minute, big decrease per game. And steals are massively decreased in value and in fact 7 or more steals showed a deficit per game (more points scored against), the first time any of these figures showed a negative advantage. What is the PF to Opponent FTA's: 5.6 for blocks and 6.8 for steals. So when it gets really high for steals, it may indicate it is pushing the game beyond team penalty situations, mainly not the player actually fouling themselves as that would cause them to lay off but probably because there are more possessions for the rest of the team.

Back in the reverse direction (when the values get too small the program bogs down horribly so I like to start high'ish):

3+ steals: +.114 / +1.9
3+ blocks: +.113 / +2.7

And like I said, 10+ assists was more valuable than 7+ shot blocks, although 3+ blocks and steals are more valuable than 10+ rebounds.

As for points in a game:

30+: +.176 / +3.9
40+: +.242 / +5.2
50+: +.318 / +8.5

Double-doubles with points:
Assists: +.206 / +5.0
Rebounds: +.111 / +1.9

And turnovers:
6+: -.022 / -2.6
7+: -.009 / -2.5
8+: -.016 / -3.6

It doesn't seem like more turnovers are actually that big a detriment, probably because the player is getting assists to compensate, but oddly the bench don't like it and the game deficit increases to -3.6 so go figure... It's not the players fault for getting benched, he is actually marginally increasing his court time for each additional turnover. So it must be the bench who are bringing it down solely when he is off court. There are 58 games in 2015-16 with players with 8+ turnovers, so there should be something in it. Probably it comes down to the fact that the players doing the turning over are on worse teams more than anything especially meaningful.

So ranked in value per minute:

Triple doubles: +.356
50+ points: +.318
40+ points: +.242
Assist double: +.192
6+ blocks: +.187
30+ points: +.176
4+ steals: +.149
Rebound double: +.101

Ranked per game margin:

Triple double: +10.3
50+ points: +8.5
40+ points: +5.2
Assist double: +4.3
5+ blocks: +4.3
30+ points: +3.9
4+ steals: +2.8
Rebound double: +1.8

You can test other theories, as well, such as whether scoring point guards is a better thing for the game (probably drastically altered by Stephen Curry that year)...

30+ points for:
PG: +.218 / +5.7
SG: +.164 / +3.6
SF: +.163 / +3.7
PF: +.167 / +1.9
C: +.118 / +1.1

So it looks like the point guards are probably the ones who should be scoring!

At the moment, it only filters by player name, position (2 or 1 characters) and all (leaving the input box blank). I have the intention of extending it to height, so you can see how much it really does matter in team basektball (you'll be surprised) and teams by 3 letter initials (eg. Cleveland = CLE). I thought it would be interesting to also have filters for birth states or things of that nature, but that is further away and I'm not convinced it's worth anything. I also need to create a fix for inputted player names with typos, which at the moment just crashes the program.

What I will eventually hope to do is allow you to customize formulas for additional player rating output for games. What I'd really like to do is create a basketball equivalent of Out of the Park baseball for historical day-to-day basketball simulation. What I would like to do is allow the user to choose games at will, rather than simulate full seasons, and be graded on how well they perform relative their historical counterparts despite choosing historical games at will. This way, you can also see your overall high score amongst other real NBA coaches. I would not do it in 'realistic' ways like others, with complex plays and stuff like that, at least not initially. That is too fantasy, there is nothing in the data that I know that indicatives the relative strengths of specific plays. Rather, I would have you being able to increase or decrease certain values and also have a yelling and voice mechanism to steer the player to your targets, with the player always drifting to his default play style. If you want certain players to shoot more, for instance, you have a voice 'bank' and you have to use your voice and your time outs carefully. But it is looking impossible at this stage to even think about branching out, and the present interface I have is not how I need it to be and things like clickable hyperlinked text on a scrollable window appears to be impossible in GameMaker (so you can't, for instance, just start a game by clicking on it).
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby shadowgrin on Tue Jul 11, 2017 5:49 am

The data actually shows it is far more valuable to have a 10 assist player than a 10 rebound player.

Of course. An assist can only be registered if the team scores so if you have 10 assists it means the team got 20 points off it. Rebounds can only guarantee possession not points. That might explain why your data favors assists.



Adrian Dantley > Michael Jordan confirmed.
HE'S USING HYPNOSIS!
JaoSming2KTV wrote:its fun on a bun
shadowgrin
NLSC's Jefferson Davis/The Questioneer
 
Posts: 22596
Joined: Thu Dec 12, 2002 6:21 am
Location: In your mind

Re: Making my own basketball statistics program

Postby dwayne2005 on Tue Jul 11, 2017 7:13 am

shadowgrin wrote:
The data actually shows it is far more valuable to have a 10 assist player than a 10 rebound player.

Of course. An assist can only be registered if the team scores so if you have 10 assists it means the team got 20 points off it. Rebounds can only guarantee possession not points. That might explain why your data favors assists.



Adrian Dantley > Michael Jordan confirmed.


I'll look into where they level up, if they level up, later on. Just 1 for 1, those numbers tell you which are more valuable. I think 10 assists is also much rarer than 10 rebounds, so that may also be a factor.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Wed Jul 12, 2017 2:25 am

I found what was bogging down the program. Now it runs smoothly even when processing a lot of data. If only I could figure out why it is restricted to 40% at the most of the CPU.

In terms of frequency, 6 assists equals 10 rebounds. They do not keep pace 1:1 but that's the way I'll present it. Not only does each assist count for 100% field goal, but each rebound also counts as 100% misses so that needs to be taken into account, and the presence of offensive boards may be distorting the figures (correlating more rebounds with worse team performance).

(There is a chance some of my numbers may be marginally distorted due to the duality of the rebound ranges. If I range total rebounds within a minimum of x, if it picks up an minimum of 1 defensive rebounds, then it inserts that minimum, When I revert to 0 total rebounds, it then reverts to 1 total rebound. It doesn't seem to have affected these numbers (it would mean a player needs a minimum of 1 rebound with all the steals and blocks values) and I question whether I would've noticed it if it effected the others. I need to fix that.)

Advantages:

6 assists: +.131 / +2.5
10 rebounds: +.101 / +1.8

7 assists: +.149 / +2.9
11 rebounds: +.102 / +1.9

8 assists: +.173 / +3.6
12 rebounds: +.099 / +1.7

9 assists: +.183 / +3.8
13 rebounds: +.110 / +1.8

10 assists: +.191 / +4.2
14 rebounds: +.128 / +2.2

11 assists: +.204 / +4.7
15 rebounds: +.135 / +2.6

12 assists: +.227 / +5.3
16 rebounds: +.147 / +3.2

13 assists: +.202 / +4.8
17 rebounds: +.154 / +3.1

14 assists: +.176 / +3.7
18 rebounds: +.187 / +4.2

Interesting facts:

- You need 15 rebounds or more to equal the advantage of 6 assists or more.
- At their most frequent, 6 assists (2391 games) are 130-139% more valuable than 10 rebounds (2214 games).
- There is hardly any gain from rebounds 10-13.
- Assists begin loosing value just likes steals and blocks after a certain point. Here it is 12 assists. 13 assists equates to 11 assists, but the 11 assists figure is inclusive of the 12 assists (and all games above) so 13 is probably a little better. Higher than 12 is worse for the team on average, for whatever reason. There is no correlation with higher opponent foul shots per player fouls, so may not be a game pace factor as I theorize may happen with steals.
- Rebounds begin loosing value after 18 rebounds.
- At 18 rebounds and 14 assists, rebounds becomes more valuable for the first time in the figures.
- At their peaks, 12 assists is 121-126% better than 18 rebounds. At 12 assists there are 210 games and for 18 rebounds there are 104 games so it is twice as frequent.


I probably should find a way to accumalate the team datas for each split as well, as that might indicate a clearer picture of what is happening and why these stats have an inversion in value after a certain point. It is a bit intimidating with all the data on the screen already, though, and you can customize the second line of stats to reveal the team stats. For instance, you could see just the raw opponents teams FTA's or how many more they are getting to the team.

Doing something like this, perhaps having another line for customizable data, helps to separate it from already existent online databases.


I counted some output times. These are from the moment I push compile in Gamemaker so are a little quicker when using the program. Despite perception, there appears to be no exponential increase in times for regular player output. There is probably something happening in that regard for opponents output, but it isn't as major as it seems.


LeBron James (2016) output times.

Regular stats (regular season): 0:19
Regular stats (regular season + playoffs): 0:22 (116%)

Opponent stats (regular season, limit 1 game): 0:45
Opponent stats (regular season + playoffs, limit 1 game): 0:54 (120%)

Opponent stats (regular season, limit 2 games): 0:33
Opponent stats (regular season + playoffs, limit 2 games): 0:40 (121%)

Michael Jordan (1985-2003) output times.

Regular stats (regular season): 2:09 (1072 games)
Regular stats (regular season + playoffs): 2:31 ... 117% (1251 games ... 117%)

Opponent stats (regular season, limit 10 games): 8:25
Opponent stats (regular season + playoffs, limit 10 games): 11:48 (140%)

All players (PG, regular season): 0:37
All players (all positions, regular season): 3:14 (524%, should be around 500% for 5 positions)
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Thu Jul 13, 2017 2:38 am

There is a more clear trend with just defensive rebounds, suggesting the reason why total rebounds wouldn't get any gains was due to it being a reflection of how poorly a team was doing on the offensive end resulting in increased offensive rebounds.

6: +.103
7: +.115
8: +.121
9: +.134
10: +.145
11: +.163
12: +.181
13: +.185
14: +.189
15: +.189
16: +.184
17: +.096 (just 12 games)
18: +.089 (just 9 games)
19: +.027 (just 2 games)

So there is very little change from defensive rebounds 12-16 and a possible decrease after 17 rebounds although the games pulled from just 1 season are inadequate,
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Fri Jul 14, 2017 4:26 am

I've now set it to display game (team and opponent) totals for easier analysis, but it only shows on splits not in the main game area.

Image

This shows the steals from 3-8. [3:3] means it ranges 3 steals to 3 steals so grabs just the stats for each individual steal. If it were points and the range was every 5 points, it might be [1:5], [6:10], [11-15], etc. Despite this, it is set in the settings.ini to amass all the data above the bracket. So it becomes 3 and above. For steals, you can see the average steal is in fact 3.4 for the 3 bracket, and it climbs to 4.4 for the 4 bracket, 5.3 for the 5 bracket, etc. narrow to being closer to the selected value as the number of games above it decreases.

Team fouls go from: 20.3, 20.5, 20.2, 21.4, 24.5, and 23.3. This is despite not seeing an increase in opponents fouls with it ranging 20-20.6 before finishing down on 18.3. There is a clear gap that opens up depending on how many steals a player gets. The ball thiefs personal fouls range from 2.3 to 2.9 so can only explain a bit of that increase. Opponents foul shot attempts also sees a corresponding rise, ranging from 23.1 to 29.5

The average game time rises to about midway, so may explain some of the increase but it peaks at 49 minutes. (Decimal for now, I will format the : time format later.)

Despite this, most other measures don't seem to show a faster paced game. The average points is up and down and at its lowest when fouls peak. So is it a measure of a fast paced game sending a team over the foul limit caused by steals or not?

Earlier I speculated that steals might reduce opponents shooting (due to the threat of intercepts or fumbles causing them to put up earlier and more stupid shots), but the shooting remains stable.

For blocks:

Image

Opponents shooting is pretty much stable throughout, but there is a sizeable gap between team and opponent. Since shot blocks are measured as FGA's, this distorts the data (naturally there would be more misses, it doesn't necessarily show intimidation). Fouls remain stable, but early on it's actually the blocking players team that gets more FTA's (frustration fouls? Trying to get 1 back?) before the opponent begins getting significantly more FTA's at the higher ranges.

I also briefly analyzed rebounds, and surprisingly it doesn't correspond with worse team shooting, at least in the range where there was little movement.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Sat Jul 15, 2017 5:05 am

It now lists a subjects stats alongside those he played against. At the moment, it only lists the subjects stats in the totals.

I just stress tested the processing of all opponents he played against, from 1 game up. It took 1 hour and 45 minutes. I saved it as a 5mb text file, then ran the sort which took 20 minutes. I hit CTRL to copy and save it to text, but it didn't copy and crashed. So I don't have the data sorted by points per game.

An example of the output (bare in mind you can this data much quicker if you isolate it to a number of games):

Image

The @ Larry Bird is Larry Bird's stats, the V/Larry Bird is Jordan's stats.

I have uploaded the text file here (first host I could find when I looked up text uploads that allowed a 5mb file): https://file.io/7pWmiF (bare in mind the equations for the second line of stats are experimental).
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby benji on Sun Jul 16, 2017 4:14 pm

You're loading too much data in an inefficient format to start with. These kinds of queries should actually be lightning fast with a proper dataset. Otherwise B-R wouldn't even work.

I love Bob, but his program is outdated completely by the -reference.com sites. Especially since he never used them to include his own higher level data that BBALL the simulator uses for example.

LibreOffice is better than OpenOffice which is abandoned imo.

Unity and Unreal are completely free, but you don't need a game engine, let alone one of that complexity, you just need a compiler and the ability to write code. (Visual Studio, which is overkill, is free if you have any ties to a university. Hell, even keeping an old version like I do 2010 around should be more than enough.)

To make the jump from where you are to a simulator is headed in the wrong direction. I can "simulate" you a season of basketball for a team rotation, as shadow is aware though never to the extent of my full powers, and do it with an incredibly low standard error. And this is just from raw data and Excel doing a single set of calculations. And it takes zero seconds to process. Swapping in and out players in the rotation like Antoine Walker for Kobe Bryant takes the time to select the latter and paste him over the former.

I could, even, theoretically, setup my system to do this league-wide and not need a final adjustment without much difficulty.

All the correlations you're doing have already been done, and regressions for them have been done. This is actually easier than you're making it, 2K and EA don't use crazy complicated systems to produce their statistical simulations or shouldn't anyway, because you can reverse engineer this fairly simply. Then you just add noise to it.

My major advice would be to avoid all the stats and databases right now and create your simulator, then import the data later. I independently "discovered" point differential's correlation with win % from Dean Oliver and John Hollinger like twenty years ago or whatever. Mine was actually more accurate than either's originally published versions, and more accurate than Hollinger's public version ever was before he got hired away from ESPN. And it was because I was trying to create a simulator first and backfill the data in after getting the groundwork running. I noticed the pattern in my dummy data.
User avatar
benji
 
Posts: 14381
Joined: Sat Nov 16, 2002 9:09 am

Re: Making my own basketball statistics program

Postby dwayne2005 on Mon Jul 17, 2017 5:09 am

I'll take your post under advisement. It is not exactly the same as other info on the web, because it does allow you to customize calculations (one of the risky calculations I have is that for steals I am linking it to opponent shots not possessions) and splits and allows you to filter out info. Unlike basketball reference (to my awareness) you can also load in the full career of a player, not just season by season. Having all opponents output is also a nicer way of juxtaposition and provides for more interesting reading than simply singling out each rivalry as they come to mind.

The general game stats for 1 season takes only a few seconds to process. This is after the game has been output (which doubles the length of the generation) but it still performs a number of calculations based on each game.

I want to do the stats first, as its easier for me and making me become more accustomed to the software. It is also something I'm more passionate about than the simulation. I am not a quick learner, especially with programming. I have to do things my own way, otherwise it will never sink in.

I have stalled in doing anything, mucking around with the GUI and trying to bring the features into place and fix up bugs. I had previously created this opponent generation as an operation through the ini. Now I am trying to put it into the main program, I am getting memory errors by Gamemaker despite 'only' using 600mb of data for a full career (and I know that is probably way too large, I'm a newb). I can't see what I else I need to shut down to reduce memory, but as it's not taking up the full RAM I think it's some kind of bug somewhere in Gamemaker. So now I'm looking into ways to reduce the memory. I also have reduced it enough for it to output full career stats but I will add as many optimizations in that regard I can figure out. This is probably actually points in favor of doing this kind of thing first, as I'd not learn about optimization without the need. When I get to making games, I can optimize them with all I take away from this.

I can put together a scrappy version if there was any interest in checking it out, but right now I wish I knew more about the GUI and how to do things in that regard...

I do have a CryEngine bundle of assetts, so if I were to get into any other engine I guess that will be my first pick.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Tue Jul 18, 2017 3:19 am

I have created a test version so you can try before criticizing. Maybe it has been done before, I'm sure it has. Just I think some features are unique to any source I've found, and chances are most who've done what I have done don't share their work. While you can filter out broadly in other online databases, this allows custom filters. While you can filter out on SuperDB: 1) SuperDB is season data, not game by game; 2) it requires successive filters, and if you make a mistake you have to reset. Critize the goofy game looking interface all you like, but having the filters all on the screen like this is a better way of doing things.

https://www.mediafire.com/?1ei327p1usp6fi2

I guess this is what you call pre-alpha or something. It is a zip archive (no installer) of 35 MB. It continues two statistics archives that unarchive to your APPDATA/Local directory to about 75MB. In addition, it adds several more MB's to this APPDATA directory in the form of game output to speed things up.

A few things to note:
1) I believe escaping works by either Escape or double Escape. At one point, I tried disabling it so I don't know where it's at. Most of the times when using it I hit escape to get out and it seems to work, but I don't remember fixing it back. I also often use the in GameMaker termination button. There is a chance you will need to force terminate it in task manager.
2) It goes horribly slow first boot only. You can't even Alt-Tab out of it. Because it is unarchiving the main data, which contains thousands of files which is taxing to most unarchives, especially GameMaker. It takes several mintues to unarchive the files. The data isn't finalized yet, so I don't want to create archives for each individual season and then realize I have to redo it. It is far better when it is done season by season, but I have seen things that possibly need fixing up first. Originally I did do it season by season, it is written up that way but I have ammended it for now to get all the data in.
3) Even though it has every seasons data, only post 1983-84 works right now I believe. And again, I don't have the 2016-17 season data yet.
4) Most other several minute long operations, you can Alt-Tab out of it and do something else as it operates in the background. But often upon finishing or upon returning, you are met with a white screen. You have to Alt-Tab out of it back to the page to get it to show. No idea what's causing that, nor if it's a feature of the build version which I haven't even trialled out.
5) It is only fullscreen. No windowed mode. At the moment, the data is too long for the screen for common low resolution settings. I should look into workarounds, such as possibly scaling down the font. It requires a widescreen resolution width of either 1280 or 1366 I think.

Some notes about using it.

The players name has to match (case insensitive) the name in the data. Anything else will crash the program at present. So it doesn't at the moment weed out spelling mistakes.

To apply the player name, you hit the S ('search') buttom below the name and next to the year range.

Most of the input boxes on the top are filters, except the 3 in a triangle just to the right of the name input box. The top box is a min filter, the bottom box is a max filter. The numbers already in the filters should be the best and worst of that player or data. If you go below the min or above the max, the number will come up red but that won't crash it and sometimes you want to do it. It will be yellow if it's in range. If you put a min higher than max, it will crash the program obviously.

To apply the filters, you hit the R ('range') button the top right of the screen.

The trio of boxes are the splits boxes. The topmost one of the triangle with a -1 in it is a game limit. If you want to split the data up into 20 game splits, you enter a 20 and leave the bottom two boxes at 0. -1 either sets it to a really high value or cicumvents this limitation, I can't remember which. So if you want all games, you choose -1. I rarely do anything with this box as it works in conjunction with the two other more interesting boxes where it doesn't really have much purpose for me.

The two bottom splits boxes: the leftmost one is the most important. This value is the increment depending on what value is sorted. As the program default sorts to date (except in the opponents output), a number hear means the number of days to split the data into. If you want it split in 15 day splits, you enter 15; 30 day splits, you enter 30; etc. The box to the right is a little varied, and I can't quite remember how it functions anymore. Either positive or negative numbers, I don't remember which, should trigger an 'except if' days between games equals value. That is, you can stop it counting the games if there is an absence of say 10 days between games. Again it works in conjunction with the other input boxes. If you just want to split up the data into chunks without absences of a selected duration, you either have to hit 0 or possibly -1 (I don't remember) in the left most box otherwise it works in conjunction with it. So you can set it to split data up every 10 days, except if there is a 3 day break then the split is cut short. Or you can isolate the days between dates for preabsence and absence data to try and gauge if the offtime effected a players game.

The splits, once generated, are the bottom most part of the output.

If it is sorted by another statistic, like points, then these two boxes mean something different. Let's say a player had a top of 37 points in a season, then entering a 5 in the left most box means all his games should be split into 5's: 33-37, 28-32, 24-28, 20-24, etc. But if you want it to be fixed to a number like 36-40, 31-35, 26-30, 21-25, etc. then you use a positive number in the right box of +3 I think which 'shifts' this data that many points. If you use a negative value in the right box, however, it will have a different meaning. In this case, unless there is a games split number, the splits are split into just 3: above average, average and below average. The negative value is the number deviation from either the mean or the median average (can't remember which). So if you wanted to find out the average points game with a deviation of 5 points at most from the average, you enter in -5. Anything outside this range is either above or below it in the output.

If you want to do a full season output, you enter in nothing for the name. This will take some time. After output, the first thing you should do is apply a filter to make sure it speeds up correctly. If you wanted to analyze the splits for steals, you should enter at least a 1 in the minimum range box for steals, but 2 is better, and then hit the R button. After you do this, and only after you do this (it'll take far less time), you sort by steals by clicking the column header. Then once sorted, if you want to split the steals data up into each one steal then you enter 1 in the bottom left of the triangle of boxes and then you hit the R button. The splits should be generated at the bottom of the page.

You can also enter PG, SG, SF, PF, C in the player name field. If you do any of these, you do not expect continue to use the program afterwards, it'll likely run out of memory triggering a crash but hopefully it should continue to output properly if you type in any number of player enquiries. Most of my testing is one deep, I rarely run successive searches.

The G button at the top of screen is the graphs button. These graphs display differently to the graphs in the main window. They are sorted differently from left to right. In the main window, the left-right order is what the column is sorted by. So if it is sorted by points, and then you select it to show points, you will see in the main screen the order decreases as it moves from left to right. In the graphs window, however, the sorts are based on each individual statistic. Points is sorted by points (high-low/left-right), rebounds by rebounds, assists by assists, etc. The height of the bars is then assigned to whatever sort you have set at the bottom of the screen; so if you set it to display points at the bottom of the screen, this becomes the vertical size of the bars in the graphs screen. If you have set it to show splits as mentioned above, this will in addition create a colour scale with yellow being the highest number and white being the lowest number corresponding with each of those splits.

Finally, if you use the OPP button at the bottom of the main data, you need to first set a number in the blank box at the side for the minimum number of rivalries. This will determine how fast it generates the opponents data. If you have only 1 season, it will take a few seconds to generate all players with at least 1 game. If you have a career output, this number will need to be high and the generation will take quite some time.

EDIT: To copy the contents to clipboard (which is often the better way to read it anyway), just hit CTRL.

EDIT: Deleted my other post because I don't want people to see the wall of text and not notice I uploaded the test version.

EDIT: Also, the DIF and +/- ranges may not work in conjunction. I don't know why yet... You can actually invert those ones. For instance, by default if you set a -10 minimum to a +10 maximum it grabs all the games from -10 to +10. But you can take the negative away from the minimum and add the negative to the maximum and it should filter for games which were victories or losses over 10 points.

The program uses by default for splits a feature that adds up all the values below it. This is to get the 'or above' figures which results in clearer patterns. This actually probably messes some of the splits up (I imagine the days counter will produce horrible results)! I need to better direct that feature. It can be turned off in the settings.ini. If you mess with the settings, keep in mind the equations still don't work 100% depending on where you put parenthesis.

EDIT: Actually, the results of aggragating data above a certain date do serve a purpose.

I should mention that difference and +/- won't sort properly yet. I noted the issue when I was writing it up but didn't bother changing it then and still haven't gotten around to it.
Last edited by dwayne2005 on Tue Jul 18, 2017 4:11 am, edited 2 times in total.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby benji on Tue Jul 18, 2017 7:22 am

dwayne2005 wrote:Unlike basketball reference (to my awareness) you can also load in the full career of a player, not just season by season. Having all opponents output is also a nicer way of juxtaposition and provides for more interesting reading than simply singling out each rivalry as they come to mind.

https://www.basketball-reference.com/pl ... er.cgi?p1=

https://www.basketball-reference.com/pl ... 01/splits/

etc
User avatar
benji
 
Posts: 14381
Joined: Sat Nov 16, 2002 9:09 am

Re: Making my own basketball statistics program

Postby dwayne2005 on Wed Jul 19, 2017 2:41 am

The second one doesn't feature any custom filters of game by game nor custom splits. The first one you have to type out the rivalry as they come to mind not all at once which is a feature of mine I said was unique, not that i was inventing the rivalry feature. To my awareness, I am handling the data in unique ways to what is readily available and you haven't posted anything to challenge that.

There are a lot of features that I discussed that haven't been countered, and even more that I didn't discuss. And I intend to add features. There is a lot I can't do as well. I do not have the game logs, so I don't know when two players were simultaneously on court or how exactly how many rebounds a player missed while on court, I do not know how far away from the hoop they were when they were scoring, I don't know the various complex stats NBA does, etc.

I have added the players data vs the rivals data game by game, but it will only generate for the players who meet the minimum number of games specified.

It has a new look:

Image

if I had known it'd take so little time to find the right visual balance, I'd have done it already. I had to manually fill in the black spaces between the boards of wood so that it wouldn't distract from the data. As I am colour blind, those patch up may leap out in a way they don't to my eyes but I'm going on the assumption that it blends well enough.

Right now, it looks good in Windows. I had issues with Windows mode (resize and the button to put it in the background) and given it didn't look so good before I moved to fullscreen. Now it's back, but it still has its crashes and the fullscreen option in the settings.ini won't work.

I have inserted the option to turn off that cumulative splits thing in the settings.ini which I had previously missed.

I have fixed the sorts of DIFF and +/-. The problem was I needed to move to a text based sorting method as I need to add data based on a previous sort, and had to add data in to make real numbers sort correctly in a text way, which was easy enough with positive numbers but a little more needed to be done for negative numbers.

I have found an issue that might explain the memory crashes. I changed the vertex method in Gamemaker to the more compatible method and it fixed up memory issues I was having (at just 200MB with the double up of data by posting the rivals games in the output). I also resized down the number of lines of data in the scroll box. In the future, I want to put those lines in an array and find a method of having it scroll while only calling the numbers in the array. I need it to move in lines, not scroll smoothly, so I can assign buttons to the box to activate games for managing and possibly advance my ideas there. If I can have those button boxes move with the text, I don't know of it.

Doesn't seem like I'm going to get any real support here...

I uploaded it, so I'll post the latest version since it is a far better test version than yesterdays. Once again, it takes several minutes to extract the data, this time without giving any indication. Once you click on the .exe, nothing will happen for a few minutes.
https://www.mediafire.com/?gt9vrqlr06y7vnt

EDIT: Some opponent totals are not correct. For singular game rivalries, team totals are combining the two for some reason.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Thu Jul 20, 2017 3:01 am

I shouldn't keep uploading every small change I make, in fact I should give you a break from me, but yesterdays version was severely broken with the opponents totals and it needs addressing. It wasn't just 1 game, or 2 games, it was ALL of them that were messing up!

I'm still learning how to problem solve bugs. My first port of call should have been to check the old 'version' to see if the opponents totals were working there, because then I'd have known it was most likely caused by the adding of the players stats for each game. I didn't do this. Why? I don't know. When I finally figured it out, I tried looking for what was causing things to add up. And my own code confused the hell out of me. :crazyeyes: Eventually, I figured out how to circumvent the script that I had isolated it to and now only is it now working, it should work a little bit faster now too.

In addition to this, I fixed the problem of it crashing when switching between widescreen/fullscreen/windows/resizing. It is a well known issue with Gamemaker and had I been on the internet at all that day I'd have figured it out in an instant. So now it can be viewed in widescreen or fullscreen or switched between to your hearts content.

Oh, you may have noticed the text was slightly translucent in areas. This problem mystified me and I have yet to find an answer, but it was the draw area that was becoming translucent. I have fixed it, but it is not an ideal remedy as I now draw the background up on the background and the draw box.

I have begun adapting the opponents generation to a teammates generation. I made some quick adaptations I didn't fully probe, but they made sense so hopefully it works. Right now it's still showing V/. which I'll probably change to W/. I may look into adding a third player into the equation.

I have removed the drop down bar for the graph in the main window and set the selection of the graphs to the column headers, right mouse button. So left mouse button to sort, right mouse button to set graph. What is set there also effects the other graphs. As it's now set to the column headers, it has more than was in the drop down, which was limited (one of the several reasons why I needed to remove it).

I have also set it to instantly update the graph with every sort or change in graph. It was due to this that I noticed the last two sort features aren't sorting the first 3 or 4 results correctly, a problem I have still yet to investigate.

I fixed the time format in the game totals, so it's no longer decimal.

The version lettering isn't really versioning. It's just my way of backing up data when I make large changes. In the future, it is possible I'll update a new version with the same letters that won't take time to extract the archives on first boot because I wasn't worried about things going awry. But I should hold off uploading small updates. Once again, the time upon boot is no issue. I changed the program to handle larger zip files only because at this moment I don't want to archive them individually, as the source files need fixing.

https://www.mediafire.com/?e7xh4gd8sgvg0mr

EDIT: Not uploaded, I've changed the scroll screen to only load into it 12 games at a time, and this may have almost halved the RAM usage for some reason. It also moves incrementally with the scrollbar. So for the main screen, dragging the mouse down with move 2 full lines each time so it goes through one full game. In the opponents screen, it moves down 4 lines. It works very smoothly, nothing odd about it at all! This allows me to set clickable areas to get game relevant stats and to initialize game managing. Due to the way it handles lines, totals and splits are now going to be on a different page. No more scrolling to the bottom to get them. Opponent stats will have different pages for averages and individual games. As the screen now outputs each game as individual variables in an array, I'm able to do different things with the text. Right now I have inserted space between each game removing a lot of the clutter.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Wed Jul 26, 2017 4:02 am

Progress has slowed, but I continue to work it. I am trying to get a team search into it. Once I've got it done, you will be able to get multiple full season analysis of the value of individual stats as they apply to team totals. With the full season analysis based on players right now, you get the full amount of games (around 1230) multiplied by the number of players who play on average in those games (over 20), so you get over 25,000 entries and uneven win-loss numbers of around 13,000. However, when you deal with team totals, you are dealing with just 1,230 entries per season so in theory for it to become as bogged down as the other searches it would require 20 or so seasons.

It should have been easy to adapt, but it didn't turn out that way.

Some changes I have made may have sped it up, but I have performed no speed tests and it's not going to make a world of difference if it has. It's still going to go the speed of a C64 datacassette game.

(I have got the team analysis done, but haven't got it displaying correct and processing all teams yet and for some perplexing reason it is not appending all the data of outputted games in my new file that I used to tell it what games have been processed.)

EDIT: The problem in parenthesis above I isolated it to the 1 line which was causing the issues and I still couldn't figure it out leading me to believe it was glitching or something, but replaced it with another operation. Anyway, I have added 2 dimensional arrays to describe how many games are played. The 1st dimension is for the various splits, the 2nd dimension is for how many games were played for each recorded stat. As the data is incomplete in seasons before 1986, this means if, say, only 1 game had recored 11 rebounds it won't produce averages that divide that 11 rebounds by 82 games, but by the 1 game that was recorded. Next to these averages, there will be asterisk to show the data is incomplete. I will do the same for minutes. Everything is working: team mates output, opponents output, team data... A number of seasons are now working from seasons prior to 1983, but all of these features there are output issues. As it's generating the game data, I am going to put an error detection thing into place to notify you if there is anything that doesn't add up in the numbers.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Tue Aug 08, 2017 2:56 am

The height advantage in basketball. What this does is it splits the heights into 3, starting with the Median average (6'7" or 69 inches), then deducts or adds a custom amount (here 2 inches) and makes anything above or below this range (6'5"-6'9") get grouped in the above or below values.

Image

So much for a height advantage in team position based basketball. In 2016, short players (6'4" and under) averaged 10.8 PPG compared with 9.1 for both the average height players and the tall players. They scored with a slightly worse efficiency, however: 1.050 to 1.068 (average) and 1.085 (tall). However, 20 years earlier the shorter players were scoring with the best efficiency of the 3 groups: 1.073 to 1.066 and 1.054. But they were mid-ranked in scoring: 9.8 to 10.2 and 9.3. As this data is for 2016, it doesn't include the breakthrough seasons of Isaiah Thomas or Russell Westbrook. Also note that relative 1996, centres were scoring at about the same rate of 9.1/9.2 points per game. Minutes have dropped off for mid height and taller players but remain steady for below average height.

I know from my examinations elsewhere that the trend increases below 6'4", so it's not the upper heights necessarily responsible for those numbers. There is also the possibility that the fact fewer of the 6'4" and under players are being played might mean the difference: the points distribution is not getting spread among more players and that is distorting figures. But relative the 1996 figures, there are more shorter players being played and higher figures.

What the figures show, however, is that average height players are responsible for the biggest points differential. +0.3 more than the other two brackets (yet to figure how those numbers add up) but that that number is in decline with a +0.8 difference between average height and below average height players. It also might indicate that the better all around teams just didn't want to take a chance on shorter players.

Interestingly also, the difference is less for when the player is actually on court (his +/-) suggesting the advantage is coming from the bench. And then there is the 'off' value which determines the points differential in relation to the players points differential which shows for all 3 brackets the bench had a very slightly better points differential per minute than the players (I gather this is possible in these numbers without that value being in error because this is number crunching players with different minutes per game, but I've yet to number crunch to explain how they can all be in negative).

You may notice that the team and opponent team FG% numbers are not mathematically correct. That is because I have set the FGA to not include FGA's from shot blocks. This helps, for instance, identify whether blocks is having a negative effect on opponents shooting and how much of an effect beyond the misses which are encompassed in the statistics keeping.

Once again, the reason these numbers are so high given there are less than 500 individual players is because this is game by game. A player who played 82 games in the regular season will get 82 shares. I believe these figures include playoffs, also. They are not averages based on each individual player, but averages based on each player in each game.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Wed Aug 09, 2017 4:47 am

Reportedly, I can actually use the YYC option to build a far more efficient version. I need to download Visual Studio and I haven't the internet connection or free hard drive space for that at the moment. I imagine that will make it as efficient as alternative sources...

Like I said, I am new to everything.

Rest assured, I am very determined to see this through and do a little almost every day.

EDIT: I am thinking about some form of seeded generation for game simulation so two people can manage more or less the same game and contrast. I don't even know where to begin with game simming.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Sun Aug 13, 2017 5:03 am

It is compiling with Visual Studio 2012, the only VS seemingly compatible with Gamemaker Studio 1 (incidentally, Humble is running the same bundle again currently). Although I don't understand a lot about the way things are done, I believe this means it is converted to C++ programming, so if it isn't efficient here it wouldn't have been had I programmed this in C++. And it doesn't help a lot. It seems to be almost double the speed, but that doesn't mean much when you are talking about slow generations like this. For example, if I generated the output for all players for a full season, it is down from about 2.5 minutes to 1.5 minutes. Before, I'd just flip it through to the background and do something else for a few minutes. So the gain, I don't even notice it. It's not like I sit around waiting for it for 2.5 minutes. Even 1.5 minutes is an intolerable time to just stare at the screen and wait, but I am now more inclined to do that then when it was taking 2.5 minutes which makes it seem as bad.

But the gain is significant, so I'll keep it. That said, it also has some nasty side effects. If it is processing in fullscreen, it won't allow you to flip out of the screen with ALT-TAB. There is nothing you can do to get out of the screen but wait for it to finish. The only other thing I can do is unplug the power from my laptop forcing it to shutdown. I have a program for instance that assigns all the shutdown options to hotkeys. If I go to lock the computer and bring me back to the Windows Welcome screen, it won't do so: the screen will be locked on the program, and it'll only do this operation after it has finished processing, despite it using only a single core of my CPU.

So I'll have to get rid of fullscreen mode or try to figure out how to prevent this from happening. With Windowed mode, the screen just becomes inaccessible until it has finished processing.

Things that are currently different from the test version I uploaded:

- I have fixed the windows resizing nonsense, of course. Now it is customized from the .ini, not freely stretched.
- It is processing all data correctly. There was a glitch where I inserted a +1 I believe that might have erroed opponent stats or messed up the first entry or something if I recall correctly. In the test version, I only started implementing the team mates calculations, which was all messed up and erroring as well. That is all working 100%.
- Both the team mates and opponents outputs now have a line summing up or negating each others statistics depending on what search method is set. If it is team mates, you will have a line that adds the two players so you can see at a glance their combined statistics without having to mentally process the numbers manually. It might serve potentially more than just cosmetic, as I may add a sort option for it so you can see what was the best tandem for Jordan, for instance, but at the moment I'm not doing that as that would involve increasing the size of the pretty large ds_grid already. When it comes to the generation of opponent stats, it detracts the opponents stats from Jordan's stats to produce a +/- value so you can see how dominant Jordan was over the rival. It is a new feature I've just implemented yesterday, and I need to look into also potentially adding the summing up/negating to the 'advanced stats'.
- When compiling in Windows (YYC) for the first time, it found a number of errors I had made that I had previously overlooked. These errors had to do with the variable names for the custom calculations. For example, when I cut and pasted to 'replace all' the TMxxx with OPxxx it also replaced the TMFTM with OPFOP. I knew this at the time, but I didn't simplify the operation into one script and had to make multiple changes and overlooked some. Also I changed the variable for personal fouls from PER to PFS but didn't do all entries with that either.
- When processing regular seasons + playoffs, I didn't clean out a variable which caused a doubling up of the total game calculations for the first playoffs game, which is one impediment to getting in seasons before 1970 as it seemed to have been the cause for crashing.
- I have created a grid for games played per statistic, so that it averages correctly when there are incomplete statistics. I still need to do the minutes played grid.
- I have fixed most of the first line of the formatting from incomplete seasons (before 1985), but need to fix the second line and there are still some issues with the first line.
- I have stripped out the processing of all output as one variable and turned into an array of varables. This array of variables will only load one screen at a time, so there is far less on the screen than before. Before, the screen was full of stuff that you could scroll down to. Now it draws up the stuff you scroll down to. This seems to have improved performance considerably. There is absolutely no lag when processing full season output. I think for some reason it also improved the memory, even though it just switched the same content to an array.
- The side effect of the above and one reason why I haven't posted up a new version is that copy to clipboard is currently broken. You can turn the array into a single string for copying to clipboard, but it inserts formatting for an array. In order to remove all the comas, for instance, I have to have a replace all operation in the script, but where previously copying to clipboard was instantaneous, doing that bogs down the program and insanely for very large outputs (so insane that I'm not sure how long it takes, but it could be well in excess of 30 minutes to just copy and paste). I was hoping that this might work better after compiling with Visual Studio but it's not the case. So search/replace is out of the question. And building it up from arrays is no better either. So I think the way to do it is to save the output appending to it one line at a time (as I don't want an extra variable that size always on) and when CTRL is pressed it reads from the file, creates the variable, copies the file to clipboard and then clears the variable.
- I created an early version of error detection in the output. Since it is generating the team stats from the individual stats, if there is anything that doesn't add up correctly it'll tell you by inserting an ERR or asterisk depending on where it is. I have the points differential stats separate from the game calculation, so I can compare the generated scores for both teams and if the differential doesn't match correctly then it will produce an ERR beneath the date/game ID text. It will also add up all made shots and if the point total doesn't equal the game totals for both teams, then it will also produce the ERR. The asterisks (which will appear next to the total stats) are not actually errors, it just means the data is incomplete. I may branch out the program over years to include more game data and better error detection than just points, but that's all I have at hand.
- One thing I want to do is to compile all the data into separate .zips so first boot will be instantaneous. As it was in my test version, it could take minutes to boot up, and it may even corrupt the data if its not allowed to process fully as it only checks the general data, not season specific data. Zipping it is all easy, a little repetitive but nothing too bothersome. But I'm not convinced I'm ready just yet with doing that and don't want to do stuff more times than I need to.
- Some of the main problems I want to fix is that while the team mates and opponents output is correct, after it is sorted it changes format and I haven't got it processing the player stats adjacent the opponents/team mates states. For some reason, I was confused by my own programming when trying to fix this yesterday.
- I have included an extra sortable column for height for the multi player searches only, so splits can be generated with height. I have also added a team output mode that is triggered by 3 letter initials: eg. BOS, CLE, NYK or ALL. However, due to the way the data is processed, the full season output needs to be generated first before producing correct results to generate all games.
- I have also been trying to get an option in it to add up data above and below a midpoint, but first I need to get it to add up data below it. Since it is processed sequentially, only above works. I have changed it to a system that will work, but it only adds up data below it for the main game stats. The game stats calculations needed for the second line of stats won't work! And I can't figure out why! I have spent hours on the problem. It is so frustrating. I have the data adding up correctly in another section of my program, but I can't transport this over to the new method for 'addabove', where it should produce identical results but it isn't. Not even close. I can't even make sense of the results it is producing. If I can get it to add up above and below the median, it will show greater trends for above and below certain heights, for instance.
- In a similar veign, I tried again to fix the custom calculation formulas with parenthesis problem that seems to effect either 2 lots of parenthesis within parenthesis or the first parenthesis inside a parenthesis if it is aligned next to the first open parenthesis. I can't figure out what could possibly be going wrong.

There was more, but I have forgotten it right now.

EDIT: For the record, the reason why the bench is showing up as better than the players even though the data is amassing both the bench and the players is because the data is lopsided due to how many players are played in wins and losses. If say a team was winning, they may field 10 players, but if a team was losing they may field 11 players, which creates an imbalance in the output. This is not an error at all, but may create some level of noise that might affect some of the conclusions about the data.

EDIT2: The ready-to-go text variable that I was using for copy to clipboard seems to have previously dramatically slowed it down. I think it is much faster now. Everything I can do with the data slows it down considerably, but exporting it to a text file was the quickest. It increased the generation from 1:40 to 2:40, which is unacceptable. There must be another way. Actually, I haven't tried the other loops yet, not sure if they are any more efficient but I can see any reason why they should be.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Tue Aug 15, 2017 4:27 am

Okay, Gamemaker and probably programming in general and I just didn't know it really, REALLY do not like extremely long variables that are continually called back with thousands of loops. So when it comes to copying 56,000 lines of text to clipboard, on a very large text file, over and over, even if I'm adding it one line at a time through an array it kills the software. The solution was to split the data up into pages the square root of the size of the data and create an equal number of pages and put two loops. It now copies the 56,000 lines in about 1.5 seconds.

I wondered if a similar thing couldn't also be done with the data grids, and maybe I could very dramatically speed up the processing there. I implemented it in a test version, broke some things I haven't a clue how to fix, but I can get it to work for single season output eventually and there was no gain. About 8.5 seconds each (actually the fixed version I timed at 8.8 seconds but I think I was about that much off in clicking the buttons on my stop watch). It wouldn't really apply to this data anyway as the grids are small, it might work better on very large grids but I have been given no indication it done anything. I don't know whether it's worth trying to figure out all the bugs to get it fully implemented and take it full scale but I did read something today which suggested to me there might be something to it.

At present, it is set to create up to 300 grids (300^2=90,000 entries, with two lines of data per entry, so 180,000 line output). It could be I got some speed increase from it but it was offset by the additional scripts that create an array of grids. Cubing things (sharing the load 3 ways) might also increase the processing speed?
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Wed Aug 16, 2017 3:03 am

The grid array experiment had no effect on most searches and the really long ones like team mates and opponents, it made it much slower not faster.

Here is the current state of things. I'm only posting this because the last version was so bad. There are so many things that need fixing up! With this release, only seasons down to 1980 are included (and the presentation needs work for incomplete data seasons anyway) as I've zipped them up manually to speed up start time. But it will still take a minute or two to load at first because all the playoffs for every season are still included in 1 archive. Do not load it twice even though it might appear to not be doing anything, it might (but probably won't) cause an abort of the generation of game stats which might produce incorrect future results. And it might botch the extraction of the playoffs data. Like I've said in the past, the initial load time is not a flaw with the program. It will boot almost instantly when I have all the data zipped in their own archives. The generation of game stats makes initial searches longer. In the future, when I'm happy with the format, I may include the game stats in the data.

https://www.mediafire.com/file/z32j78u3 ... tatsCJ.zip

One difference is the games and totals are split across on different tabs accessible by left and right arrow keys. Eventually, I'll add the tabs. You can also scroll down the page with arrow keys now. Scrolling with mouse is strictly confined to the area with the scroll button.

A reminder, in order to generate team mate and opponent stats quicker you need to insert a number greater than 1 in the input box besides them, which calculates totals for only the players who met a minimum number of games. The figure should be at least 1 per season for opponents stats, while even at 1 you can generate the output in much faster time for team mates than 10 game minimums for opponent stats (I was doing it in about 4 minutes on my laptops CPU).

Eventually, I will be adding highlighting an entry and double clicking an entry to reveal game stats (from the game stats screen I may then branch it out to game manager).

The team output is doubling up data, so isn't properly working yet. But you can see how it will be formatted and that the ranges should adapt to team stats.

EDIT: Fixed (not uploaded) it counting double the stats in team gens. Turns out it was importing double the games. Am working on getting the opponent team stats to show with the second row of stats. Even though it's generally faster than it was before, opponents output is just as bad as it was, maybe a fraction worse, due to the game info being generated I think.
dwayne2005
 
Posts: 153
Joined: Fri Aug 29, 2008 2:00 pm


Return to NBA & Basketball

Who is online

Users browsing this forum: No registered users and 4 guests