Making my own basketball statistics program

Like real basketball, as well as basketball video games? Talk about the NBA, NCAA, and other professional and amateur basketball leagues here.

Re: Making my own basketball statistics program

Postby dwayne2005 on Thu Mar 08, 2018 3:34 am


From the best I know, everything is functional enough with regards to player analysis going back to about 1984 or 1985. I have added splits for years (seasons, ie. 2018 equals 2017-18 not 2018... should change that, I guess) for multi-season operations, which introduced another YYC compiler issue I had to figure out (the inability for it to convert string copies to real in one operation). You should be able to much around with custom formulas (doesn't save; no real time change, one way to trigger a change without doing a new search I think is to switch to per minute and back to per game; no custom labels) from the interface by right clicking next to the range boxes.

Oh, the some of the sort operations are still broken or kind of broken. I have yet to figure out why. I don't think player names sort on multiple players at all, while ON and OFF and possibly DIF and +/- values seem to sort correctly except for values starting with 0.

Video explaining some features:
phpBB [video]

I encountered an apparent issue when making this video. I couldn't get the team mates combined using the right click method for the Westbrook analysis for some reason so had to manually add one team mate in using the + method. Maybe I clicked on the wrong thing or done something wrong because it should have been no different than the Kyrie Irving/Kevin Love/LeBron James analysis. It looks like an issue I have with following the chain, not destroying enough variables to get data to reset that I thought I fixed. I'm sure it's a rookie mistake, losing track of variables. But that is all, for player stats. I have fixed up a lot of stuff.

The infinite loops where caused by having two valid while loops inside one another. I can see no reason why it was happening, just that there must be a glitch with the YYC compiler in Gamemaker. Now I have it with a repeat loop that has a break command instead and it works fine.

Currently, much of the team statistics stuff is broken. I want to make it more functional on repeat searches before I get to all that. Maybe take another break.
Last edited by dwayne2005 on Sun Mar 11, 2018 3:36 am, edited 3 times in total.
Posts: 202
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Sat Mar 10, 2018 3:02 am ...
EDIT: Fixed some of the things mentioned below, but more needs to be done.

Identified and fixed the issue with second players not adding to the entries on TMM/OPP searches (it was a mistype of 2 variables).

It occurred to me that the invert button might produce some unusual results when adding and removing players through right click or filtering data through right click. And it was. The invert operation for instances allows you to remove specific data instead of filter it. For instance, do a full career search on a player and remove a specific team from the results. I have now integrated it, but it is not working 100% for some reason. For example, it seems to be working with data you can't otherwise invert in the program (teams, opponents, seasons) but not data in which you do have other options (home, away, wins and losses). I have no idea why yet. I have disabled the invert operation from TMM and OPP searches, but now I know I need to add it back just put special conditions on it.

I have changed the plus minus per minute (on/off) stats to represent 48 minutes not the custom minutes a player sets in the per minute calculations. Why? The on entry is right beneath the diff entry. This allows very helpful juxtaposition: you can see how well the team done over roughly 48 minutes (some extra time for overtimes) with dif and you can see how they may have theoretically done if the player had played the full 48 minutes and relate the dif and on values almost 1:1. I could make it reflect overtimes, but then it would lose cross referencability with other players so it seems like it is ideal to cap it off at 48 minutes. So it is mostly relatable.

The on rating, as mentioned, is his plus minus (now) per 48 minutes. This is based on baseline of 0 points. The off rating adjusts the baseline up or down depending on how well his team done when he was off the court. The off value shows how much better or worse he was than when the team was playing without him. So if a player has a +5 on rating, but a +6 off rating, this means when he was benched the rest of the team were -1 points if they had played 48 minutes. I found this awfully confusing yesterday because a lot of comparisons get into -- territory and it's easy to lose track of what it is saying. For example, say the on value was -10.3 and the off value was -12.6. What does this mean? It means he was -2.3 points worse than the bench. Why don't I just show this value instead? Because the player value is more meaningfull. In fact, it is far more valuable than the on value, it doesn't penalize him for being on a bad team and it shows you his true worth on a team: how much he has improved the plus/minus for that team.

I decided to add a random game ranking formula to the custom formulas replacing the weird one named TEST. I decided to add GMSC, as I was offline and that was all I had at hand. Doing this introduced a new issue I didn't realize was an issue. All of the formulas up until this point were self-contained ratios. For example, if you work out shooting percentage by FGM/FGA it would be the same for a game as it would be for a season. But if you try to introduce a game ranking formula, what it does is it treats the entire season as one giant, long game. Whereas the GMSC value may average 20 per game, on the totals splits it was showing over 1,000+! This was not an issue for any other formulation I had been working with, but I needed to introduce an option to force it to divide custom formulas by the average game or minute. So now there is an extra option to set in the .ini for each formulation should this need arise.

I was getting random errors when right clicking on the season splits to filter that data. I believe I tracked it back down to the same issue I was having before with YYC compiler not working when combining real(string_copy... commands. In order to get it to identify when the seasons change, I need to take the game ID which consists of YYMMDDHTMATM plus 1 extra character. It is a text string, even though the first 6 values are numbers. The important ones are 3 and 4, the month ID, where I can tell it if it is the NBA off season (assumed September or 9) it signifies the start of a new season. (I can't just add a new column into the grid to identify the season as the grid gets built, which would be preferable, because that would mean changing an enormous number of values.) So I need to copy that string then convert it into a number where I can apply a greater than or less than sign to it. Doing this in one operation works in the standard Gamemaker builds, but it breaks when using YYC builds. So it looks like it is working since I build and test mostly in Gamemaker as it is a lot faster until it dawns on me that it isn't.

Last time, I fixed it by splitting the operation into two lines. But it looked like there was more issues. And after I fixed those issues, I decided to run a search of every instance of real(string_copy and found dozens of entries which took a really long time to fix up. None of which I had associated with any error. But there was that possibility the program had a great many unidenfitied bugs from it that are now fixed, or it was something else that was triggering this malfunction.
Posts: 202
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Tue Mar 13, 2018 5:10 am

I am getting some odd numbers. Read an article saying that Isaiah averaged -15.8 +/- for the Cavs. My numbers were way off, but this was because I forgot to change the switch from 36 minutes to 48 minutes elsewhere in the program. However, the numbers are still off: it is reporting he was -15.6. This might be due to their calculations being based on rounded to the tenth decimal values whereas I use exact seconds, I'm not sure yet.

When I split the values up into smaller sections of 5 games, I see the average of the plus-minuses per 48 does not correspond to the totals my program is producing. I have yet to figure out the logic as to why. They're way off. But over the longer stretch, it is 15.6 to 15.8. Not sure if the data is breaking will smaller pools of games or not. It could just be the way the numbers are, I've yet to wrap my head around the logic of it all. That you can't just average out those games even though each is stretched to 48 minutes. I need to investigate that. I am not the smartest person, so it'll take some time and some calmness to comprehend.

Thirdly, when using TMM or OPP searches and applying the custom splits, if the columns are in descending order and the player selected twice (maybe once as well) it is producing really weird results in the top superfluous values for the bottom-most bracket. The numbers should be the same calculating for the player twice but they differ only in the bottom bracket. I'll have to look into that.

I spent a long time trying to fix the team searches yesterday, and can't for the life of me figure out what is happening that is triggering it to eat up all the ram almost instantly with ALL searches over a full seasons worth of data, It works when I halve the size of the ds grids. I've rewritten code, pondered it closely, but nothing... It makes no sense. There is nothing there, unless one of the loops is becoming infinite due to some kind of glitch like the one I encountered before of loops within loops.

EDIT: My program is correctly averaging the amount of court time Isaiah received in his 15 games Cavs stretch at 27:03. Basketball Reference says this is 27.1 minutes or 27:06. This is because it is being rounded up from 27:03.47 to the nearest decimal. (27.1/27.05)*15.6 equals 15.63 so it isn't good enough to make sense of things. The mean average of the 15 per 48 minute values for plus/minus equals 15.253. The means are based on values accurate to the tenth of a decimal, so it should be accurate to the 10th of a decimal. Isolated a 24:00 game and a +/- of 1 extrapolates correctly to 2.0 per 48 minutes on both the game and the averages.
Posts: 202
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Wed Mar 14, 2018 2:59 am

The team stuff is working, and better than ever before. I am opening up a number of splits I disabled for players: if a player played for multiple teams, you could also find their splits for each team at home, away and for wins and losses. I disabled this for being superfluous. But at least the home and away ones are very valuable if I get around to allowing you to create predictive formulas. For example, you might guess that a team may have x rebounds on any given night by taking their total and taking their opponents average concessions and dividing them by two. But with the home and away splits, you can now factor into account home and away. It gives additional tools for trying to create the ultimate formula for predicting game results. I hope to finesse it in the next day. I have to create a system that allows for 3 things: partial formulas that only determine the point spread, like SRS; full formulas that predict each statistic in a game, such as rebounds; and game score formulas which assign weightings to each statistic to produce game scores for predicting results. I see no reason why all 3 can't be accomodated simultaneously with flexibility.

I am reasonably confident my software is not in error with the plus/minus scores, but ESPN was: should have been 15.6 or possibly 15.7 if it is rounded down from 15.65 or something with the slight adjustment measured for appriximating minutes played, not 15.8.
Posts: 202
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Thu Mar 15, 2018 3:29 am

Most up to date version: ...

When processing as team stats, the assists formulations messes things up because it negates the players field goals (determines the percentage of assists a player made to every player on a team discounting his shots). When it processed the team version, it treats player values as team values. So it tells it to work out the assist percent from 0 field goals (either made or attempts, I can't remember which I use right now) + turnovers, creating a result that throws things a bit out of balance. I'll look into accomodating formulas like that.

I've also made explicit mention that the program doesn't support 'order of operation', so you should be generous with use of brackets. For example, it doesn't know how to process this very well: PLFTM + PLFGM * 2 + PL3PM * 3. Order of operation rules specify you use the multiplication and division first, multiplying PL3PM by 3 and PLFGM by 2. Then it processes from left to right. Without doing those first, the results can come out incorrectly especially as the expression becomes more complex. It does process parenthesis first, so instead you need to be generous with brackets which forces them to be done first PLFTM + (PLFGM * 2) + (PL3PM * 3). It is low priority to fix up as I imagine it might be high stress to figure out how to add the new stuff into the mess I done up (I had to come up with my own conception for converting strings into formulas). I didn't really notice it was a problem before adding GMSC recently.

The value of assists to rebounds from a team stats perspective. I had to hand type out these numbers, I think my next goal will be to create a custom copy to clipboard option. Especially as my eyes weren't in a state that could see very well (slime coming out of them after an eye trauma; may also be from lack of sleep, my eyes tend to want to go slimy when I don't sleep).

Since offensive rebounds often corresponds to negative team shooting performance, I've isolated defensive rebounds.

This is the first time that I've examined it seriously on a team level (using the "ALL" search after processing out a full season to generate game stats with ""). I thought there might be some difference in valuation to player numbers but it appears more or less the same. The numbers are from the 2016-17 regular season.

I've grouped each bracket into a similar number of games between them. I started at the first bracket where the numbers had enough games to stabilize.

AST 36 (37.6): 31-3 (91.18%) +21.8 FG:54.21% DFG:44.54%
...means it is for 36 or more team assists in a game for an average of 37.6 team assists, corresponding with 31 wins and 3 losses -- 2.8% of all games -- and (win%) and a 21.8 points average winning marin. Then it shows the FG and DFG% which is self-explanatory. Assists inflates the FG% while defensive rebounds corresponds with lower opponent FG%. The league wide average FG% for that season was 45.72%.

It is clear assists are more valuable than defensive rebounds are (and even more so total rebounds due to their negative correlation with team performance). The idea of having two great passers (eg. Chris Paul and James Harden) or numerous good passers on the same team may be vastly beneficial in this way.

AST 36 (37.6): 31-3 (91.18%) +21.8 FG:54.21% DFG:44.54%
DRB 46 (47.6): 31-6 (83.78%) +15.6 FG:44.95% DFG:37.77%

AST 35 (36.9): 43-4 (91.49%) +20.1 FG:54.03% DFG:45.17%
DRB 45 (46.8): 44-9 (83.02%) +13.8 FG:44.57% DFG:38.03%

AST 34 (36.2): 55-6 (90.16%) +19.5 FG:53.86% DFG:45.27%
DRB 44 (46.1): 56-13 (81.16%) +13.1 FG:44.70% DFG:38.39%

AST 33 (35.3): 75-9 (89.29%) +17.2 FG:53.26% DFG:45.03%
AST 32 (34.3): 103-17 (85.83%) +15.2 FG:52.32% DFG:45.31%
DRB 43 (45.1): 83-20 (80.58%) +11.8 FG:44.57% DFG:38.78%

AST 31 (33.4): 145-23 (86.31%) +14.0 FG:51.79% DFG:45.42%
DRB 42 (44.1): 120-29 (80.54%) +10.8 FG:44.74% DFG:38.97%

AST 30 (32.5): 186-44 (80.87%) +12.1 FG:51.35% DFG:45.71%
DRB 41 (43.3): 163-45 (78.37%) +9.8 FG:44.80% DFG:39.42%

AST 29 (31.4): 252-74 (77.30%) +10.2 FG:50.72% DFG:45.86%
DRB 40 (42.2): 235-69 (77.30%) +9.2 FG:45.19% DFG:40.11%

AST 28 (30.7): 315-103 (75.36%) +9.7 FG:50.15% DFG:45.63%
DRB 39 (41.4): 306-104 (74.63%) +8.9 FG:45.62% DFG:40.68%
Posts: 202
Joined: Fri Aug 29, 2008 2:00 pm

Re: Making my own basketball statistics program

Postby dwayne2005 on Sat Mar 17, 2018 3:43 am

It is custom formatting output for the 3 modes and the 2 types of stats for 2 of those 3 modes, so 5 of 6 overall. It is probably further than 83% complete though as that remaining one only requires a bit of research and tweaks to get working. What the custom formatting can't do at present is output team stats unless the team search is done. I've not committed those stats to a grid or an array, they just get processed as they go and at the moment can't be referenced after processing, and I am reluctant to increase the RAM usage by doing so unless over time I am persuaded it is needed. I am probably even less likely to devise a formula to pull it from the broader text output of the program. But you can get it to process out the custom results, so you could simply type in (OPFGM/OPFGA)*100 for the field below minutes for instance and format a custom output with |C02 to include the opposing teams shooting percent in the custom output.

I think I am getting faster and more efficient at programming. The idea is that eventually when you click on games, it'll bring up the game stats and from there there will be a button to coach a game where I'll implement my ideas for that. But I may look into other projects before I get there such as creating a customizable rogue game. I have actually no idea what the first thing I should do when approaching game management is. Got a feeling I should get straight into the clocks and have them work over the top of one another and then insert various things and then overlap them with your inputs. I propose a shout meter when you can shot instructions at players, for instance to shoot, and the player will respond. So you can influence the game to a degree beyond simply subbing players in and out and eventually choosing a fantasy line up.
Posts: 202
Joined: Fri Aug 29, 2008 2:00 pm


Return to NBA & Basketball

Who is online

Users browsing this forum: No registered users and 6 guests