I would be interested to know what you make of a recent paper in Nature from AlphaDev which claims to have developed a novel faster sorting algorithm. It did not contain a new sorting algorithm. Coming up with assembly level tricks is not the same as finding a new sorting algorithm. The results contained in it are interesting. It just does not do what it says on the tin. Is AI given a free pass when it comes to reviews in that particular venue?
Have you requested feedback from Dr. Gelman? This sort of thing seems like it would be right up his alley, even if the subject matter isn’t his area of expertise.
It is getting even worse, when it comes to real-life research that aims to find causal relations between genAI and outcome variables, that many organisations can resonate with like productivity. Quality of texts was measured by three guys (who unwillingly fed the training data set, which is a very human form of data leakage) and a self-invented interrater reliability via IRC = .40, which is "small", at best. Also a renown journal:
Introducing the REFORMS checklist for ML-based science
Brilliant work!!! Guidelines like these help keep science on the pathway of accuracy, productivity, and ethical clarity.
I would be interested to know what you make of a recent paper in Nature from AlphaDev which claims to have developed a novel faster sorting algorithm. It did not contain a new sorting algorithm. Coming up with assembly level tricks is not the same as finding a new sorting algorithm. The results contained in it are interesting. It just does not do what it says on the tin. Is AI given a free pass when it comes to reviews in that particular venue?
This seems like it applies well to avoiding ML screwups in a commercial setting also.
Have you requested feedback from Dr. Gelman? This sort of thing seems like it would be right up his alley, even if the subject matter isn’t his area of expertise.
It is getting even worse, when it comes to real-life research that aims to find causal relations between genAI and outcome variables, that many organisations can resonate with like productivity. Quality of texts was measured by three guys (who unwillingly fed the training data set, which is a very human form of data leakage) and a self-invented interrater reliability via IRC = .40, which is "small", at best. Also a renown journal:
https://www.science.org/doi/10.1126/science.adh2586