Michal Černý and Milan Hladík. The complexity of computation and approximation of the t-ratio over one-dimensional interval data. Comput. Stat. Data Anal., 80(0):26–43, 2014.
[PDF] [gzipped postscript] [postscript] [HTML]
The main question is how to compute the upper and lower limits of the range of possible values of a given statistic, when the data range over given intervals. Initially some well-known statistics, such as sample mean, sample variance or F-ratio, are considered in order to illustrate that in some cases the limits can be computed efficiently, while in some cases their computation is NP-hard. Subsequently, the t-ratio (variation coefficient) is considered. It is investigated when the limits for t-ratio are computable in polynomial time and a new efficient algorithm is designed for this case. Conversely, complementary NP-hardness results are proved, demonstrating the cases when the computation cannot be done efficiently. Subsequently, the NP-hardness results are strengthened: it is shown that under certain assumptions, even an approximate evaluation with an arbitrary absolute error is NP-hard. Finally, it is shown that the situation can also be (in some sense) regarded positively: a new pseudopolynomial algorithm is developed. The algorithm is of practical importance, especially when the dataset to be processed is large and does not contain excessively large numbers.
@article{CerHla2014a, author = "Michal {\v{C}}ern\'{y} and Milan Hlad\'{\i}k", title = "The complexity of computation and approximation of the t-ratio over one-dimensional interval data", journal = "Comput. Stat. Data Anal.", fjournal = "Computational Statistics and Data Analysis", volume = "80", number = "0", pages = "26-43", year = "2014", doi = "10.1016/j.csda.2014.06.007", issn = "0167-9473", url = "https://doi.org/10.1016/j.csda.2014.06.007", bib2html_dl_html = "http://www.sciencedirect.com/science/article/pii/S0167947314001789", abstract = "The main question is how to compute the upper and lower limits of the range of possible values of a given statistic, when the data range over given intervals. Initially some well-known statistics, such as sample mean, sample variance or F-ratio, are considered in order to illustrate that in some cases the limits can be computed efficiently, while in some cases their computation is NP-hard. Subsequently, the t-ratio (variation coefficient) is considered. It is investigated when the limits for t-ratio are computable in polynomial time and a new efficient algorithm is designed for this case. Conversely, complementary NP-hardness results are proved, demonstrating the cases when the computation cannot be done efficiently. Subsequently, the NP-hardness results are strengthened: it is shown that under certain assumptions, even an approximate evaluation with an arbitrary absolute error is NP-hard. Finally, it is shown that the situation can also be (in some sense) regarded positively: a new pseudopolynomial algorithm is developed. The algorithm is of practical importance, especially when the dataset to be processed is large and does not contain excessively large numbers.", keywords = "Interval data, t-ratio, Computational complexity, NP-hardness, Inapproximability, Pseudopolynomial algorithms", }
Generated by bib2html.pl (written by Patrick Riley ) on Wed Oct 23, 2024 08:16:44