logo-ri

Repositório Institucional da Produção Científica da Marinha do Brasil (RI-MB)

Use este identificador para citar ou linkar para este item: https://www.repositorio.mar.mil.br/handle/ripcmb/847104
Título: Online Large-Scale Hypothesis Tesng with Corrupted Data
Autor(es): Alves, Victor Benicio Ardilha da Silva
Orientador(es): Szechtman, Roberto
Chen, Louis
Palavras-chave: False discovery rate
Power
Data corrupon
Cascading effect
Áreas de conhecimento da DGPM: Engenharia de produção aplicada à pesquisa operacional e gestão da inovação
Data do documento: 2024
Editor: Naval Postgraduate School (NVS)
Naval Postgraduate School (NVS)
Descrição: This thesis examines the robustness of the Levels Based On Recent Discovery (LORD) algorithmwhenexposedtocorrupteddata,particularlywithincriticalreal-timeprocessing environments like the Brazilian Navy’s Blue Amazon Management System (SisGAAz). Ourstudyrevealsthatmaintainingtheintegrityofstatisticaltestingiscrucial,mainlywhere decision-makingdependsontheaccuracyofdataanalysisconductedonline. Ourresearchidentifiesandrigorouslyevaluateseffectivemitigationstrategiesagainstprobabilisticdatacorruptionscenarios.Keyfindingshighlighttherobustefficacyof“phantom” rejections and the strategic integration of the LORD algorithm with the online Benjamini andHochberg(BH)algorithm,avariationadaptedfromthetraditionalofflineBHmethod. These approaches, we assert, maintain testing power significantly, even under adversarial manipulations,instillingconfidenceintheireffectiveness. Weproposeacontrolledadversarialsetupinvolvingtwoentities:“Blue,”thedefenderwho aims to make true discoveries, and “Red,” the attacker focused on data corruption. Our analysis investigates several attack scenarios. The first is a singular anticipated attack that manipulatesthefirsttruediscoveryandtraditionallytriggersacascadeeffect,counteredby adjusting the decay rate of each test level to buffer against such disruptions. Additionally, we explore multiple p-value corruption scenarios where strategically placed “phantom” rejections can reclaim compromised testing power, although this strategy faces practical challenges due to the necessity of predicting attack probabilities. Lastly, indiscriminate attacks on any p-value show that integrating the LORD algorithm with the online BH algorithm is exceptionally effective, maintaining the algorithm’s robustness even amidst widespreadcorruption. The thesis concludes that while prevalent algorithms are adequate for handling FDR in trustworthydatascenarios,theireffectivenessdiminishesunderadversarialdatamanipulation, a common issue in real-time data environments. Our findings suggest that enhancing algorithmic robustness against data corruption supports reliability in statistical testing and contributes to broader research and application in adversarial conditions. We propose new avenues for future investigation, such as exploring data corruption impacts on other existing algorithms and developing a “pure” algorithm. This new algorithm could offer a more robustalternativetothecurrentmixedapproach,providingastrongerdefenseagainstdata manipulation.
Tipo de Acesso: Acesso aberto
URI: https://www.repositorio.mar.mil.br/handle/ripcmb/847104
Tipo: Dissertação
Aparece nas coleções:Engenharia Naval: Coleção de Dissertações

Arquivos associados a este item:
Arquivo Descrição TamanhoFormato 
Dissertação - CC Benicio.pdf8,21 MBAdobe PDFVisualizar/Abrir


Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.