Physical Activity in Adults With Fatigue After Cancer Treatment: A Systematic Review Of Randomized Trials With Fatigue as an Eligibility Criterion

Editor’s Note: This article is included within the RiSE issue of Communications in Kinesiology at the behest of the editors due to its rigor, reproducibility, and transparency. Physical Activity in Adults With Fatigue After Cancer Treatment: A Systematic Review Of Randomized Trials With Fatigue as an Eligibility Criterion Rosie Twomey, Samuel Yeung, James G. Wrightson, Lillian Sung, Paula D. Robinson, Guillaume Y. Millet, S. Nicole Culos-Reed


Introduction
Fatigue, characterized by a sensation of tiredness, weariness, or exhaustion, is a common and debilitating symptom associated with cancer or cancer treatment. Cancer-related fatigue (CRF) is a distressing, persistent sense of physical, emotional, and/or cognitive tiredness or exhaustion that is not proportional to recent activity and interferes with usual functioning (Berger et al., 2015). CRF differs from fatigue experienced by healthy individuals because it is non-transient and less likely to be relieved by rest. Most people undergoing cancer therapy will experience CRF (e.g.~85% during chemotherapy (Pearce et al., 2017)). Although estimates of prevalence after treatment (with curative intent) vary depending on the definition and measurement of CRF, many people will experience an improvement in CRF over time once treatment is completed. However, although CRF can resolve within the first few months after treatment, for approximately 25% of people, it will continue for months or years (Abrahams et al., 2016;Goedendorp et al., 2013;Jones et al., 2016). The terms 'post-cancer fatigue' and 'chronic CRF' are used to describe CRF that continues for months or years after treatment (greater than 3 months is one criterion used see Bruggeman-Everts et al., 2017;Sandler et al., 2017;Twomey et al., 2020). For the purpose of this review, we will use the term 'post-cancer fatigue' to describe CRF that is clinically relevant (i.e. moderate to severe) and continues post-cancer-treatment. Post-cancer fatigue can result in a reduction in health-related quality of life (Bower et al., 2000), increased utilization of healthcare resources (Goldstein et al., 2012;Heins et al., 2013), difficulties returning to work, and a reduced capability to work (Duijts et al., 2014;Islam et al., 2014).
Physical activity is recommended for the management of CRF, both during and after cancer treatment (Campbell et al., 2019). This is based on evidence from more than 170 randomized trials to date (Oberoi et al., 2018), and there have been multiple systematic reviews on the topic in the past decade (Kelley & Kelley, 2017;Kessels et al., 2018;Mustian et al., 2017;Oberoi et al., 2018;Vulpen JK et al., 2019). However, the majority of physical activity interventions are delivered during treatment, with <25% delivered after treatment completion (Oberoi et al., 2018). Furthermore, relatively few randomized trials involving physical activity interventions include fatigue as an eligibility criterion (Oberoi et al., 2018); thus, CRF may not be present or may only be mild at baseline. Therefore, there are relatively few studies investigating the benefits of exercise for post-cancer fatigue. We recently reported that some people with post-cancer fatigue experience post-exertional symptom exacerbation after exercise (Twomey et al., 2020). This highlights the need to monitor tolerance of exercise and symptom exacerbation in future exercise interventions involving people with post-cancer fatigue. There is a need to examine the quantity and quality of the existing literature on physical activity for post-cancer fatigue. The objective of this systematic review was to summarize and evaluate the effect of physical activity on post-cancer fatigue in adults, using randomized trials where fatigue was an eligibility criterion. had to be adults (aged ≥ 18 or described as adults) with a cancer diagnosis who had completed initial cancer treatments (e.g., surgery, chemotherapy and/or radiation therapy). Studies had to explicitly state that fatigue was a participant eligibility/inclusion criterion, regardless of how this was described or assessed. Studies had to involve a physical activity intervention. As previously reported by Oberoi et al. (Oberoi et al., 2018), eligible interventions included physical activity of any type, intensity, frequency, or duration used for the management of fatigue. Physical activity was defined as any bodily movement produced by skeletal muscles, which results in energy expenditure above resting (basal) levels (Garber et al., 2011). This included physical activity interventions categorized using the American College of Sports Medicine (Garber et al., 2011): (i) aerobic -includes brisk walking, running, cycling, climbing stairs, working out in a gym using a treadmill, stationary bike or elliptical machine, water aerobics, aerobic dance and hiking; (ii) neuromotor -includes yoga, tai chi and qigong; (iii) resistance -includes use of free weights and dumbbells, machines with stacked weights or pneumatic resistance, elastic tubing and resistance bands; and (iv) flexibility -includes ballistic stretches, dynamic stretching, and static stretching. Single-joint physiotherapy interventions were excluded, and cognitive behavioral therapy and other similar therapies were excluded unless physical activity was a major focus of the intervention. As previously detailed by Oberoi et al. (2018), eligible control group types included: (i) usual care or waitlist control; (ii) physical activity when studies compared two different types of physical activities such as aerobic versus resistance; (iii) non-physical activity, active control (not usual care or waitlist control) such as psychological, pharmacological or mind and body interventions; and (iv) combination (physical activity and non-physical activity, active control). Studies were included if they measured fatigue as a primary or secondary outcome. The primary difference between the present review and the previous review (Oberoi et al., 2018) is that the present review excluded studies if they did not include fatigue as an eligibility/inclusion criterion and if the physical activity intervention was delivered during chemotherapy or radiation therapy. We screened the 170 studies included in the previous review for eligibility in the present review according to these additional criteria (Figure 1).

Data Extraction
Search results were stored and de-duplicated in Zotero, an open-source reference management software (https://www.zotero.org/). Following de-duplication, screening of the references retrieved in the literature search was performed in a systematic review software (Ouzzani et al., 2016). The titles and abstracts were independently screened for eligibility by two researchers (SY, RT), blinded to the other researcher's decisions during this initial process. Any publication considered potentially eligible by either reviewer was retrieved in full and assessed for eligibility. After this process, the inclusion of studies in this meta-analysis was determined by the agreement of both reviewers. Discrepancies between the two reviewers were resolved by consensus and adjudication by a third reviewer (JGW). Data were extracted in duplicate by two reviewers (SY and RT), and any discrepancies were resolved by consensus and adjudication by a third reviewer (JGW). Where multiple records reported findings from the same trial, only one record was included (that which reported on fatigue as an outcome pre-versus post-intervention). Extracted variables are presented in Tables 1-3 and the supplements.

(a) Risk of Bias
The risk of bias was assessed using the Cochrane Collaboration's tool for randomized trials (RoB 1.0, Higgins et al., 2011). RT and JGW graded each risk of bias parameter as high risk, low risk, or unclear risk based on recommendations for judging the risk of bias (Higgins et al., 2011). The approach evaluates sequence generation and allocation concealment (selection bias), blinding of participants and personnel (performance bias), blinding of outcome assessors (outcome bias), incomplete outcome data (attrition bias), selective reporting (reporting bias) and other biases. For attrition bias, we judged studies to be at high risk when more than 30% of data were missing for the post-intervention (short-term) followup. For reporting bias, one reviewer (RT) checked for a major discrepancy between the registered and published outcomes of these trials and decisions were reviewed by a second reviewer (JGW). A major discrepancy was defined if the registered and published primary outcomes (or fatigue outcome, if fatigue was not specified as a primary outcome) were different or assessed at a different time point, according to specific criteria described in Chen et al. (2019). The other bias category was used for baseline differences, reporting of adherence to interventions and contamination between groups. Effects may be underestimated if participants in the physical activity intervention do not adhere and if participants in the control group increase physical activity due to involvement in the study (and this may or may not be reported). We considered studies with <70% adherence to have a high risk of other bias. We did not restrict the metaanalysis to only studies that were judged as having a low risk of bias because we anticipated that few would be included. The Grades of Recommendation, Assessment, Development and Evaluation (GRADE) system was used, which classifies the certainty of evidence as one of four levels: high, moderate, low, and very low (Guyatt et al., 2008). Because there can be challenges with blinding, adherence and contamination in PA studies in clinical populations even if they are well designed, we did not downgrade based on this alone. Instead, we downgraded by one level if other biases (e.g. selection or attrition bias) were apparent.

(b) Meta-Analysis
A meta-analysis for the severity of fatigue across different scales at the end of the intervention was conducted. For two trials that contained three arms (two eligible physical activity groups versus a control group) (Kröz et al., 2017;Yuen, 2007), a single pairwise comparison was created by combining data from both groups using methods recommended in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al., 2019). For trials with two arms comparing two different types of physical activities, either the aerobic or the activity of lower intensity was chosen as the experimental arm (with the consensus of RT, SY and JGW). Data were analyzed for the first time-point following completion of the intervention to maximize the data available and due to heterogeneity in the timing of follow-up. The meta-analysis was performed using an open-source add-on for meta-analysis (Viechtbauer, 2010) in the statistical software environment of R (R Core Team, 2017). For analyses across different fatigue scales, data were synthesized using the standardized mean difference (SMD), i.e. Hedges' g, after rescaling such that higher scores reflect more fatigue. Because we anticipated heterogeneity between studies, a random-effects model was used (IntHout et al., 2014). Cochran's Q and the I 2 statistic were used to evaluate heterogeneity [the latter using Cochrane's rough guide to interpretation; Higgins et al. (2011)]. Although not a measure of absolute heterogeneity, the I 2 describes the percentage of variability in the point estimates that is due to heterogeneity rather than sampling error (Higgins et al., 2011).

Results
The flow diagram of study identification and selection is presented in Figure 1. The database search identified 10,044 citations, of which 6301 were removed as duplicates, including 29 citations that were duplicated with Oberoi et al. (2018) (due to a planned overlap in search dates). From the 3743 records, we evaluated 67 full texts. Five studies met the eligibility criteria and were included in this systematic review (Abrahams et al., 2016;Kim et al., 2020;Kröz et al., 2017;Pagola et al., 2020a;Sandler et al., 2017). One study required adjudication with a third reviewer (Pyszora et al., 2017). That study investigated two weeks of physiotherapy for patients admitted to a palliative care service and was excluded due to the severity of the disease and high risk of deterioration in the general condition of the participants during the study period (Pyszora et al., 2017). We included 16 records from Oberoi et al. (2018) for full-text evaluation based on collaboration with the authors of this review , and excluded two. We contacted the authors of three studies to confirm eligibility regarding treatment completion (Campo et al., 2014;Donnelly et al., 2011;Mayo et al., 2014), and these were subsequently included. Therefore, a total of 19 studies were eligible for this review and were included ( Figure 1). Of these 19 studies, 12 were described by the authors as pilot studies or as assessing feasibility Mayo et al. (2014).
Estimated Proportion of Studies with Fatigue as an Eligibility Criterion Of the 3743 citations from our updated search, we estimate (based mainly on abstract screening alone) that approximately 92 would have met the original eligibility criteria of Oberoi et al. (2018). This provides an estimate of the proportion of studies that investigated post-cancer fatigue and included fatigue as an eligibility criterion versus the total number that used a physical activity intervention for the management of fatigue. Adding 92 to the 169 studies (in adults) included Oberoi et al. (2018) gives a total of 261 studies. Therefore, we estimate that 7% (i.e., 19 of 261) of all randomized studies on physical activity for the management of CRF included fatigue as an eligibility criterion after cancer treatment (i.e., investigate post-cancer fatigue). Of the remaining 242 studies, only four include fatigue as an eligibility criterion (during active treatment), while 238 include fatigue as an endpoint but not an eligibility criterion (91%).

(b) Fatigue as an Eligibility Criterion
Across studies, there was variability in how fatigue as an eligibility criterion was assessed (Table  2). Sandler et al. (2017) used an empirically derived somatic symptom subscale of the Somatic and Psychological Health Report (SPHERE) (Hickie et al., 2001), where a score of ≥ 3 has been validated to designate disabling/prolonged fatigue states (Hadzi-Pavlovic et al., 2000). Two studies (Abrahams et al., 2016;Prinsen et al., 2013) used the fatigue severity subscale of the Checklist Individual Strength (1994) and a cut-point of ≥ 35, developed for use in people with chronic fatigue syndrome and also recommended for CRF (Worm-Smeitink et al., 2017). Cantarero-Villanueva et al. (2013) used the revised Piper Fatigue Scale (Piper et al., 1998); the origin of the cut-point used was not stated (Table 2). Campo et al. (2014) used the National Cancer Institute Common Terminology Criteria for Adverse Events (Common Terminology Criteria for Adverse Events (CTCAE) Version 4.0 Published, 2009) to qualify fatigue not relieved by rest, limiting instrumental or self-care activities of daily living. However, a score >20 on a 0-100 general fatigue grading scale was also accepted. Pagola et al. (2020a) used the Perform questionnaire and a cut point of <45 that was not justified by the authors but may have been based on the mean score of patients not undergoing any kind of treatment (Baró et al., 2009). Nearly half of the studies (n=9) used a 0-10 numerical rating scale (NRS) or visual analogue scale (Table 2) as an eligibility criterion for fatigue. Six of these used a threshold score of ≥ 4 ( Table 2) as recommended for routine CRF screening (Fabi et al., 2020;Howell et al., 2013;Pearson et al., 2016). Kröz et al. (2017) supplemented the rating with the Cancer Fatigue Scale (CFS-D) (Kröz et al., 2008;Okuyama et al., 2000), and we were unable to determine the origin of the cut-point used, though other cut-points were available (Okuyama et al., 2001). Rogers et al. used a 0-10 scale for average fatigue over the past week (Rogers et al., 2014) but also accepted a rating of sleep dysfunction. The authors noted that this method was not selected based on validity/clinical judgment, but to avoid overly restrictive inclusion criteria that would have impeded their ability to recruit within a time frame limited by budgetary constraints. Payne et al. (2008) also used a score of ≥ 3 and specified three components (rating of usual fatigue, frequency and interference). In a lenient use of a 0-10 NRS, Donnelley et al. (2011) included participants if they scored ≥ 1.

(c) Measuring and Monitoring Fatigue
Overall, 13 different scales were used across the 19 studies (Table 1), indicating that there was no consensus on the measurement of CRF before and after a physical activity intervention. Only 10 studies specified that fatigue was a primary outcome (and one included it as part of a composite measure). Five studies did not specify a primary outcome in the paper (Table 1). Only two studies used the same tool for both the screening for and measurement of fatigue (Abrahams et al., 2019;Pagola et al., 2020a). Three studies used more than one fatigue scale to measure fatigue as an outcome Donnelly et al., 2011;Larkey et al., 2015). Six studies specified the minimal clinically important difference (MCID) in fatigue in the methods (Abrahams et al., 2019;Cantarero-Villanueva et al., 2013;Donnelly et al., 2011;Mayo et al., 2014;Pagola et al., 2020a;Sandler et al., 2017), though only of these four reported on this in the results/discussion (Abrahams et al., 2019;Donnelly et al., 2011;Pagola et al., 2020a;Sandler et al., 2017). Three further studies considered the MCID Bower et al. (2012), and the remainder (n=10), did not (Table 1). Six studies considered improvement or deterioration of fatigue on an individual level (Abrahams et al., 2019;Campo et al., 2014;Mayo et al., 2014;Rogers et al., 2014;Sandler et al., 2017;Yuen, 2007) (Table 1). Despite relatively small sample sizes (ranging from 5-42 participants in the intervention arm), no study plotted individual trajectories from pre-to post-intervention.

(d) Characteristics of the Intervention
The characteristics of the interventions are presented in Table 3. Half the interventions were categorized as aerobic, four included a combination of resistance and aerobic exercise, three were neuromotor, and one was resistance only. Several interventions were multimodal, including behavioral support alongside physical activity. More than half of the interventions were 12-weeks long, and the others ranged from 6-24 weeks. Most of the interventions were home-based, some were fully or partially supervised, there was variation in exercise frequency and duration, and when exercise intensity was reported, it was mainly low to moderate and monitored using heart rate or rating of perceived exertion. Several studies did not include details regarding exercise progressions (Table 3)  events Cantarero-Villanueva et al., 2013;Kröz et al., 2017;Larkey et al., 2015;Pagola et al., 2020a;Stan et al., 2016). Therefore, most studies did not include the minimum data set considered necessary to report exercise interventions (Slade et al., 2016).

(e) Risk of Bias
Most studies were considered to be at low risk of selection bias related to random sequence generation (two at high risk (Heim et al., 2007;Kröz et al., 2017) and three at unclear risk (Galantino et al., 2003;Mayo et al., 2014;Payne et al., 2008)) and allocation concealment (two at high risk (Heim et al., 2007;Kröz et al., 2017) and seven at unclear risk (Galantino et al., 2003;Kim et al., 2020;Mayo et al., 2014;Pagola et al., 2020a;Payne et al., 2008;Stan et al., 2016;Yuen, 2007)). Larkey et al. (2015) used a form of 'sham' QiGong to blind participants, but the personnel delivering the interventions were not blinded. Some bias is inherent to exercise studies even if they are well-designed, due to the inability to blind participants. All studies except Larkey et al. were at high risk of detection bias (where the participant is the assessor for patient-reported outcomes, and knowledge of the intervention could influence the outcome). Most studies were considered to be at low risk of attrition bias, but four studies were at high risk of attrition bias due to >30% attrition and/or unequal attrition between groups (Heim et al., 2007;Kröz et al., 2017;Mayo et al., 2014;Prinsen et al., 2013;Stan et al., 2016). Only seven studies included a reference to a trial registration (Abrahams et al., 2019;Campo et al., 2014;Donnelly et al., 2011;Kröz et al., 2017;Pagola et al., 2020b;Prinsen et al., 2013;Sandler et al., 2017), and Prinsen et al. (2013) also cited a study protocol. However, only three studies were prospectively registered (Abrahams et al., 2019;Pagola et al., 2020a;Prinsen et al., 2013), and only two of these could be considered low risk of reporting bias (Figure 2). By definition (Chen et al., 2019), the comparison in the paper by Sandler et al. (2017) resulted in a discrepancy with the trial registration and was rated as high risk of reporting bias due to a difference in the primary outcome between the trial registration (an improvement in fatigue as assessed as a global score from a structured clinical interview) versus the paper (an improvement in fatigue assessed using a scale). In practical terms, both outcomes are essentially a self-report of fatigue, but the justification for the switch should have been noted in the paper. There were no further major discrepancies, but Prinsen et al. (2013) was rated as unclear risk of reporting bias because the trial registration listed numerous primary outcomes, and the primary outcome was not specified in the paper. Other details for trial registrations are reported in Supplementary File 7. For the other nine studies, we were unable to find trial registrations. Five were conducted after the 2008 Declaration of Helsinki Revision regarding prospective registration as a principle of medical research. Three studies were considered to be at high risk of other bias due to baseline differences (Bennett et al., 2007;Kröz et al., 2008;Prinsen et al., 2013), and the risk was unclear in three studies (Donnelly et al., 2011;Galantino et al., 2003;Payne et al., 2008). Four studies were at high risk of other bias due to adherence of <70% (Campo et al., 2014;Donnelly et al., 2011;Kröz et al., 2008;Stan et al., 2016). Adherence was unclear or not reported in seven studies (Abrahams et al., 2019;Bennett et al., 2007;Galantino et al., 2003;Heim et al., 2007;Mayo et al., 2014;Payne et al., 2008;Prinsen et al., 2013). Finally, six studies were at high risk of other bias due to contamination (due to increases in physical activity in the control groups and/or due to similarities in physical activity levels between groups Larkey et al. (2015)

(f) Meta-Analysis
Sixteen studies were included in the meta-analyses (data from 758 participants) (Abrahams et al., 2019;Bennett et al., 2007;Bower et al., 2012;Cantarero-Villanueva et al., 2013;Donnelly et al., 2011;Kim et al., 2020;Kröz et al., 2017;Larkey et al., 2015;Mayo et al., 2014;Pagola et al., 2020a;Prinsen et al., 2012;Rogers et al., 2014;Sandler et al., 2017;Stan et al., 2016;Yuen, 2007). Three studies were excluded due to a lack of data or lack of response to a request for additional data. Data from one study (Heim et al., 2007) was extracted from a figure using WebPlotDigitizer (Rohatgi, n.d.). Data, code and additional details for the meta-analysis are available in supplements. The reduction in the severity of fatigue in the physical activity intervention compared to control was g = -0.40 (95% CI -0.68 to -0.11; p=0.010; Figure 3). The 95% CI in this random-effects model contains highly probable values for the mean effect of physical activity. However, the 95% prediction intervals (-1.41 to 0.62; Figure 3) provide information on the range of treatment effects that are likely to be seen in other settings (future studies or when working with people with post-cancer fatigue in an exercise oncology setting) (IntHout et al., 2016). The prediction interval demonstrates that although larger effects in the direction of benefit (i.e. a decrease in fatigue) are included, it also contains values above zero, meaning that physical activity may have no effect or may lead to worse fatigue in some settings/participants (as also noted in Table 1 and in previous reviews (Kelley & Kelley, 2017)). In addition, the Cochran's Q indicated variation across studies (Q = 45.64, p<0.001) and the I 2 statistic was 67% (95% CI 44.6 to 80.50%), which may represent substantial heterogeneity (Higgins et al., 2019).  The grey squares represent the SMD, and the left and right extremes of the squares represent the corresponding 95% confidence intervals. The grey diamond represents the overall effect and the bold black line represents the prediction interval, derived from the SMD and 95% CI.
Using GRADE, the certainty of the evidence (the extent of confidence that an estimate of effect is correct) was downgraded to low certainty (i.e. further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate) (Guyatt et al., 2008). The evidence was downgraded one point due to study limitations (high risk of bias) and one point due to inconsistent results (high heterogeneity) (Guyatt et al., 2008).
Exploratory sub-analysis showed that (i) removing n=3 studies that included physical activity in the control group (e.g. of a lower intensity) had very little influence on the primary meta-analysis (g = -0.42; 95% CI -0.77 to -0.08; prediction interval -1.55 to 0.70; p=0.020; Supplementary File 10); (ii) the summary effect was slightly reduced and not statistically significant in n=8 studies reporting change scores from pre-to post-intervention (g = -0.34; 95% CI -0.81 to 0.13; prediction interval -1.61 to 0.94; p=0.135) and (ii), the summary effect was reduced and not statistically significant in n=7 studies reporting a follow-up measurement of fatigue (g = -0.23; 95% CI -0.65 to 0.18; prediction interval -1.29 to 0.82; p=0.221).

Discussion
The primary objective of this review was to summarize the evidence for the effect of physical activity on CRF after cancer treatment in adults, based on randomized trials where fatigue was an eligibility criterion. We found 19 studies that met our inclusion criteria, indicating that only~7% of all randomized trials on physical activity for the management of CRF are designed to include people with fatigue after cancer treatment, and to date, most of these are pilot/feasibility trials. Most studies were at a high or unknown risk of more than one type of bias (excluding performance and detection bias), the evidence was graded as low certainty, and it was rarely possible to determine if participants were experiencing a chronic fatigue state.

(a) Fatigue as an eligibility criterion
Although fatigue is one of the most assessed outcomes in exercise oncology (Kelley & Kelley, 2017) and is considered one of the most consistent effects (Campbell et al., 2019), only a small minority of papers target people experiencing fatigue within the study design. More than 200 studies include fatigue as an endpoint but not as an eligibility criterion, and fatigue that persists for years after treatment is under recognized in the literature. To investigate post-cancer fatigue, a study must include participants who are experiencing CRF (a distressing, persistent sense of tiredness or exhaustion that interferes with usual functioning associated with cancer or cancer treatment (Berger et al., 2015)) that has not resolved over a prolonged period (e.g. several months). To ensure fatigue is cancer-related, alongside an evaluation of CRF onset and history, studies should exclude based on factors other than cancer/cancer therapy that may be the cause of fatigue. These include psychiatric disorders (e.g. a major depressive episode or psychosis), anemia, sleep disorders (e.g. untreated sleep apnea or restless leg syndrome), untreated hypothyroidism, autoimmune-related disorders (e.g. systemic lupus or rheumatoid arthritis) or shift work. Most studies (n=14) reported that they excluded participants based on a psychiatric disorder, but only half of the studies included consideration of factors that might explain CRF (Abrahams et al., 2019;Bower et al., 2012;Donnelly et al., 2011;Kim et al., 2020;Kröz et al., 2017;Larkey et al., 2015;Prinsen et al., 2013;Rogers et al., 2014;Sandler et al., 2017;Yuen, 2007), rather than only exclusion criteria based on contraindications to physical activity.
Although not all included studies used the terms 'post-cancer fatigue' or 'chronic CRF,' there was good congruence between the purpose of this review and the aims of the included studies. Therefore, it is surprising that most studies did not fully consider the screening and measurement of CRF (with some exceptions, e.g. Sandler et al. (2017)). Half of all studies reported the minimum time since treatment completion (Table 1), and half excluded participants based on other common causes of fatigue. Furthermore, there was uncertainty about the validity of most of the questionnaires and/or cut-points used as an assessment of CRF severity because it was unclear if the method was able to differentiate clinically-significant CRF from everyday tiredness. Two studies Larkey et al., 2015) used the vitality (energy and fatigue) scale of the 36-Item Short-Form Health Survey (SF-36). Beginning in 2000 (Bower et al., 2000), Bower et al. (1992) used the mid-point of the SF-36 to dichotomize participants into fatigued and non-fatigued groups, where >50 represents well-being, and ≤ 50 represents limitations or disability related to fatigue. The extent to which this is a valid cut-point for moderate-severe CRF is not certain (for example, an individual could answer "some of the time" for all four of the above items for a score of 42), but the cut-point has since been used in several studies and differences in, for example, pro-inflammatory cytokine activity between groups have been found (Bower et al., 2002).
Finally, two studies did not report the scale/method used for eligibility based on the presence of fatigue but reported that participants were included if they were "fatigued" (Bennett et al., 2007) or were "suffering from self-reported fatigue" (Galantino et al., 2003). Using an arbitrary or overly lenient definition of CRF may result in the inclusion of people who have negligible or mild fatigue. However, this can be avoided because there are well-established questionnaires with validated and clinically meaningful cut-points for the assessment of CRF severity (Belle et al., 2005;Minton & Stone, 2009;Yellen et al., 1997). Half of all studies used a 0-10 NRS for fatigue, and most did not report the framing of the question or recall period (e.g. fatigue right now, average fatigue over the past week, etc.). Although a 0-10 NRS is recommended for the routine screening of CRF in clinical practice (Fabi et al., 2020;Howell et al., 2013;Pearson et al., 2016), it is not recommended as the only screening method for eligibility in a randomized trial focused on CRF. The guidelines state that a score of ≥ 4 should be followed by a focused assessment that includes (though is not limited to) measurement of fatigue severity using a valid tool (Fabi et al., 2020;Howell et al., 2013;Pearson et al., 2016).
In recent studies that include a physical activity intervention and explicitly focus on CRF, there is little consensus in the literature for the cut-point to evaluate fatigue -even when the most widely-recommended questionnaire is used (the FACIT-F). For example, recent studies have used a cut-point of <45 (Sheehan et al., 2020), which is described as arbitrary in the study that first used it (Downie et al., 2006). Others have used a cut-point of <42 (Adams et al., 2018) or <43 (Dhillon et al., 2018), where 43 is the score that divided groups based on analysis of the general population versus patients with chemotherapy-induced anemia (Hgb <11.0 g/dL) (Cella et al., 2002). Scales used to measure CRF have been reviewed elsewhere (Dittner et al., 2004;Minton & Stone, 2009;Whitehead, 2009). One framework that should be considered is the consensus-developed diagnostic criteria, as proposed for the International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10) (Cella et al., 1998). To help operationalize this without a diagnostic interview, the FACIT-F has a cut-point (<34) that that correctly identifies over 90% of 'ICD-10 positive' cases and has been recommended for the 'diagnosis' of CRF (Belle et al., 2005).
Meta-analytic evidence for the effect of physical activity on post-cancer fatigue As shown in other metaanalyses, physical activity has a beneficial effect on CRF, but the effect is small to moderate (e.g. -0.30 -0.49) (Mustian et al., 2017;Oberoi et al., 2018). It has previously been suggested that physical activity may be most beneficial if those most in need are targeted (i.e. people with more severe or persistent CRF) (Bower, 2012;Campbell et al., 2019). However, in the present study, we found a similarly small-moderate (-0.40) effect to those earlier meta-analyses. One conclusion could be that the effect of physical activity on CRF is consistent and modest. However, there was substantial heterogeneity, and as we have outlined above, limitations with the definition of CRF in the included studies. Therefore, it is possible that the small effect is a result of methodological decisions made in the included studies, and our meta-analysis may involve the same problems as the previous literature. Indeed, in a quasi-experimental non-randomized study, Sheehan et al. (2020) included people with moderate-severe CRF after cancer treatment (mean scores of 20 on the FACIT-F). In this study, a large effect of~1.33 for the effect of progressive exercise versus health education was observed. This suggests that people with severe CRF may benefit more within a tailored and partly-supervised physical activity intervention (Sheehan et al., 2020). Future well-designed studies in people with post-cancer fatigue may help confirm this finding.
Exploratory sub-analysis showed that (i) removing n=3 studies that included physical activity in the control group (e.g. of a lower intensity) had little influence on the primary meta-analysis (g = -0.42; 95% CI -0.77 to -0.08; prediction interval -1.55 to 0.70; p=0.020; Supplementary File 10); (ii) the summary effect was slightly reduced and not statistically significant in n=8 studies reporting change scores from pre-to postintervention (g = -0.34; 95% CI -0.81 to 0.13; prediction interval -1.61 to 0.94; p=0.135) and (ii), the summary effect was reduced and not statistically significant in n=7 studies reporting a follow-up measurement of fatigue (g = -0.23; 95% CI -0.65 to 0.18; prediction interval -1.29 to 0.82; p=0.221).

(b) Monitoring of fatigue
Although it was not the primary aim of the study, we did pre-specify our interest in the monitoring of fatigue in included studies. We have recently reported that some people with post-cancer fatigue may experience post-exertional symptom exacerbation (Twomey et al., 2020). Almost half of the included studies did not report any method for monitoring fatigue during the intervention. We acknowledge that these studies may have included monitoring and not reported it and that some studies instead reported the number of participants that deteriorated from pre-to post-intervention alongside the number of participants who improved (Table 1). One study did advise that some participants will not experience improvements in their fatigue with an exercise intervention (based on the findings therein) (Rogers et al., 2014). Monitoring fatigue and adjusting exercise dose/intensity to avoid symptom exacerbation is important because not doing so may increase attrition and/or lead to deflated effect sizes. We suggested that a 0-10 NRS is well suited to frequent monitoring of momentary fatigue for regular review by an exercise professional, and this could be combined with a patient diary as used in some of the included studies. in at least one study (Prinsen et al., 2013), the effects of the intervention were not mediated by a change in physical activity. Thirteen studies reported raw means and standard deviations for participants with complete data (post-intervention). Only three studies included missing values carried forward (i.e. used the intention to treat analysis, which avoids overoptimistic estimates of the benefits of an intervention), and this is reflected in the sample sizes.

(c) Limitations
We acknowledge that although fatigue at baseline is rarely an eligibility criterion in studies on physical activity for CRF, this is not synonymous with fatigue being rarely present at baseline. Most studies explicitly focused on CRF without an eligibility criterion for fatigue, likely include a subset of participants with moderate-severe CRF. Arguably, the people most in need are those with advanced cancer with a high symptom burden. Although not a limitation per se, in our screening, we noticed that are there are very few studies on physical activity for CRF in people living with end-stage cancer. In this population, supportive care interventions and trials are challenging, and at least in the short-term, an eligibility criterion for CRF may be overly restrictive.
Recommendations for research on physical activity for post-cancer fatigue There are published guidelines assessment of CRF in clinical practice that may be useful to researchers (Fabi et al., 2020;Howell et al., 2013;Pearson et al., 2016). In addition to prospective trial registration and transparent reporting practices, our synthesis of the evidence on physical activity resulted in several specific recommendations for designing future trials on post-cancer fatigue: • Fatigue severity should be clinically relevant. This could be identified using a validated cut-point on a questionnaire with known psychometric properties, e.g. <34 (Belle et al., 2005) on the FACIT-F (Maqbali et al., 2019;Yellen et al., 1997). • The time since fatigue onset that is used to define 'post-cancer fatigue' or 'chronic CRF' should be specified and justified.' • Participants should be excluded based on factors other than cancer/cancer therapy that may be the cause of fatigue (e.g. psychiatric disorders). • The primary outcome should be a questionnaire that is validated for the measurement of CRF. • The validated MCID or the change score of interest must be pre-specified, and the number of participants improving or deteriorating must be reported using this score. • Fatigue should be monitored during the intervention, and data on symptom exacerbation must be reported. • The intervention should be tailored and guided by an exercise professional to monitor fidelity, support adherence and reduce attrition. • A complete description of the intervention in line with international reporting guidelines (Hoffmann et al., 2014;Slade et al., 2016) must be included so other researchers can replicate the intervention and build on research findings. • The protocol and all findings must be transparently reported (using supplementary files or archives such as the Open Science Framework as needed) (Altman & Moher, 2014).

(d) Conclusion
We estimate that less than 10% of the randomized trials of physical activity for CRF after cancer treatment include fatigue as an eligibility criterion. Based on limited data and substantial heterogeneity between trials, the benefit of physical activity for post-cancer fatigue is modest and variable. Additional transparently reported randomized clinical trials are needed to better understand the benefits of physical activity for post-cancer fatigue.

(d) Conflict of Interest
Authors have no conflicts of interest to declare.

(e) Funding
This study was not funded. However, RT was supported by the O'Brien Institute of Public Health and Ohlson Research Initiative at the University of Calgary during the conduct of this review.

(f) Acknowledgments
LS is the Canada Research Chair in Pediatric Oncology Supportive Care. The authors would like to thank Dr. Ian Lahart for helpful discussions and comments on the first version of the preprint.