Suppose chiral fermions $\psi$ interacting with gauge fields $A_{\mu,L/R}$. With $P_{L/R} \equiv \frac{1\mp\gamma_{5}}{2}$ and $t_{a,L/R}$ denoting the generators, the corresponding action reads
$$
S = \int d^{4}x\bar{\psi}i\gamma_{\mu}D^{\mu}\psi, \quad D_{\mu} = \partial_{\mu} - iA_{\mu,L}^{a}t_{a,L}P_{L} - i\gamma_{5}A_{\mu,R}^{a}t_{a,R}P_{R}
$$
To check the presence of the anomaly $\text{A}(x)$ in the conservation law for the current
$$
J^{\mu}_{L/R,c} \equiv \bar{\psi}\gamma^{\mu}\gamma_{5}t_{c}\psi,
$$
we have to calculate the VEV of its covariant divergence:
$$
\tag 1 \langle (D_{\mu}J^{\mu}_{L/R}(x))_{a}\rangle_{A_{L/R}} \equiv \langle \partial_{\mu}J^{\mu}_{L/R,a} +f_{abc}A^{L/R}_{\mu,b}J^{\mu}_{L/R,c}\rangle_{A_{L/R}} \equiv \text{A}^{L/R}_{a}(x),
$$
where $f_{abc}$ is the structure constant.
Let's study the one-loop contributions (other contributions do not exist, as was established by Adler and Bardeen) in $(1)$. In general, we have to study triangle diagrams, box diagrams, pentagon diagrams and so on, arising from the quantum effective action $\Gamma$. From dimensional analysis of corresponding integrals we conclude that the three-point vertex
$$
\Gamma_{\mu\nu\alpha}^{abc}(x,y,z) \equiv \frac{\delta \Gamma}{\delta A^{\mu}_{a}(x)\delta A^{\nu}_{b}(y)\delta A^{\alpha}_{c}(z)},
$$
which generates the triangle diagram, is linearly divergent, four-point vertex $\Gamma_{\mu\nu\alpha\beta}^{abcd}(x,y,z,t)$ is logarithmically divergent, five-point vertex $\Gamma_{\mu\nu\alpha\beta\gamma}^{abcde}(x,y,z,t,p)$ is convergent, and so on.
Unlike the abelian case, there the only triangle diagram makes the contribution in the anomaly, here more diagrams contribute. Precisely, we know that non-zero anomaly in triangle diagram requires non-zero coefficient
$$
D_{abc}^{L/R} \equiv \text{tr}[t_{a},\{t_{b},t_{c}\}]_{L/R}
$$
The box diagram (with the requirement of the Bose symmetry) is proportional to
$$
D_{abcd}^{L/R} \equiv \text{tr}[t_{a}\{t_{b},[t_{c},t_{d}]\}] = if_{cde}D^{L/R}_{abe},
$$
while the pentagon diagram - to (the subsctipt $[]$ means antisymmetrization)
$$
D_{abcde}^{L/R} \equiv \text{tr}[t_{a}t_{[b}t_{c}t_{d}t_{e]}] \sim f_{r[bc}f_{de]s}D^{L/R}_{ars}
$$
Therefore, it seems that they also make the contribution in the anomaly.
I have two questions.
1) The chiral anomaly arises in the result of impossibility of defining the local (in terms of momenta) action functional generating the counterterm which cancels the gauge invariance breaking corrections in n-point vertices. The triangle diagram is linearly divergent, and because of bose symmetry it can be shown that the only non-local action can generate the anomaly in the limit of small momenta. In this spirit, we can cancel the box and pentagon diagrams (which diverge linearly) by adding the local counterterms, so I don't understand why they make the contribution in the anomaly $(1)$.
2) If there is the reason why they can't be cancelled by adding the counterterm, what about hexagonal diagrams and so on? Why do they vanish? Because of something like Jacobi identity for structure constants?
An edit
It seems that the answer is that the following diagrams make the contribution in the anomaly $(1)$, but not because of $(2)$, $(3)$ (the latter just shows that box and pentagon diagrams anomalous contribution vanishes if there is no triangle anomaly). The reason to make the contribution is in the structure of the anomalous Ward identities.
Suppose we're dealing with consistent anomaly. Then we have (I've omitted the subscript $L/R$), by the definition,
$$
-\text{A}_{a}(x) = \delta_{\epsilon_{a}(x)}\Gamma \equiv \partial_{\mu}^{x}\frac{\delta\Gamma}{\delta A_{\mu,a}(x)} + f_{abc}A_{\mu,b}(x)\frac{\delta \Gamma}{\delta A_{\mu,c}(x)}
$$
The Ward identities for the $n$-point vertex are obtained by taking $n-1$ functional derivatives with respect to $A_{\mu_{i},a_{i}}$ and setting $A_{\mu_{i},a_{i}}$ to zero. It can be shown that the Ward identities for the derivative of the 4-point vertex (which is logarithmically divergent) contain 3-vectex functions which are anomalous. Therefore we see that the 4-point vertex also contribute to the anomaly (not by itself, since it is only logarithmically divergent, but through linearly diverging 3-point vertex).
What's about the 5-point vertex? The Ward identities for its derivative contain only the 4-point function, so in first sight it seems that it doesn't make the contribution in the anomaly. However, this is not true in particular cases. Indeed, if one of currents $\text{J}_{\mu}^{a}$ running in the loop is the global one, we can preserve the gauge invariance by pumping the anomaly to the $\text{J}^{a}_{\mu}$ conservation law. This is realized in particular by changing the 4-point vertex (not its derivative!) by the anomalous polynomial. Therefore the Ward identity for 5-point vertex becomes anomalous. However, even in this case this vertex may give no contribution in the anomaly (there is situation when the global current is the abelian one); in this case the $A^{4}$ term in the anomaly vanishes identically due to the group arguments - because of Jacobi identities.
This also illustrates why there are no anomalous contribution from the derivative of hexagonal diagrams, and higher.