I have a 16 node Cassandra 2.1.11 Cluster, divided into 2 racks (R0 and R1), 8 nodes per rack. Each node serves about 700Gb of data. The cluster looks pretty balanced. Each node has 2x3Tb HDD.
Datacenter: DC0
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.21.72 677.37 GB 256 ? 0260665f-5f5c-4fbc-9583-8da86848713a R0
UN 192.168.21.73 658.8 GB 256 ? ed1e7814-f715-41c8-97c9-8b52164835f9 R1
UN 192.168.21.74 676.97 GB 256 ? 62833182-b339-46f2-9370-0e23f6bb7eab R1
UN 192.168.21.75 657.1 GB 256 ? ab31f28b-ffea-489f-a4da-3b5120760b8e R1
UN 192.168.21.76 690.29 GB 256 ? e636bf2e-89d6-4bf3-9263-cf1ed67fcbd9 R1
UN 192.168.21.77 679.77 GB 256 ? 959e5207-1251-4c58-afa9-3910b5a27ff5 R1
UN 192.168.21.78 648.85 GB 256 ? 6f650315-1cd1-4169-b300-391800be974f R1
UN 192.168.21.79 675.96 GB 256 ? 324bd475-b5f6-4b39-a753-0cd2c80a46c4 R1
UN 192.168.21.65 636.01 GB 256 ? 65e3faa1-e8d5-4d78-a87e-bfde1f4095a5 R0
UN 192.168.21.66 674.89 GB 256 ? 213696eb-c4a0-4803-a9b3-0efd04c567f2 R0
UN 192.168.21.67 716.77 GB 256 ? 62542a8e-8177-4f13-9077-ea2426607ace R0
UN 192.168.21.68 666.1 GB 256 ? a9864059-3de2-48a2-a926-00db3f9791ee R0
UN 192.168.21.69 691.9 GB 256 ? 02ea1b28-90f9-4837-8173-ff79fa6966d7 R0
UN 192.168.21.70 681.16 GB 256 ? a9c8adae-e54e-4e8e-a333-eb9b2b52bfed R0
UN 192.168.21.71 653.18 GB 256 ? 6aa8cf0c-069a-4049-824a-8359d1c58e59 R0
UN 192.168.21.80 694.14 GB 256 ? 7abb5609-7dca-465a-a68c-972e54469ad6 R1
Now I'm trying to expand the cluster by adding 16 more nodes, also divided into 2 racks (R2 and R3). After adding all new nodes I expect a 32 node cluster, divided by 4 racks, with 350Gb of data on each node.
I add one node at a time according to the Cassandra documentation. I started Cassandra process at the first node, with the same configuration as existing nodes, but in R3 (new) rack. It caused 16 streams from existing nodes to newly added node, 250Gb of data in each, all data successfully transferred to new node, at this point process looks normal.
But after that data size lands on the new node, as shown by nodetool status it's starting to increase, now it already says 1.7Tb and keeps growing.
UJ 192.168.21.89 1.69 TB 256 ? 42a80db9-59d6-44b6-b79c-ac7819f69cee R3
It's something opposite to what I expected (350Gb per node but not 1.7Tb). 4Tb disk space out of 6Tb already used by Cassandra data dir on the new node.
I've thought that it isn't normal and have stopped the process.
Now I'm wondering what I'm doing wrong and what I should do to add 16 nodes properly and have 32 nodes with 350Gb on each node in the end. Should I expand existing racks instead of adding new ones? Should I calculate tokens for new nodes? Any other options?